NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Comments on OCEANOGRAPHIC CONVENTIONS for NETCDF, Draft no.1 - 10/7/92 >This document presumes the contents of "conventions.info" >(unidata.ucar.edu anonymous ftp directory pub/netcdf) and will >not duplicate what is already described there. As both >conventions.info and profile.oceanography will be evolving in >parallel we will need to coordinate the documents throughout >their evolutions. A few comments about the "conventions.info" file. The last time I looked at conventions.info (3 weeks ago ?) it was in rather bad shape. + some variables occur several times. Eg., lat, lon + many variable names are too short and/or ambiguous. Here are some examples... float Tmin minimum temperature float Tmax maximum temperature long meana mean anomaly float speed movement speed associated with an echo-object (MDR) o "Tmin" and "Tmax" are way to ambiguous. T doesn't necessarily mean temperature, and there is no room for having multiple temperature sources and types (ie., water, air, wading pool). o "meana" is wide open. What kind of anomaly? Gravity? Food quality? Hem lines? o Using a name as general as "speed" for a specific thing like MDR is also a no-no. No room for storing speed of moving platforms, etc. See further comments about variable name limitations. > Guidelines for Creating Profiles. > o keep it simple (avoid proliferation of attributes) Yes, yes, yes! The more complex it is, the fewer people will use it. Why convert to a confusing new format when your old one works fine? > Guidelines for Robust Applications. > o Uninterpretable attributes should be ignored > o Variables with unsupported data types should be ignored > o Applications that require recognized variable names should > ignore variable names they do not recognize In my mind, you are trying to put too much functionality and brains into the application. To program an application that can "figure out" a file is bordering on AI and Expert Systems. I think what you fail to address that at some point in a file's processing life, a person will look at it and determine what variables to use. For example, if a file had variables lat, lon, and depth, AS WELL AS x, y, and z, what should the application do? Obviously a user has to say "use lat, lon, & depth", or "use x, y, and depth", or "lat, x, and z". If an app that can't recognize a variable, or determine the structure, it should present the user with a list of information from the file and ask, "What do YOU think?" Also, an app that makes a conclusion about how to interpret a file should be "over-rideable" by the user. Imagine having a "smart" app that's not smart enough, or too smart for it's own good. > Oceanographic Profile Issues. ... > variables: > double time(nobs); > time:units = "milliseconds since (1992-9-16 > 10:09:55.3 -600)" > (This will be implemented shortly in the udunits library.) Stick with this and let udunits define the format as much as possible. > 2) How to determine the orientation of a coordinate variable > Alternative 1: > Minimal restrictions on the naming of coordinate variables > and choice of units. Applications should apply a > multi-step algorithm to identify orientation as follows: > First - check the units of the coordinate variable: > Do the units imply a unique orientation (e.g. units of > time, "degrees longitude", "layer", etc.) ? > If no, then check the name of the coordinate variable: > Does the variable name match a template (e.g. *depth*, > *lon*, *lat*, *time*, x*, y*, z*, t*, etc.)? > Is this approach too complex? No. > What about cases where the orientation remains ambiguous? Have the app ask the user what they think. > Alternative 2: > Introduce a variable attribute 'orientation' with a > suitable naming convention for orientation strings (e.g. > "west-east", "south-north") > Should this be an optional attribute that can be applied > when the Alternative 1 technique fails? Getting too complex! Rember "Keep It Simple" > 4) Case-insensitive Names > Should application programs be case-sensitive with respect > to attribute and variable names? > Alternative 1: Case-insensitive. Easier to use. You don't mess up because you queried on "lat", "LAT", and "Lat", but for some reason the file creator liked "lAT". Stranger things can happen. > Alternative 2: Case-sensitive. > There are conveniences to the use of e.g. "time", > "Time", and "TIME" within the same file. I think making two variables in a dB, one named "time", and the other "TIME" is bad design. It's really, really abiguous, fuzzy, and bad. Creating a readable CDL file is like writing readable "C" or "FORTRAN" code. Who likes FORTRAN with all capital letters and no spaces between reserved words? It's HARDTOREADANDMAKEANYSENCEOFIT. > 6) Need a global attribute to indicate profile type and revision Alternative 1 looks good. > 7) Standardized (Conventional) Variable Names > The meteorological community has suggested a list of > standardized variable names (see conventions.info). "conventions.info", as I said above, is too messing for me to base any kind of standard on. > Should this list be extended to include additional oceanographic > variables? It must be modified before OCE data can be properly included. > How should these names fit this into the > framework of "resources" as described in conventions.info? > (We need input from folks familiar with "resources" in this > context.) I would suggest both standardized names, and the use of a "configuration" file that help's one read a file. The configuration file could be used to specify the variables to be used in a file. For example, assume a ship data file like this... dimensions: time = unlimited; variables: float GPS_lat (time); // position from GPS float GPS_lon (time); float SatNav_lat (time); // position from Magnavox Sat Nav float SatNav_lon (time); float LORAN_lat (time); // position from Loran-C float LORAN_lon (time); float GPS_time (time); // GPS clock time at fix float SatNav_time (time); // SatNav clock time at fix float LORAN_time (time); // LORAN clock time at fix float PC_time (time); // PC acquisition time at fix float SBE_Sea_Surface_Temperature1 (time); // SeaBird SST #1 float SBE_Sea_Surface_Temperature2 (time); // SeaBird SST #2 float SBE_Sea_Conductivity1 (time); // Seabird Conductivity #1 float SBE_Sea_Conductivity2 (time); // Seabird Conductivity #1 float Salinity1 (time); // T1 and C1 float Salinity2 (time); // T1 and C2 float Salinity3 (time); // T2 and C1 float Salinity4 (time); // T2 and C2 Writing an app that could interpret this and handle a request like "plot time vs. salinity", or "plot the ship's track" would be impossible without user intervention. In lieu of having the app flat out ask the user something like... Of these latitudes... 1 = GPS_lat 2 = SatNav_lat 3 = LORAN_lat Which do you want to use (1-3) ? for every ambiguous thing (of which there are many), I would suggest useing a configuration file that might look something like this... TIME = PC_time LAT = GPS_lat LON = GPS_lon SEA_SURFACE_TEMPERATURE = SBE_Sea_Surface_Temperature1 SEA_CONDUCTIVITY = SBE_Sea_Conductivity2 SALINITY = Salinity2 It's a thought. > 8) Name String Lengths > Should attribute and variable names be further restricted > with respect to length beyond the limit of `MAX_NC_NAME' > described in conventions.info? No, no, no! Don't limit names!!! Just look at some of the names in the "conventions.info" file and you can see what happens when you scrimp on name lengths! For example, why use "Tmin" for temperature min, when you could go with "temperature_min", or "minimum_temperature"? Another bad example is "SST", which should be "Sea_Surface_Temperature". Two more examples are "DIR" (wind direction) and "SPD" (wind speed). Names like "wind_direction" and "wind_speed" would be much better, and not as ambiguous. Suppose I want to store winds from a moving platform. I would use... variables: float platform_speed (time); // platform info float platform_heading (time); float true_wind_speed (time); // corrected for platform motion float true_wind_heading (time); // " " float raw_wind_speed (time); // not corrected float raw_wind_heading (time); // not corrected Get wordy! Get descriptive! Disk space is cheap! > 10) Requiring non-coordinate variables to be 4 dimensional > Is it acceptable to insist that all non-coordinate variables > be represented as 4-dimensional (lat/long/depth/time) > structures? Should there be other restrictions on number of > axes? > > Alternative 1: dimensionality should not be restricted to > exactly 4 - the restriction would preclude some data types > and would force misrepresentation of others. Some > restriction on the maximum number of dimensions for a > variable would, however, ease the burden on application > writing. > > 11) Mandatory ordering of geographical dimensions > Is it acceptable to mandate that if dimensions with > geographical significance are used in defining a variable > they will be ordered as lat-lon-depth-time (i.e. time as > the slowest moving axis)? > > Alternative 1: yes with reservations - are there serious > performance penalties? > > Alternative 2: no - applications require greater > flexibility than this. Perhaps a standard ordering could > be defined and an attribute introduced that would indicate > permutations. Example: > > var:permutation = "TXYZ"; It took me a while, but I think what you are presuming here is that when you make a data request, you will be asking "What was a particular data value at this given position (X, Y, & Z) and time (T)?" rather than "What was the data value at a time (T) and what was the platform's position (X, Y, & Z) at this particular time (T)?" As a collector of raw data from a ship, I think in the second case. The first case just doesn't exist on a single, moving platform. It exists when you have created a model, or grid, or have a large number of sensors in an array. > 16) Vertical axis orientation > Often oceanographic data is organized with positive down on > vertical axes. What is the best mechanism to indicate this > in a netCDF file? (A similar question arises on latitude > axes which may be south-positive or north-positive.) In the last 10 years of processing, I have never seen anything in my corner of the field where + latitude meant anything else but north, and - meant south. Same for east and west. I would suggest that anyone with data that is "- is north" should just re-process their data before trying to warp an app to handle this case. > > Alternative 1: Introduce a (boolean) coordinate variable > attribute "reversed". > > Alternative 2: Combine this property together with others > that have been discussed in a new attribute > > depth: properties = "reversed, coordinates, vertical"; Z direction (+ is up, + is down) does seem to flop around a bit. How about an attribute such as... z: up = "+"; // or "up" or "positive" or z: down = "-"; // or "down" or "negative" > 17) Longitude axis encodings > Longitudes encodings are not standardized - they may be > continuous across the dateline or continuous across the > prime meridian; either westward or eastward may be positive; > the range may be -180 to 180 or 0 to 360 or some other > choice. How should netCDF convey this encoding? One should distinguish between using longitude values as parts of queries, and longitudes as values. If you want to query on a particular value, then true, -170 needs to be translated to +190, +190 to -170 (or what-ever). If you want to store a position (like a drifter position), then it really doesn't matter. Any program worth it's (sea) salt can handle -180 to 180 and 0 to 360 when extracting lat/lon data for plotting or processing. Kind of a comment on this... > 19) Huge Data Sets / Multiple Files > Should we provide a standardized mechanism for associating > multiple files in a single "project"? How should it > function? as a time axis distributed among files? as > multiple variables distributed among files? Is this beyond > the scope of this document? > > Alternative 1: a "parent" netCDF file with variables and > attributes suitably defined to point to "child" files. > > Alternative 2: a file naming convention such as > my_cdf.001, my_cdf.002, my_cdf.003, ... > that will implicitly concatenate netCDF files along their > record (or time?) axis. How about having some global variables like this... variables: file_sequence: first = W9205a_001.cdf; file_sequence: last = W9205a_023.cdf; file_sequence: next = W9205a_013.cdf; file_sequence: prev = W9205a_011.cdf; Obviously, you might not be able to put all the names in right away as the sequence may not be known until after processing and acquisition is completed. A comment on the naming convention idea. I would point out the "badness" of naming netCDF files with any extension but ".CDF". It's much easier to say... % ls *.cdf which would give you all the cdf files, than... % ls *cdf* which would give you your cdf files, as well as anything else that had "cdf" in it's name, like your processing programs, sub directories, etc. > 20) Representing Sigma Coordinate Systems Could someone explain Sigma Coordinate Systems to me? I plead ignorance. Go ahead, laugh at me and drop me down a notch in your mind. ************************** > Real-Time and Shipboard data collection? Ah, I've been waiting for this... > What are the special issues? I think it's important to note that raw cruise data and processed, gridded models and the like are very different animals. In my raw/real-time/shipboard myopic view, I would say a file containing a model or grid will be used for queries like "What is the current vector at lat=45.36, lon=-126.97, depth=50.0, and time=325.36?" were as a file containing cruise data will be used for questions like "What was the current vector, lat, and lon for the sample at depth=50.0 and time=325.36?" Very different questions. One could think of ship data as being 4 dimensional, and you could make 4d queries ("What is the wind speed at a given lat, lon, depth (assume 0), and time?"), but the search would be horrendous -- check through EVERY sample in the data file, looking for match on lat, lon, depth, and time. This is assuming that the ship happened to go over the particular point at the particular time. I guess I'm saying that you can think of ship data in a 4d sence, and make 4d queries, but it's not really practical. Mostly, I would say data is tied around a "1d" coordinate system -- time. Most all queries are referenced to a given time. "Where were we at time T? What was the temperature at time T? What was the towed vehicle's depth at time T? What was the ship's speed at time T?" > How to represent a cruise track? (** a requirement? **) I would assume that instead of creating variables like this... dimensions: time = unlimited; variables: float time(time); float lat(lat); float lon(lon); a cruise track would be... dimensions: time = unlimited; variables: float time(time); float lat(time); float lon(time); Many of the questions and rules outlined above dealing with determining variable names and contents also apply. Looking for "*lat*" or what not works fine. As to "** a requirement? **" -- Eah-gahds, YES! Where do you think processed data comes from anyway? It comes from raw data! The only way you get a set of ADCP data to build a large scale 3d grid of currents is by going out in a ship and cruising around and collecting data! The only way you know what the water depth is at a given point is by going out there and "pinging" the bottom! > Other Topic Issues (relating to Shipboard Data) One point that I would stress for collecting raw data is that a file contain the true raw data, as well as calculated and computed values. It is a very "dangerous" step for us data collectors to store only calculated engineering unit values and not the raw data it came from. For example, looking at winds in OSU's XMIDAS netCDF file, we store the following values... ship's speed (from 3 nav sources and speed log) in knots ship's heading (from 3 nav sources and gyro) in deg raw voltage values from wind vane for wind speed (from 0-5 volt) raw voltage values from wind vane for wind heading (from 0-5 volt) uncorrected wind speed (in knots) uncorrected wind heading (in deg) corrected (for ship's and hdg, using speedlog) wind speed (in knots) corrected (for ship's speed and hdg, using gyro) wind heading (in deg) This gives the PI that uses the data a great lee-way to work with the data. If the gyro or the speed log wigs out at some point, (s)he is free to re-calculate the values from GPS or what ever. If the calibration coefficients had changed or some drift occured in the wind vane over the period of a cruise, the uncorrected and corrected winds could be re-computed from the raw voltage values with new callibration coefficients. Also, if someone messed up the program that did the calculations and miss-typed a coefficient, then that could be disasterous. Imagine having someone do a whole study based on the calculated values one had delivered, only to discover that a mistake had been made and all temperature values were 2 deg C too high, thus there IS no resumption of the El Nino. (extreme, I'll admit). Our data files are very large, as we store all kinds of data -- raw and calculated. We include the raw text strings from the GPS, SatNav (Complete with the word "MAGNAVOX" for you SatNavers), and Loran, as well as floats that hold a subset of the info in the strings. If you really want to know exactly what satellites were used for each GPS fix, it's there, if you want to go for it. We also store raw frequency values and voltages for other instruments, along with the number of samples taken over the sample period (1 minute), and the min, max, and mean values observed over the sample period. I guess we believe that "disk is cheap", and "data is precious" to us. Also, our "disclamer" is that our NFS mandate is to collect raw data, not to process it. That's the PI's job (for our case). > Compressed data I would die for compression, especially if it was automatic and happened inside the netCDF routines and I never knew it was there. ======================================================================= | || Tim Holt / Marine Technician / RV Wecoma +--==o_____+-/|--+|| College of Oceanography / Oregon State .____| R/V WECOMA ~-----/ Corvallis, OR USA, 97331-5503 (503)737-4447 +------------------------' holtt@xxxxxxxxxxxx
netcdfgroup
archives: