NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Dave, > May I request that Netcdf names beginning with a digit be handled as > a special exception, requiring some kind of permission mode to be > set before the API will allow these variables to be created? In > this way the default behavior for most users would be the same as in > previous versions. > > I think it is generally desirable to have Netcdf names follow the > same rigor and restrictions as variable names in common programming > languages. Indeed Fortran, NCL, C/C++, Grads, and one or more > Netcdf conventions all require the first character alphabetic or > sometimes underscore. The original authors of the Netcdf specs had > something like this in mind when they formalized this restriction. You're right that our intent was to provide easy mappings between netCDF variables and variables in programming languages. However, there are also good reasons to support use of names such as "4LFTX" for a 4-layer lifted index and "5WAVH" for a 5-wave potential height. These come directly from a well-known table of model output parameter names, and they have apparently been in use for some time in netCDF archives, although we didn't realize this. After the recent release that returned an error when asked to create such names, we also heard from another user who wrote: ... This breaks our application, which uses variable names that are completely numeric. This is partly for historical reasons, partly because numbers aren't language-specific (they are mapped to a language-specific string when they are displayed), and partly because it allows us to create more performant and space-efficient indices using integers (we have a lot of index information!). The C-based netCDF libraries have always supported reading netCDF data with arbitrary names. On defining names in a new dataset, the libraries have enforced restrictions on names. The absence of a check on creating names beginning with a number ("numeric names" for short) was inadvertent. We didn't realize enforcing this restriction in the new software would cause problems, but now that we know there are operational data collections that make use of numeric names, we have to consider our commitment to backward compatibility: http://www.unidata.ucar.edu/software/netcdf/workshops/2007/netcdf4/Compatibility.html The problem of mapping numeric names to variable names in C and Fortran has a fairly simple solution, for example mapping the name "4LFTX" to the variable name "DIGIT_4_LFTX" in the C or Fortran programs generated by ncgen from CDL input. The ncdump utility can display numeric names by escaping the first character, so that ncgen can parse the resulting CDL and easily distinguish names from numbers. For example, when ncgen sees variables: float \4LFTX(time, level, lat, lon) ; it will interpret this as a declaration for a variable named "4LFTX". This CDL support is already implemented and will be in the next daily snapshot release. > Now I think the alpha restriction is important in practice for > several diffuse reasons. Primarily, it will reduce compatibility > problems over time between producers and consumers of Netcdf data, > because of conventions. Also, it's common practice to copy variable > names between files and program code, aiding clarity. Also, the > occurrence of non-letters at the start of a name can be helpful as > an early warning of malfunctions. The role of conventions is important. The CF Conventions, for example, still require that variable, dimension, and attribute names begin with a letter or "_", so numeric names technically should not be used in data for which CF compliance is important. In practice some software that requires CF-compliant data may work fine with numeric names, because the netCDF libraries don't check name syntax when reading. A larger change that has elicited few comments but that is relevant to this issue is the addition of support for Unicode names in the most recent release. The first character of a name does not have to be in the US-ASCII character set. It can also be a non-ASCII Unicode character, like the rest of the characters in a name, to permit data to be more self-describing in contexts that use other alphabets. This is relevant because Unicode actually includes duplicate encodings of numerals within several other code blocks. Trying to extend rules for ASCII names to cover Unicode characters is complicated enough that we opted to allow any legal non-ASCII Unicode characters (UTF-8 encoded) in names, in addition to the rules for ASCII characters. The ncdump and ncgen utilities in the most recent netCDF releases handle UTF-8 Unicode in names. ... > I see this as a small amount of work now to save more in distributed > problems later. Thanks for your consideration. You may be right, we may be opening a can of worms by extending the character set and loosening the rules for netCDF names. Applications and archives need not support the new character sets or names. If you avoid use of numeric names or non-ASCII Unicode characters in names, previous versions of ncgen should continue to work with your data. If you specify that your applications or archives require CF-compliant netCDF, for example, you should not encounter problems allowed by the new looser rules for names. Postel's principal that you should "be liberal in what you accept and conservative in what you send" applies here, as a guide for application developers and data providers, respectively. We will continue to try to maintain backward compatibility while supporting current users, existing data archives, and new standards and technologies. As always, we welcome your feedback. --Russ
netcdfgroup
archives: