NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

netCDF loses comments and changes formats

> Organization: ADISS Project
> Keywords: 199209241829.AA21251 ncdump ncgen CDL

Rich Lysakowski wrote:
> Some people at ISAS in Europe using netCDF on DOS systems reported to me
> that they have problems with the netCDF test routines.  After the code
> build is done and the tests put a test dataset thru ncdump and ncgen,
> comments are lost.  Another problem is that formatting of the datasets is
> different after doing a round trip thru the utilities.

This is not a bug in the MSDOS version, but rather is the intended behavior
on all platforms.  More explicitly, if an ASCII CDL file containing comments
is input to "ncgen -n" to create a binary netCDF file, and that file is then
used as input to ncdump to generate another CDL file, the latter is not
necessarily identical to the original CDL file: there are no comments in
the final CDL file, and its indentation and line breaks may be different
from the original CDL file.

For example, assume a file named `example.cdl' contains the following
decalaration with trailing comment:

  float Z(lat, lon);  // Z is geopotential height

After generating a binary netCDF file with "ncgen -n" and looking at the
result with "ncdump", the line will appear as:

        float Z(lat, lon) ;

The fact that CDL comments are not stored in the netCDF files generated by
ncgen is analogous to C or Fortran compilers not storing comments in object
files.  It would be possible to use some global attribute convention, for
example

  :_Line_7_trailing_comment = "Z is geopotential height"

to store this information in the netCDF file so that it would be preserved
and interpreted correctly by ncdump later, but a better way to store this
kind of information is using ordinary netCDF attributes or variables, e.g.:

  Z:long_name = "geopotential height";

>                                                         Loss of comments
> is bad because important scientific information is being lost.  Formatting 
> is an annoyance, and may have some effect on usability. 

In my opinion important scientific information, even if in the form of
comments, should be stored in named variables or attributes.  If a comment
is important enough to be preserved with the data, it should be named rather
than just given a position in a CDL file.  The information is not useful to
programs if it can only be retrieved by (arbitrary) position in one of many
possible CDL files.  There may not even be an associated CDL file, since
most netCDF data is created through library interfaces rather than through
invocation of ncgen.

Similarly, the formatting of a CDL file is not preserved when it is
converted to a netCDF file because there are no variables or attributes that
have been assigned by convention to contain information about CDL line
indentation or line breaks.  There is no one-to-one correspondence between
CDL files and netCDF files; many CDL files can represent exactly the same
netCDF data and will yield the same netCDF file when input to ncgen.  This
is actually a good thing, because it provides an easy way to determine if
two CDL files represent the same data, even though they are formatted
differently.  Just run them through ncgen and compare the resulting netCDF
files.

--Russ