NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Jed, > Is there any way - currently or planned for the future - to specify > netCDF4-specific variable definitions (e.g., deflate) in a CDL file? There is no current way, but it's in our plans (see below). > As I often use ncgen on cdl files to create new netCDF files, this > capability would be highly desirable. > > As a workaround, I did try creating an empty netCDF-4 file with ncgen > -v4 -b ... and then writing a little f90 program to run nf90_redef() > and then set the "deflate" flags for each variable. But the library > returned an error, saying this action is not possible after the > variables are defined. According to the documentation > > "Once enddef has been called, it is impossible to set the deflate for > a variable." > > But I might imagine that one could call redef to change these flags? > Unfortunately, this did not work for me. No, it's actually not possible to change the compression (or chunking) of a netCDF variable after it has been created, so doing a redef won't work. But this is a timely question, as ncdump is being reimplemented for netCDF-4. There are several "performance characteristics" of netCDF data that are currently not represented in ncdump output (or in NcML): - format variant (classic, 64-bit-offset, netcdf-4, netcdf-4-classic) - netCDF-4 variable compression - netCDF-4 variable chunking parameters - netCDF-4 endianness At issue are two different views of the purpose of CDL/NcML: 1. CDL and NcML are abstract textual representations of metadata and data, without details of optimizations for performance. This has advantages in being able to easily compare ncdump output of two files that use different performance-related format, compression, chunking, or endianness settings. The current implementation of ncdump follows this philosophy, creating CDL/NcML with no information about the file format variant of the input, but allows determining this information with the "-k" (kind) option. 2. CDL and NcML are a completely faithful textual representation of data with all the details needed to generate performance-tuned binary data via a program such as ncgen, permitting ncdump and ncgen to be true inverses. One way to implement the second philosophy is to optionally include in ncdump output extra syntax to specify performance characteristics. We have plans for this, but are still discussing how to do it. One approach would simply add new syntax to CDL to represent performance-related characteristics. For example, performance characteristic specifications could be included after a variable definition in parentheses: float relhum(time, level, lat, lon) (Compression: deflate=5) ; It would then be the job of ncgen to parse the new syntax and make the appropriate API calls. A second approach would require ncdump and similar utilities to generate synthetic attributes that don't really exist in the data file but that contain information about format, chunking, compression, and endianness. These synthetic attributes would represent performance-related properties of the data that ncgen could use in generating binary files from CDL/NcML data. File-level, group-level, or variable-level attributes with names "_Compression", "_Chunking", and "_Endianness" could be used for these attributes, with variable-level attributes overriding group-level attributes, which in turn could override file-level specifications. Although the ncgen utility would respect these special attributes, they would not actually be stored in the file, since that information is already available through the API. A third approach would implement these attributes under the C and Java APIs, so that whenever a variable is represented as compressed, the API would behave as if performance-related attributes have been defined even though such attributes do not actually exist in the file. Users could specify performance characteristics by defining such attributes instead of through existing API calls, but such attributes could only be defined at certain times. For example, it's not possible to change the file format of a file through an API call, so adding a new "_Format" attribute to an existing file that used a different format would result in an error. Similarly, performance characteristics of variables fixed at variable definition time could not later be altered by adding a variable attribute. But ncgen could use such attributes to allow CDL to specify performance characteristics in creating new files. If you have reasons why you think we should reject or favor one of these approaches, please let us know soon. Thanks! --Russ
netcdfgroup
archives: