NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
-------- Original Message -------- Subject: valid_min, valid_max, scaled, and missing values Date: Fri, 23 Feb 2001 14:24:51 -0700 From: Russ Rew <russ@xxxxxxxxxxxxxxxx> Organization: UCAR Unidata Program To: caron@xxxxxxxx John, First, the GDT conventions at http://www-pcmdi.llnl.gov/drach/GDT_convention.html say: In cases where the data variable is packed via the scale_factor and add_offset attributes (section 32), the missing_value attribute matches the type of and should be compared with the data after unpacking. Whereas the CDC conventions at http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml say ... missing_value has the (possibly packed) data value data type. Here's what Harvey had to say to netcdfgroup about valid_min and valid_max or valid_range applying to the external packed values rather than the internal unpacked values: http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1174 implying that the missing_value or _FillValue attributes should be in the units of the packed rather than the unpacked data. And Harvey said (in http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1095) Yet I have encountered far too many netCDF files which contravene Section 8.1 in some way. For example, we are currently processing the NCEP data set from NCAR. An extract follows. It is obvious that a great deal of effort has gone into preparing this data with lots of metadata and (standard and non-standard) attributes, etc. But it is also obvious that there cannot be any valid data because the valid minimum (87000) is greater than the maximum short (32767)! And Section 8.1 states that the type of valid_range should match that of the parent variable i.e. should be a short not a float. Obviously the values given are unscaled external data values rather than internal scaled values. short slp(time, lat, lon) ; slp:long_name = "4xDaily Sea Level Pressure" ; slp:valid_range = 87000.f, 115000.f ; slp:actual_range = 92860.f, 111360.f ; slp:units = "Pascals" ; slp:add_offset = 119765.f ; slp:scale_factor = 1.f ; slp:missing_value = 32766s ; slp:precision = 0s ; It would be useful to have a utility which checked netCDF files for conformance to these conventions. It could also provide other data for checking validity such as counting the number of valid and invalid data elements. I guess I have to take some of the blame. I was one of the authors of NUGC and I was largely responsible for rewriting Section 8.1 last year while I was working at Unidata. I tried to make it clearer and simpler. In particular, I tried to simplify the relationship between valid_range, valid_min, valid_max, _FillValue and missing_value. But it seems that we have failed to make the current conventions sufficiently clear and simple. In http://www.unidata.ucar.edu/glimpse/netcdfgroup-list/1079 here's what John Sheldon of GFDL had to say about whether the missing value should be in units of the packed or unpacked data: - Section 32: Missing values in a data variable I think that the data should be checked against the "missing_value" *before* unpacking. First, I think there is already a pretty strong convention that "missing_value" be of the same type as the data. Second, some packages simply display the packed values, and they wouldn't be able to detect missing values. Third, I've been burned and confused often enough by varying machine precision to be quite shy of comparing computed values. However, handling missing values when unpacking packed data does present a real problem! Imagine a subroutine which unpacks, say, SHORT values into a FLOAT array. This routine will be able to reliably detect missing values, but what value is it to put in the FLOAT array? We solve this by storing a global FLOAT attribute which specifies this number. If a file has no such attribute, we stuff a default value in it. In any case, we inform the user of what was used. but Jonathan Gregory replied > Section 32: Missing values in a data variable > > > I think that the data should be checked against the "missing_value" *before* > unpacking. [JS] > > Yes, you may well be correct. Thanks. The problem then becomes: what will you put in the array of unpacked data if you find a missing value in the packed data? We store a global attribute to hold this value (say, -1.E30). In the absence of this global attribute, we simply stuff in a fill-value, which is OK, but you lose the distinction between intentionally and unintentionally missing data. In any case, we tell the calling routine what float values we used in both cases. So there evidently was no consensus on this issue and differing opinions. Since we have to pick one, I think I favor having the missing value be in the packed units. --Russ
netcdf-java
archives: