NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
> -----Original Message----- > From: owner-ncdigest@xxxxxxxxxxxxxxxx > [SMTP:owner-ncdigest@xxxxxxxxxxxxxxxx] > Sent: Friday, April 20, 2001 13:25 > To: ncdigest@xxxxxxxxxxxxxxxx > Subject: ncdigest V1 #586 > > Today's Topics: > standard handling of scale/offset and missing data > > ---------------------------------------------------------------------- > > Date: Thu, 19 Apr 2001 16:21:03 -0600 > From: John Caron <caron@xxxxxxxxxxxxxxxx> > Subject: standard handling of scale/offset and missing data > As the main author of the "Attribute Conventions" section in the User's Guide, I must take responsibility for what is not clear. The following comments are intended to make these clearer (but these are just my personal opinions) > For example, in practice, valid_range seems to be in unpacked units > rather than packed. The manual is not that clear (to me) and I could > imagine it being used both ways. > I find the all the terms 'packed/unpacked', 'scaled/unscaled' and 'raw/converted' confusing. (I used the terms 'packed/unpacked' in the User's Guiode, but I now regret this.) We need terms which suggest 'actual value written on disk' and 'logical data value in memory'. How about 'external' and 'internal'? Any other suggestions? I will use 'internal' and 'external' in the following. For example, the internal type is often float with an external type of short. Re 'valid_range'. This should be an external type and value, as should valid_max, valid_min, _FillValue and missing_value. But add_offset and scale_factor should be an internal type/value. > - --------------------------- > public class VariableStandardized extends Variable > > A "standardized" read-only Variable which implements: > 1) packed data using scale_factor and add_offset > 2) invalid data using valid_min, valid_max, valid_range, missing_data > or _FillValue > I assume you mean 'missing_value', which I believe should be ignored on input (see below). > if those "standard attributes" are present. If they are not present, it > acts just like the original Variable. > > Implementation rules for scale/offset: > 1) If scale_factor and/or add_offset variable attributes are present, > then this is a "packed" Variable. > 2) the Variable element type is converted to double, unless the > scale_factor and add_offset variable attributes are both type float ,in > which case it converts it to float . > 3) packed data is converted to unpacked data transparently during the > read() call. > I am happy with these three rules. > Implementation rules for missing data: > 1) if valid_range is present, valid_min and valid_max attributes are > ignored. Otherwise, the valid_min and/or valid_max is used to construct > a valid range. > The User's Guide states it is illegal to have valid_range if either valid_min or valid_max is defined. If such a file exists in practice, I consider it better to force the user to delete attributes to avoid such ambiguity. > 2) a missing_value attribute may also specify a scalar or vector of > missing values. > Yes, but note that this attribute is merely a hint for output & should be ignored on input. > 3) if there is no missing_value attribute, the _FillValue attribute > can be used to specify a scalar missing value. > For what purpose? This could be reasonable on input if you are defining an internal missing value, but my understanding of your proposal is that you are simply defining an array of data. Before writing the section, I thought long and hard about the relation between valid range, missing_value and _FillValue. We finally agreed to essentially deprecate missing_value for simplicity. On input, if there is a valid_range then any value outside this is considered missing. If there is no valid_range then _FillValue defines a valid max if it is positive, otherwise it defines a valid min. On output missing data may be written as any value outside the valid range. However a particular application may choose to use the missing_value (or an element of it if it as a vector) as the value to write for missing data. So it would make sense for generic applications to use the 1st element of the missing_value for output (provided this was outside the valid range). > Implementation rules for missing data with scale/offset: > 1) valid_range is always in the units of the converted (unpacked) data. > NO!!! See above. > 2) _FillValue and missing_data values are always in the units of the > raw (packed) data. > I agree. > If hasMissingData(), then isMissingData( double val) is called to > determine if the data is missing. Note that the data is converted and > compared as a double. > Harvey Davies, CSIRO Atmospheric Research, Private Bag No. 1, Aspendale 3195 E-mail: harvey.davies@xxxxxxxxxxxx Phone: +61 3 9239 4556 Fax: +61 3 9239 4444
netcdfgroup
archives: