NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
> Date: Fri, 20 Apr 2001 10:29:16 -0600 (MDT) > From: Brian Eaton <eaton@xxxxxxxxxxxxx> > Subject: Re: standard handling of scale/offset and missing data > > The treatment of the valid_range, _FillValue and missing_value attributes > differs substantially between v-2.3 and v-3 of the User's Guide. > These changes appeared in version 2.4 of the User's Guide (UG). > In v-2.3 the _FillValue is not connected to the definition of valid_range > except that it is recommended that the _FillValue should be outside the > valid_range. In v-3 if a valid range is not defined then _FillValue is > used to define one. > The UG states that it is 'legal' (whatever that means) for _FillValue to be a valid value, but recommends it be outside the valid range. > The type of scale_factor and add_offset should determine the unpacked > type. What if I want to unpack bytes into ints? > I agree. This seems essential since (unlike HDF) netCDF has no standard attribute to specify the internal type. > Date: Fri, 20 Apr 2001 13:31:59 -0600 > From: John Caron <caron@xxxxxxxxxxxxxxxx> > Subject: Re: standard handling of scale/offset and missing data > > One clarification: VariableStandardized is read-only. I will > (eventually) add a writeable version, in which case I would certainly > follow your conventions. My task right now is to define behavior which > does a reasonable job on (important) existing datasets. So I am more > motivated to bend the rules than if I was trying to define the rules. > I understand. However I have been very disappointed at the blatant (or ignorant) disregard for conventions displayed by far too many creators of important datasets. I am concerned that relaxing the rules in software such as yours will encourage such practice. I would prefer to get the creators (or users) of such datasets to rewrite the files. (I'm being deliberately provocative -- there may well be a reasonable compromise possible in many cases.) > > The User's Guide states it is illegal to have valid_range if either > > valid_min or valid_max is defined. If > > such a file exists in practice, I consider it better to force the user > to > > delete attributes to avoid such > > ambiguity. > > I guess the problem is that theres no library enforcement of such > Conventions, and so i am inclined to relax the rules if it doesnt cause > confusion. > I believe it would cause confusion. I would prefer to provide an easy, efficient way of deleting the redundant attributes. > >> 2) a missing_value attribute may also specify a scalar or vector of > >> missing values. > >> > > > > Yes, but note that this attribute is merely a hint for output & should > be > > ignored on input. > > I dont understand why you ignore it on input. > We wanted to keep things simple and reasonably efficient. The valid range is defined by valid_min, valid_max, valid_range and _FillValue. The test for missing involves zero, one or to two comparisons. I would not like to have to do more than two comparisons. Even two is quite time consuming. We could have chosen to use missing_value of none of the above four attributes were defined, but we decided against this just (for simplicity if I remember correctly). > What if there is no > valid_range specified? > As suggested above, we could have chosen to use missing_value like _FillValue when none of these four attributes was defined, but missing_value was defined. I would prefer to force renaming of missing_value to _FillValue, but I'm prepared to admit this may be unreasonably harsh. > What if the missing_data is inside the valid_range? > I assume you mean missing_value. There is no problem if missing_value is merely a hint for output. You simply alway ignore it (on input at least)! > >> 3) if there is no missing_value attribute, the _FillValue attribute > >> can be used to specify a scalar missing value. > >> > > > > For what purpose? This could be reasonable on input if you are defining > an > > internal missing value, but > > my understanding of your proposal is that you are simply defining an > array > > of data. > > I'm not sure if I understand. Through the hasMissing() and isMissing() > methods I am providing a service of knowing when the data is > missing/invalid. > I am thinking of an application which has an internal missing value for each variable. In this case the decision on whether data is missing is not part of the input process, but done later. I gather this is not the case with your proposed routines. > OK, I understand _FillValue better, thanks. Two things though: 1) it > seems reasonable to pre-fill an array with valid values, since perhaps > only a few data points need to be written that way. > I agree there may be cases where you want to pre-fill with a valid value of say 0. The UG states this is legal even though against recommended practice. We should have worded this more clearly to make it clear this is fine. > The above rules > would seem to preclude this. 2) Is the default fill value supposed to > operate the same way? If not, it seems funny that they might have > radically different meaning. > If none of the four above attributes is defined then all values are valid. (Well not quite, I guess NaN can hardly ever be considered 'valid'!! -- Incidentally I feel we should rethink the recommendation not to use NaN and other IEEE special values now that the IEEE standard is so widely used. I use NaN a lot.) > >> Implementation rules for missing data with scale/offset: > >> 1) valid_range is always in the units of the converted (unpacked) > data. > >> > > > > NO!!! See above. > > The problem is that many important datasets use the internal units. I > think theres a good argument that it is more natural since those would > be the units a human would think in. Is there anything in the current > manual that specifies this? I just reread it again and I dont see it. > I must apologise for this omission. Despite this omission, the convention has always been that valid range is external. It may well have been more logical for it to be internal, but it is too late to change it. You could argue for it to be internal if the datatype matched the internal type (i.e. that of scale_factor and add_offset), but I think this would cause confusion. Harvey Davies, CSIRO Atmospheric Research, Private Bag No. 1, Aspendale 3195 E-mail: harvey.davies@xxxxxxxxxxxx Phone: +61 3 9239 4556 Fax: +61 3 9239 4444
netcdfgroup
archives: