RE: ncdigest V1 #586

To: "'netcdfgroup@xxxxxxxxxxxxxxxx'" <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: RE: ncdigest V1 #586
From: "Davies, Harvey" <harvey.davies@xxxxxxxxxxxx>
Date: Fri, 20 Apr 2001 16:47:51 +1000
> -----Original Message-----
> From: owner-ncdigest@xxxxxxxxxxxxxxxx
> [SMTP:owner-ncdigest@xxxxxxxxxxxxxxxx]
> Sent: Friday, April 20, 2001 13:25
> To:   ncdigest@xxxxxxxxxxxxxxxx
> Subject:      ncdigest V1 #586
> 
> Today's Topics:
> standard handling of scale/offset and missing data
> 
> ----------------------------------------------------------------------
> 
> Date: Thu, 19 Apr 2001 16:21:03 -0600
> From: John Caron <caron@xxxxxxxxxxxxxxxx>
> Subject: standard handling of scale/offset and missing data
> 
As the main author of the "Attribute Conventions" section in the User's
Guide, I must take responsibility
for what is not clear.  The following comments are intended to make these
clearer (but these are just my
personal opinions)

> For example, in practice, valid_range seems to be in unpacked units 
> rather than packed. The manual is not that clear (to me) and I could 
> imagine it being used both ways.
> 
I find the all the terms 'packed/unpacked', 'scaled/unscaled' and
'raw/converted' confusing. (I used
the terms 'packed/unpacked' in the User's Guiode, but I now regret this.) We
need terms which suggest
'actual value written on disk' and 'logical data value in memory'.   How
about 'external' and 'internal'?
Any other suggestions?

I will use 'internal' and 'external' in the following.  For example, the
internal type is often float with an
external type of short.

Re 'valid_range'.  This should be an external type and value, as should
valid_max, valid_min, _FillValue and
missing_value.  But add_offset and scale_factor should be an internal
type/value.

> - ---------------------------
> public class VariableStandardized extends Variable
> 
> A "standardized" read-only Variable which implements:
>   1) packed data using scale_factor and add_offset
>   2) invalid data using valid_min, valid_max, valid_range, missing_data 
> or _FillValue
> 
I assume you mean 'missing_value', which I believe should be ignored on
input (see below).

> if those "standard attributes" are present. If they are not present, it 
> acts just like the original Variable.
> 
> Implementation rules for scale/offset:
>    1) If scale_factor and/or add_offset variable attributes are present, 
> then this is a "packed" Variable.
>    2) the Variable element type is converted to double, unless the 
> scale_factor and add_offset variable attributes are both type float ,in 
> which case it converts it to float .
>    3) packed data is converted to unpacked data transparently during the 
> read() call.
> 
I am happy with these three rules.

> Implementation rules for missing data:
>    1) if valid_range is present, valid_min and valid_max attributes are 
> ignored. Otherwise, the valid_min and/or valid_max is used to construct 
> a valid range.
> 
The User's Guide states it is illegal to have valid_range if either
valid_min or valid_max is defined.  If
such a file exists in practice, I consider it better to force the user to
delete attributes to avoid such
ambiguity.

>    2) a missing_value attribute may also specify a scalar or vector of 
> missing values.
> 
Yes, but note that this attribute is merely a hint for output & should be
ignored on input.

>    3) if there is no missing_value attribute, the _FillValue attribute 
> can be used to specify a scalar missing value.
> 
For what purpose?  This could be reasonable on input if you are defining an
internal missing value, but
my understanding of your proposal is that you are simply defining an array
of data.

Before writing the section, I thought long and hard about the relation
between valid range, missing_value
and _FillValue.  We finally agreed to essentially deprecate missing_value
for simplicity.  On input, if there
is a valid_range then any value outside this is considered missing.  If
there is no valid_range then
_FillValue defines a valid max if it is positive, otherwise it defines  a
valid min.  On output missing data
may be written as any value outside the valid range.  However a particular
application may choose to
use the missing_value (or an element of it if it as a vector) as the value
to write for missing data.  So it
would make sense for generic applications to use the 1st element of the
missing_value for output 
(provided this was outside the valid range).

> Implementation rules for missing data with scale/offset:
>    1) valid_range is always in the units of the converted (unpacked) data.
> 
NO!!! See above.

>    2) _FillValue and missing_data values are always in the units of the 
> raw (packed) data.
> 
I agree.

> If hasMissingData(), then isMissingData( double val) is called to 
> determine if the data is missing. Note that the data is converted and 
> compared as a double.
> 
Harvey Davies, CSIRO Atmospheric Research,
Private Bag No. 1, Aspendale 3195
E-mail: harvey.davies@xxxxxxxxxxxx
Phone: +61 3 9239 4556
  Fax: +61 3 9239 4444
Follow-Ups:
- Re: standard handling of scale/offset and missing data
  - From: John Caron
2001 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: