NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: Replies to comments on GDT

Harvey DAVIES (hld@dit.csiro.au)
Thu, 24 Jul 1997 23:20:32 +1000 (EST)

On Wed, 23 Jul 1997, Jonathan Gregory wrote:

> Section 6: Variable names
> 
> > I think there should be a recommendation that names consist of whole words
> > unless there is some strong reason to do otherwise.  So 'latitude' would be
> > preferred to 'lat'.  Note that such full-word variable names often obviate
> > the need for a 'long_name' attribute. [HD]
>
> I would be happy with such a recommendation. I do not think it would reduce
> the need for long_name, though. The long_name might really be quite detailed,
> for instance "volumetric soil moisture content at wilting point" (or this
> might be the quantity in GDT).

But surely the variable name 'latitude' is adequate!

> > I suppose the ability to define 0-dimensional variables could come in handy,
> > though such a quantity is probably more appropriately stored as a global
> > attribute. [JS]

There seems to be some confusion about what is meant by 0-dimensional.  I
would assume it means rank=0.  In other words an ordinary scalar value
(i.e. no dimensions).  JS seems to mean an array with a dimension of
size 0.

> > I wish to propose allowing missing (invalid) values in coordinate variables.
> > All corresponding data in the main variable would also have to be missing.
> > In particular this would simplify the problem of calendar dimensions which
> > GDT discuss.  You could simply allocate 31 days to every month and set data
> > for illegal dates (e.g. 30 Feb) to a missing value. [HD]
> 
> I am not happy about this idea, myself. To me it would imply that the data
> existed in principle, but was simply unavailable. See also Section 24.

I would argue strongly for a much broader concept of 'missing' or 'invalid'.
I see no reason why some of the missing values specified in the missing_value
vector should not mean things like 'meaningless' and 'undefined'.  This is
very similar to having missing values in the ocean for land-only variables
like soil-moisture.  How else can such values be represented?

I would also argue strongly for the above proposal to allow missing (invalid)
values in coordinate variables.  I feel it is a neat solution to the date
problem and is likely to be useful in other contexts.

> Section 11: Units
> 
> > I would like to see "none" added as a legitimate characterization, as it
> > would serve as a definite affirmation that the variable really does have no
> > units. [JS]
> 
> Good idea. Perhaps "one" or "unity" would be acceptable, since this could
> perhaps be inserted comfortably into the udunits "constants" section?

You ignored my comment that the required functionality is already provided
by udunits which allows units=" " for this purpose.  If you do  not like
using blank then udunits also allow units="1".

> I'm afraid I do not understand Harvey Davies's "measurement level" proposal.

Measurement level (measurement scale) describes the valid operations on a
variable and thus determines what statistics are valid.  The four levels are:

1. NOMINAL: Only valid operation is '='.  A measure of location is the 
   MODE (most frequent value).

2. ORDINAL: Comparisons are possible using operations '<' and '>'.  
   Non-parametric statistics can be used.  The usual measure of location is
   the MEDIAN (value with 50% of cases above & 50% below).

3. INTERVAL: Addition and subtraction are allowed.  So the ordinary
   ARITHMETIC-MEAN can be calculated and most standard statistical techniques
   can be used.

4. RATIO: Multiplication and division are allowed.  So the 
   GEOMETRIC-MEAN can be calculated.  Most physical and chemical measurements
   are at this level.

Here are some meteorological examples:

1. NOMINAL: Cloud Type (e.g. 1=cirrus, 2=nimbus, etc.)

2. ORDINAL: Beaufort Wind Scale (from 0=calm to 12=Hurricane).

3. INTERVAL: Temperature in Celsius. 

4. RATIO: Temperature in Kelvin. It makes sense to say that 200K is twice
   the temperature of 100K.

I am trying to think of a better example of an INTERVAL variable.  The above
temperature example is confusing in that it is the unit which makes it
INTERVAL, not the nature of the variable itself.  Perhaps a better example
would be altitude measured relative to an arbitrary datum whose absolute
altitude (height above standard sea-level) is unknown.

> Section 24: Time axes

> > If the unit is a day then there should be a fixed number (31 for 'normal'
> > calendars such as Gregorian) days in each month.  The time coordinate
> > variable should have a missing value for each day which does not exist in the
> > calendar used.  I think this obviates the need for the 'calendar' global
> > attribute and allows for most kinds of calendars without having to hard-code
> > them into a standard. [HD]
> 
> This would deal with the particular case of calculating the interval between
> two dates when a time axis at daily intervals is provided. I am not sure that
> counting the non-missing days between two points in a vector would be more
> convenient than working it out using a calendar-dependent algorithm, although
> it would be more general, I agree. However, it would not help if you did not
> wish to provide time coordinates at daily intervals. What if I have time
> coordinates at monthly intervals? To indicate the lengths of the months, would
> I have to pad out the coordinate vector, and presumably the data too, with
> missing data values at daily intervals i.e. approximately 30 times more missing
> data than genuine data? Not only would wasted space be added to the file, but
> it could easily be misunderstood, no matter how explicit the convention is
> made.

I suggest storing monthly data as follows:

dimensions:
    month = 120;
variables:
    length(month);
	length:units="days";
    temperature(month);
data:
    length = 31, 28, 31, 30, 31, ...

> Certainly, monthly and yearly mean data are among the most important types of
> climate data, so it is crucial to keep the representation of such data as
> simple and natural as possible, while representing them.  But I think it is
> good to avoid units of months and years. Although the udunits unit of "months"
> has a precise meaning (30.4368 days), this is probably not what you intend, and
> could lead applications to make mistakes if they do not check carefully what
> the intention is.

I do not see what the problem is with 'year' and 'month' in this context.  All
that matters is that there are 12 months in a year, a fact with which udunits
agrees!

> Section 32: Missing values in a data variable
> 
> > I think that the data should be checked against the "missing_value" *before*
> unpacking. [JS]
> 
> Yes, you may well be correct. Thanks.

Of course you can use missing_value however you like in SPECIFIC
applications.  But the netCDF User's Guide now states that GENERIC
applications should use the valid range (as defined by valid_range or
valid_min/max), not missing_value.  (I confess that you have me to blame me
for this change. You may want to throw something in the direction of
Australia, so I am donning my helmet as follows:  [(:-) )

Harvey Davies, CSIRO Mathematical and Information Sciences,
723 Swanston Street, Carlton, Victoria 3053, Australia            
Email: harvey.davies@cmis.csiro.au
Phone: +61 3 9282 2623 or +61 3 9239 4556
  Fax: +61 3 9282 2600