Re: [netcdfgroup] Precision an dSignificant digits in netcdf

To: Roy Mendelssohn - NOAA Federal <roy.mendelssohn@xxxxxxxx>
Subject: Re: [netcdfgroup] Precision an dSignificant digits in netcdf
From: Chris Barker <chris.barker@xxxxxxxx>
Date: Fri, 8 Jul 2016 09:12:21 -0700
On Thu, Jul 7, 2016 at 6:27 PM, Roy Mendelssohn - NOAA Federal <
roy.mendelssohn@xxxxxxxx> wrote:

> > Thank you very much Jeff.  I think I’m too far to be able to explain
> myself. Perhaps, this is the wrong list for this question but I sent it in
> hope there is someone has deep understanding of netcdf data and use R. Let
> me tell the story simpler. Assume that you read a numeric vector of data
> from a netcdf file:
> >
> > data <- c(9.1999979, 8.7999979, 7.9999979, 3.0999980, 6.1000018,
> 10.1000017, 10.4000017, 9.2000017)
> >
> > you know that the values above are a model output and also you know
> that, physically, first and last values must be equal but somehow they are
> not.
>

classic floating point precision issues -- nothing to do with netcdf or R,
really. I think your data provider should have rounded before writing the
file, but what can you do?


> > And now, you want to use “periodic” spline for the values above.
> >
> > spline(1:8, data, method = “periodic”)
> >
> > Voila! spline method throws a warning message: “spline: first and last y
> values differ - using y[1] for both”.


actually, it seems that warning aside, the spline function is doing the
right thing :-) -- though Ideally it would let the user specify a precision
with which to check for "equality" -- you almost never want to check
equality of floating point values directly.


> Then I go on digging and discover 2 attributes in netcdf file: “precision
> = 2” and “least_significant_digit = 1”. And I also found their definitions
> at [1].
>

Interesting -- something like that really should be in CF ....



> > precision -- number of places to right of decimal point that are
> significant, based on packing used. Type is short.
>

yeach! using "right of the decimal point" rather than some number of
significant figures is pretty limiting (what if you have large magnitude
numbers?


> > least_significant_digit -- power of ten of the smallest decimal place in
> unpacked data that is a reliable value. Type is short.
>

This sure sounds like the same thing -- with the same limitations. unless
it can be negative, in which case you could be specifying large-magnitude
numbers.

According to:
https://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html

packing involves storing values as integers in fixed point:

unpacked_data_value = packed_data_value * scale_factor + > add_offset

(not sure what +> means....typo?)

anyway, this scheme allows any magnitude value to be stored, but the ncep
definitions seem to only support order-1 values.


this is really a question for NCEP. I can't find the reference in the docs
at that link (No, I didn't dig deep), so maybe there is more there, but
from what the OP posted:

Note that this is about how the data was packed, rather than how accurate
the data were/are in the first place. Which is odd, as if you pack and
unpack the data in the same way, then you should get the same values, which
was not the case here. Which indicates to me that difference is in the
actual data, not a result of the packing method. so precision and
least_significant digit are actually irrelevant to the OP's issue :-)

none the less, I understand the confusion -- these seem to be the same
thing, and are not consistent:

precision = 2 seems to mean that you can trust the first two decimal digits
after the decimal point -- i.e. the hundreds place, so you'd want to round
to 2 decimal points (round(x,2) in Python, probably the same in R.

but least_significant_digit = 1 seems to mean that the least significant
digit is the tenths place -- or one digit after the decimal point. In which
case you would round to one digit -- round(x,1)

However, in the OP's case, the first and last values are the same to 5
values after the decimal point, so ging with round(x, 2) or 3 or 4 or 5
would all work.

note that I see a lot of the digits "999979" in there, which looks like a
binary representation issue (for maybe .9?), which makes this data look
"good" to  5 digits to me. where "good" means re-creating the packed data
values -- not accuracy of teh data in the first place.

NCEP should enhance those docs :-) I'd add an example -- worth a thousand
words!

> Please, do not condemn me, english is not my main language :). At this
> point, as a scientist, what would you do according to explanations above? I
> think I didn’t exactly understand the difference between precision and
> least_significant_digit. One says “significant” and latter says “reliable”.
> Should I round the numbers to 2 decimal places or 1 decimal place after
> decimal point?
>

If the packing and unpacking is done the same way (which is pretty much has
to be) then you'll get the same exact floating point values if the inputs
were the same -- so that difference between the first and last values were
in the original data, and are not an artifact of packing.

I suspect that despite the wording in the docs -- "precision" in this case
is referring to the precision of the original data, not the limitation of
the packing scheme. Is that data even packed in the original file?

So I'd probably round to 2 digits after the decimal place.

Even better, get some clarification from NCEP.

The data comes out that way because of the way R encodes floating points,


R does the same thing as every other system.... and netcdf itself if those
data are in floats to begin with.


> But as the user later wrote:
>
> > For instance, If you check the header information of omega.2015.nc file
> it says;
> >
> > $ ncdump -h omega.2015.nc
> >
> > ...
> > omega:precision = 3s;
> > omega:least_significant_digit = 3s;
>


> > and if you check the output of rhum.2015.nc;
> >
> > $ ncdump -h rhum.2015.nc
> > ...
> > rhum:precision = 2s ;
> > rhum:least_significant_digit = 0s ;
>

this is starting to look to me like, despite the definitions, they are
trying to capture significant figures here. i.e.:

precision means number of sig figs, and least_significant is telling you
the magnitude of teh numbers.

you really need to ask someone that's involved with generating these files!

-CHB

> If you have a good answer, please reply all so that the original poster
> can see the response.
>

I don't seem to have the OP's email in this thread...

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
Follow-Ups:
- Re: [netcdfgroup] Precision an dSignificant digits in netcdf
  - From: Ismail SEZEN
References:
- [netcdfgroup] Precision an dSignificant digits in netcdf
  - From: Roy Mendelssohn - NOAA Federal