NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: [netcdf-java] Erroneous data from linked HDF files

  • To: Christian Ward-Garrison <cwardgar@xxxxxxxx>
  • Subject: Re: [netcdf-java] Erroneous data from linked HDF files
  • From: Christopher Mueller <cmueller@xxxxxxxxxxxxxx>
  • Date: Fri, 8 Aug 2014 21:52:21 +0000
Sounds good, thanks Christian.

Best,
Chris

----- Reply message -----
From: "Christian Ward-Garrison" <cwardgar@xxxxxxxx>
To: "Christopher Mueller" <cmueller@xxxxxxxxxxxxxx>
Cc: "netcdf-java@xxxxxxxxxxxxxxxx" <netcdf-java@xxxxxxxxxxxxxxxx>
Subject: [netcdf-java] Erroneous data from linked HDF files
Date: Fri, Aug 8, 2014 16:52

Hi Chris,

It's definitely a good idea to emit a warning, rather than silently returning 
bad data. I've created a JIRA issue for this problem, that you can follow if 
you wish: https://bugtracking.unidata.ucar.edu/browse/TDS-584

Cheers,
Christian


On Tue, Aug 5, 2014 at 7:49 AM, Christopher Mueller 
<cmueller@xxxxxxxxxxxxxx<mailto:cmueller@xxxxxxxxxxxxxx>> wrote:
Hi Christian,

I actually prefer normal files myself, but had a need to use the files from the 
NASA OceanColor site, some of which are provided as .main + subordinate 
(linked) files.  I have used a subordinate file structure with HDF5 in the 
past, but when I did so I was working directly with the HDF files (via their 
HDF Java api), so the linking wasn’t an issue.  The primary reason I’m aware of 
for using subordinates is to keep the size of any single file smaller – though 
I think this is a somewhat antiquated reason that’s a holdover from the days of 
2GB file limits.

As to my particular problem, I’ve been able to incorporate the aforementioned 
HDF Java library into our application, which has allowed us to read the 
linked-fileset without issue.  The downside is that it we incur a requirement 
for platform-specific binaries, but we don’t have much other option! :)  
Fortunately, we’re able to segregate the code into a “pre-process”, which means 
we don’t need to worry about distributing the platform-specific portions.

It’s understandable that there is not support for linked HDF files in 
theNetCDF-Java library – as you said, it’s probably not a very frequently 
required functionality.  However – it may be worth trying to find a way to at 
least recognize that a particular dataset is backed by a linked-file so that an 
appropriate error can be thrown.  The concern I have is that, as it stands now, 
the NetCDF-Java library returns data without any indication that the data is 
incorrect.  While in theory, someone should know what their dealing with and 
recognize that the data is incorrect, I could envision a scenario where it 
could become a problem.

Best,
Chris

From: Christian Ward-Garrison <cwardgar@xxxxxxxx<mailto:cwardgar@xxxxxxxx>>
Date: Friday, August 1, 2014 at 7:17 PM
To: Christopher Mueller 
<cmueller@xxxxxxxxxxxxxx<mailto:cmueller@xxxxxxxxxxxxxx>>
Cc: "netcdf-java@xxxxxxxxxxxxxxxx<mailto:netcdf-java@xxxxxxxxxxxxxxxx>" 
<netcdf-java@xxxxxxxxxxxxxxxx<mailto:netcdf-java@xxxxxxxxxxxxxxxx>>
Subject: Re: [netcdf-java] Erroneous data from linked HDF files

Hi Chris,

First off, let me just say that this is an absolutely fantastic bug report. I 
wish I had better news for you, but the simple answer is that NetCDF-Java 
doesn't support linked HDF files. Frankly, you're the first use that's even 
mentioned them to us. Is there a particular reason that you prefer linked files 
to normal files?

Regards,
Christian


On Tue, Jul 15, 2014 at 1:29 PM, Christopher Mueller 
<cmueller@xxxxxxxxxxxxxx<mailto:cmueller@xxxxxxxxxxxxxx>> wrote:

tl;dr There appears to be a bug in NetCDF Java with respect to reading linked 
HDF4 files which results in data that is read from the linked file(s) to be 
erroneous.

Resources

  *   ToolsUI
  *   HDFView
  *   The files mentioned below can be retrieved directly from 
OceanColor<http://oceancolor.gsfc.nasa.gov/cgi/l3/A20021822013212.L3b_MC_RRS.main.bz2?sub=bin>
 (one at a time), or (for convenience) as one tar.gz file 
fromhere<https://drive.google.com/uc?id=0B6UT7Mn4GZQhMjdLNDBBMFE0TTA&export=download>

Details

I'm reading data from the Aqua MODIS L3 Binned products available from the NASA 
OceanColor<http://oceancolor.gsfc.nasa.gov/> website. It should be noted that 
these files are HDF4 (4.2.9 according to NetCDF Java - ncdump). Many of the 
products, such as chlorophyll, Particulate Inorganic Carbon, and Sea Surface 
Temperature, come as a single file. The NetCDF library reads these files 
without any difficulty.

However, one of the datasets of interest is the Remote Sensing Reflectance 
data, which is NOT provided as a single file, but as a "main" file and a set of 
subordinate files which are read via the "main" file as needed (see here for 
more information<http://oceancolor.gsfc.nasa.gov/PRODUCTS/modis_binned.html>):

  *   A20021822013212.L3b_MC_RRS.main
  *   A20021822013212.L3b_MC_RRS.x00
  *   A20021822013212.L3b_MC_RRS.x01
  *   A20021822013212.L3b_MC_RRS.x02
  *   A20021822013212.L3b_MC_RRS.x03
  *   A20021822013212.L3b_MC_RRS.x04
  *   A20021822013212.L3b_MC_RRS.x05
  *   A20021822013212.L3b_MC_RRS.x06
  *   A20021822013212.L3b_MC_RRS.x07
  *   A20021822013212.L3b_MC_RRS.x08
  *   A20021822013212.L3b_MC_RRS.x09
  *   A20021822013212.L3b_MC_RRS.x10
  *   A20021822013212.L3b_MC_RRS.x11

NetCDF Java (via ToolsUI) loads the .main file without issue, and permits 
reading of data variables (i.e. Rrs_412) without raising any errors. However, 
the data returned is not accurate. Below is a comparison of the data returned 
by ToolsUI and the same data returned by HDFView (which uses the HDF-java 
JNI<http://www.hdfgroup.org/products/java/JNI/> library):

Retrieving the first 10 values for variable "Rrs_412"

HDFView

Screen Capture<http://cl.ly/WZnD>

Opening the .main file in HDFView and looking at the Rrs_412 dataset gives a 
very different set of data:


0.0055423053, 0.0106070135, 0.006894292, -0.0040368317, -0.0020879991, 
-0.0020279996, 0.009794002, 0.011879213, 0.010874448, 0.012330733


ToolsUI

Screen Capture<http://cl.ly/WZMW>

Opening the .main file and performing an Ncdump Data of variable: 
"Level-3_Binned_Data/Rrs_412(0:10:1).Rrs_412_sum"

Returns:


float Rrs_412_sum;

 data:

  {1.86057E-40, 9.403955E-38, 6.4099753E-10, 2.6076459E-9, 1.0297978E21, 
5.6431478E-11, 0.0, -2.9699963E36, 4.59183E-40, 3.67343E-40, 2.60329423E11}


Also, in ToolsUI, all of the other data variables (i.e. angstrom, aot_869 & 
Rrs_*) all display very very similar (most are identical) values as the 
Rrs_412. This is not the case for HDFView.

Incidentally, reading the data via OceanColor's 
SeaDas<http://seadas.gsfc.nasa.gov/> application (which uses NetCDF Java under 
the hood) results in the same data as ToolsUI.

Wrap-up

The evidence above appears to indicate that there is a bug in NetCDF Java 
related to linked HDF files which results in incorrect data reads from linked 
files.

Does anyone have any idea:

a) what could be causing the issue?
b) how could it be addressed?



Thanks in advance,
Chris

_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx<mailto:netcdf-java@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit: 
http://www.unidata.ucar.edu/mailing_lists/