Re: [netcdfgroup] New C++ interface for netCDF

To: John Buonagurio <jbuonagurio@xxxxxxxxxxxx>
Subject: Re: [netcdfgroup] New C++ interface for netCDF
From: Chris Barker - NOAA Federal <chris.barker@xxxxxxxx>
Date: Tue, 25 Feb 2020 15:30:00 -0500

I agree that the CF conventions provide very useful metadata, so yes,
supporting them is a great idea.

But people can use those conventions without using netcdf.

Zarr and plain hdf, for instance.

Or plain in-memory data.

But it sounds like you are on the right track with your storage_adaptor
class.

-CHB

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115 <x-apple-data-detectors://6/1>       (206) 526-6317
main reception

On Feb 25, 2020, at 12:09 PM, John Buonagurio <jbuonagurio@xxxxxxxxxxxx>
wrote:

On Mon, 24 Feb 2020 at 18:27 Chris Barker <chris.barker@xxxxxxxx> wrote:

Sounds interesting. ONe suggestion:


It's a subtle distinction, but rather than a "interface to netcdf",

considering building a library for working with data that conforms to

the netcdf data model, that has netcdf-C as one back-end.


This is how xarray is built, and it opens the door to other file

formats, like zarr, or even grib, etc.


Thanks Chris. I could isolate most netCDF-C API functions in a 'storage
adaptor' class for future extensibility. The I/O API is basically the
same across N-dimensional array formats, and many libraries (e.g.
bjoern-andres/marray, xtensor) provide BLAS-like slicing over contiguous
storage. Efficient views over chunked storage are more difficult but it
can be done with a generic caching iterator, similar to the approach
used by nccopy. I'm working on a C++ implementation.

The problem is figuring out how to index and slice the array, since
metadata encoding schemes vastly differ between formats. Flexible
slicing is the primary goal of this library. Consider how difficult this
would be with the Unidata netCDF libraries (from code example):

```
auto slice = tcw.select(
   ncpp::selection<date::sys_days>{"time", start, end, 2},
   ncpp::selection<double>{"latitude", 77.5, 80},
   ncpp::selection<double>{"longitude", 7.5, 10}
);

// tcw(2002-07-01 12:00,80,7.5)    = -23261
// tcw(2002-07-01 12:00,80,10)     = -23675
// tcw(2002-07-01 12:00,77.5,7.5)  = -23473
// tcw(2002-07-01 12:00,77.5,10)   = -23216
// ...
```

The CF conventions are the only major standard which can unambiguously
associate an array with labeled axes (coordinate variables), coordinate
chains (`instance_dimension` attribute) and related variables
(`ancillary_variables` attribute). CF conventions are currently only
defined for the netCDF data model, and there is not yet a standardized
and portable metadata mapping from the netCDF data model to other
backends. For example if you convert a GRIB-2 file to netCDF using
Unidata CDM, ecCodes and wgrib2, you may get very different results
including mislabeled variables. There are just too many edge cases.

xarray allows for more control over dataset indexing and value
conversion. While I recognize that this is useful, I initially want to
prioritize simple and unambiguous operations on standards-compliant
files. For example, date/time conversion should work the same as it does
in ncdump, and coordinate variables have to meet certain preconditions.
CF compliance also makes high-level, automatic indexing possible for
discrete sampling geometries, using the `featureType` and `cfRole`
attributes to determine logical structure.

--
John Buonagurio

References:
- Re: [netcdfgroup] New C++ interface for netCDF
  - From: John Buonagurio

2020 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: