NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
I agree that the CF conventions provide very useful metadata, so yes, supporting them is a great idea. But people can use those conventions without using netcdf. Zarr and plain hdf, for instance. Or plain in-memory data. But it sounds like you are on the right track with your storage_adaptor class. -CHB Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 <x-apple-data-detectors://6/1> (206) 526-6317 main reception On Feb 25, 2020, at 12:09 PM, John Buonagurio <jbuonagurio@xxxxxxxxxxxx> wrote: On Mon, 24 Feb 2020 at 18:27 Chris Barker <chris.barker@xxxxxxxx> wrote: Sounds interesting. ONe suggestion: It's a subtle distinction, but rather than a "interface to netcdf", considering building a library for working with data that conforms to the netcdf data model, that has netcdf-C as one back-end. This is how xarray is built, and it opens the door to other file formats, like zarr, or even grib, etc. Thanks Chris. I could isolate most netCDF-C API functions in a 'storage adaptor' class for future extensibility. The I/O API is basically the same across N-dimensional array formats, and many libraries (e.g. bjoern-andres/marray, xtensor) provide BLAS-like slicing over contiguous storage. Efficient views over chunked storage are more difficult but it can be done with a generic caching iterator, similar to the approach used by nccopy. I'm working on a C++ implementation. The problem is figuring out how to index and slice the array, since metadata encoding schemes vastly differ between formats. Flexible slicing is the primary goal of this library. Consider how difficult this would be with the Unidata netCDF libraries (from code example): ``` auto slice = tcw.select( ncpp::selection<date::sys_days>{"time", start, end, 2}, ncpp::selection<double>{"latitude", 77.5, 80}, ncpp::selection<double>{"longitude", 7.5, 10} ); // tcw(2002-07-01 12:00,80,7.5) = -23261 // tcw(2002-07-01 12:00,80,10) = -23675 // tcw(2002-07-01 12:00,77.5,7.5) = -23473 // tcw(2002-07-01 12:00,77.5,10) = -23216 // ... ``` The CF conventions are the only major standard which can unambiguously associate an array with labeled axes (coordinate variables), coordinate chains (`instance_dimension` attribute) and related variables (`ancillary_variables` attribute). CF conventions are currently only defined for the netCDF data model, and there is not yet a standardized and portable metadata mapping from the netCDF data model to other backends. For example if you convert a GRIB-2 file to netCDF using Unidata CDM, ecCodes and wgrib2, you may get very different results including mislabeled variables. There are just too many edge cases. xarray allows for more control over dataset indexing and value conversion. While I recognize that this is useful, I initially want to prioritize simple and unambiguous operations on standards-compliant files. For example, date/time conversion should work the same as it does in ncdump, and coordinate variables have to meet certain preconditions. CF compliance also makes high-level, automatic indexing possible for discrete sampling geometries, using the `featureType` and `cfRole` attributes to determine logical structure. -- John Buonagurio
netcdfgroup
archives: