NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Konrad Hinsen <hinsen@xxxxxxxxxxxxxxx> wrote: > But I must read along both major dimensions, depending on the type of > analysis I am doing. From your explanation it seems that one the two > access types will always be very slow. Shouldn't it be possible for > the netCDF library to organize the data in such a way that a scan > along any dimension is doable with acceptable efficiency? For example, > each contiguous file block could correspond to a subarray of > approximately equal extent along each dimension. The data organization you are referring to here has been referred to as chunking or blocking. It balances the access along "chunked" dimensions at the cost of slower access along a favored dimension. NetCDF does not implement chunking, but HDF 4.1 and HDF5 support chunking for data layout. There are some other advantages to chunking (the possibility of compression with direct access to data subsets, efficient support for multiple unlimited dimensions) that make it an attractive alternative to the standard layout for multidimensional data. For more details on chunking performance, see the HDF 5 section on "Dataset Chunking Issues" at http://hdf2.ncsa.uiuc.edu/HDF5/doc/Chunking.html Support for chunking in netCDF would require a different data format. Implementation of the netCDF interface on the HDF5 format would provide this capability and is still in our eventual plans, if we can find the resources for the necessary development ... > Could I gain anything from not using an unlimited dimension? In some > cases I know the final size before creating the file, and in others > it might be worth to make a fixed-size copy before some > lengthy analysis. Each fixed size variable is stored contiguously. Each record variable is spread across all the records in a dataset, with only one record's worth of data stored in each record. So reading all the data in a record variable (when there are multiple record variables) will in general require more disk accesses than reading all the data in a similarly shaped fixed-size variable. _____________________________________________________________________ Russ Rew UCAR Unidata Program russ@xxxxxxxxxxxxxxxx http://www.unidata.ucar.edu
netcdfgroup
archives: