Re: ncdigest V1 #501

To: Konrad Hinsen <hinsen@xxxxxxxxxxxxxxx>
Subject: Re: ncdigest V1 #501
From: Russ Rew <russ@xxxxxxxxxxxxxxxx>
Date: Tue, 29 Jun 1999 13:43:38 -0600
Konrad Hinsen <hinsen@xxxxxxxxxxxxxxx> wrote:

> But I must read along both major dimensions, depending on the type of
> analysis I am doing. From your explanation it seems that one the two
> access types will always be very slow. Shouldn't it be possible for
> the netCDF library to organize the data in such a way that a scan
> along any dimension is doable with acceptable efficiency? For example,
> each contiguous file block could correspond to a subarray of
> approximately equal extent along each dimension.

The data organization you are referring to here has been referred to
as chunking or blocking.  It balances the access along "chunked"
dimensions at the cost of slower access along a favored dimension.
NetCDF does not implement chunking, but HDF 4.1 and HDF5 support
chunking for data layout.  There are some other advantages to chunking
(the possibility of compression with direct access to data subsets,
efficient support for multiple unlimited dimensions) that make it an
attractive alternative to the standard layout for multidimensional
data.  For more details on chunking performance, see the HDF 5 section
on "Dataset Chunking Issues" at

  http://hdf2.ncsa.uiuc.edu/HDF5/doc/Chunking.html

Support for chunking in netCDF would require a different data format.
Implementation of the netCDF interface on the HDF5 format would
provide this capability and is still in our eventual plans, if we can
find the resources for the necessary development ...

> Could I gain anything from not using an unlimited dimension? In some
> cases I know the final size before creating the file, and in others
> it might be worth to make a fixed-size copy before some
> lengthy analysis.

Each fixed size variable is stored contiguously.  Each record variable
is spread across all the records in a dataset, with only one record's
worth of data stored in each record.  So reading all the data in a
record variable (when there are multiple record variables) will in
general require more disk accesses than reading all the data in a
similarly shaped fixed-size variable.

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
russ@xxxxxxxxxxxxxxxx                     http://www.unidata.ucar.edu