Hi Ed
Some comments on your response:
John Storrs wrote:
> > The 'unlimited' semantics go only half way to matching this requirement.
> > At the HDF5 storage level,all is well. H5dump shows that the stored size
> > of each variable is the initialized size, not the maximum initialized
> > size of all the variables to which the dimension is evidently set. So far
> > so good, but ncdump shows all the data padded to that size, reducing its
> > usefulness. This is presumably because the dimension provides the only
> > size exposed by the API, unless I overlook something. HDF5 knows about
> > the initialized sizes, but NetCDF doesn't expose them. So we cannot
> > easily read the data and nothing but the data. Do you have an initialized
> > size inquiry function tucked away somewhere, or do we have to store the
> > value as an attribute with each variable?
>
> If it is any consolation to you, netcdf-4 does not actually attempt to
> write or read the extra fill values.
Understood.
>
> That is, although the increase in the time dimension seems to cause
> all the variable that share this dimension to increase in size, in
> fact, no writes or reads take place for those other variables (as
> would happen with classic netcdf format). The netcdf-4 library just
> pretends that the other variables have increased in sizes, and, if you
> try and read such values, hands you arrays of the fill value.
> However, as John points out, the semantics of netCDF objects are such
> that, logically, all the variables share the dimension, and it must be
> the maximum size needed to hold data from any of the variables that
> share it.
Understood, but we will need a way of reading just the archived data.
Certainly we can store the size ourselves, in an attribute of each of these
variables. That would be another component of a new convention. But it would
be better if the NetCDF API exposed that information.
> > I don't think I want to explore VLEN to crack this, because it's new and
> > would complicate things. It seems to me that this is a use case others
> > will encounter, which needs a tidy solution.Any thoughts? I have to
> > present a strong case for NetCDF here next week, to counter an HDF5
> > proposal which doesn't have this problem, though it has many others.
>
> I would suggest that the HDF5 solution will probably involve VLENs, as
> they are natural for the data structure you describe. But are you
> content to always read/write the VLEN as a unit? That is, you can't
> read/write part of a VLEN, you have to do the whole VLEN at once. This
> might make it unsuitable.
Not necessary, because the variable data is actually fixed size - see my first
posting today.
I should make clear that the dimension issue under discussion applies to only
one potential application area of NetCDF in fusion research. The classic
model is already used successfully for archiving fusion modelling code data.
I'm trying to standardise across different application areas, to get the
benefits of uniformity (one interface to bind them!). I have to persuade
proponents of HDF5 here that NetCDF is better, because of its simpler higher
level interface.
Regards
John
--
John Storrs, Experiments Dept      e-mail: john.storrs@xxxxxxxxxxxx
Building D3, UKAEA Fusion                               tel: 01235 466338
Culham Science Centre                                    fax: 01235 466379
Abingdon, Oxfordshire OX14 3DB              http://www.fusion.org.uk