NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] NetCDF4 for Fusion Data

Hi John and Ed

Thanks for your responses.

John Storrs wrote:
> > Further to my previous postings about 'unlimited' dimensions, I now
> > understand the semantics better, and it's apparent that there is a
> > mismatch with the needs of our application.
> >
> > As previously described, we need to archive say 96 digitizer channels
> > which have the same sample times but potentially different sample counts.
> > From a logical point of view, the channel measurements share a single
> > time dimension - some move further along it than others, that's all. They
> > should clearly all reference a single time coordinate variable.
To clarify, the case I'm describing is archiving digitizer data from a 5
second plasma shot. The data is acquired from the digitizers after each shot
(~3000 channels in total), and a number of archive files are written and
permanently stored. The data for each channel has a fixed size, known at the
time the archive file is written. The proposal to use an unlimited time
dimension here is a stratagem to allow a set of variables storing arrays of
different (fixed) lengths to reference a single logical dimension and a
single coordinate array. That's the requirement - if there's a better way of
doing it I need to know.

> > Also, we  may want to stick with our present compression strategy for
>>  time, storing it as a (sequence of) triple: start time, time increment,
>> and count. We might put these values in an attribute of the time
>>  coordinate variable, leaving the variable itself empty.
> storing data values in attributes is a bad idea (IMHO). what would be the
> motivation?
I was seeing them as similar to scale_factor and add_offset attributes, which
we will use for digitizer data. They would be components of a new convention.
In the usual simple case of a start time of 0.0 seconds, a fixed sample clock
period P seconds (eg 1.0e-6), and maximum size of the unlimited time
dimension T, time coordinates can be stored as "0.0, P, T". This is a
compression strategy. If we want to store the actual data we'll have to use a
double array to avoid loss of precision (floats are too small) - compressing
with shuffle and deflate filters reduces the data size by ~95% in some tests,
but still leaves ~2MB. What's really needed for this type of data is a
differential filter which reduces it to the triples described. Does HDF5
offer that - I haven't seen it.

> > The 'unlimited' semantics go only half way to matching this requirement.
> > At the HDF5 storage level,all is well. H5dump shows that the stored size
> > of each variable is the initialized size, not the maximum initialized
> > size of all the variables to which the dimension is evidently set. So far
> > so good, but ncdump shows all the data padded to that size, reducing its
> > usefulness. This is presumably because the dimension provides the only
> > size exposed by the API, unless I overlook something. HDF5 knows about
> > the initialized sizes, but NetCDF doesn't expose them. So we cannot
> > easily read the data and nothing but the data. Do you have an initialized
> > size inquiry function tucked away somewhere, or do we have to store the
> > value as an attribute with each variable?
>
> I would store the "actual size" of each variable as another variable. if
> you are constantly changing it, dont store as an attribute. if it only
> changes occasionally, an attribute would probably be ok.
>
> Since you want these to share the same time coordinate among these
> variables, you have to pay the price that all of them logically have the
> same length. Generic application will work fine, seeing missing values.
> Your specialized programs can take advantage of the extra info and operate
> more efficiently.
>
> > I don't think I want to explore VLEN to crack this, because it's new and
> > would complicate things. It seems to me that this is a use case others
> > will encounter, which needs a tidy solution.Any thoughts? I have to
> > present a strong case for NetCDF here next week, to counter an HDF5
> > proposal which doesn't have this problem, though it has many others.
> >
> > Another point: nc_inq_ncid returns NC_NOERR if the named group doesn't
> > exist. Do you mean this?
> >
> > Regards
> > John
> >
> > On Tuesday 20 January 2009, John Storrs wrote:
> >> I've uncovered a couple of problems:
> >>
> >> (1) Variables in an explicitly defined group, using the same 'unlimited'
> >> dimension but of different initialized sizes, result in an HDF error
> >> when ncdump is run (without flags) on the generated NetCDF4 file. No
> >> problems are reported when the file is generated (all netcdf call return
> >> values are checked in the usual way). The dimension is defined in the
> >> root group. Try writing data of size S to one variable, and size < S to
> >> the next. This error isn't seen if the variables are all in the root
> >> group. In that case, ncdump fills all variables to the maximum size
> >> which I suppose is a feature and not a bug. An ncdump flag to disable
> >> this feature would be useful.
> >
> > --
> > John Storrs, Experiments Dept      e-mail: john.storrs@xxxxxxxxxxxx
> > Building D3, UKAEA Fusion                               tel: 01235 466338
> > Culham Science Centre                                    fax: 01235
> > 466379 Abingdon, Oxfordshire OX14 3DB
> > http://www.fusion.org.uk
> >



--
John Storrs, Experiments Dept      e-mail: john.storrs@xxxxxxxxxxxx
Building D3, UKAEA Fusion                               tel: 01235 466338
Culham Science Centre                                    fax: 01235 466379
Abingdon, Oxfordshire OX14 3DB              http://www.fusion.org.uk




  • 2009 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: