RE: NetCDF for underway oceanographic data storage

To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: RE: NetCDF for underway oceanographic data storage
From: Phil Morgan <Phil.Morgan@xxxxxxxxxxx>
Date: Thu, 17 Sep 1992 15:09:29 EST
Gday all  [Hi all netcdf users]


I have a few comments of this topic.  (I have included a copy of
Lindsay Pender's email that did not get distributed to the netcdfgroup).
If you wish to reply, please reply to the netrcdfgroup  so we can all be 
informed of this discussion.  I think there may be many interested 
parties in this group.


Cheers
-Phil Morgan

Addition comments at end by Lindsay Pender

*******************************************************

LINDSAY PENDER WRITES:
====================
>> >From pender@xxxxxxxxxxx Mon Sep 14 21:14:10 1992
>> >Subject: NetCDF for underway oceanographic data storage
>> >
>> >I read with interest your mail message expressing your intention to use 
>> >netCDF
>> >for underway data storage. I have also considered this approach for the same
>> >reasons you give, however came up with some conceptual difficulties when I 
>> >was
>> >looking at ways to implement it. It may be that your data is different, but 
>> >in
>> >our case we have data coming from many different sources, each with 
>> >different
>> >sampling rates. Some of our instruments are sampled at 2.5kHz, while others 
>> >are
>> >as slow as once a minute. For an underway data storage system using netCDF 
>> >how
>> >do you store such data with only one 'unlimited' dimension? What I have
>> >considered doing, is to collect data from the various instruments into fixed
>> >length blocks, and then after some suitable time writing all of the data to 
>> >a
>> >netCDF file with the now known dimensions. Using this scheme, I would have 
>> >to
>> >carry an extra variable for each instrument - the time stamp for each block.
>> >
>> >Any comment?
>> > 
>> >Regards
>> > 
>> >Lindsay Pender


TIM HOLT WRITES:
==============
>> OSU currently can manage it's data by logging 1 minute averages for
>> all instruments.  No one yet has asked for finer resolution from our
>> common use equipment.  CTD, ADCP, and other such higher resolution
>> systems are managed and logged by their own software and are currently
>> independent of the new netCDF system.  Soon though, I will need to merge
>> in some finer res. data (5 second GPS and ADCP).  Here is my scheme, and
>> I'm real curious what kinds of alternatives others can suggest.
>> 
>> I'll see if I can describe my idea with a CDL file.  It may not be the best 
>> way, but I guess it will work...
>> 
>> 
>> 
>> <<< BEGIN multi_res.cdl >>>
>> 
>> netcdf multires {
>> 
>> dimensions:
>>      min_max_mean = 3;   // store 3 numbers:  min, max, mean
>>      ten_hz = 600;       // number of 10.0 hZ samples in 1 minute
>>      five_hz = 300;      // number of 5.0 hZ samples in 1 minute
>>      twopoint5_hz = 150; // number of 2.5 hZ samples in 1 minute
>>      one_hz = 60;        // number of 1.0 hZ samples in 1 minute
>>      five_second = 20;   // number of 0.05 hZ samples in 1 minute
>>     time = unlimited;   // the "time" dimension
>> 
>> variables:
>>      long   time(time);      // seconds since some fixed point in time
>>      float  gps_lat(time);   // gps latitude in sample period
>>     float  gps_lon(time);   // gps longitude in sample period
>>      short  n_sats(time);    // number of satellites used in fix
>>      float  raw_gps_lat(time, five_second);  // raw gps latitude
>>      float  raw_gps_lon(time, five_second);  // raw gps longitude
>>      float  sea_temp(time, min_max_mean);  // sea surface temperature
>>      float  towed_cdt_temp (time, ten_hz); // raw CTD temperature
>>      float  towed_ctd_cond (time, ten_hz); // raw CTD conductivity
>> }
>> 
>> <<< END multi_res.cdl >>>
>> 
>> The idea is to pick the least common denominator (1 minute data) and
>> pack anything that's a finer resolution into a new dimension. 
>> 
>> I did try this scheme for a towed vehicle logging/display system, but I
>> found the netCDF time overhead (on a PC) was too high for me to log real
>> time, raw 24 hZ CTD data.  Too many variables to log -- more than the
>> simple example above.  I still used the same idea, but went to a simpler
>> ASCII file for quick I/O.
>> 
>> Comments???
>> 
>> Tim Holt, OSU Oceanography
>> holtt@xxxxxxxxxxxx
>> 

 
Reading in data and saving to a file in real-time will always be 
limited by the sampling rate and the number of samples monitored.  
Saving directly to netCDF format adds an extra cost in processing 
overheads.

For fast sampling and/or large number of samples, it is best to 
save the "continuously" sampled data records from an instrument
directly to a file(ascii or binary[fastest]).  For example, read 
data into a record in fortran (yes, i know it's an extension) or a
structure in C etc and write out the whole record in binary format. 
This file of records acts as a buffer from which you can run a 
program to convert the file of records into netCDF format.
 
A picture of this for 2 instruments follows (could N instruments,
each with a different number of component data elements).  

+---------+
| instr#1 | -->> (Read&Save) -->> instr#1 file --> (convert)  ---> netCDF
+---------+                       of records  

 
+---------+
| instr#2 | -->> (Read&Save) -->> instr#2 file --> (convert)  ---> netCDF
+---------+                       of records

 
If data is acquired at a relatively slow rate then you may well 
have plenty of time to write directly to a netCDF format file.

The netCDF file could be separate files for each instrument or all 
data merged into a single file.  


Separate instrument logged files (data at same sampling rate for each file)
===============================
If all data from one instrument is at the same time base then this is easy.
The time dimension can be set to "unlimited".  Each instrument log file
will have it's own time variable appropriate for the sampling rate.  
Comparisons between different instruments will need to account for the 
different time base in each file.  This should be no problem but we do 
have several (instrument) files.

Other specialised instruments with their own data acquisition and data 
storage format (eg ADCP,CTD) could have their data converted to netCDF 
files after aquisition is complete.


If there are several data components at different sampling rates then 
data at the same sampling rate could be grouped together in a file.  
Thus each file will have it's own time base.


ONE "MERGED" FILE (data components with different sampling rates)
================
If data components are sampled at different rates but using the same
clock then there will be a common denominator (common time base) and the
method suggested by Tim Holt IS EXCELLENT.  Lindsay's concern of different
sampling rates can be accommodated by Tim's method as long as there is a
common clock from which the sampling rates are referenced.

lindsay>> Some of our instruments are sampled at 2.5kHz, while others are
lindsay>> as slow as once a minute. For an underway data storage system using 
netCDF how
lindsay>> do you store such data with only one 'unlimited' dimension? 

The above case should encompass most common data aquisition situations.


However, if high speed acquisition is sampled using different clocks then 
they do not have an exact common time base. 

Lindsay Pender has 2 solutions
=============================
1.  The easiest solution may be to record the time base for each data
    component sampled at a different rate and from a different clock.

Lindsay>> What I have
>> >considered doing, is to collect data from the various instruments into fixed
>> >length blocks, and then after some suitable time writing all of the data to 
>> >a
>> >netCDF file with the now known dimensions. Using this scheme, I would have 
>> >to
>> >carry an extra variable for each instrument - the time stamp for each block.
>> >
I believe that Lindsay is suggesting something like this ...(rough CDL)
dimesions:
        xsample_no = 300  // say, no. of samples in blocks of x
        ysample_no = 40   // say, no. of samples in blocks of y


// These are the user defined no. of blocks to read from each 
// instrument before writing out to a netCDF file
        indexx  =  1000   // say
        indexy  =  500    // say


variables:
        // Instrument #1 data
        float signalx(indexx,xsample_no,other dims)
        long  timex(indexx)     // time stamps for each block

        // Instrument #2 data
        float signaly(indexy,ysample_no,other dims)
        long  timey(indexy)     // time stamps for each block



This will require the aquisition program to count the number of 
samples and write out a netCDF file at appropriate times.  Application
programs will need to use the individual time stamps for each block
of data from each instrument.

If data aquisition is fast and processor time limited, it may be 
neccessary to write all data to a binary file and later convert 
to a netCDF file.


2.  PADDING (Info directly from Lindsay)

When sampling rates do not have a common clock one could use 
Tim Holt's scheme by rounding up the block length for each time 
interval (common for all instruments) each sample from each 
instrument was guarenteed to fit within the block.  Note now that
the number of samples in consecutive blocks may be different, 
depending upon the relative timing of the block and instrument 
sampling.   This can be handled by using _FillValue for he unused 
samples in the block.


============end of file======

=============================================================================
Phil Morgan    mail:  CSIRO Oceanography                     _--_|\ 
                      GPO Box 1538,                         /      \
                      Hobart Tas 7008, AUSTRALIA            \_.--._/
               email: morgan@xxxxxxxxxxxxxxxxxxx                -----          
               phone: (002) 206236     +61 02 206236            \   /
               fax:   (002) 240530     +61 02 240530             \*/            
            
=============================================================================