NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdf-java] Dataset aggregation by globbing

Hi John,

> My first reaction is to roll your code into the ncml package, and have 
> openGlob() write Ncml and then process it, and then enrich Ncml
> processing, so that that functionality is also available in a more explicit 
> way. But id have to look more at what youve done.

I like this idea.  I'm having to write quite a bit of code to do the
aggregation and I'd like to get rid of this and use the NcML stuff
instead.  I wonder if there's a way to detect whether the glob
expansion results in a FMRC or just a simple aggregation along the
time dimension?

> BTW, how do you identify the time coord vs forecast time? are you assuming CF 
> conventions?

Each file matching the glob pattern gets read.  For each variable in
each file I look at list of Dates (there is usually just one dimension
in each file) using CoordinateAxis1DTime.getTimeDates().  I maintain a
hash in which the Date is the key and the value is a class containing
the filename, variable name and the time index in the file that
corresponds with this Date.  If the Date already exists as a key, I
pick the Date that is associated with the smaller time index (which I
assume is the shortest forecast time).  Hence I end up with a hash of
unique Dates, each of which maps to the file, variable and t index
with the shortest forecast time - i.e. the "best timeseries".

Cheers, Jon

On Thu, Sep 25, 2008 at 2:29 AM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
> Hi Jon:
>
> This sounds like a nice idea, which we chatted about briefly at GO-ESSP.
>
> I could see an API like NetcdfDataset.openGlob(glob, type). The tricky part 
> is to define what it should do in various cases. Im guessing you are handling 
> the common cases, which is quite nice to make so easy. the optional type is 
> to give a hint for doing something different, but im not yet sure whats 
> needed.
>
> My first reaction is to roll your code into the ncml package, and have 
> openGlob() write Ncml and then process it, and then enrich Ncml processing, 
> so that that functionality is also available in a more explicit way. But id 
> have to look more at what youve done.
>
> I dont think its an IOSP, but perhaps you have another way of looking at it.
>
> My fmrc implementation seriously needs refactoring, so id like to see what 
> youve done there also. BTW, how do you identify the time coord vs forecast 
> time? are you assuming CF conventions?
>
> Jon Blower wrote:
>> Dear all (esp. John),
>>
>> We use NcML a lot for both file aggregation and for "fixing" metadata
>> problems in underlying NetCDF files - it's a great technology.
>> However, it can be an inconvenience to create an NcML file simply to
>> aggregate "well-behaved" files.  In our ncWMS we allow users to
>> specify a group of files using glob expressions, e.g. "/path/to/*.nc"
>> or even more complex things like "/path/to/200?/*/foo.nc".  This
>> simply unions the matching files together along the time axis.  It
>> allows files to contain different combinations of variables.
>> Internally, the system creates some kind of hash map, so that when a
>> user requests a particular variable at a particular time, the
>> aggregation knows which actual file, and which time index within the
>> file, is being requested.
>>
>> We have found this to be very useful.  I wonder if it would be a good
>> idea to integrate this capability into the NetCDF-Java libraries so
>> that users can open an aggregation by running
>> NetcdfDataset.openDataset("/path/to/*.nc") or similar?  What do others
>> think?
>>
>> Our code is available for stealing, but it might need some work to
>> satisfy more use cases.  In particular, for a forecast model run
>> collection (fmrc) our code automatically generates the "best
>> timeseries" but doesn't allow access to other things like the run
>> dates.  I could have a go at creating an IOSP, if this is a good way
>> to begin the integration.
>>
>> Cheers, Jon
>>
>



-- 
Dr Jon Blower
Technical Director, Reading e-Science Centre
Environmental Systems Science Centre
University of Reading
Harry Pitt Building, 3 Earley Gate
Reading RG6 6AL. UK
Tel: +44 (0)118 378 5213
Fax: +44 (0)118 378 6413
j.d.blower@xxxxxxxxxxxxx
http://www.nerc-essc.ac.uk/People/Staff/Blower_J.htm


  • 2008 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: