Re: [netcdf-java] Dataset aggregation by globbing

To: "John Caron" <caron@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdf-java] Dataset aggregation by globbing
From: "Jon Blower" <jdb@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 25 Sep 2008 15:02:30 +0100

Hi John,

> My first reaction is to roll your code into the ncml package, and have 
> openGlob() write Ncml and then process it, and then enrich Ncml
> processing, so that that functionality is also available in a more explicit 
> way. But id have to look more at what youve done.

I like this idea.  I'm having to write quite a bit of code to do the
aggregation and I'd like to get rid of this and use the NcML stuff
instead.  I wonder if there's a way to detect whether the glob
expansion results in a FMRC or just a simple aggregation along the
time dimension?

> BTW, how do you identify the time coord vs forecast time? are you assuming CF 
> conventions?

Each file matching the glob pattern gets read.  For each variable in
each file I look at list of Dates (there is usually just one dimension
in each file) using CoordinateAxis1DTime.getTimeDates().  I maintain a
hash in which the Date is the key and the value is a class containing
the filename, variable name and the time index in the file that
corresponds with this Date.  If the Date already exists as a key, I
pick the Date that is associated with the smaller time index (which I
assume is the shortest forecast time).  Hence I end up with a hash of
unique Dates, each of which maps to the file, variable and t index
with the shortest forecast time - i.e. the "best timeseries".

Cheers, Jon

On Thu, Sep 25, 2008 at 2:29 AM, John Caron <caron@xxxxxxxxxxxxxxxx> wrote:
> Hi Jon:
>
> This sounds like a nice idea, which we chatted about briefly at GO-ESSP.
>
> I could see an API like NetcdfDataset.openGlob(glob, type). The tricky part 
> is to define what it should do in various cases. Im guessing you are handling 
> the common cases, which is quite nice to make so easy. the optional type is 
> to give a hint for doing something different, but im not yet sure whats 
> needed.
>
> My first reaction is to roll your code into the ncml package, and have 
> openGlob() write Ncml and then process it, and then enrich Ncml processing, 
> so that that functionality is also available in a more explicit way. But id 
> have to look more at what youve done.
>
> I dont think its an IOSP, but perhaps you have another way of looking at it.
>
> My fmrc implementation seriously needs refactoring, so id like to see what 
> youve done there also. BTW, how do you identify the time coord vs forecast 
> time? are you assuming CF conventions?
>
> Jon Blower wrote:
>> Dear all (esp. John),
>>
>> We use NcML a lot for both file aggregation and for "fixing" metadata
>> problems in underlying NetCDF files - it's a great technology.
>> However, it can be an inconvenience to create an NcML file simply to
>> aggregate "well-behaved" files.  In our ncWMS we allow users to
>> specify a group of files using glob expressions, e.g. "/path/to/*.nc"
>> or even more complex things like "/path/to/200?/*/foo.nc".  This
>> simply unions the matching files together along the time axis.  It
>> allows files to contain different combinations of variables.
>> Internally, the system creates some kind of hash map, so that when a
>> user requests a particular variable at a particular time, the
>> aggregation knows which actual file, and which time index within the
>> file, is being requested.
>>
>> We have found this to be very useful.  I wonder if it would be a good
>> idea to integrate this capability into the NetCDF-Java libraries so
>> that users can open an aggregation by running
>> NetcdfDataset.openDataset("/path/to/*.nc") or similar?  What do others
>> think?
>>
>> Our code is available for stealing, but it might need some work to
>> satisfy more use cases.  In particular, for a forecast model run
>> collection (fmrc) our code automatically generates the "best
>> timeseries" but doesn't allow access to other things like the run
>> dates.  I could have a go at creating an IOSP, if this is a good way
>> to begin the integration.
>>
>> Cheers, Jon
>>
>



-- 
Dr Jon Blower
Technical Director, Reading e-Science Centre
Environmental Systems Science Centre
University of Reading
Harry Pitt Building, 3 Earley Gate
Reading RG6 6AL. UK
Tel: +44 (0)118 378 5213
Fax: +44 (0)118 378 6413
j.d.blower@xxxxxxxxxxxxx
http://www.nerc-essc.ac.uk/People/Staff/Blower_J.htm

References:
- [netcdf-java] Dataset aggregation by globbing
  - From: Jon Blower
- Re: [netcdf-java] Dataset aggregation by globbing
  - From: John Caron

2008 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: