Re: [Fwd: [Fwd: Forecast Model Run Collection Aggregation prototype available]]

To: John Caron <caron@xxxxxxxxxxxxxxxx>
Subject: Re: [Fwd: [Fwd: Forecast Model Run Collection Aggregation prototype available]]
From: "dan.swank" <Dan.Swank@xxxxxxxx>
Date: Wed, 16 Aug 2006 17:21:12 -0400

John Caron wrote the following on 8/16/2006 3:39 PM:
> Hi Dan:
> 
> dan.swank wrote:
> 
>> This will be a challenge for sure.
>> The NARR, for example, will be an aggregation of ~75000 grib files.
>> Stored in a basic ./YYYYMM/YYYYMMDD tree.  The recursive datasetScan
>> tag added recently helps a ton with this.  Some of our datasets have
>> forecast hours, some don't.  Doing n forecast hour aggregation across
>> the 00hr will help termendously with all of them, however.
>> While it works wonderfully for NetCDF, I cannot see the NcML agg.
>> working with this set of data ~
>> mainly due to the changing reference times.
>>  
>>
> I think the FMRC will probably solve it. However, a 75,000 file
> aggregation will be a challenge. Im actually pretty sure we can solve it
> (with enough server memory!) but it does worry me that with a single
> dods call, someone could make a request that requires opening 75,0000
> files to satisfy. OTOH, if thats the service you want to provide, it
> sure is a lot better doing it on the server!!! Any thoughts?

Throttles... If the dev team could create an element to specify
the maximum size of a request in either bytes returned or
 number of files accessed, that would be great.
> 
> Looking at the NARR data:
>  - it looks like you have them divided by day, then all for the same month.
>  - it looks like all the time coordinates are either 0 or 3 hour offsets
> from run time.

The NARR is a reanalysis, as it contains variables
defined at instantaneos initial time,
   or a 0 to 3 hour average/total/ or other operation.

>  - whats the difference bewteen narr_a and narr_b? Should they be
> combined or kept seperate?

The differences are explained here:
http://nomads.ncdc.noaa.gov/data.php?name=narrdiffs

>  - i assume new files are added now and then?  how often? ever deleted?

New NARR comes in from NCEP on an irregular basis.  Typeically,
this is on a once a month or less frequency.  This archive is set to
grow indefinately, the files are never deleted.
> 
>> According to NCEP, our NAM & GFS will soon be foreced into GRIB2.
>> But NCDC-NOMADS NWP it currently entirely a GRIB-1 archive.
>> Only recently home-grown NCDC datasets are created in NetCDF.
>>
>> For NAM & GFS, we have about 6 months online, which comes out to
>> about 700 file when stripped to a 1 forecast time
>> (say 00hr) aggregation.  But there are 61 forecast times for GFS, and 21
>> for NAM.
>>  
>>
> Do you store each hour seperately, or are all the forecast hours for a
> run in the same file?

We store them in a one file per forecast hour, which contains all
parameters and vertical levels for that forecast hour.


-Dan

References:
- Re: [Fwd: [Fwd: Forecast Model Run Collection Aggregation prototype available]]
  - From: John Caron
- Re: [Fwd: [Fwd: Forecast Model Run Collection Aggregation prototype available]]
  - From: dan.swank
- Re: [Fwd: [Fwd: Forecast Model Run Collection Aggregation prototype available]]
  - From: John Caron