Re: [thredds] aggregating on both time and time_run dimensions?

To: thredds@xxxxxxxxxxxxxxxx
Subject: Re: [thredds] aggregating on both time and time_run dimensions?
From: Antonio S. Cofiño <cofinoa@xxxxxxxxx>
Date: Tue, 12 Aug 2025 11:45:46 +0200

Hi John,

Just to clarify before suggesting a solution, could you share the CDL(or NcML) structure of one of your new daily files? In particular, I’dlike to confirm whether each file contains only a single time_run valueand multiple time values for that same day.


Thanks,
Antonio

On 12/8/25 0:39, John Maurer wrote:

Hi TDS folks,
We have a new use case for aggregating our FMRC collections, but I'mhaving difficulty implementing it. To save space, the files are nowdaily files, rather than multi-day files. This avoids repeating thesame day across multiple files and drastically cuts down on storagerequirements over the long term. Rather, data for the same day willoverwrite any previous file for the same day. What results isessentially a "Best Time Series" (now-casts) with only the latesthandful of files containing forecasts of future days.
Inside the files, we are storing both "time" and "time_run" coordinatevariables so that an end user will know when the model was run foreach timestep. Since the runtime is no longer in the filenames (thedate in the filenames indicates the day of the time steps), I am notemploying the traditional FMRC aggregations via featureCollection.Thus, I'm trying to figure out how to do an NcML aggregation on a filescan that can aggregate over both the files' "time" and "time_run"coordinate variables to achieve an FMRC-like effect.
I know that nested NcML aggregations are possible, but I don't knowhow they might be used to aggregate over two time variables. Is therea way?
If I do this, then *time_run* (the outer aggregation) only gets thepenultimate file's values:
        <aggregation dimName="time_run" type="joinExisting">
          <netcdf>
            <aggregation dimName="time" type="joinExisting">
<scan location="/path/to/model/data/" suffix=".nc"subdirs="true" olderThan="5 min" />
            </aggregation>
          </netcdf>
        </aggregation>
And if I do this, then *time* (the outer aggregation) only gets thepenultimate file's values:
        <aggregation dimName="time" type="joinExisting">
          <netcdf>
            <aggregation dimName="time_run" type="joinExisting">
<scan location="/path/to/model/data/" suffix=".nc"subdirs="true" olderThan="5 min" />
            </aggregation>
          </netcdf>
        </aggregation>
Any ideas or suggestions on how this can be accomplished? As afallback, I might have to use a more brute-force approach and tack onruntimes into the filenames (e.g., model_20250812_20250801.nc<http://model_20250812_20250801.nc>) where the second date indicatesthe time_run coordinate. But then it's no longer a simple overwrite ofmodel_20250812.nc <http://model_20250812.nc>, and I need to remove theprior day's runtime (e.g., model_20250812_20250731.nc<http://model_20250812_20250731.nc>) when saving a new runtime.
Many thanks!,
John Maurer
Data System Engineer
Pacific Islands Ocean Observing System (PacIOOS)
University of Hawaii at Manoa

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
To subscribe:thredds-join@xxxxxxxxxxxxxxxx
To unsubscribe:thredds-leave@xxxxxxxxxxxxxxxx 

References:
- [thredds] aggregating on both time and time_run dimensions?
  - From: John Maurer