NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [thredds] aggregating on both time and time_run dimensions?

Hi John,

Just to clarify before suggesting a solution, could you share the CDL (or NcML) structure of one of your new daily files? In particular, I’d like to confirm whether each file contains only a single time_run value and multiple time values for that same day.

Thanks,
Antonio

On 12/8/25 0:39, John Maurer wrote:
Hi TDS folks,
We have a new use case for aggregating our FMRC collections, but I'm having difficulty implementing it. To save space, the files are now daily files, rather than multi-day files. This avoids repeating the same day across multiple files and drastically cuts down on storage requirements over the long term. Rather, data for the same day will overwrite any previous file for the same day. What results is essentially a "Best Time Series" (now-casts) with only the latest handful of files containing forecasts of future days.

Inside the files, we are storing both "time" and "time_run" coordinate variables so that an end user will know when the model was run for each timestep. Since the runtime is no longer in the filenames (the date in the filenames indicates the day of the time steps), I am not employing the traditional FMRC aggregations via featureCollection. Thus, I'm trying to figure out how to do an NcML aggregation on a file scan that can aggregate over both the files' "time" and "time_run" coordinate variables to achieve an FMRC-like effect.

I know that nested NcML aggregations are possible, but I don't know how they might be used to aggregate over two time variables. Is there a way?

If I do this, then *time_run* (the outer aggregation) only gets the penultimate file's values:

        <aggregation dimName="time_run" type="joinExisting">
          <netcdf>
            <aggregation dimName="time" type="joinExisting">
              <scan location="/path/to/model/data/" suffix=".nc" subdirs="true" olderThan="5 min" />
            </aggregation>
          </netcdf>
        </aggregation>

And if I do this, then *time* (the outer aggregation) only gets the penultimate file's values:

        <aggregation dimName="time" type="joinExisting">
          <netcdf>
            <aggregation dimName="time_run" type="joinExisting">
              <scan location="/path/to/model/data/" suffix=".nc" subdirs="true" olderThan="5 min" />
            </aggregation>
          </netcdf>
        </aggregation>

Any ideas or suggestions on how this can be accomplished? As a fallback, I might have to use a more brute-force approach and tack on runtimes into the filenames (e.g., model_20250812_20250801.nc <http://model_20250812_20250801.nc>) where the second date indicates the time_run coordinate. But then it's no longer a simple overwrite of model_20250812.nc <http://model_20250812.nc>, and I need to remove the prior day's runtime (e.g., model_20250812_20250731.nc <http://model_20250812_20250731.nc>) when saving a new runtime.

Many thanks!,
John Maurer
Data System Engineer
Pacific Islands Ocean Observing System (PacIOOS)
University of Hawaii at Manoa

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
To subscribe:thredds-join@xxxxxxxxxxxxxxxx
To unsubscribe:thredds-leave@xxxxxxxxxxxxxxxx 
  • 2025 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: