[thredds] aggregating on both time and time_run dimensions?

To: THREDDS Users <thredds@xxxxxxxxxxxxxxxx>
Subject: [thredds] aggregating on both time and time_run dimensions?
From: John Maurer <jmaurer@xxxxxxxxxx>
Date: Mon, 11 Aug 2025 12:39:27 -1000

Hi TDS folks,
We have a new use case for aggregating our FMRC collections, but I'm having
difficulty implementing it. To save space, the files are now daily files,
rather than multi-day files. This avoids repeating the same day across
multiple files and drastically cuts down on storage requirements over the
long term. Rather, data for the same day will overwrite any previous file
for the same day. What results is essentially a "Best Time Series"
(now-casts) with only the latest handful of files containing forecasts of
future days.

Inside the files, we are storing both "time" and "time_run" coordinate
variables so that an end user will know when the model was run for each
timestep. Since the runtime is no longer in the filenames (the date in the
filenames indicates the day of the time steps), I am not employing the
traditional FMRC aggregations via featureCollection. Thus, I'm trying to
figure out how to do an NcML aggregation on a file scan that can aggregate
over both the files' "time" and "time_run" coordinate variables to achieve
an FMRC-like effect.

I know that nested NcML aggregations are possible, but I don't know how
they might be used to aggregate over two time variables. Is there a way?

If I do this, then *time_run* (the outer aggregation) only gets the
penultimate file's values:

        <aggregation dimName="time_run" type="joinExisting">
          <netcdf>
            <aggregation dimName="time" type="joinExisting">
              <scan location="/path/to/model/data/" suffix=".nc"
subdirs="true" olderThan="5 min" />
            </aggregation>
          </netcdf>
        </aggregation>

And if I do this, then *time* (the outer aggregation) only gets the
penultimate file's values:

        <aggregation dimName="time" type="joinExisting">
          <netcdf>
            <aggregation dimName="time_run" type="joinExisting">
              <scan location="/path/to/model/data/" suffix=".nc"
subdirs="true" olderThan="5 min" />
            </aggregation>
          </netcdf>
        </aggregation>

Any ideas or suggestions on how this can be accomplished? As a fallback, I
might have to use a more brute-force approach and tack on runtimes into the
filenames (e.g., model_20250812_20250801.nc) where the second date
indicates the time_run coordinate. But then it's no longer a simple
overwrite of model_20250812.nc, and I need to remove the prior day's
runtime (e.g., model_20250812_20250731.nc) when saving a new runtime.

Many thanks!,
John Maurer
Data System Engineer
Pacific Islands Ocean Observing System (PacIOOS)
University of Hawaii at Manoa

Follow-Ups:
- Re: [thredds] aggregating on both time and time_run dimensions?
  - From: Antonio S. Cofiño
- Re: [thredds] aggregating on both time and time_run dimensions?
  - From: John Maurer