NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [thredds] aggregating on both time and time_run dimensions?

Nevermind, I think I have solved this. I was overthinking it. There's no
need to make time_run its own coordinate variable (i.e., dimension). (Even
though that's how FMRC does it.) Instead, I just define a time_run variable
that uses the existing time dimension, like any other time series variable.
That way the aggregation works and a user can get a time_run value for
every time step.
Cheers,
John


On Mon, Aug 11, 2025 at 12:39 PM John Maurer <jmaurer@xxxxxxxxxx> wrote:

> Hi TDS folks,
> We have a new use case for aggregating our FMRC collections, but I'm
> having difficulty implementing it. To save space, the files are now daily
> files, rather than multi-day files. This avoids repeating the same day
> across multiple files and drastically cuts down on storage requirements
> over the long term. Rather, data for the same day will overwrite any
> previous file for the same day. What results is essentially a "Best Time
> Series" (now-casts) with only the latest handful of files containing
> forecasts of future days.
>
> Inside the files, we are storing both "time" and "time_run" coordinate
> variables so that an end user will know when the model was run for each
> timestep. Since the runtime is no longer in the filenames (the date in the
> filenames indicates the day of the time steps), I am not employing the
> traditional FMRC aggregations via featureCollection. Thus, I'm trying to
> figure out how to do an NcML aggregation on a file scan that can aggregate
> over both the files' "time" and "time_run" coordinate variables to achieve
> an FMRC-like effect.
>
> I know that nested NcML aggregations are possible, but I don't know how
> they might be used to aggregate over two time variables. Is there a way?
>
> If I do this, then *time_run* (the outer aggregation) only gets the
> penultimate file's values:
>
>         <aggregation dimName="time_run" type="joinExisting">
>           <netcdf>
>             <aggregation dimName="time" type="joinExisting">
>               <scan location="/path/to/model/data/" suffix=".nc"
> subdirs="true" olderThan="5 min" />
>             </aggregation>
>           </netcdf>
>         </aggregation>
>
> And if I do this, then *time* (the outer aggregation) only gets the
> penultimate file's values:
>
>         <aggregation dimName="time" type="joinExisting">
>           <netcdf>
>             <aggregation dimName="time_run" type="joinExisting">
>               <scan location="/path/to/model/data/" suffix=".nc"
> subdirs="true" olderThan="5 min" />
>             </aggregation>
>           </netcdf>
>         </aggregation>
>
> Any ideas or suggestions on how this can be accomplished? As a fallback, I
> might have to use a more brute-force approach and tack on runtimes into the
> filenames (e.g., model_20250812_20250801.nc) where the second date
> indicates the time_run coordinate. But then it's no longer a simple
> overwrite of model_20250812.nc, and I need to remove the prior day's
> runtime (e.g., model_20250812_20250731.nc) when saving a new runtime.
>
> Many thanks!,
> John Maurer
> Data System Engineer
> Pacific Islands Ocean Observing System (PacIOOS)
> University of Hawaii at Manoa
>
  • 2025 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: