NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

[thredds] 4.6.x NcML Aggregation Cache Generation

Hello,

I noticed that the way NcML aggregation cache xml files are created has changed in version 4.6.x. In previous versions, the cache xml file contained lines similar to:

  <netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc' ncoords='1' 
>
    <cache varName='ocean_time' >1.191888E8 </cache>
  </netcdf>

from the start. With large datasets, this took a while (30 minutes plus and sometimes crashing TDS) to generate the first time the dataset was accessed, but subsequent accesses were much faster. The new way more quickly generates the NcML cache without the cached joinExisting values:

  <netcdf id='/data/roms/espresso/2009_da/avg/espresso_avg_1379.nc' ncoords='1' 
>
  </netcdf>

and fills in the "<cache varName='ocean_time' >1.191888E8 </cache>" lines as data from the corresponding file is requested. A side effect, in my case at least, is that even requests for small amounts of data are relatively slow. Presumably, this will be the case until all ocean_time cache values are filled in. Once all values were cached, response times dropped significantly: from 15s to less than 1s in my very limited tests (~1600 files spanning 19,146 time records).

For anyone experiencing the same side effect, you can populate the whole aggregation cache xml file with the <cache> lines by requesting all records of the joinExisting variable (or successive chunks for very large datasets) as a workaround.

I can certainly see the reasoning and benefits to the new way of caching but want to point out possible side effects and workarounds. Another workaround could be to use a combination of Python/Perl and NCO to generate the cache file (complete with cached joinExisting values) offline.

Dave