[thredds] Workaround for scanning large unlimited dimensions.

To: <thredds@xxxxxxxxxxxxxxxx>
Subject: [thredds] Workaround for scanning large unlimited dimensions.
From: David Blodgett <dblodgett@xxxxxxxx>
Date: Mon, 16 Dec 2013 17:47:35 -0600

All,

I’ve been working on some big aggregations that are getting prohibitively 
expensive to scan. We’ve gone around in circles trying to get the aggregation 
caches to stick, but no matter what we do, it seems that they simply will not 
get picked up reliably. 

In general, I’ve just made sure the data we publish has a fixed time dimension 
so that scanning a whole bunch of files is cheap. With this dataset, rewriting 
to a fixed time dimension doesn’t seem to be an option. 

I’m wondering if someone knows definitively if writing the time coordinate 
variable of a joinExisting aggregation into the .ncml will get picked up in a 
way that THREDDS would not have to scan all the files to find the time stamps?

ie. 
<variable name="time" shape="time" type="int">
  <attribute name="units" type="String" value=“days since . . . " />
  <values>6 18 etc. . .</values>
</variable>

The dataset I’m talking about is here: 
http://esgdata1.nccs.nasa.gov/thredds/catalog/bypass/NEX-DCP30/bcsd/catalog.html

The top level joined and unioned aggregations take 3-5 minutes to respond. If 
you go down a level, each of the smaller joinExisting aggregations takes about 
2-3 seconds to respond. There’s something like 93 joinExistings, so the time 
adds up that it’s scanning all the files to create the big union.

The tests I’ve done haven’t given me the answer I want, but I’m not able to get 
ahold of THAT much of this data for testing since it is so spatially massive.

Thanks for any help you can provide.

- Dave