NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
All, I’ve been working on some big aggregations that are getting prohibitively expensive to scan. We’ve gone around in circles trying to get the aggregation caches to stick, but no matter what we do, it seems that they simply will not get picked up reliably. In general, I’ve just made sure the data we publish has a fixed time dimension so that scanning a whole bunch of files is cheap. With this dataset, rewriting to a fixed time dimension doesn’t seem to be an option. I’m wondering if someone knows definitively if writing the time coordinate variable of a joinExisting aggregation into the .ncml will get picked up in a way that THREDDS would not have to scan all the files to find the time stamps? ie. <variable name="time" shape="time" type="int"> <attribute name="units" type="String" value=“days since . . . " /> <values>6 18 etc. . .</values> </variable> The dataset I’m talking about is here: http://esgdata1.nccs.nasa.gov/thredds/catalog/bypass/NEX-DCP30/bcsd/catalog.html The top level joined and unioned aggregations take 3-5 minutes to respond. If you go down a level, each of the smaller joinExisting aggregations takes about 2-3 seconds to respond. There’s something like 93 joinExistings, so the time adds up that it’s scanning all the files to create the big union. The tests I’ve done haven’t given me the answer I want, but I’m not able to get ahold of THAT much of this data for testing since it is so spatially massive. Thanks for any help you can provide. - Dave
thredds
archives: