NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Wenli:You're right that dynamic catalog generation is (sort of) the problem.
As a practical matter, its not hard to figure out how often the datasets come in, and use that as a heuristic on how often to crawl. In principle, "Last Modified" HTTP header could tell you if the catalog has been modified, or perhaps "Expires" is better. The problem is that the server doesn't actually know what that should be, but perhaps we can figure out a way to add that. This would only be approximate, but I assume that would be good enough for your purposes? Realtime data is challenging. For gridded data, the granularity is large enough that "Expires" is probably useful. For datasets like radar and surface obs, the answer is always "yes, it changed since last time you asked". John Wenli Yang wrote:
Hello,We are doing a project on ingesting THREDDS catalogs to OGC catalogs (Catalog Service for Web, or CSW). We find that we have to go through an entire THREDDS catalog to update an ingested CSW server, because we don't know if the THREDDS catalog has been modified before exhaust it.There is a "date" element in the threddsMetadataGroup. The element can be used to identify the modified (or created, valid, issued, available, etc)date/time of a individual and/or collection dataset. This element is very useful not only at individual dataset level but also at data collection level. For example, suppose a data collection A contains another collection AA which contains another collection AAA which contains datasets a,b,c,and d (i.e., A>AA>AAA>a,b,c,d). If the "modified" date stamp is applied to all the dataset nodes, individual as well as collection, a returned user would not need to follow the complete path to find out if a new dataset is added/modified/etc in data collection AAA and/or anotherother collections in the hierarchy.However, it seems that this "date" element is not used widely, if any, at the data collection level. In fact, I randomly browsed some of the data paths in Unidata's motherlode catalog ( http://motherlode.ucar.edu:8080/thredds/catalog.html)and didn't find any "Last Modified" information until I got to the final dataset level.I guess that the reason THREDDS catalog does not show modified date/time at collection level is that the catalog is not automatically updated when a new dataset is inserted into the database/file system connected to the catalog. Once a user browses down to the catalog, the server will scan the immediate child nodes to get all the available datasets/data collections. Thus, a user browsing down the hierarchy will always be presented the most currently available datasets although the catalog does not update itself upon new datasets being inserted. The disadvantage of the approach is that a user always needs to go to the bottom level to find out if any new datasets has been inserted. Similarly, in order to update our CSW catalog, our THREDDStoCSW ingestor will have to scan through an entire THREDDS catalog, which can be very large, such as the Unidata catalog.Any comments/suggestions will be highly appreciated. Wenli Yang George Mosaon University
============================================================================== To unsubscribe thredds, visit: http://www.unidata.ucar.edu/mailing-list-delete-form.html ==============================================================================
thredds
archives: