NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi all, I'm setting up a geospatial data and metadata portal based on thredds catalog and the Geonetwork engine and web application. I am working on Linux CentOS and my applications are deployed with Tomcat8. I am populating a thredds catalog based on a filesystem containing meteorological data. Geonetwork then harvests the catalog and populates the application. However, and given that I'm updating the data on the web side, I would like to harvest only once the data. I tried to set the 'harvest' attribute from the catalog, but without success. Here's an excerpt of my catalog.xml file: <datasetScan name="AUXILIARY" ID="testAUXILIARY" path="AUXILIARY" location="content/testdata/auxiliary-aux" harvest="true"> <metadata inherited="true"> <serviceName>all</serviceName> <dataType>Grid</dataType> <dataFormatType>NetCDF</dataFormatType> <DatasetType harvest="true"></DatasetType> <harvest>true</harvest> <keyword>WRF outputs</keyword> <documentation type="summary">This is a summary for my test ARPA catalog for WRF runs. Runs are made at 12Z and 00Z, with analysis an d forecasts every 6 hours out to 60 hours. Horizontal = 93 by 65 points, resolution 81.27 km, LambertConformal projection. Vertical = 1000 to 100 hPa pressure levels.</documentation> <timeCoverage> <end>present</end> <duration>5 years</duration> </timeCoverage> <variables vocabulary="GRIB-1" /> <variables vocabulary=""> <variable name="Z_sfc" vocabulary_name="Geopotential H" units="gp m">Geopotential height, gpm</variable> </variables> </metadata> <filter> <include wildcard="*wrfout_*"/> </filter> <addDatasetSize/> <addTimeCoverage datasetNameMatchPattern="([0-9]{2})_([0-9]{4})-([0-9]{2})-([0-9]{2})_([0-9]{2}):([0-9]{2}):([0-9]{2})" startTimeSubstitutionPattern="$2-$3-$4T$5:00:00" duration="6 hours" /> <namer> <regExpOnName regExp="([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})" replaceString="WRF $1-$2-$3T$4:00:00" /> <regExpOnName regExp="([0-9]{2})_([0-9]{4})-([0-9]{2})-([0-9]{2})_([0-9]{2}):([0-9]{2}):([0-9]{2})" replaceString="WRF Domain-$1 $2-$3-$4T$5:00:00" /> </namer> </datasetScan> Even if I set the harvest="true" attribute, it is not inherited by the datasets and thus the harvester does not get the files. I can also ignore the 'harvest' attribute while harvesting, but my aim is to harvest only new files using an auxiliary catalog that contains symbolic links (and updating the Thredds path after harvesting). Am I missing something? How would you sistematically add the harvest attribute to all inner datasets in a nested filesystem? Or, would it make sense to create two catalogs using the time filter options (ex. all up to yesterday in one catalog, and today's files in another)? Can you show me an example of usage of those filters in a datasetScan? Many thanks, Chiara -- Chiara Scaini
thredds
archives: