Re: [thredds] Set harvest attribute using datasetScan

To: Chiara Scaini <saetachiara@xxxxxxxxx>, thredds@xxxxxxxxxxxxxxxx
Subject: Re: [thredds] Set harvest attribute using datasetScan
From: Antonio S. Cofiño <cofinoa@xxxxxxxxx>
Date: Wed, 4 Jul 2018 19:46:35 +0200

Hi Chiara,

I'm answering inline.



On 04/07/18 18:23, Chiara Scaini wrote:

Hi all, I'm setting up a geospatial data and metadata portal based onthredds catalog and the Geonetwork engine and web application. I amworking on Linux CentOS and my applications are deployed with Tomcat8.

Which TDS version are you using?

I am populating a thredds catalog based on a filesystem containingmeteorological data. Geonetwork then harvests the catalog andpopulates the application. However, and given that I'm updating thedata on the web side, I would like to harvest only once the data.
I tried to set the 'harvest' attribute from the catalog, but withoutsuccess. Here's an excerpt of my catalog.xml file:

The "harvest" it's been only defined as attribute for dataset (anddatasetScan) elements, but IMO it's no the purpose you are looking for(see [1])

  <datasetScan name="AUXILIARY" ID="testAUXILIARY"
path="AUXILIARY"location="content/testdata/auxiliary-aux" harvest="true">

This harvest is correct.

    <metadata inherited="true">
      <serviceName>all</serviceName>
      <dataType>Grid</dataType>
      <dataFormatType>NetCDF</dataFormatType>
        <DatasetType harvest="true"></DatasetType>
        <harvest>true</harvest>

This hrvest it's not defined in the THREDDS Client Catalog Specification(see [1])

      <keyword>WRF outputs</keyword>
<documentation type="summary">This is a summary for my testARPA catalog for WRF runs. Runs are made at 12Z and 00Z, with analysis an d forecasts every 6 hours out to 60 hours. Horizontal = 93 by65 points, resolution 81.27 km, LambertConformal projection. Vertical= 1000 to
         100 hPa pressure levels.</documentation>
       <timeCoverage>
         <end>present</end>
         <duration>5 years</duration>
       </timeCoverage>
       <variables vocabulary="GRIB-1" />
       <variables vocabulary="">
<variable name="Z_sfc" vocabulary_name="Geopotential H"units="gp m">Geopotential height, gpm</variable>
       </variables>
    </metadata>

    <filter>
      <include wildcard="*wrfout_*"/>
    </filter>

How files are distributed on disk? they are under directories? If yesthe you need to add a include filter with the collectionattribute="true" (see [2] and [3])

    <addDatasetSize/>
<addTimeCoveragedatasetNameMatchPattern="([0-9]{2})_([0-9]{4})-([0-9]{2})-([0-9]{2})_([0-9]{2}):([0-9]{2}):([0-9]{2})"
           startTimeSubstitutionPattern="$2-$3-$4T$5:00:00"
                  duration="6 hours" />

    <namer>
<regExpOnName regExp="([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})"replaceString="WRF $1-$2-$3T$4:00:00" /> <regExpOnNameregExp="([0-9]{2})_([0-9]{4})-([0-9]{2})-([0-9]{2})_([0-9]{2}):([0-9]{2}):([0-9]{2})"replaceString="WRF Domain-$1 $2-$3-$4T$5:00:00" />
    </namer>

  </datasetScan>
Even if I set the harvest="true" attribute, it is not inherited by thedatasets and thus the harvester does not get the files. I can alsoignore the 'harvest' attribute while harvesting, but my aim is toharvest only new files using an auxiliary catalog that containssymbolic links (and updating the Thredds path after harvesting).
Am I missing something? How would you sistematically add the harvestattribute to all inner datasets in a nested filesystem? Or, would itmake sense to create two catalogs using the time filter options (ex.all up to yesterday in one catalog, and today's files in another)? Canyou show me an example of usage of those filters in a datasetScan?
Many thanks,
Chiara


How this helps
Regards

Antonio

[1]https://www.unidata.ucar.edu/software/thredds/current/tds/catalog/InvCatalogSpec.html#dataset[2]https://www.unidata.ucar.edu/software/thredds/current/tds/catalog/InvCatalogServerSpec.html#datasetScan_Element#filter_Element[3]https://www.unidata.ucar.edu/software/thredds/current/tds/reference/DatasetScan.html#Including_Only_the_Desired_Files


--
Antonio S. Cofiño
Dep. de Matemática Aplicada y
        Ciencias de la Computación
Universidad de Cantabria
http://www.meteo.unican.es



--
Chiara Scaini


_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/

References:
- [thredds] Set harvest attribute using datasetScan
  - From: Chiara Scaini