Re: THREDDS/DLESE Connections slides

To: thredds <thredds@xxxxxxxxxxxxxxxx>
Subject: Re: THREDDS/DLESE Connections slides
From: John Caron <caron@xxxxxxxx>
Date: Mon, 17 Dec 2001 12:26:55 -0700



Peter Cornillon wrote:

we expect that data holdings can be divided into two categories. 1) sites in
which the monitoring (eg crawling) can be done occasionally (once a day, once an
hour, once a week?), and the impact of the crawling is therefore minimal. 2)
real-time sites that have constantly changing data. For these, we probably need
a different strategy, and we are considering instrumenting the LDM as one
possible solution.
But in sites that are being continuously updated, it seems to me
that you need a local inventory, a file or some other way ofkeeping track of the contents of a data set. This is our notionof a file server or your configuration file in the AggregationServer. This is the thing that you want to discover when searchingfor data sets, not all of the files (or granules or whatever) inthe data set. This is what we are wrestling with in the crawler that
we are looking at. In particular, I have asked Steve to look at
ways of having the crawler group files into data sets automatically
and then to reference the inventory for the data set rather than
the entire data set and to make the crawler capable of updating
the inventory.



Just to make sure i understand your terminology:

files = physical files
datasets = logical files we want the user to see
inventory = listing of datasets
granule = ??

question:
what does it mean to "group files into data sets"? like the agg server?

Our hope is that the crawler would work locally
building the inventory locally and could be made to run as often
as you like. However, the inventory need not reside at the sitecontaining the actual data and the crawler could be run from a
remote site as our prototype does. The point here is that there
are two types of crawlers generating two types of lists, one
that generates inventories of granules in data sets (generallylocally and can be run as often at you like) and the other generatinginventories of data sets - directories (generally run remotely
less often). Finally, I note that the inventory could be generated
in other ways, for example every time a granule is added to adata set, the inventory could automatically be updated. I really
see the inventory issue as a local process. What is strange is
the number of data sets that we encounter that do not have a
formal inventory and this is what gives rise to this problem.



Some possible terminology clarifications:

We have been using the word "crawler" to mean a process that gets all of itsinformation from the web/DODS server. So it cant see local disk files, but canbe run remotely.

A process that must run locally, and can have access to whatever files exists,we have been calling a "scanner" as in disk scanner.

Generating "inventories of granules in data sets" makes sense in the context ofan agg server, but is there also meaning to it in the context of a normal DODSserver?

Follow-Ups:
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon

References:
- THREDDS/DLESE Connections slides
  - From: John Caron
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon
- Re: THREDDS/DLESE Connections slides
  - From: John Caron
- Re: THREDDS/DLESE Connections slides
  - From: Peter Cornillon