orthogonality (was Re: New attempt)

To: Benno Blumenthal <benno@xxxxxxxxxxxxxxxx>,
Subject: orthogonality (was Re: New attempt)
From: Joe Wielgosz <joew@xxxxxxxxxxxxx>
Date: Tue, 04 Jun 2002 14:45:19 -0700

Benno, John,

As I am currently immersed in Web-service-think (particularly WSDL,which seems to basically be a more generalized version of what THREDDScatalog is attempting for scientific data services) I would propose thatthe principle of orthogonality might be a useful tool for deciding onthese issues.

For an XML design, orthogonality means whether or not a given tag orattribute represents a distinct concept that is in no case expressibleusing existing tags and attributes.

While a completely orthogonal tag set results in somewhat less succinctdocuments than an approach which defines a number of "special case"tags, the advantage is that it yields the maximum ratio ofexpressiveness to schema complexity.

WSDL is an extreme example of this approach. Unlike them, we may want todefine special-case tags for very common cases to make theactual documents less unwieldly.

The purpose of this message is not to suggest exactly which cases thesemight be; rather, I am just suggesting we take a look the current andproposed DTD down from this perspective.


-------------------

A summary of the completely orthogonal concepts I believe we haveintroduced thus far (not necessarily named the same as the tags thatcurrently express the concept):


service type - a named mechanism for accessing scientific data

access - a (named?) binding of a URI to a specific service type

metadata type - a named convention for description of scientific data

metadata - a (named?) binding of a text fragment to a specific metadata type

metadata reference - a (named?) binding of a URI to a specific metadata type

dataset - a named collection of access objects and metadata

collection - a named collection of datasets

collection reference - the URI of a THREDDS XML document containing acollection


----------------

In contrast, the following concepts are not clearly orthogonal to me:

service path, server path, collection path, dataset path, suffix - sincethey are only used in the context of the access object, they don'treally add meaning - any catalog using these attributes is equivalent toone without them, which uses absolute uri's for all of its access objects

compound service / service list - this also doesn't strictly speakingadd meaning since services are only used in the context of accessobjects - thus, one access with a compound service type is functionallyequivalent to n access objects with simple service types.

service subtype - unless the values for this attribute are givenstandard meanings, this is equivalent to a named access object. even ifthe values do have standard meanings, there still seems to be someoverlap with metadata type.

catalog, server - if you factor out the path attribute, these areequivalent to collections


documentation - equivalent to metadata with a human-readable metadata type

document - same as documentation, except with the connotation that it isnot critical to interpreting the dataset


--------------------

One that I am not sure about is the "attribute" tag, since I am notclear on how this is intended to be used. Is it for the THREDDS parser,or passed directly to the user? Will there be standardized names andvalues for attributes?

A reminder, I am not trying to say specifically whether any of thesetags should be kept or dropped. I am merely suggesting that we mightwant to focus on tags and attributes that represent orthogonal concepts,and be a bit more choosy about the rest.

Also, I would suggest that any proposed extensions that *are* genuinelyorthogonal to the original tag set (although I'm not sure we've had anythus far) be given special consideration, since by definition, there isno workaround if they are not included.


John, hope this is useful input.

- Joe



Benno Blumenthal wrote:

John Caron wrote:
Im trying to think what is the meaning of serviceType="Catalog" in
    general?
    What should the client assume? It seems that if you want the client
    to be
    able to get the collection as a dataset, then you add a dataset
    element. If
    you want the client to "drill down" further, then a collection or
    catalogRef
    element can do that. What I have removed is the clear association
    between
    the dataset element and the collection, eg that these are the same
    thing. I
    have also made it more cumbersome(need two elements). I agree these are
    weaknesses.
The client does not know that these are two different ways of looking atthe same thing -- the key piece of information that was trying to beconveyed. The client does not have to present both -- maybe the clientonly presents THREDDS choices because it has no DODS capabilities,another client does not present the drill-down because it does haveDODS capabilities.
The fact that you can produce a dataset as COARDS vs DIF, etc is
    also for me
    not so great of an example. Rather than modifying the underlying
    data acess
    (eg DODS), it seems simpler to add a metadata element. I admit that
    this is
    just an idea which has not been done yet. And you already have a
    server that
    does in fact modify the data access. But think of it from a client POV.
    Should she search through the services looking for a service of type
    DODS,
    subtype COARDS? Or search through the metadata looking for COARDS
    metadata,
    independent of service type?
My point was your metadata tag was services for metadata. clients thatcan only handle COARDS metadata would ask for COARDS metadata services.
    A more compelling example would be where the dataset is served up
    through
    FTP and DODS, and ADDE, etc. But then I wonder/doubt whether one URL is
    likely to be able to be used for all these services.
DODS already has multiple services -- ascii is not necessarily present,some of the selection interfaces are optional, metadata is optional.
>> I am also concerned about XYZT clients (4D world view) -- howcan I
    protect
     > them against higher (and other) dimensional data (ensemble member
    count,
     > spectral, different kinds of time (forecast start, lead, target
    time)?   I
     > could convert to multiple datasets, or spatial grids, but it
    would be nice
    to
     > advertise the service.  As well as supporting various binary and
    ascii
    data
     > formats.  Or the THREDDS dataset (as opposed to collection/catalog)
     > description...

    I am not clear of "protect", did you mean "project" ?
Protect -- 4D world clients simply fail when given something else, Iwould like them to have an alternative.
> 2) The access for the dataset LEVITUS94 is again via THREDDS
    (the present
     > collection) or via DODS (the access statement).  Adding another
    dataset
    inside
     > the collection called "Daily" is not the same meaning at all.

    Sorry, I should have had:

    <dataset name="LEVITUS94 dataset" urlPath="SOURCES/.LEVITUS94/dods"/>
    <catalogRef xlink:title="Drill down into dataset"
    xlink:href="http://iridl.ldeo.columbia.edu/SOURCES/LEVITUS94/thredds.xml";
    />

    In this case the dataset is presented to the user for immediate
    selection
    AND a link is presented for drilling further down.
This is the wrong example: LEVITUS94 was a collection that was alsoavailable through DODS -- now you have lost it completely.It is an example for the subdatasets ANNUAL, etc, with the flaw thatclients no longer can tell that these are two different ways of gettingthe same thing as mentioned above.
Besides, multiple services also will show up in aggregations ofTHREDDS catalogs -- multiple servers serving the same dataset could berepresented as a single entry with multiple services -- in this case,services with identical attributes except for path information.
The main thing service does is to let you specify a type and factor
    out the
    common URL base. then this is passed to "protocol aware" code.
    Because the
    das,dds,dods,info,ascii subservice URLs are always regular in how
    they are
    formed, it seems unnecessary to actually specify them. In principle
    subservices are probably useful but some concrete examples are needed.
As I mentioned earlier, not all DODS servers have all the services.Even if they did, it would not hurt to be able to list them.
     While I have not given examples, different datasets will have
    different
> services, which is why I kept specifying using the access tag.Some
    datasets
     > will be incompatible with certain representations, so the service
    lists
    will
     > vary.   One could argue that the THREDDS standard collection is a
    dataset
    not
     > available via DODS -- certainly that is the case for me.

    I understand you want to compactly specify what services are
    available for
    datasets. Im not sure we have enough examples to make sure we are
    doing it
    right. I am also oriented towards incremental design, doing what we
    can get
    right and iterating.
OK with me, but we have lost the structure I was trying to express --alternate ways of accessing the same object, with emphasis on the sameobject.
Benno

--
Dr. M. Benno Blumenthal          benno@xxxxxxxxxxxxxxxx
International Research Institute for climate prediction
Lamont-Doherty Earth Observatory of Columbia University
Palisades NY 10964-8000                  (845) 680-4450



--
Joe Wielgosz
joew@xxxxxxxxxxxxx / (707)826-2631
---------------------------------------------------
Center for Ocean-Land-Atmosphere Studies (COLA)
Institute for Global Environment and Society (IGES)
http://www.iges.org

Follow-Ups:
- orthogonality (was Re: New attempt)
  - From: Joe Wielgosz

References:
- New attempt
  - From: Benno Blumenthal
- New attempt
  - From: Benno Blumenthal
- Re: New attempt
  - From: Benno Blumenthal
- Re: New attempt
  - From: Benno Blumenthal
- Re: New attempt
  - From: Benno Blumenthal