Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives

NOTE: The galeon mailing list is no longer active. The list archives are made available for historical reasons.

To: John Graybeal <graybeal@xxxxxxxxx>
Subject: Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
From: Gerry Creager <gerry.creager@xxxxxxxx>
Date: Fri, 21 Aug 2009 14:42:11 -0500

Howdy, John,

John Graybeal wrote:

I'm in the odd position of agreeing in principal with several writers(keep metadata with data, support non-networked computing, the valuesare more than the numbers), and then disagreeing with many details. Afew examples are below.


Yeah, in a lot of ways, so am I.

On reading Steve Hankin's post, though, I must ask: What exactly isbeing proposed? A binary data format for files? A set of such binarydata formats? Or a protocol for exchanging information? Is this simplya recapture of 'everything netCDF and CF' so that OGC can put a stamp ofapproval on it?

In a way, yes, this is what is proposed. THEN there's a formal way toadd, extend and improve CF-NetCDF within a known framework.

Ben wrote "This approach will result in a binary encoding which can beused with different access protocols, e.g., WFS or SOS as well as WCS."I don't really know what it means to 'use a binary encoding with SOS',can we be more precise about that?

SOS can reference a binary ("netCDF") file and send it along. At thatpoint, the XML metadata could (or maybe not) be reduced because of theself-documenting nature of well-constructed netCDF files. I'm less surethat it's a good fit for WFS, but I might be convinced.

In short, having read through the referenced 'core standard' proposal[1], I can't tell what we're trying to do yet..
Other comments on this thread, for those needing distraction:

On Aug 20, 2009, at 10:00 AM, Ron Lake wrote:
I would argue that we should stop this idea that data are just numbersand strings and everything else is "metadata". <snip> Let's start bydefining the objects of interest and THEN we can have metadata aboutthem.
After watching thoughtful communities try to carefully describe 'theobject of interest', I am sure the proposed 'start' will be a long slowone. I'd rather stick with "one person's data is another person'smetadata", and try to avoid getting too excited about the precisedistinction between data and metadata, except when it is very narrowlydefined on a specific project (not the case in this thread, IMHO).

This is a key point. A lot of otherwise really sharp folks tend todefine everyone's data and metadata by their own prejudices, includingme. After all, MY data's easy to identify and define, and I can see howYOUR data should be identified and defined, too. What? you don't agreewith me? How dare you?

On Aug 20, 2009, at 9:54 AM, Tom Whittaker wrote:
One of the single biggest mistakes that the meteorological communitymade in defining adistribution format for realtime, streaming data was BUFR -- becausethe "tables" neededto interpret the contents of the files are somewhere else....andsometimes, end users cannot find them!
Perhaps this is a problem with the way the tables are made available,and not simply the fact they are separate from the data stream? Afterall, many image files (for example) are not described internally at all,but no one seems to have trouble working with those images.... (I knowthat's oversimplifying the difference, but it's instructive nonetheless.)

Ah, but it's not quite the same AND oversimplifying the difference. Aswell, using the current, well-known image files, there usually ISmetadata (or something describing the image) somewhere in the header.That's just not the case with BUFR. You've some expectation of findingthe GIF header in a file you think is a GIF. That tells you how thething's compressed, what the core color table is, and the sampling.Then the data are relatively easy to pick out. For BUFR, you'rerequired to have prior knowledge of the file to interpret it.

NetCDF and ncML maintain the essential metadata within the files:
types, units, coordinates -- and I strongly urge you (or whomever) not
to make the  "BUFR mistake" again -- put the metadata into the files!
Maybe you think all the essential metadata is within the netCDF file,but in my opinion it isn't. I often find the essential metadata,particularly of the semantic variety, to be absent. And I know ofcommunities that have had significant difficulty with the provenance(for example) within CF/netCDF files.

Yeah, but... the mechanisms are there to put the semantic content intothe netCDF file, and to display at least originator history. There's noguarantee someone might not change the internal metadata, but I don'tthink that's what you're asking about.

The generalization (point) of this observation is that different peoplerequire different metadata, sometime arbitrarily complex or peripheralmetadata. And I don't think you want ALL that metadata in the same fileas the data -- especially when the data may be coming not in a file, butin a stream of records.

Another good point. I often think along the lines of inheritable andfile-unique metadata, and of how to obtain the inheritable stuff.There's little reason to include it when it could be obtained with a URIreference, but most disciplines can identify what their own file-unique(or observation-unique, experiment-unique, or such) metadata are, andthose *should* be included.

Do not require the end user to have to have an internet connection tosimply "read" the data....many people download the files and then take them along" whentraveling, for example.
Ah, in the era of linked data, or LinkedData [2] -- which will be ourera in 5 years from now, if not already -- this problem will be solved,because all will insist on having the internet connection when they aretraveling. Witness the trajectory of internet availability at scientificconferences.
If I simply downloaded the file at
<http://schemas.opengis.net/om/1.0.0/examples/weatherObservation.xml>
I would not be able to read it. In fact, it looks like even if I alsogot the "metadata" file at:
<http://schemas.opengis.net/om/1.0.0/examples/weatherRecord1.xml>
I would still not be able to read it, since it also refers to otherservers in the universe to obtain essential metadata.
Uh... I think you may be a bit wrong about what you saw in theexamples. The first file is crudely readable if not comprehensivelydescribed (to say the least), but by the designer's choice this filereferences more detailed metadata in a second file. (The file creatordidn't have to do that per the spec, but in some observing systems Iwould say it makes sense.) Nothing in the second file appears to referto 'essential metadata' in other files... depending on what you think ofas essential of course. (The .xsd for example is more of a formatspecification, not a bit of central metadata. By analogy, I can't findthe reference in a netCDF file to any specification of its format, so Iguess it wouldn't qualify as containing all the essential metadata inthat sense either.)

Ah, but isn't that some of what we're trying to achieve here? Somestandard of the minimum required metadata to describe a dataset? I dohonestly believe that's not going to be a single, all-inclusivedefinition, but more likely will end up as a discipline-by-disciplineeffort, but I do believe there's potential for creating a starting pointto accomplish something here.


gerry

[1] Core standard OGC draft:http://sites.google.com/site/galeonteam/Home/cf-netcdf-candidate-standard
[2] Linked Data: linkeddata.org



--------------
NOTE NEW EMAIL ADDRESS
--------------
John Graybeal   <mailto:jbgraybeal@xxxxxxxxxxxxxx>
Marine Metadata Interoperability Project: http://marinemetadata.org

_______________________________________________
galeon mailing list
galeon@xxxxxxxxxxxxxxxx
For list information, to unsubscribe, visit:http://www.unidata.ucar.edu/mailing_lists/


--
Gerry Creager -- gerry.creager@xxxxxxxx
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843

References:
- [galeon] CF-netCDF standards initiatives
  - From: Ben Domenico
- Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
  - From: Max Martinez
- Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
  - From: Robin, Alexandre
- Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
  - From: Woolf, Andrew (STFC,RAL,ESC)
- Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
  - From: John Caron
- Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
  - From: Robin, Alexandre
- Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
  - From: Tom Whittaker
- Re: [galeon] [WCS-2.0.swg] CF-netCDF standards initiatives
  - From: John Graybeal

2009 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the galeon archives: