Howdy, John,
John Graybeal wrote:
I'm in the odd position of agreeing in principal with several writers
(keep metadata with data, support non-networked computing, the values
are more than the numbers), and then disagreeing with many details. A
few examples are below.
Yeah, in a lot of ways, so am I.
On reading Steve Hankin's post, though, I must ask: What exactly is
being proposed? A binary data format for files? A set of such binary
data formats? Or a protocol for exchanging information? Is this simply
a recapture of 'everything netCDF and CF' so that OGC can put a stamp of
approval on it?
In a way, yes, this is what is proposed. THEN there's a formal way to
add, extend and improve CF-NetCDF within a known framework.
Ben wrote "This approach will result in a binary encoding which can be
used with different access protocols, e.g., WFS or SOS as well as WCS."
I don't really know what it means to 'use a binary encoding with SOS',
can we be more precise about that?
SOS can reference a binary ("netCDF") file and send it along. At that
point, the XML metadata could (or maybe not) be reduced because of the
self-documenting nature of well-constructed netCDF files. I'm less sure
that it's a good fit for WFS, but I might be convinced.
In short, having read through the referenced 'core standard' proposal
[1], I can't tell what we're trying to do yet..
Other comments on this thread, for those needing distraction:
On Aug 20, 2009, at 10:00 AM, Ron Lake wrote:
I would argue that we should stop this idea that data are just numbers
and strings and everything else is "metadata". <snip> Let's start by
defining the objects of interest and THEN we can have metadata about
them.
After watching thoughtful communities try to carefully describe 'the
object of interest', I am sure the proposed 'start' will be a long slow
one. I'd rather stick with "one person's data is another person's
metadata", and try to avoid getting too excited about the precise
distinction between data and metadata, except when it is very narrowly
defined on a specific project (not the case in this thread, IMHO).
This is a key point. A lot of otherwise really sharp folks tend to
define everyone's data and metadata by their own prejudices, including
me. After all, MY data's easy to identify and define, and I can see how
YOUR data should be identified and defined, too. What? you don't agree
with me? How dare you?
On Aug 20, 2009, at 9:54 AM, Tom Whittaker wrote:
One of the single biggest mistakes that the meteorological community
made in defining a
distribution format for realtime, streaming data was BUFR -- because
the "tables" needed
to interpret the contents of the files are somewhere else....and
sometimes, end users cannot find them!
Perhaps this is a problem with the way the tables are made available,
and not simply the fact they are separate from the data stream? After
all, many image files (for example) are not described internally at all,
but no one seems to have trouble working with those images.... (I know
that's oversimplifying the difference, but it's instructive nonetheless.)
Ah, but it's not quite the same AND oversimplifying the difference. As
well, using the current, well-known image files, there usually IS
metadata (or something describing the image) somewhere in the header.
That's just not the case with BUFR. You've some expectation of finding
the GIF header in a file you think is a GIF. That tells you how the
thing's compressed, what the core color table is, and the sampling.
Then the data are relatively easy to pick out. For BUFR, you're
required to have prior knowledge of the file to interpret it.
NetCDF and ncML maintain the essential metadata within the files:
types, units, coordinates -- and I strongly urge you (or whomever) not
to make the "BUFR mistake" again -- put the metadata into the files!
Maybe you think all the essential metadata is within the netCDF file,
but in my opinion it isn't. I often find the essential metadata,
particularly of the semantic variety, to be absent. And I know of
communities that have had significant difficulty with the provenance
(for example) within CF/netCDF files.
Yeah, but... the mechanisms are there to put the semantic content into
the netCDF file, and to display at least originator history. There's no
guarantee someone might not change the internal metadata, but I don't
think that's what you're asking about.
The generalization (point) of this observation is that different people
require different metadata, sometime arbitrarily complex or peripheral
metadata. And I don't think you want ALL that metadata in the same file
as the data -- especially when the data may be coming not in a file, but
in a stream of records.
Another good point. I often think along the lines of inheritable and
file-unique metadata, and of how to obtain the inheritable stuff.
There's little reason to include it when it could be obtained with a URI
reference, but most disciplines can identify what their own file-unique
(or observation-unique, experiment-unique, or such) metadata are, and
those *should* be included.
Do not require the end user to have to have an internet connection to
simply "read" the data....
many people download the files and then take them along" when
traveling, for example.
Ah, in the era of linked data, or LinkedData [2] -- which will be our
era in 5 years from now, if not already -- this problem will be solved,
because all will insist on having the internet connection when they are
traveling. Witness the trajectory of internet availability at scientific
conferences.
If I simply downloaded the file at
<http://schemas.opengis.net/om/1.0.0/examples/weatherObservation.xml>
I would not be able to read it. In fact, it looks like even if I also
got the "metadata" file at:
<http://schemas.opengis.net/om/1.0.0/examples/weatherRecord1.xml>
I would still not be able to read it, since it also refers to other
servers in the universe to obtain essential metadata.
Uh... I think you may be a bit wrong about what you saw in the
examples. The first file is crudely readable if not comprehensively
described (to say the least), but by the designer's choice this file
references more detailed metadata in a second file. (The file creator
didn't have to do that per the spec, but in some observing systems I
would say it makes sense.) Nothing in the second file appears to refer
to 'essential metadata' in other files... depending on what you think of
as essential of course. (The .xsd for example is more of a format
specification, not a bit of central metadata. By analogy, I can't find
the reference in a netCDF file to any specification of its format, so I
guess it wouldn't qualify as containing all the essential metadata in
that sense either.)
Ah, but isn't that some of what we're trying to achieve here? Some
standard of the minimum required metadata to describe a dataset? I do
honestly believe that's not going to be a single, all-inclusive
definition, but more likely will end up as a discipline-by-discipline
effort, but I do believe there's potential for creating a starting point
to accomplish something here.
gerry
[1] Core standard OGC draft:
http://sites.google.com/site/galeonteam/Home/cf-netcdf-candidate-standard
[2] Linked Data: linkeddata.org
--------------
NOTE NEW EMAIL ADDRESS
--------------
John Graybeal <mailto:jbgraybeal@xxxxxxxxxxxxxx>
Marine Metadata Interoperability Project: http://marinemetadata.org
_______________________________________________
galeon mailing list
galeon@xxxxxxxxxxxxxxxx
For list information, to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
--
Gerry Creager -- gerry.creager@xxxxxxxx
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843