NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Rich: Thanks for your comments. Data Explorer (formerly, IBM Visualization Data Explorer or informally, DX) is a commercial software package developed by the group that I am in. The specification of the DX data model and the operations that DX supports are available publically. Obviously, the code is not. Although DX has a lot of capabilities, the data model can support a greater variety of data than the currently available version of the software can visualize. For a simple example, consider a rank 2 tensor in 3-space on an irregular grid. The model can handle it without difficulty. DX can do mathematical operations on such data, but realization (generating renderable geometry) is not so straightforward (i.e., a research problem). Today, DX would allow you to treat a 3-tuple from the tensor as a 3-vector or a single element as a scalar, for example, and do appropriate things. There are also a few visual things that can be done if the tensor is symmetric. Documentation on DX and the data model range from marketing literature with some level of detail about data types, etc. to papers in the literature (cf., R. Haber et al, "A Data Model for Scientific Visualization with Provisions for Regular and Irregular Grids", Proceedings IEEE Visualization '91 Conference, pp. 298-305, October 1991, B. Lucas et al, "An Architecture for a Scientific Visualization System". Proceedings IEEE Visualization '92, pp. 107-113, October 1992) to an internal report that I have written on data management methods for visualization to software documentation (DX user's guide, DX programmer's guide). All of this is available publically. We are also interested in the use of the external representation of the model independent of DX. Right now, it is a multiple file representation without an API independent of DX. This could be something to discuss further, if there is interest. The issue of public domain vs. commercial is certainly one that I faced at NASA over the years. The first data system that used CDF was the NASA Climate Data System (NCDS). NCDS uses some commercial software (e.g., RDBMS, UI, graphics). This was widely criticized at design (a decade ago) because the NASA approach at that time was to build everything from scratch and ignore the outside world. Commercial software was chosen to reduce costs, especially with a finite budget. Custom software was used to develop things unavailable commercially or in the public domain and to integrate. CDF was one such piece of software. Later in the 80s, the development of CDF was criticized because the NASA view then was that we should not be developing stuff to put out in the public domain, but should adopt what is already available. It did not matter if the appropriate tools did not exist. C'est le vie. Anyway, I do agree with Rich's assessment that the public domain is the proper place for standards (and benchmarks -- another subject we should discuss at some point). There is plenty of precedence for this view in other arenas of computing. Commercial systems may use, enhance, etc. such a standard, of course, which then provides value that a potential customer is willing to buy. If you will, that's our view about importing data in things like netCDF or CDF or generating images in TIFF, PS, etc. Given that we wanted to support data and analysis thereof for problems that were beyond what current systems like CDF, netCDF, HDF at al could handle we developed something ourselves. We did look at everything out there first -- no sense reinventing the wheel. Our extensions/conventions for netCDF were an early attempt to provide an importa- tion mechanism on one public domain "standard". However, given the limits in- herent in the netCDF data model and its vocabulary, the result was a subset of what the DX data model supports. We do have some interest in making the data model and a external format available publically, independent of DX. I would have interest in discussing this further with anyone so inclined. There are other issues that I wish to discuss at some other time in two arenas. One relates to data set scaling, both width (complexity) and depth (bulk size). Some of what we have developed addresses both of these: the complexity in terms of the model vocabulary and the size in terms of support for parallel computation and use of high-performance I/O systems (h/w). NSSDC, for example, has addressed some of the size scaling issues in CDF for disk access in conventional file systems with direct access to the disk, subsampling from disk, etc. This has implications when dealing with more than a few 10s of MB of data. The other area relates to semantics -- issues of higher-level information imbedded as metadata, and driving applications. Lloyd ------------------------------- Referenced Note --------------------------- Received: from unidata.ucar.edu by watson.ibm.com (IBM VM SMTP V2R2) with TCP; Thu, 26 Nov 1992 02:48:24 EST Received: by unidata.ucar.edu id AA21351 (5.65c/IDA-1.4.4 for netcdfgroup-send); Thu, 26 Nov 1992 00:01:48 -0700 Received: from enet-gw.pa.dec.com by unidata.ucar.edu with SMTP id AA21347 (5.65c/IDA-1.4.4 for <netcdfgroup@xxxxxxxxxxxxxxxx>); Thu, 26 Nov 1992 00:01:45 -0700 Organization: . Keywords: 199211260701.AA21347 Received: by enet-gw.pa.dec.com; id AA12450; Wed, 25 Nov 1992 23:01:38 -0800 Message-Id: <9211260701.AA12450@xxxxxxxxxxxxxxxxxx> Received: from mr4dec.enet; by decwrl.enet; Wed, 25 Nov 1992 23:01:38 PST Date: Wed, 25 Nov 1992 23:01:38 PST From: 26-Nov-1992 0153 <lysakowski@xxxxxxxxxxxxxxxxxxx> To: netcdfgroup@xxxxxxxxxxxxxxxx Cc: lysakowski@xxxxxxxxxxxxxxxxxxx Apparently-To: netcdfgroup@xxxxxxxxxxxxxxxx Subject: Broader Requirements for netCDF and standards - response to Lloyd Tr's memo o Lloyd Treinish has done an excellent job on thinking how to use netCDF "as is" to represent complex datatypes that are not inherently supported in netCDF now. We in the analytical instrument community have requirements that go far beyond where netCDF is currently. We need to support more complex data models sooner rather than later. Many kudos to Lloyd for taking the next major step -- again!! >From my quick reading of Lloyd's comments, the conventions used in Data Explorer (included below) detail a way to implement some parts of a more extensive data model using conventions in CDL. I think the description is very useful to describe how one might use netCDF and CDL to store more complex datatypes in netCDF files. I'd like to see the technical requirements and scope of those requirements that the Data Explorer data model addresses now. Lloyd, is the Data Explorer data model specification a public-domain document? Data Explorer sounds like a great package. It appears to address many requirements for several different domains of science. -------------------------------------------------------------------------- We need to advance to other issues not addressed by Lloyd's input. I feel that we still haven't fully addressed the question of standards. There are very important business, organizational, and people constraints on any solution that will be WIDELY accepted, i.e., become a standard. The analytical instrument vendors, universities, government agencies, and end user companies that I've been working with on analytical data standards over the past 4 years have said "if we have to buy it from "company x", then it's not an open standard, and we don't want it." A major problem with Data Explorer (and other commercial systems "more advanced" than netCDF) is that it is proprietary, and requires paid royalties to a for-profit company. I've been hit down hard for proposing proprietary technologies to standards groups and other researchers that, for whatever reason, feel they must base their work on public-domain standards. Until a public-domain version is made available that is free of charge, available over Internet, and is supported by a vendor-independent software engineering support group like Unidata or NASA, Data Explorer (or any other commercial package) doesn't serve the major needs for universities, standards communities, and even many sections of industry for scientific data interchange and storage. I've hit up against this hard "reality" many times. If such a public-domain version of a generic package (Data Explorer or any other package) for scientific data interchange and storage were made available to the scientific community, it must not be made available as a "scaled-down" version, that requires someone to buy the commercial version to get the full functionality. Unidata doesn't use such "hooks", because they don't serve Unidata's clientele. We must not lose sight of the fact that technical solutions by themselves are not complete solutions or business solutions, whether your "business" is university research, industrial R&D, or government R&D. Too many technical solutions fail to make it "to market" because they are technical solutions only, and fail to satisfy the all other requirements, particularly business, organizational, and people. This is not a soapbox conversation. I've had to take long hard looks at what is making the analytical data standards successful. The technical part of it (netCDF software) is an important, yet small part of the solution. This is not always easy for technical people (including myself) to accept. The vendor-independent software support center (Unidata) is an organizational factor that is crucial to the success of netCDF. However, to be successful the full range of requirements must be included in the solution. Unidata has done a good job of addressing the fuller range of requirements than most other organizations I've seen. Unidata does an enormous amount of work to make sure their codes are fully avaiable on all the major platforms, with no particular bias toward any group of users or vendors. They should be commended for all their great work. I hope that this discussion leads to a broader discussion of requirements for systems and solutions in the future. This may be controversial, but it is meant to forward the scientific community large. NetCDF has a broad applicability, and it needs to be extended to meet some of the requirements beyond those that Lloyd and others have begun to address in varous memos. This is a good time to start discussing the broader requirements for the future versions. Your feedback on this note will be much appreciated. Rich Lysakowski Director, Analytical Data Interchange and Storage Standards Project
netcdfgroup
archives: