NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
The formatting of yesterday's message from Steve Hankin to the netcdfgroup mailing list was inadvertantly corrupted here at Unidata, so I've appended a reposting of the message in a more readable form. Let me also take this opportunity to remind members of the mailing list that administrative requests for addition to or deletion from the list should be sent to netcdfgroup-adm@xxxxxxxxxxxxxxxx rather than the full mailing list. Thanks. Hello Rich, Tim, Ken, et. al., I've been following your discussion about netCDF styles with interest as my own group - a numerical modeling group - shares the similar concerns: how to use netCDF to achieve a compatible representations of our model data (gridded, multi-gigabyte, multiple variables on staggered grids) as well as PMEL's EPIC data (down the hall) and outside institutions, too. This business of time axis representations is leading us all to similar solutions. Rich has described a global variable called "base_date" which "specifies the Gregorian start date". Similarly, the file "conventions.info" available from unidata.ucar.edu suggests e.g. variables: double time(nobs); time:units = "milliseconds since (1992-9-16 10:09:55.3 -600)" Our own software, FERRET, uses a solution e.g.: float TIME(TIME) ; TIME:units = "seconds" ; TIME:time_origin = "14-JAN-1976 14:00:00" ; and accepts int, long, float, or double data types. While all of these are very similar solutions they are also incompatible. How are time-date strings formatted? Where should the time origin be placed: in the units string? in a global attribute? in a variable attribute? If in an attribute, what is the attribute name? Is the data type mandated? Does the axis have to be a "coordinate variable" (dimension name=variable name) ? etc. etc. Similar issues arise for if/how to map gridded data onto 4-dimensional grids. Mandatory ordering of axes? Mandatory axis names? Mandatory units choices? What to do with missing axes (e.g. Z axis of vertically averaged flow)? It seems to me that if we want to adopt conventions for these issues now is the time to do it. NetCDF can fail to be a "standard" in any meaningful way if these issues are not addressed somewhat formally by "users" (us) acting as a community. I have some personal experiences with this type of standards-failure as a member of the ANSI committee that creates CGM (the Computer Graphics Metafile). CGM, a broadly conceived standard, has expected user communities to develop "profiles" that dictate their particular style choices and ensure interoperability. The user communities have mostly failed to get organized and there is chaos in the CGM world - enough to endanger its success as a standard. I spoke to Russ Rew and he agreed that a "straw man" proposal on these conventions for oceanographers was in order. I will try to pull one together in the next few days - using "conventions.info" as a starting point but going into much greater detail. My main goal will be to enumerate the open issues. The list I generate will be VERY incomplete - I hope we can pass it around and add to it. When we have a moderately exhaustive list then we can begin discussing solutions that encompass our issues. If you see a problem with this process please fire away! cheers - steve >From owner-netcdfgroup@xxxxxxxxxxxxxxxx 29 Tue, Sep Date: Tue, 29 Sep 1992 10:03:22 -0700 (PDT) From: HANKIN@xxxxxxxxxxxx To: netcdfgroup@xxxxxxxxxxxxxxxx Subject: RE: sizes of netCDF objects... Received: by unidata.ucar.edu id AA12164 (5.65c/IDA-1.4.4 for netcdfgroup-send); Tue, 29 Sep 1992 11:05:57 -0600 Received: from FERRET.NOAAPMEL.GOV ([192.68.161.61]) by unidata.ucar.edu with SMTP id AA12160 (5.65c/IDA-1.4.4 for <netcdfgroup@xxxxxxxxxxxxxxxx>); Tue, 29 Sep 1992 11:05:55 -0600 Organization: . Keywords: 199209291705.AA12160 Message-Id: <920929100322.20200e6d@xxxxxxxxxxxx> X-Vmsmail-To: SMTP%"netcdfgroup@xxxxxxxxxxxxxxxx" <Date: Mon, 28 Sep 1992 14:44:05 PDT <From: 28-Sep-1992 1734 <lysakowski@xxxxxxxxxxxxxxxxxxx> <Subject: sizes of netCDF objects... <To: netcdfgroup@xxxxxxxxxxxxxxxx <Cc: lysakowski@xxxxxxxxxxxxxxxxxxx <Message-id: <9209282143.AA26482@xxxxxxxxxxxxxxxxxx> <Organization: . <Apparently-To: netcdfgroup@xxxxxxxxxxxxxxxx <Keywords: 199209282144.AA10080 < < <Please respond to this message only if you are using netCDF for <large (over a megabyte of data) to Huge (100's of megabytes to gigabytes <of data). Our model outputs are typically about 2 Gbytes in size. We have an in-house direct access format that permits us to break this up into multiple files and a strategy that allows a "data set" (an associated group of files) to be the equivalent of a netCDF hyperslab such that the data set still shares the grid coordinates and indices of the full model output. This permits us in most cases to avoid working with the full multi-gigabyte data set. At present we have adapted the hyperslab strategy to netCDF files (using a handfull of netCDF attributes) but we have not yet implemented the ability to split the netCDF data set into multiple files. Because of this we havn't been using the netCDF format for our HUGE files yet - order 10-20 Mbytes, only so far. But we will likely be facing similar performance issues to your own in the future. <I need to do a short survey of netCDF usage for large to HUGE datasets. < <We are thinking about using netCDF for Nuclear Magnetic Resonance data for <analytical laboratories and for Magnetic Resonance Imaging data. < <1) What are the largest datasets that you are using with netCDF now? - see above <2) For what applications? - ocean GCM outputs <3) What limitations are you experiencing for performance? (If you are <experiencing limitations, please state what kind of hardware and software <you are using so we know how to interpret your results.) - Significant performance limitations on WRITEing file - excellent performance READing in all (very informal) tests to date. In WRITE operations the use of the RECORD (unlimited) dimension seems to impose a quadratic falling off in performance as the length of the record axis increases ... a potential gotcha for long time series saved incrementally ... <4) What are your plans for larger datasets in the future? How far do you <envision netCDF going before it breaks down, if at all? - as above: we're still on the leading edge of our learning curve, too <Thanks in advance. < <Rich Lysakowski <ADISS Project Director | NOAA/PMEL | ph. (206) 526-6080 Steve Hankin | 7600 Sand Point Way NE | FAX (206) 526-6744 | Seattle, WA 98115-0070 | hankin@xxxxxxxxxxxx
netcdfgroup
archives: