NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
>To: support@xxxxxxxxxxxxxxxx >From: Frederic J Chagnon <frederic@xxxxxxx> >Subject: Creation of large netCDF datasets using ncgen and CDL files >Organization: UCAR/Unidata >Keywords: 200010132335.e9DNZ8417714 Hi Frederic, > I am attempting to create a netCDf file to store some model output, and am > using a method that has worked for the past year. But now, the file I am > creating is very large (3 Gb), and there seems to be a problem. The netCDF format uses 32-bit file offsets and extents, so in general the size of a netCDF file is limited to about 2 Gbytes (2^31, since off_t is a signed type). However, there are some exceptions, and below we suggest how you could keep your data in a single file by taking advantage of one of these exceptions. On systems that support 64-bit offsets and a large file environment with files exceeding 2 Gbytes (IRIX64, SunOS 5.x for SPARC v9, OSF1/Alpha, ...) it is possible to create and access very large netCDF files. The remaining size constraints are that the file offset to the beginning of the record variables (if any) must be less than 2 Gbytes, and the relative offset to the start of each fixed length variable or each record variable within a record must be less than 2 Gbytes. Hence, a very large netCDF file might have * no record variables, some ordinary fixed-length variables, and one very large (exceeding 2 Gbytes) fixed-length variable; or * some ordinary fixed-length and record variables, and one very large record variable; or * some ordinary fixed-length and record variables and a huge number of records. If you create very large netCDF files, they will only be usable on other systems that support very large files. The netCDF file format has not changed, so files less than 2 Gbytes in size are still writable and readable on all systems on which netCDF is supported. To eliminate the above weaker file size constraints would require a new netCDF format. So far the original format (version 1, since 1987) has been sufficient for all versions of the software through the latest netCDF 3.5 release. Implementing software that would support a new format (based on HDF-5) but that would also continue to permit access to files in the previous format has been in our long-term plans for netCDF. But there will be no release in the near future that doesn't have the above restrictions on variable size or number of records in a netCDF file. > Here's my method: > I have a CDL file in which I define the extents of the domain and all the > variables. I use the ncgen -o command to create a netCDF file based on the > CDL > file. To test the "goodness" of the file created, I use the ncdump -c command. > This method works well for me if I am creating datasets that are smaller than > 1.5 GB. However, I have just added more time step definitions to my CDL file, > and while it creates the netCDF file seemlessly, the ncdump -c command > returns > an error. > > I have even downloaded the latest version of the netCDf libraries > (netcdf-3.5-beta) in hope that it would solve the problem, but I have had no > luck. > > I am puzzled, because if I modify the CDL file and decrease either the number > of variables, or the size of the domain, the resulting netCDF file "works". > > Below is the error message from the ncdump -c command. If you could enlighten > me in any way on the matter, I would be most grateful. i would be glad to > supply my CDL file (but didn't want to bother you with an attachement unless > needed.) Thanks. > > ncdump -c OSU_YEAR_2D.nc > netcdf OSU_YEAR_2D { > dimensions: > londot = 75 ; > latdot = 50 ; > loncrs = 75 ; > latcrs = 50 ; > levela = 23 ; > levelb = 24 ; > levelc = 1 ; > time = 8760 ; > variables: > float GROUNDT(time, levelc, latcrs, loncrs) ; > GROUNDT:long_name = "GROUND TEMPERATURE" ; > (...) > > double time(time) ; > time:time_origin = "01-JAN-1998:00:00:00" ; > time:units = "seconds" ; > time:point_spacing = "even" ; > > // global attributes: > :title = "OSU_YEAR SIMULATION OUTPUT" ; > data: > > londot = ncdump: Invalid argument So it looks like the GROUNDT variable takes about 131 Mbytes. If you have more than 16 such variables, the offset of the 17th and all subsequent variables would be greater than 2^31 = 2.1475 Gbytes, so such a file could not be represented with the current netCDF format. However, if you defined `time' as a record (unlimited) dimension, each record variable would only require about 15000 bytes, and you could have up to 2^31 records, so with this structure you should be able to keep all your data in a single netCDF file. --Russ _____________________________________________________________________ Russ Rew UCAR Unidata Program russ@xxxxxxxxxxxxxxxx http://www.unidata.ucar.edu
netcdfgroup
archives: