NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
On Mon, Jun 21, 2010 at 10:33:12PM +0400, Constantine Khroulev wrote: > It seems to me that case 1 is slow because NetCDF (Classic) keeps > the file header as small as possible (Section 4 of the NetCDF User's > Guide is perfectly clear about this). You can use nc__enddef (the double-underscore version) to adjust this behavior and pad out the header. then, when the header grows in size because you've added another variable, you won't have to rewrite the entire dataset. You only need a few bytes in the header for a new variable: by adding, say, 4k of headroom, you can store a lot of variables w/o triggering a rewrite. > Case 2, on the other hand, > seems to be slow because (please correct me if I'm wrong) variables > are stored contiguously. (In other words: if variables A and B are > defined in this order, then appending X bytes to A requires moving B > over by X bytes.) In parallel-netcdf land we take some (technically legal) liberties with the file format so that you can pad out individual variables. There might be a tuning option to do that in netcdf, but I don't know it off the top of my head. > My question is: > > How does NetCDF-4 compare to NetCDF Classic in this regard? Would > switching to it improve write performance? (This is two questions, > really: I'm interested in cases 1 and 2 separately.) I imagine the new file format will handle this pretty well, but I'm not an expert. You'll pay a bit of a price when you read back this data but it sounds like that's not a big deal for your workload. ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA
netcdfgroup
archives: