NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Simon, In relation to a problem you noticed with using nccopy to rechunk data, you asked: > Is there an obvious mistake on my side or might there be a problem with > variables in groups? > > I am using netcdf library version 4.3.1.1 of Feb 26 2014 12:06:45 You encountered undocumented behavior in nccopy, but it wasn't related to groups. The new chunk size you chose for 4-byte float data, 820 by 1, results in chunks of 3280 bytes, which is less than the (undocumented) threshold for minimum chunk sizes, set in nccopy.c: #define CHUNK_THRESHOLD (8192) /* variables with fewer bytes don't get chunked */ Instead of using the smaller chunk size you requested, nccopy used default chunking for your variable, resulting in the weird 55 by 17856 chunks (approximately proportional to the shape of your original variable, 820 by 249984). The intent of the CHUNK_THRESHOLD minimum is to not create chunks smaller than a physical disk block, as an I/O optimization, because attempting to read a smaller chunk will still cause a whole disk block to be read. So as a workaround, you could specify 820 by 3 chunks instead and get the same efficiency as 820 by 1 chunks, assuming your physical disk blocks are 8192 bytes. However, I think for the next release, we should lower the default threshold to 512 bytes, and document the behavior. Thanks for reporting the problem! --Russ Simon Stähler wrote: > I want to use the nccopy script to change the chunking of a large 2D > dataset (first dimension time ("snapshots"), second point index > ("gllpoints_all")). The original file has the structure: > > $ ncdump -sch original.nc > netcdf original { > dimensions: > snapshots = 820 ; > gllpoints_all = 249984 ; > variables: > > // global attributes: > :npoints = 249984 ; > > group: Snapshots { > variables: > float strain_dsus(snapshots, gllpoints_all) ; > strain_dsus:_Storage = "chunked" ; > strain_dsus:_ChunkSizes = 1, 249984 ; > } // group Snapshots > > > For further processing of the file, I want to change the chunks so that > each contains all the time steps at one point. > I do this with > > $ nccopy -c "snapshots/820,gllpoints_all/1" original.nc new.nc > > However, the resulting chunk sizes are somewhat weird: > {55, 17856} instead of {820, 1}: > > $ ncdump -sch new.nc > netcdf axisem_output_3 { > dimensions: > snapshots = 820 ; > gllpoints_all = 249984 ; > > // global attributes: > :npoints = 249984 ; > > group: Snapshots { > variables: > float strain_dsus(snapshots, gllpoints_all) ; > strain_dsus:_Storage = "chunked" ; > strain_dsus:_ChunkSizes = 55, 17856 ; > } // group Snapshots > > Is there an obvious mistake on my side or might there be a problem with > variables in groups? > > I am using netcdf library version 4.3.1.1 of Feb 26 2014 12:06:45 > > cheers, > > Simon Stähler
netcdfgroup
archives: