NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
You guys are replying faster than I can keep up! (Which is awful nice of you!) I was able to change the chunk size and get a file size that makes much more sense. With a chunk size of 1024, I get a file of 166kBytes. What are the units of chunk size by the way? -Val > On Apr 5, 2016, at 3:53 PM, Chris Barker <chris.barker@xxxxxxxx> wrote: > > oh, and I've enclosed my code -- your didn't actually run -- missing imports? > > > > > On Tue, Apr 5, 2016 at 12:52 PM, Chris Barker <chris.barker@xxxxxxxx > <mailto:chris.barker@xxxxxxxx>> wrote: > > > On Tue, Apr 5, 2016 at 12:13 PM, Ted Mansell <ted.mansell@xxxxxxxx > <mailto:ted.mansell@xxxxxxxx>> wrote: > You might check the ChunkSizes attribute with 'ncdump -hs'. The newer netcdf > sets larger default chunks than it used to. I had this issue with 1-d > variables that used an unlimited dimension. Even if the dimension only had a > small number, the default chunk made it much bigger. > > I had the same issue -- 1-d variable had a chunksize of 1, which was really, > really bad! > > But that doesn't seem to be the issue here -- I ran the same code, and get > the same results, and here is the dump: > > netcdf text3 { > types: > ubyte(*) variable_data_t ; > dimensions: > timestamp_dim = UNLIMITED ; // (1 currently) > data_dim = UNLIMITED ; // (1 currently) > item_len = 100 ; > variables: > double timestamp(timestamp_dim) ; > timestamp:_Storage = "chunked" ; > timestamp:_ChunkSizes = 524288 ; > variable_data_t data(data_dim) ; > data:_Storage = "chunked" ; > data:_ChunkSizes = 4194304 ; > data:_NoFill = "true" ; > > // global attributes: > :_Format = "netCDF-4" ; > } > > if I read that right, nice big chunks. > > note that if I do'nt use a VLType variable, I still get a 4MB file -- though > that could be the netcdf4 overhead: > > netcdf text3 { > types: > ubyte(*) variable_data_t ; > dimensions: > timestamp_dim = UNLIMITED ; // (1 currently) > data_dim = UNLIMITED ; // (1 currently) > item_len = 100 ; > variables: > double timestamp(timestamp_dim) ; > timestamp:_Storage = "chunked" ; > timestamp:_ChunkSizes = 524288 ; > ubyte data(data_dim, item_len) ; > data:_Storage = "chunked" ; > data:_ChunkSizes = 1, 100 ; > > // global attributes: > :_Format = "netCDF-4" ; > } > > something is up with the VLen..... > > -CHB > > > > > (Assuming the variable is not compressed.) > > -- Ted > > __________________________________________________________ > | Edward Mansell <ted.mansell@xxxxxxxx <mailto:ted.mansell@xxxxxxxx>> > | National Severe Storms Laboratory > |-------------------------------------------------------------- > | "The contents of this message are mine personally and > | do not reflect any position of the U.S. Government or NOAA." > |-------------------------------------------------------------- > > On Apr 5, 2016, at 1:44 PM, Val Schmidt <vschmidt@xxxxxxxxxxxx > <mailto:vschmidt@xxxxxxxxxxxx>> wrote: > > > Hello netcdf folks, > > > > I’m testing some python code for writing sets of timestamps and variable > > length binary blobs to a netcdf file and the resulting file size is > > perplexing to me. > > > > The following segment of python code creates a file with just two > > variables, “timestamp” and “data”, populates the first entry of the > > timestamp variable with a float and the corresponding first entry of the > > data variable with an array of 100 unsigned 8-bit integers. The total > > amount of data is 108 bytes. > > > > But the resulting file is over 73 MB in size. Does anyone know why this > > might be so large and what I might be doing to cause it? > > > > Thanks, > > > > Val > > > > > > from netCDF4 import Dataset > > import numpy > > > > f = Dataset('scratch/text3.nc <http://text3.nc/>','w') > > > > dim = f.createDimension('timestamp_dim',None) > > data_dim = f.createDimension('data_dim',None) > > > > data_t = f.createVLType('u1','variable_data_t’) > > > > timestamp = f.createVariable('timestamp','d','timestamp_dim') > > data = f.createVariable('data',data_t,'data_dim’) > > > > timestamp[0] = time.time() > > data[0] = uint8( numpy.ones(1,100)) > > > > f.close() > > > > ------------------------------------------------------ > > Val Schmidt > > CCOM/JHC > > University of New Hampshire > > Chase Ocean Engineering Lab > > 24 Colovos Road > > Durham, NH 03824 > > e: vschmidt [AT] ccom.unh.edu <http://ccom.unh.edu/> > > m: 614.286.3726 <tel:614.286.3726> > > > > > > _______________________________________________ > > netcdfgroup mailing list > > netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx> > > For list information or to unsubscribe, visit: > > http://www.unidata.ucar.edu/mailing_lists/ > > <http://www.unidata.ucar.edu/mailing_lists/> > > > > > _______________________________________________ > netcdfgroup mailing list > netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx> > For list information or to unsubscribe, visit: > http://www.unidata.ucar.edu/mailing_lists/ > <http://www.unidata.ucar.edu/mailing_lists/> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 <tel:%28206%29%20526-6959> voice > 7600 Sand Point Way NE (206) 526-6329 <tel:%28206%29%20526-6329> fax > Seattle, WA 98115 (206) 526-6317 <tel:%28206%29%20526-6317> main > reception > > Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx><huge_nc_file.py> ------------------------------------------------------ Val Schmidt CCOM/JHC University of New Hampshire Chase Ocean Engineering Lab 24 Colovos Road Durham, NH 03824 e: vschmidt [AT] ccom.unh.edu m: 614.286.3726
netcdfgroup
archives: