NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] Unexpectedly large netCDF4 files from python

oh, and I've enclosed my code -- your didn't actually run -- missing
imports?




On Tue, Apr 5, 2016 at 12:52 PM, Chris Barker <chris.barker@xxxxxxxx> wrote:

>
>
> On Tue, Apr 5, 2016 at 12:13 PM, Ted Mansell <ted.mansell@xxxxxxxx> wrote:
>
>> You might check the ChunkSizes attribute with 'ncdump -hs'. The newer
>> netcdf sets larger default chunks than it used to. I had this issue with
>> 1-d variables that used an unlimited dimension. Even if the dimension only
>> had a small number, the default chunk made it much bigger.
>
>
> I had the same issue -- 1-d variable had a chunksize of 1, which was
> really, really bad!
>
> But that doesn't seem to be the issue here -- I ran the same code, and get
> the same results, and here is the dump:
>
> netcdf text3 {
> types:
>   ubyte(*) variable_data_t ;
> dimensions:
>     timestamp_dim = UNLIMITED ; // (1 currently)
>     data_dim = UNLIMITED ; // (1 currently)
>     item_len = 100 ;
> variables:
>     double timestamp(timestamp_dim) ;
>         timestamp:_Storage = "chunked" ;
>         timestamp:_ChunkSizes = 524288 ;
>     variable_data_t data(data_dim) ;
>         data:_Storage = "chunked" ;
>         data:_ChunkSizes = 4194304 ;
>         data:_NoFill = "true" ;
>
> // global attributes:
>         :_Format = "netCDF-4" ;
> }
>
> if I read that right, nice big chunks.
>
> note that if I do'nt use a VLType variable, I still get a 4MB file --
> though that could be the netcdf4 overhead:
>
> netcdf text3 {
> types:
>   ubyte(*) variable_data_t ;
> dimensions:
>     timestamp_dim = UNLIMITED ; // (1 currently)
>     data_dim = UNLIMITED ; // (1 currently)
>     item_len = 100 ;
> variables:
>     double timestamp(timestamp_dim) ;
>         timestamp:_Storage = "chunked" ;
>         timestamp:_ChunkSizes = 524288 ;
>     ubyte data(data_dim, item_len) ;
>         data:_Storage = "chunked" ;
>         data:_ChunkSizes = 1, 100 ;
>
> // global attributes:
>         :_Format = "netCDF-4" ;
> }
>
> something is up with the VLen.....
>
> -CHB
>
>
>
>
>
>> (Assuming the variable is not compressed.)
>>
>> -- Ted
>>
>> __________________________________________________________
>> | Edward Mansell <ted.mansell@xxxxxxxx>
>> | National Severe Storms Laboratory
>> |--------------------------------------------------------------
>> | "The contents of this message are mine personally and
>> | do not reflect any position of the U.S. Government or NOAA."
>> |--------------------------------------------------------------
>>
>> On Apr 5, 2016, at 1:44 PM, Val Schmidt <vschmidt@xxxxxxxxxxxx> wrote:
>>
>> > Hello netcdf folks,
>> >
>> > I’m testing some python code for writing sets of timestamps and
>> variable length binary blobs to a netcdf file and the resulting file size
>> is perplexing to me.
>> >
>> > The following segment of python code creates a file with just two
>> variables, “timestamp” and “data”, populates the first entry of the
>> timestamp variable with a float and the corresponding first entry of the
>> data variable with an array of 100 unsigned 8-bit integers. The total
>> amount of data is 108 bytes.
>> >
>> > But the resulting file is over 73 MB in size. Does anyone know why this
>> might be so large and what I might be doing to cause it?
>> >
>> > Thanks,
>> >
>> > Val
>> >
>> >
>> > from netCDF4 import Dataset
>> > import numpy
>> >
>> > f = Dataset('scratch/text3.nc','w')
>> >
>> > dim = f.createDimension('timestamp_dim',None)
>> > data_dim = f.createDimension('data_dim',None)
>> >
>> > data_t = f.createVLType('u1','variable_data_t’)
>> >
>> > timestamp = f.createVariable('timestamp','d','timestamp_dim')
>> > data = f.createVariable('data',data_t,'data_dim’)
>> >
>> > timestamp[0] = time.time()
>> > data[0] = uint8( numpy.ones(1,100))
>> >
>> > f.close()
>> >
>> > ------------------------------------------------------
>> > Val Schmidt
>> > CCOM/JHC
>> > University of New Hampshire
>> > Chase Ocean Engineering Lab
>> > 24 Colovos Road
>> > Durham, NH 03824
>> > e: vschmidt [AT] ccom.unh.edu
>> > m: 614.286.3726
>> >
>> >
>> > _______________________________________________
>> > netcdfgroup mailing list
>> > netcdfgroup@xxxxxxxxxxxxxxxx
>> > For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>
>>
>>
>> _______________________________________________
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@xxxxxxxx
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
from netCDF4 import Dataset
import numpy as np
import time

f = Dataset('text3.nc', 'w')

dim = f.createDimension('timestamp_dim', None)
data_dim = f.createDimension('data_dim', None)
item_len = f.createDimension('item_len', 100)

data_t = f.createVLType('u1', 'variable_data_t')

timestamp = f.createVariable('timestamp', 'd', 'timestamp_dim')
# data = f.createVariable('data', data_t, 'data_dim')
data = f.createVariable('data', np.uint8, ('data_dim', 'item_len'))


timestamp[0] = time.time()
data[0] = np.ones((100,), dtype=np.uint8)

f.close()
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: