Re: [netcdfgroup] Unexpectedly large netCDF4 files from python

To: Ted Mansell <ted.mansell@xxxxxxxx>
Subject: Re: [netcdfgroup] Unexpectedly large netCDF4 files from python
From: Chris Barker <chris.barker@xxxxxxxx>
Date: Tue, 5 Apr 2016 12:53:39 -0700

oh, and I've enclosed my code -- your didn't actually run -- missing
imports?




On Tue, Apr 5, 2016 at 12:52 PM, Chris Barker <chris.barker@xxxxxxxx> wrote:

>
>
> On Tue, Apr 5, 2016 at 12:13 PM, Ted Mansell <ted.mansell@xxxxxxxx> wrote:
>
>> You might check the ChunkSizes attribute with 'ncdump -hs'. The newer
>> netcdf sets larger default chunks than it used to. I had this issue with
>> 1-d variables that used an unlimited dimension. Even if the dimension only
>> had a small number, the default chunk made it much bigger.
>
>
> I had the same issue -- 1-d variable had a chunksize of 1, which was
> really, really bad!
>
> But that doesn't seem to be the issue here -- I ran the same code, and get
> the same results, and here is the dump:
>
> netcdf text3 {
> types:
>   ubyte(*) variable_data_t ;
> dimensions:
>     timestamp_dim = UNLIMITED ; // (1 currently)
>     data_dim = UNLIMITED ; // (1 currently)
>     item_len = 100 ;
> variables:
>     double timestamp(timestamp_dim) ;
>         timestamp:_Storage = "chunked" ;
>         timestamp:_ChunkSizes = 524288 ;
>     variable_data_t data(data_dim) ;
>         data:_Storage = "chunked" ;
>         data:_ChunkSizes = 4194304 ;
>         data:_NoFill = "true" ;
>
> // global attributes:
>         :_Format = "netCDF-4" ;
> }
>
> if I read that right, nice big chunks.
>
> note that if I do'nt use a VLType variable, I still get a 4MB file --
> though that could be the netcdf4 overhead:
>
> netcdf text3 {
> types:
>   ubyte(*) variable_data_t ;
> dimensions:
>     timestamp_dim = UNLIMITED ; // (1 currently)
>     data_dim = UNLIMITED ; // (1 currently)
>     item_len = 100 ;
> variables:
>     double timestamp(timestamp_dim) ;
>         timestamp:_Storage = "chunked" ;
>         timestamp:_ChunkSizes = 524288 ;
>     ubyte data(data_dim, item_len) ;
>         data:_Storage = "chunked" ;
>         data:_ChunkSizes = 1, 100 ;
>
> // global attributes:
>         :_Format = "netCDF-4" ;
> }
>
> something is up with the VLen.....
>
> -CHB
>
>
>
>
>
>> (Assuming the variable is not compressed.)
>>
>> -- Ted
>>
>> __________________________________________________________
>> | Edward Mansell <ted.mansell@xxxxxxxx>
>> | National Severe Storms Laboratory
>> |--------------------------------------------------------------
>> | "The contents of this message are mine personally and
>> | do not reflect any position of the U.S. Government or NOAA."
>> |--------------------------------------------------------------
>>
>> On Apr 5, 2016, at 1:44 PM, Val Schmidt <vschmidt@xxxxxxxxxxxx> wrote:
>>
>> > Hello netcdf folks,
>> >
>> > I’m testing some python code for writing sets of timestamps and
>> variable length binary blobs to a netcdf file and the resulting file size
>> is perplexing to me.
>> >
>> > The following segment of python code creates a file with just two
>> variables, “timestamp” and “data”, populates the first entry of the
>> timestamp variable with a float and the corresponding first entry of the
>> data variable with an array of 100 unsigned 8-bit integers. The total
>> amount of data is 108 bytes.
>> >
>> > But the resulting file is over 73 MB in size. Does anyone know why this
>> might be so large and what I might be doing to cause it?
>> >
>> > Thanks,
>> >
>> > Val
>> >
>> >
>> > from netCDF4 import Dataset
>> > import numpy
>> >
>> > f = Dataset('scratch/text3.nc','w')
>> >
>> > dim = f.createDimension('timestamp_dim',None)
>> > data_dim = f.createDimension('data_dim',None)
>> >
>> > data_t = f.createVLType('u1','variable_data_t’)
>> >
>> > timestamp = f.createVariable('timestamp','d','timestamp_dim')
>> > data = f.createVariable('data',data_t,'data_dim’)
>> >
>> > timestamp[0] = time.time()
>> > data[0] = uint8( numpy.ones(1,100))
>> >
>> > f.close()
>> >
>> > ------------------------------------------------------
>> > Val Schmidt
>> > CCOM/JHC
>> > University of New Hampshire
>> > Chase Ocean Engineering Lab
>> > 24 Colovos Road
>> > Durham, NH 03824
>> > e: vschmidt [AT] ccom.unh.edu
>> > m: 614.286.3726
>> >
>> >
>> > _______________________________________________
>> > netcdfgroup mailing list
>> > netcdfgroup@xxxxxxxxxxxxxxxx
>> > For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>
>>
>>
>> _______________________________________________
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@xxxxxxxx
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx

from netCDF4 import Dataset
import numpy as np
import time

f = Dataset('text3.nc', 'w')

dim = f.createDimension('timestamp_dim', None)
data_dim = f.createDimension('data_dim', None)
item_len = f.createDimension('item_len', 100)

data_t = f.createVLType('u1', 'variable_data_t')

timestamp = f.createVariable('timestamp', 'd', 'timestamp_dim')
# data = f.createVariable('data', data_t, 'data_dim')
data = f.createVariable('data', np.uint8, ('data_dim', 'item_len'))


timestamp[0] = time.time()
data[0] = np.ones((100,), dtype=np.uint8)

f.close()

Follow-Ups:
- Re: [netcdfgroup] Unexpectedly large netCDF4 files from python
  - From: Val Schmidt

References:
- [netcdfgroup] Unexpectedly large netCDF4 files from python
  - From: Val Schmidt
- Re: [netcdfgroup] Unexpectedly large netCDF4 files from python
  - From: Ted Mansell
- Re: [netcdfgroup] Unexpectedly large netCDF4 files from python
  - From: Chris Barker