NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] nf90_char size

There it is.

> DATASET "BSE_RESONANT_COMPRESSED1_DONE" {
>       DATATYPE  H5T_STRING {
>          STRSIZE 1;
>          STRPAD H5T_STR_NULLTERM;
>          CSET H5T_CSET_UTF8;
>          CTYPE H5T_C_S1;

Your char arrays are being stored as strings, not 1-byte characters.  This
incurs overhead for each character.  I am not familiar with the exact
details of physical storage of HDF5 strings, but it doesn't matter.
This scheme is inefficient, and you should find something better.

I vaguely recall some changes in netcdf-4 character storage in recent
years.  Since you are using an older version of the netcdf library, first
try the latest version.  Otherwise, as I said elsewhere, go to 64-bit or
CDF5 formats.

If netcdf-4 is important to you for some reason, you might also consider
encoding your char data into signed or unsigned bytes.


On Sat, May 2, 2020 at 10:38 AM Davide Sangalli <davide.sangalli@xxxxxx>
wrote:

> h5stat -Ss ndb.BS_COMPRESS0.005000_Q1
>
> Filename: ndb.BS_COMPRESS0.005000_Q1
> Free-space section threshold: 1 bytes
> Small size free-space sections (< 10 bytes):
>         Total # of small size sections: 0
> Free-space section bins:
>         Total # of sections: 0
> File space management strategy: H5F_FILE_SPACE_ALL
> Summary of file space information:
>   File metadata: 4355 bytes
>   Raw data: 16356758312 bytes
>   Amount/Percent of tracked free space: 0 bytes/0.0%
>   Unaccounted space: 6216 bytes
> Total space: 16356768883 bytes
>
>
>
> On Sat, May 2, 2020 at 6:28 PM +0200, "Davide Sangalli" <
> davide.sangalli@xxxxxx> wrote:
>
> h5dump -Hp ndb.BS_COMPRESS0.005000_Q1
>> HDF5 "ndb.BS_COMPRESS0.005000_Q1" {
>> GROUP "/" {
>>    ATTRIBUTE "_NCProperties" {
>>       DATATYPE  H5T_STRING {
>>          STRSIZE 57;
>>          STRPAD H5T_STR_NULLTERM;
>>          CSET H5T_CSET_ASCII;
>>          CTYPE H5T_C_S1;
>>       }
>>       DATASPACE  SCALAR
>>    }
>>    DATASET "BSE_RESONANT_COMPRESSED1" {
>>       DATATYPE  H5T_IEEE_F32LE
>>       DATASPACE  SIMPLE { ( 24776792, 2 ) / ( 24776792, 2 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 198214336
>>          OFFSET 16158554547
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  9.96921e+36
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "DIMENSION_LIST" {
>>          DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
>>          DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
>>       }
>>    }
>>    DATASET "BSE_RESONANT_COMPRESSED1_DONE" {
>>       DATATYPE  H5T_STRING {
>>          STRSIZE 1;
>>          STRPAD H5T_STR_NULLTERM;
>>          CSET H5T_CSET_UTF8;
>>          CTYPE H5T_C_S1;
>>       }
>>       DATASPACE  SIMPLE { ( 2025000000 ) / ( 2025000000 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 2025000000
>>          OFFSET 8100002379
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  ""
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "DIMENSION_LIST" {
>>          DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
>>          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>>       }
>>    }
>>    DATASET "BSE_RESONANT_COMPRESSED2_DONE" {
>>       DATATYPE  H5T_STRING {
>>          STRSIZE 1;
>>          STRPAD H5T_STR_NULLTERM;
>>          CSET H5T_CSET_UTF8;
>>          CTYPE H5T_C_S1;
>>       }
>>       DATASPACE  SIMPLE { ( 2025000000 ) / ( 2025000000 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 2025000000
>>          OFFSET 10125006475
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  ""
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "DIMENSION_LIST" {
>>          DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
>>          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>>       }
>>    }
>>    DATASET "BSE_RESONANT_COMPRESSED3_DONE" {
>>       DATATYPE  H5T_STRING {
>>          STRSIZE 1;
>>          STRPAD H5T_STR_NULLTERM;
>>          CSET H5T_CSET_UTF8;
>>          CTYPE H5T_C_S1;
>>       }
>>       DATASPACE  SIMPLE { ( 781887360 ) / ( 781887360 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 781887360
>>          OFFSET 15277557963
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  ""
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "DIMENSION_LIST" {
>>          DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
>>          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>>       }
>>    }
>>    DATASET "BS_K_compressed1" {
>>       DATATYPE  H5T_IEEE_F32BE
>>       DATASPACE  SIMPLE { ( 24776792 ) / ( 24776792 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 99107168
>>          OFFSET 16059447379
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  H5D_FILL_VALUE_DEFAULT
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "CLASS" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 16;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "NAME" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 64;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "REFERENCE_LIST" {
>>          DATATYPE  H5T_COMPOUND {
>>             H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
>>             H5T_STD_I32LE "dimension";
>>          }
>>          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>>       }
>>    }
>>    DATASET "BS_K_linearized1" {
>>       DATATYPE  H5T_IEEE_F32BE
>>       DATASPACE  SIMPLE { ( 2025000000 ) / ( 2025000000 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 8100000000
>>          OFFSET 2379
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  H5D_FILL_VALUE_DEFAULT
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "CLASS" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 16;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "NAME" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 64;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "REFERENCE_LIST" {
>>          DATATYPE  H5T_COMPOUND {
>>             H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
>>             H5T_STD_I32LE "dimension";
>>          }
>>          DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
>>       }
>>    }
>>    DATASET "BS_K_linearized2" {
>>       DATATYPE  H5T_IEEE_F32BE
>>       DATASPACE  SIMPLE { ( 781887360 ) / ( 781887360 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 3127549440
>>          OFFSET 12150006475
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  H5D_FILL_VALUE_DEFAULT
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "CLASS" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 16;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "NAME" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 64;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "REFERENCE_LIST" {
>>          DATATYPE  H5T_COMPOUND {
>>             H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
>>             H5T_STD_I32LE "dimension";
>>          }
>>          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>>       }
>>    }
>>    DATASET "complex" {
>>       DATATYPE  H5T_IEEE_F32BE
>>       DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
>>       STORAGE_LAYOUT {
>>          CONTIGUOUS
>>          SIZE 8
>>          OFFSET 16059447371
>>       }
>>       FILTERS {
>>          NONE
>>       }
>>       FILLVALUE {
>>          FILL_TIME H5D_FILL_TIME_IFSET
>>          VALUE  H5D_FILL_VALUE_DEFAULT
>>       }
>>       ALLOCATION_TIME {
>>          H5D_ALLOC_TIME_EARLY
>>       }
>>       ATTRIBUTE "CLASS" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 16;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "NAME" {
>>          DATATYPE  H5T_STRING {
>>             STRSIZE 64;
>>             STRPAD H5T_STR_NULLTERM;
>>             CSET H5T_CSET_ASCII;
>>             CTYPE H5T_C_S1;
>>          }
>>          DATASPACE  SCALAR
>>       }
>>       ATTRIBUTE "REFERENCE_LIST" {
>>          DATATYPE  H5T_COMPOUND {
>>             H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
>>             H5T_STD_I32LE "dimension";
>>          }
>>          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
>>       }
>>    }
>> }
>> }
>>
>>
>>
>> On Sat, May 2, 2020 at 5:55 PM +0200, "Wei-Keng Liao" <
>> wkliao@xxxxxxxxxxxxxxxx> wrote:
>>
>> For HDF5 files, command “h5dump -Hp ndb.BS_COMPRESS0.005000_Q1” shows
>>> the data chunk settings used by all datasets in the file.
>>>
>>> Command “h5stat -Ss ndb.BS_COMPRESS0.005000_Q1” shows information about
>>> free space, metadata, raw data, etc.
>>>
>>> They may reveal why your file is abnormal big.
>>> Most likely it is the chunk setting you used.
>>>
>>> Wei-keng
>>>
>>> > On May 1, 2020, at 6:40 PM, Davide Sangalli  wrote:
>>> >
>>> > I also add
>>> >
>>> > ncvalidator ndb.BS_COMPRESS0.005000_Q1
>>> > Error: Unknow file signature
>>> >     Expecting "CDF1", "CDF2", or "CDF5", but got "�HDF"
>>> > File "ndb.BS_COMPRESS0.005000_Q1" fails to conform with CDF file format 
>>> > specifications
>>> >
>>> > Best,
>>> > D.
>>> >
>>> > On 02/05/20 01:26, Davide Sangalli wrote:
>>> >> Output of ncdump -hs
>>> >>
>>> >> D.
>>> >>
>>> >> ncdump -hs BSK_2-5B_X59RL-50B_SP_bse-io/ndb.BS_COMPRESS0.005000_Q1
>>> >>
>>> >> netcdf ndb.BS_COMPRESS0 {
>>> >> dimensions:
>>> >>         BS_K_linearized1 = 2025000000 ;
>>> >>         BS_K_linearized2 = 781887360 ;
>>> >>         complex = 2 ;
>>> >>         BS_K_compressed1 = 24776792 ;
>>> >> variables:
>>> >>         char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>>> >>                 BSE_RESONANT_COMPRESSED1_DONE:_Storage = "contiguous" ;
>>> >>         char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>>> >>                 BSE_RESONANT_COMPRESSED2_DONE:_Storage = "contiguous" ;
>>> >>         char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>>> >>                 BSE_RESONANT_COMPRESSED3_DONE:_Storage = "contiguous" ;
>>> >>         float BSE_RESONANT_COMPRESSED1(BS_K_compressed1, complex) ;
>>> >>                 BSE_RESONANT_COMPRESSED1:_Storage = "contiguous" ;
>>> >>                 BSE_RESONANT_COMPRESSED1:_Endianness = "little" ;
>>> >> // global attributes:
>>> >>                 :_NCProperties = 
>>> >> "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ;
>>> >>                 :_SuperblockVersion = 0 ;
>>> >>                 :_IsNetcdf4 = 1 ;
>>> >>                 :_Format = "netCDF-4" ;
>>> >>
>>> >>
>>> >>
>>> >> On Sat, May 2, 2020 at 12:24 AM +0200, "Dave Allured - NOAA Affiliate"  
>>> >> wrote:
>>> >>
>>> >> I agree that you should expect the file size to be about 1 byte per 
>>> >> stored character.  IMO the most likely explanation is that you have a 
>>> >> netcdf-4 file with inappropriately small chunk size.  Another 
>>> >> possibility is a 64-bit offset file with crazy huge padding between file 
>>> >> sections.  This is very unlikely, but I do not know                 what 
>>> >> is inside your writer code.
>>> >>
>>> >> Diagnose, please.  Ncdump -hs.  If it is 64-bit offset, I think 
>>> >> ncvalidator can display the hidden pad sizes.
>>> >>
>>> >>
>>> >> On Fri, May 1, 2020 at 3:37 PM Davide Sangalli  wrote:
>>> >> Dear all,
>>> >> I'm a developer of a fortran code which uses netcdf for I/O
>>> >>
>>> >> In one of my runs I created a file with some huge array of characters.
>>> >> The header of the file is the following:
>>> >> netcdf ndb.BS_COMPRESS0 {
>>> >> dimensions:
>>> >>     BS_K_linearized1 = 2025000000 ;
>>> >>     BS_K_linearized2 = 781887360 ;
>>> >> variables:
>>> >>     char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
>>> >>     char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
>>> >>     char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
>>> >> }
>>> >>
>>> >> The variable is declared as nf90_char which, according to the 
>>> >> documentation should be 1 byte per element.
>>> >> Thus I would expect the total size of the file to be 1 
>>> >> byte*(2*2025000000+781887360) ~ 4.5 GB
>>> >> Instead the file size is 16059445323 bytes ~ 14.96 GB, i.e. 10.46 GB 
>>> >> more and a factor 3.33 bigger
>>> >>
>>> >> This happens consistently if I consider the file
>>> >> netcdf ndb {
>>> >> dimensions:
>>> >>     complex = 2 ;
>>> >>     BS_K_linearized1 = 2025000000 ;
>>> >>     BS_K_linearized2 = 781887360 ;
>>> >> variables:
>>> >>     float BSE_RESONANT_LINEARIZED1(BS_K_linearized1, complex) ;
>>> >>     char BSE_RESONANT_LINEARIZED1_DONE(BS_K_linearized1) ;
>>> >>     float BSE_RESONANT_LINEARIZED2(BS_K_linearized1, complex) ;
>>> >>     char BSE_RESONANT_LINEARIZED2_DONE(BS_K_linearized1) ;
>>> >>     float BSE_RESONANT_LINEARIZED3(BS_K_linearized2, complex) ;
>>> >>     char BSE_RESONANT_LINEARIZED3_DONE(BS_K_linearized2) ;
>>> >> }
>>> >> The float component should weight ~36 GB while the char component should 
>>> >> be identical to before, i.e. 4.5 GB for a total of 40.5 GB
>>> >> The file is instead ~ 50.96 GB, i.e. again a factor 10.46 GB bigger than 
>>> >> expected.
>>> >>
>>> >> Why ?
>>> >>
>>> >> My character variables are something like
>>> >> "tnnnntnnnntnnnnnnnntnnnnnttnnnnnnnnnnnnnnnnt..."
>>> >> but the file size is already like that just after the file creation, 
>>> >> i.e. before filling it.
>>> >>
>>> >> Few info about the library, compiled linking to HDF5 (hdf5-1.8.18), with 
>>> >> parallel IO support:
>>> >> Name: netcdf
>>> >> Description: NetCDF Client Library for C
>>> >> URL: http://www.unidata.ucar.edu/netcdf
>>> >> Version: 4.4.1.1
>>> >> Libs: -L${libdir}  -lnetcdf -ldl -lm 
>>> >> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5hl_fortran.a
>>> >>  
>>> >> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_fortran.a
>>> >>  
>>> >> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_hl.a
>>> >>  
>>> >> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5.a
>>> >>  -lz -lm -ldl -lcurl
>>> >> Cflags: -I${includedir}
>>> >>
>>> >> Name: netcdf-fortran
>>> >> Description: NetCDF Client Library for Fortran
>>> >> URL: http://www.unidata.ucar.edu/netcdf
>>> >> Version: 4.4.4
>>> >> Requires.private: netcdf > 4.1.1
>>> >> Libs: -L${libdir} -lnetcdff
>>> >> Libs.private: -L${libdir} -lnetcdff -lnetcdf
>>> >> Cflags: -I${includedir}
>>> >>
>>> >> Best,
>>> >> D.
>>> >> --
>>> >> Davide Sangalli, PhD
>>> >> CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX 
>>> >> Centre
>>> >> Area della Ricerca di Roma 1, 00016 Monterotondo Scalo, Italy
>>> >> http://www.ism.cnr.it/en/davide-sangalli-cv/
>>> >> http://www.max-centre.eu/
>>>
>>>
  • 2020 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: