Re: [netcdfgroup] nf90_char size

To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: Re: [netcdfgroup] nf90_char size
From: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
Date: Sat, 2 May 2020 10:20:17 -0600
Wei-king, it can not be chunk settings, because the storage type of all
arrays is contiguous, not chunked.  See the ncdump -hs output below.
Therefore this file does not contain any chunks in the HDF5 formal sense.

Thank you for the information about extracting storage details with h5dump
and h5stat.  I was unfamiliar with how to get these details for
HDF5/netcdf-4.


On Sat, May 2, 2020 at 9:55 AM Wei-Keng Liao <wkliao@xxxxxxxxxxxxxxxx>
wrote:

> For HDF5 files, command “h5dump -Hp ndb.BS_COMPRESS0.005000_Q1” shows
> the data chunk settings used by all datasets in the file.
>
> Command “h5stat -Ss ndb.BS_COMPRESS0.005000_Q1” shows information about
> free space, metadata, raw data, etc.
>
> They may reveal why your file is abnormal big.
> Most likely it is the chunk setting you used.
>
> Wei-keng
>
> > On May 1, 2020, at 6:40 PM, Davide Sangalli <davide.sangalli@xxxxxx>
> wrote:
> >
> > I also add
> >
> > ncvalidator ndb.BS_COMPRESS0.005000_Q1
> > Error: Unknow file signature
> >     Expecting "CDF1", "CDF2", or "CDF5", but got "�HDF"
> > File "ndb.BS_COMPRESS0.005000_Q1" fails to conform with CDF file format
> specifications
> >
> > Best,
> > D.
> >
> > On 02/05/20 01:26, Davide Sangalli wrote:
> >> Output of ncdump -hs
> >>
> >> D.
> >>
> >> ncdump -hs BSK_2-5B_X59RL-50B_SP_bse-io/ndb.BS_COMPRESS0.005000_Q1
> >>
> >> netcdf ndb.BS_COMPRESS0 {
> >> dimensions:
> >>         BS_K_linearized1 = 2025000000 ;
> >>         BS_K_linearized2 = 781887360 ;
> >>         complex = 2 ;
> >>         BS_K_compressed1 = 24776792 ;
> >> variables:
> >>         char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
> >>                 BSE_RESONANT_COMPRESSED1_DONE:_Storage = "contiguous" ;
> >>         char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
> >>                 BSE_RESONANT_COMPRESSED2_DONE:_Storage = "contiguous" ;
> >>         char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
> >>                 BSE_RESONANT_COMPRESSED3_DONE:_Storage = "contiguous" ;
> >>         float BSE_RESONANT_COMPRESSED1(BS_K_compressed1, complex) ;
> >>                 BSE_RESONANT_COMPRESSED1:_Storage = "contiguous" ;
> >>                 BSE_RESONANT_COMPRESSED1:_Endianness = "little" ;
> >> // global attributes:
> >>                 :_NCProperties =
> "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ;
> >>                 :_SuperblockVersion = 0 ;
> >>                 :_IsNetcdf4 = 1 ;
> >>                 :_Format = "netCDF-4" ;
> >>
> >>
> >> On Sat, May 2, 2020 at 12:24 AM +0200, "Dave Allured - NOAA Affiliate" <
> dave.allured@xxxxxxxx> wrote:
> >>
> >> I agree that you should expect the file size to be about 1 byte per
> stored character.  IMO the most likely explanation is that you have a
> netcdf-4 file with inappropriately small chunk size.  Another possibility
> is a 64-bit offset file with crazy huge padding between file sections.
> This is very unlikely, but I do not know                 what is inside
> your writer code.
> >>
> >> Diagnose, please.  Ncdump -hs.  If it is 64-bit offset, I think
> ncvalidator can display the hidden pad sizes.
> >>
> >>
> >> On Fri, May 1, 2020 at 3:37 PM Davide Sangalli <davide.sangalli@xxxxxx>
> wrote:
> >> Dear all,
> >> I'm a developer of a fortran code which uses netcdf for I/O
> >>
> >> In one of my runs I created a file with some huge array of characters.
> >> The header of the file is the following:
> >> netcdf ndb.BS_COMPRESS0 {
> >> dimensions:
> >>     BS_K_linearized1 = 2025000000 ;
> >>     BS_K_linearized2 = 781887360 ;
> >> variables:
> >>     char BSE_RESONANT_COMPRESSED1_DONE(BS_K_linearized1) ;
> >>     char BSE_RESONANT_COMPRESSED2_DONE(BS_K_linearized1) ;
> >>     char BSE_RESONANT_COMPRESSED3_DONE(BS_K_linearized2) ;
> >> }
> >>
> >> The variable is declared as nf90_char which, according to the
> documentation should be 1 byte per element.
> >> Thus I would expect the total size of the file to be 1
> byte*(2*2025000000+781887360) ~ 4.5 GB
> >> Instead the file size is 16059445323 bytes ~ 14.96 GB, i.e. 10.46 GB
> more and a factor 3.33 bigger
> >>
> >> This happens consistently if I consider the file
> >> netcdf ndb {
> >> dimensions:
> >>     complex = 2 ;
> >>     BS_K_linearized1 = 2025000000 ;
> >>     BS_K_linearized2 = 781887360 ;
> >> variables:
> >>     float BSE_RESONANT_LINEARIZED1(BS_K_linearized1, complex) ;
> >>     char BSE_RESONANT_LINEARIZED1_DONE(BS_K_linearized1) ;
> >>     float BSE_RESONANT_LINEARIZED2(BS_K_linearized1, complex) ;
> >>     char BSE_RESONANT_LINEARIZED2_DONE(BS_K_linearized1) ;
> >>     float BSE_RESONANT_LINEARIZED3(BS_K_linearized2, complex) ;
> >>     char BSE_RESONANT_LINEARIZED3_DONE(BS_K_linearized2) ;
> >> }
> >> The float component should weight ~36 GB while the char component
> should be identical to before, i.e. 4.5 GB for a total of 40.5 GB
> >> The file is instead ~ 50.96 GB, i.e. again a factor 10.46 GB bigger
> than expected.
> >>
> >> Why ?
> >>
> >> My character variables are something like
> >> "tnnnntnnnntnnnnnnnntnnnnnttnnnnnnnnnnnnnnnnt..."
> >> but the file size is already like that just after the file creation,
> i.e. before filling it.
> >>
> >> Few info about the library, compiled linking to HDF5 (hdf5-1.8.18),
> with parallel IO support:
> >> Name: netcdf
> >> Description: NetCDF Client Library for C
> >> URL: http://www.unidata.ucar.edu/netcdf
> >> Version: 4.4.1.1
> >> Libs: -L${libdir}  -lnetcdf -ldl -lm
> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5hl_fortran.a
> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_fortran.a
> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5_hl.a
> /nfs/data/bin/Yambo/gcc-8.1.0/openmpi-3.1.0/yambo_ext_libs/gfortran/mpifort/v4/parallel/lib/libhdf5.a
> -lz -lm -ldl -lcurl
> >> Cflags: -I${includedir}
> >>
> >> Name: netcdf-fortran
> >> Description: NetCDF Client Library for Fortran
> >> URL: http://www.unidata.ucar.edu/netcdf
> >> Version: 4.4.4
> >> Requires.private: netcdf > 4.1.1
> >> Libs: -L${libdir} -lnetcdff
> >> Libs.private: -L${libdir} -lnetcdff -lnetcdf
> >> Cflags: -I${includedir}
> >>
> >> Best,
> >> D.
> >> --
> >> Davide Sangalli, PhD
> >> CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX
> Centre
> >> Area della Ricerca di Roma 1, 00016 Monterotondo Scalo, Italy
> >> http://www.ism.cnr.it/en/davide-sangalli-cv/
> >> http://www.max-centre.eu/
>
References:
- [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Dave Allured - NOAA Affiliate
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Davide Sangalli
- Re: [netcdfgroup] nf90_char size
  - From: Wei-Keng Liao