NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

  • To: Quincey Koziol <koziol@xxxxxxxxxxxx>
  • Subject: Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
  • From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
  • Date: Tue, 21 Aug 2007 14:00:30 -0600
Quincey Koziol <koziol@xxxxxxxxxxxx> writes:


>       I do think it's better to force the user to give you a chunk
> size.   Definitely _don't_ use a chunk size of one, the B-tree to
> locate the  chunks will be insanely huge.  :-(

The user may specify a chunksize in netCDF-4. With a 1 MB chunksize,
wow, it's sure a whole lot faster! Now it takes less than a second.

Also the output file is only 4 MBs. Is that expected? I presume this
is because it does not write more than 1 MB for each of the 4
variables. Neat!

Here's the netCDF code to do chunking. (Note the nc_def_chunking call
after the nc_def_var call.)

       chunksize[0] = MEGABYTE/DOUBLE_SIZE;
       for (i = 0; i < NUMVARS; i++)
       {
          if (nc_def_var(ncid, var_name[i], NC_DOUBLE, NUMDIMS, 
                         dimids, &varid[i])) ERR;
          if (nc_def_var_chunking(ncid, i, NULL, chunksize, NULL)) ERR;
       }
       if (nc_enddef(ncid)) ERR;
       for (i = 0; i < NUMVARS; i++)
          if (nc_put_var1_double(ncid, i, index, &pi)) ERR;

bash-3.2$ time ./tst_large

*** Testing really large files in netCDF-4/HDF5 format, quickly.
*** Testing create of simple, but large, file...ok.
*** Tests successful!

real    0m0.042s
user    0m0.014s
sys     0m0.028s
bash-3.2$ ls -l tst_large.nc
-rw-r--r-- 1 ed ustaff 4208887 2007-08-21 13:52 tst_large.nc

>       However, if you are going to attempt to create a heuristic for
> picking a chunk size, here's my best current thoughts on it: try to
> get a chunk of a reasonable size (1MB, say) (but make certain that it
> will contain at least one element, in the case of _really_ big
> compound datatypes :-), then try to make the chunk as "square" as
> possible (i.e. try to get the chunk size in all dimensions to be
> equal).  That should give you something reasonable, at least... ;-)

Thanks!

Ed

-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx



  • 2007 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: