Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?

To: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
From: Quincey Koziol <koziol@xxxxxxxxxxxx>
Date: Mon, 20 Aug 2007 17:16:01 -0500


On Aug 20, 2007, at 5:02 PM, Ed Hartnett wrote:

Quincey Koziol <koziol@xxxxxxxxxxxx> writes:

        The problem is in your computation of the chunk size for the
dataset, in libsrc4/nc4hdf.c, around lines 1059-1084.  The current
computations end up with a chunk of size equal to the dimension size
(2147483644/4 in the code below), i.e. a single 4GB chunk for the
entire dataset.  This is not going to work well, since HDF5 always
reads an entire chunk into memory, updates it and then writes the
entire chunk back out to disk. ;-)

        That section of code looks like it has the beginning of some
heuristics for automatically tuning the chunk size, but it would
probably be better to let the application set a particular chunk
size, if possible.


Ah ha! Well, that's not going to work!

What would be a good chunksize for this (admittedly weird) test case:
writing one value at a time for a huge array. Would a chunksize of one
be crazy? Or the right size?

I do think it's better to force the user to give you a chunk size.Definitely _don't_ use a chunk size of one, the B-tree to locate thechunks will be insanely huge. :-(

However, if you are going to attempt to create a heuristic forpicking a chunk size, here's my best current thoughts on it: try toget a chunk of a reasonable size (1MB, say) (but make certain that itwill contain at least one element, in the case of _really_ bigcompound datatypes :-), then try to make the chunk as "square" aspossible (i.e. try to get the chunk size in all dimensions to beequal). That should give you something reasonable, at least... ;-)


        Quincey

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Follow-Ups:
- Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
  - From: Ed Hartnett

References:
- [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
  - From: Ed Hartnett
- Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
  - From: Quincey Koziol
- Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
  - From: Ed Hartnett