NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: HDF5 chunking questions...

At 01:17 PM 12/16/2003, John Caron wrote:

perhaps we should have 3 modes of chunking, that the user can choose:

1) preseve the record oriented nature of our current unlimited dimension to optimize sequential reading of the array. 2) choose an optimal chunk size (some small multiple of disk block size: 8K, 16K, 32K?) and subdivide the dimensions evenly to optimize over all types of subsetting.
   3) full user spec of chunk size and chunk dimension size.

Nice.  We'd want to do some experimentation, but this looks good.

Then, somebody mentioned having a "chunk performance benchmark suite"; people could use this to test those options on their particular system.

Mike


--
Mike Folk, Scientific Data Tech (HDF)   http://hdf.ncsa.uiuc.edu
NCSA/U of Illinois at Urbana-Champaign          217-244-0647 voice
605 E. Springfield Ave., Champaign IL 61820 217-244-1987 fax
From owner-netcdf-hdf@xxxxxxxxxxxxxxxx 16 2003 Dec -0700 14:05:18
Message-ID: <wrx3cbk1njl.fsf@xxxxxxxxxxxxxxxxxxxxxxx>
Date: 16 Dec 2003 14:05:18 -0700
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
To: netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: another chunking question - when should chunking NOT be used at all?
Received: (from majordo@localhost)
        by unidata.ucar.edu (UCAR/Unidata) id hBGL5KHj025831
        for netcdf-hdf-out; Tue, 16 Dec 2003 14:05:20 -0700 (MST)
Received: from rodney.unidata.ucar.edu (rodney.unidata.ucar.edu 
[128.117.140.88])
        by unidata.ucar.edu (UCAR/Unidata) with ESMTP id hBGL5Ip2025827
        for <netcdf-hdf@xxxxxxxxxxxxxxxx>; Tue, 16 Dec 2003 14:05:19 -0700 (MST)
Organization: UCAR/Unidata
Keywords: 200312162105.hBGL5Ip2025827
Lines: 13
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-netcdf-hdf@xxxxxxxxxxxxxxxx
Precedence: bulk

Howdy all!

Another question relating to chunking - if we don't need it (i.e. for
a dataset with no unlimited dimensions), do we still chunk it?

Or is it better to leave it contiguous?

(With the mental reservation that only chunked datasets will be able
to take advantage of compression, when we get to that feature.)

Thanks!

Ed