[netcdfgroup] some advice on setting chunk sizes for netCDF-4 data...

To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: [netcdfgroup] some advice on setting chunk sizes for netCDF-4 data...
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
Date: Tue, 08 Dec 2009 13:54:52 -0700

Howdy all!

Here in (normally) sunny Boulder, Colorado, we have been having some
very cold weather. As we huddle around the iron stove in rough-hewn log
cabin that houses the netCDF programming team (wishing we had more
coal for our fire) we fell to talking about how to set chunk sizes for
netCDF-4/HDF5 data.

The setting of good chunk sizes depends on how the data will be
read, but it must be decided when the data are written. 

For those out there who are also interested in increasing performance
with good chunk sizes in netCDF-4/HDF5 files, I can offer some
information.

New Documentation:
------------------

I have added a section on chunking to the NetCDF Users Guide. The latest
version can be found here:
http://www.unidata.ucar.edu/software/netcdf/docs_snapshot/netcdf.html#Chunking

Use the Chunk Cache:
--------------------

The chunk cache is important for chunking. It is (by default) 1 MB for
netCDF-4.0, and the default was increased to 32 MB for
netCDF-4.0.1. (The chunk cache can also be set at run-time with the
nc_set_chunk_cache function; the default can be set at configure time.)

You must set the chunk cache to be larger than one chunk, obviously. How
much larger depends on your access pattern. Note that this is the one
aspect of chunking that can be controlled by the data reader. 

Test Performance with the bm_file Program:
------------------------------------------

There is a program called "bm_file" which comes with the netCDF
distribution (you must configure with --enable-benchmarks), and can be
used to test different chunk/deflation/shuffle settings (with or without
parallel I/O) to guide your selections. It is described in the new
section of the manual.

Default NetCDF-4 Chunking:
--------------------------

The default chunking of netCDF is to assign a chunk size of 1 for
unlimited dimensions, and chunk size matching the full dimension length
for fixed dimensions, unless those fixed dimensions are very large. This
works well for small data sets, or data sets which will be read in one
"record" at a time.

A complete discussion of the default chunking is in the Users Guide. I
am certainly very open to suggestions as to better default chunk size
choices.

Thanks!

Ed

-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx

Follow-Ups:
- Re: [netcdfgroup] some advice on setting chunk sizes for netCDF-4 data...
  - From: Jennifer Adams

2009 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: