Quincey Koziol <koziol@xxxxxxxxxxxx> writes:
> I do think it's better to force the user to give you a chunk
> size. Definitely _don't_ use a chunk size of one, the B-tree to
> locate the chunks will be insanely huge. :-(
The user may specify a chunksize in netCDF-4. With a 1 MB chunksize,
wow, it's sure a whole lot faster! Now it takes less than a second.
Also the output file is only 4 MBs. Is that expected? I presume this
is because it does not write more than 1 MB for each of the 4
variables. Neat!
Here's the netCDF code to do chunking. (Note the nc_def_chunking call
after the nc_def_var call.)
chunksize[0] = MEGABYTE/DOUBLE_SIZE;
for (i = 0; i < NUMVARS; i++)
{
if (nc_def_var(ncid, var_name[i], NC_DOUBLE, NUMDIMS,
dimids, &varid[i])) ERR;
if (nc_def_var_chunking(ncid, i, NULL, chunksize, NULL)) ERR;
}
if (nc_enddef(ncid)) ERR;
for (i = 0; i < NUMVARS; i++)
if (nc_put_var1_double(ncid, i, index, &pi)) ERR;
bash-3.2$ time ./tst_large
*** Testing really large files in netCDF-4/HDF5 format, quickly.
*** Testing create of simple, but large, file...ok.
*** Tests successful!
real 0m0.042s
user 0m0.014s
sys 0m0.028s
bash-3.2$ ls -l tst_large.nc
-rw-r--r-- 1 ed ustaff 4208887 2007-08-21 13:52 tst_large.nc
> However, if you are going to attempt to create a heuristic for
> picking a chunk size, here's my best current thoughts on it: try to
> get a chunk of a reasonable size (1MB, say) (but make certain that it
> will contain at least one element, in the case of _really_ big
> compound datatypes :-), then try to make the chunk as "square" as
> possible (i.e. try to get the chunk size in all dimensions to be
> equal). That should give you something reasonable, at least... ;-)
Thanks!
Ed
--
Ed Hartnett -- ed@xxxxxxxxxxxxxxxx