NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?

NOTE: The netcdf-hdf mailing list is no longer active. The list archives are made available for historical reasons.

  • To: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
  • Subject: Re: [netcdf-hdf] a question about HDF5 and large file - why so long to write one value?
  • From: ymuqun@xxxxxxxxxxxx
  • Date: Mon, 20 Aug 2007 08:53:26 -0500
Hi Ed,

I don't think HDF5 will write only the last value since you are asking HDF5 to
create that size of big dataset. It will write 17179869152 bytes + overhead into
the disk. So depending on your system, it may take minutes. 

Quincey may give you more technical explanations. I don't know if using chunks
may help you much. However, I think this is a good case to apply compression
filter since it will compress very well and should overcome I/O time.

Kent

Quoting Ed Hartnett <ed@xxxxxxxxxxxxxxxx>:

> Howdy all!
> 
> I am writing a test program which writes large files (well over 2
> GB). I have some questions about HDF5 and very large files. I need to
> check out whether netCDF-4 has been correctly implemented for best
> performance.
> 
> In the program below, I create 4 datasets, of type double. They are
> one-dimensional, with length 2147483644/4. (That is 17179869152 bytes
> of data.)
> 
> Then I write the last value only in each dataset.
> 
> Took a really long time - minutes. Is this expected? What is HDF5
> doing in the background here? Is there something I can do with
> chunking here to improve the speed of this program?
> 
> I am not setting a fill calue, so what is being written here? I
> naively expected that HDF5 would not write all the data I am skipping,
> but would find a way to write data only around the value that I am
> actually writing...
> 
> The file that this program creates is 17179883735 bytes, which is
> 14583 bytes of HDF5 overhead. Is that about what is expected?
> 
> Any comments welcome...
> 
> Thanks,
> 
> Ed
> 
> /*
>  Copyright 2007, UCAR/Unidata
>  See COPYRIGHT file for copying and redistribution conditions.
> 
>  This program (quickly, but not throughly) tests the large file
>  features of netCDF-4.
> 
>  $Id: tst_large.c,v 1.3 2007/08/18 12:26:38 ed Exp $
> */
> #include <config.h>
> #include <nc_tests.h>
> #include <netcdf.h>
> #include <stdio.h>
> #include <string.h>
> 
> /* This is the magic number for classic format limits: 2 GiB - 4
>    bytes. */
> #define MAX_CLASSIC_BYTES 2147483644
> 
> /* This is the magic number for 64-bit offset format limits: 4 GiB - 4
>    bytes. */
> #define MAX_64OFFSET_BYTES 4294967292
> 
> /* Handy for constucting tests. */
> #define QTR_CLASSIC_MAX (MAX_CLASSIC_BYTES/4)
> 
> /* We will create this file. */
> #define FILE_NAME "tst_large.nc"
> 
> int
> main(int argc, char **argv)
> {
> 
>     printf("\n*** Testing really large files in netCDF-4/HDF5 format,
> quickly.\n");
> 
>     printf("\n*** Testing create of simple, but large, file...");
>     {
> #define DIM_NAME "Time_in_nanoseconds"
> #define NUMDIMS 1
> #define NUMVARS 4
> 
>        int ncid, dimids[NUMDIMS], varid[NUMVARS];
>        char var_name[NUMVARS][NC_MAX_NAME + 1] = {"England", "Scotland",
> "Ireland", "Wales"};
>        size_t index[2] = {QTR_CLASSIC_MAX-1, 0};
>        int ndims, nvars, natts, unlimdimid;
>        nc_type xtype;
>        char name_in[NC_MAX_NAME + 1];
>        size_t len;
>        double pi = 3.1459, pi_in;
>        int i; 
> 
>        /* Create a netCDF netCDF-4/HDF5 format file, with 4 vars. */
>        if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR;
>        if (nc_set_fill(ncid, NC_NOFILL, NULL)) ERR;
>        if (nc_def_dim(ncid, DIM_NAME, QTR_CLASSIC_MAX, dimids)) ERR;
>        for (i = 0; i < NUMVARS; i++)
>        {
>         if (nc_def_var(ncid, var_name[i], NC_DOUBLE, NUMDIMS, 
>                        dimids, &varid[i])) ERR;
>        }
>        if (nc_enddef(ncid)) ERR;
>        for (i = 0; i < NUMVARS; i++)
>         if (nc_put_var1_double(ncid, i, index, &pi)) ERR;
>        if (nc_close(ncid)) ERR;
>        
>        /* Reopen and check the file. */
>        if (nc_open(FILE_NAME, 0, &ncid)) ERR;
>        if (nc_inq(ncid, &ndims, &nvars, &natts, &unlimdimid)) ERR;
>        if (ndims != NUMDIMS || nvars != NUMVARS || natts != 0 || unlimdimid
> != -1) ERR;
>        if (nc_inq_dimids(ncid, &ndims, dimids, 1)) ERR;
>        if (ndims != 1 || dimids[0] != 0) ERR;
>        if (nc_inq_dim(ncid, 0, name_in, &len)) ERR;
>        if (strcmp(name_in, DIM_NAME) || len != QTR_CLASSIC_MAX) ERR;
>        for (i = 0; i < NUMVARS; i++)
>        {
>         if (nc_inq_var(ncid, i, name_in, &xtype, &ndims, dimids, &natts)) ERR;
>         if (strcmp(name_in, var_name[i]) || xtype != NC_DOUBLE || ndims != 1 
> || 
>             dimids[0] != 0 || natts != 0) ERR;
>         if (nc_get_var1_double(ncid, i, index, &pi_in)) ERR;
>         if (pi_in != pi) ERR;
>        }
>        if (nc_close(ncid)) ERR;
>     }
> 
>     SUMMARIZE_ERR;
>     FINAL_RESULTS;
> }
> 
> 
> -- 
> Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx
> 
> _______________________________________________
> netcdf-hdf mailing list
> netcdf-hdf@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/ 
> 







  • 2007 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-hdf archives: