NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] File with large number of variables

Setting the cache to 0 has solved the problem on the definition of the
file. Thanks a lot.

Unfortunately, I'm still not able to write efficiently to the file I
just created. It looks like every call to nc_put_vara takes memory
that is not released.

I attach a code snippet to illustrate this. It is very clear when
executing with num_var = 100 (makes the test faster),
num_elements_var=10000 and buffer_size=1.
If I increase buffer_size the problem is less obvious but it's still
there (set buffer_size = 10 and increase num_elemements_var=100000).
Does not seem to be related to num_var this time but the number of
times nc_put_vara is called.

Any ideas?

Thanks in advance,

Dani


On Fri, Apr 30, 2010 at 8:26 PM, Ed Hartnett <ed@xxxxxxxxxxxxxxxx> wrote:
> Dani <pressec@xxxxxxxxx> writes:
>
>> Hi,
>> I have to write and read data to/from a netcdf file that has 750
>> variables, all of them using unlimited dimensions (only one per
>> variable, some dimensions shared) and 10 fixed dimensions.
>>
>> I have use netcdf-4 (because of the multiple unlimited dimensions
>> requirement) and C API.
>>
>> I'm making some prototyping on my development machine (Linux 2GB RAM)
>> and found several performance issues that I hope someone can help me
>> fix/understand:
>>
>> (1) when i create a file and try to define 1000 variables (all int)
>> and a single shared unlimited dimension, the process takes all
>> available RAM (swap included) and fails with "Error (data:def closed)
>> -- HDF error" after a (long)while.
>>
>> If I do the same closing and opening the file again every 10 or 100
>> new definitions, it works fine.  I can bypass this by creating the
>> file once (ncgen) and using a copy of it on every new file, but I
>> would prefer not to. Why does creating the variables take that much
>> memory?
>
> When you create a netCDF variable, HDF5 allocates a buffer for that
> variable. The default size of the buffer is 1 MB.
>
> I have reproduced your problem, but it can be solved be explicitly
> setting the buffer size for each variable to a lower value. I have
> checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
> cache setting:
>
>      for (v = 0; v < NUM_VARS; v++)
>      {
>         sprintf(var_name, "var_%d", v);
>         if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
>         if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
>      }
>
> Note the call to nc_set_var_chunk_cache(), right after the call to
> nc_def_var.
>
> When I take this line out, I get a serious slowdown around 4000
> variables. (I have more memory available than you do.)
>
> But when I add the call to set_var_chunk_cache(), setting the chunk
> cache to zero, then there is no slowdown, even for 10,000 variables.
>
> Thanks,
>
> Ed
> --
> Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx
>
  void tick(char* data) {
    clock_t c2 = clock();
    double millis = ((c2 - c1)* 1000)/ CLOCKS_PER_SEC;
    printf("%s - elapsed %f ms \n", data, millis);
    c1 = c2;
  }

  void testNetCDFLimits() {

    int num_var = 100;
    int num_elements_var = 100000;
    size_t buffer_size = 1;

    int ncid, udim;
    int varids[num_var];
    size_t start;
    int buffer[buffer_size];
    char varname[10];
    char filename[100];

    sprintf(filename, "%d-test.nc4", num_var);

    // create the file //

    if ( nc_create(filename, NC_CLOBBER | NC_NETCDF4, &ncid) ) NCERR;
    tick("created");

    if ( nc_def_dim(ncid, "udim1", 0, &udim) ) NCERR;
    if ( nc_def_dim(ncid, "udim2", 0, &udim) ) NCERR;
    tick("dimensions defined");


    for (int j = 0; j < num_var; j++) {
        sprintf(varname,"var-%d",j);
        if ( nc_def_var(ncid, varname, NC_INT, 1, &udim, &varids[j]) ) NCERR;
        if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
    }
    tick("variables defined");

    if( nc_enddef(ncid) ) NCERR;
    tick("endef");
    if ( nc_close(ncid) ) NCERR;
    tick("closed");


    // open for writing //
    if(nc_open(filename, NC_WRITE, &ncid)) NCERR;

    tick("opened");

    for (int j = 0; j < num_var; j++) {
      sprintf(varname,"var-%d",j);
      if ( nc_inq_varid(ncid, varname, &varids[j]) ) NCERR;
      if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
    }
    tick("inquired variables");

    // iterate to write on vars. On every loop buffer_size elements are written 
on all variables //
    char debug[100];
    for (int k = 0; k < num_elements_var; k = k + buffer_size) {

      for (int j = 0; j < num_var; j++) {

        for (int l = 0; l < buffer_size; l++) {
          buffer[l] = l * j;
        }
        start = k;
        if ( nc_put_vara(ncid, varids[j], &start, &buffer_size, buffer) ) NCERR;
      }
      sprintf(debug, "%d", k);
      tick( debug );
    }
    tick("variables written");

    if( nc_close(ncid) ) NCERR;
    tick("closed");



    // open for reading //
    if ( nc_open(filename, NC_NOWRITE, &ncid) ) NCERR;
    tick("open for reading");
    
    for (int j = 0; j < num_var; j++) {
      sprintf(varname,"var-%d",j);
      if( nc_inq_varid(ncid, varname, &varids[j]) ) NCERR;
      if ( nc_set_var_chunk_cache(ncid, varids[j], 0, 0, 0.75) ) NCERR;
    }
    tick("inquired variables");

    for (int k = 0; k < num_elements_var; k = k + buffer_size) {
      for (int j = 0; j < num_var; j++) {
        start = k;
        if ( nc_get_vara(ncid, varids[j], &start, &buffer_size, buffer) ) NCERR;
      }
      sprintf(debug, "%d", k);
      tick( debug );
    }
    
    tick("variables read");
    if( nc_close(ncid) ) NCERR;
    tick("closed");

}
  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: