NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] File with large number of variables

Hi Dani,

If you are really interesting in the program efficiency then HDF5 API should be 
considered.
NetCDF4 API is actually a wrapper on top of HDF5 API providing interface 
familiar for NetCDF users.
NetCDF4 provides "simpler" interface in one respect: the user doesn't worry to 
close objects opened or created earlier in the program. And this comes with a 
price: NetCDF API must keep in memory the whole file structure. That's why 
NetCDF API works much slower (and takes much more memory) than HDF5 API on 
files with complex structure.
I have replaced NetCDF code with HDF5 in your example. The resulting code is 
shorter and it will run much faster: please try.

Regards,
Sergei

-----Original Message-----
From: netcdfgroup-bounces@xxxxxxxxxxxxxxxx 
[mailto:netcdfgroup-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Dani
Sent: 03 May 2010 10:41
To: Ed Hartnett
Cc: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: Re: [netcdfgroup] File with large number of variables


Setting the cache to 0 has solved the problem on the definition of the
file. Thanks a lot.

Unfortunately, I'm still not able to write efficiently to the file I
just created. It looks like every call to nc_put_vara takes memory
that is not released.

I attach a code snippet to illustrate this. It is very clear when
executing with num_var = 100 (makes the test faster),
num_elements_var=10000 and buffer_size=1.
If I increase buffer_size the problem is less obvious but it's still
there (set buffer_size = 10 and increase num_elemements_var=100000).
Does not seem to be related to num_var this time but the number of
times nc_put_vara is called.

Any ideas?

Thanks in advance,

Dani


On Fri, Apr 30, 2010 at 8:26 PM, Ed Hartnett <ed@xxxxxxxxxxxxxxxx> wrote:
> Dani <pressec@xxxxxxxxx> writes:
>
>> Hi,
>> I have to write and read data to/from a netcdf file that has 750
>> variables, all of them using unlimited dimensions (only one per
>> variable, some dimensions shared) and 10 fixed dimensions.
>>
>> I have use netcdf-4 (because of the multiple unlimited dimensions
>> requirement) and C API.
>>
>> I'm making some prototyping on my development machine (Linux 2GB RAM)
>> and found several performance issues that I hope someone can help me
>> fix/understand:
>>
>> (1) when i create a file and try to define 1000 variables (all int)
>> and a single shared unlimited dimension, the process takes all
>> available RAM (swap included) and fails with "Error (data:def closed)
>> -- HDF error" after a (long)while.
>>
>> If I do the same closing and opening the file again every 10 or 100
>> new definitions, it works fine.  I can bypass this by creating the
>> file once (ncgen) and using a copy of it on every new file, but I
>> would prefer not to. Why does creating the variables take that much
>> memory?
>
> When you create a netCDF variable, HDF5 allocates a buffer for that
> variable. The default size of the buffer is 1 MB.
>
> I have reproduced your problem, but it can be solved be explicitly
> setting the buffer size for each variable to a lower value. I have
> checked in my tests in libsrc4/tst_vars3.c, but here's the part with the
> cache setting:
>
>      for (v = 0; v < NUM_VARS; v++)
>      {
>         sprintf(var_name, "var_%d", v);
>         if (nc_def_var(ncid, var_name, NC_INT, 1, &dimid, &varid)) ERR_RET;
>         if (nc_set_var_chunk_cache(ncid, varid, 0, 0, 0.75)) ERR_RET;
>      }
>
> Note the call to nc_set_var_chunk_cache(), right after the call to
> nc_def_var.
>
> When I take this line out, I get a serious slowdown around 4000
> variables. (I have more memory available than you do.)
>
> But when I add the call to set_var_chunk_cache(), setting the chunk
> cache to zero, then there is no slowdown, even for 10,000 variables.
>
> Thanks,
>
> Ed
> --
> Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx
>


Click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg==  to report this 
email as spam.


This message has been scanned for viruses by BlackSpider MailControl - 
www.blackspider.com

Attachment: testlimits.c
Description: testlimits.c

  • 2010 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: