NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi, I have to write and read data to/from a netcdf file that has 750 variables, all of them using unlimited dimensions (only one per variable, some dimensions shared) and 10 fixed dimensions. I have use netcdf-4 (because of the multiple unlimited dimensions requirement) and C API. I'm making some prototyping on my development machine (Linux 2GB RAM) and found several performance issues that I hope someone can help me fix/understand: (1) when i create a file and try to define 1000 variables (all int) and a single shared unlimited dimension, the process takes all available RAM (swap included) and fails with "Error (data:def closed) -- HDF error" after a (long)while. If I do the same closing and opening the file again every 10 or 100 new definitions, it works fine. I can bypass this by creating the file once (ncgen) and using a copy of it on every new file, but I would prefer not to. Why does creating the variables take that much memory? (2) writing and reading data to variables there's a huge performance difference between writing/reading one record at a time and writing/reading several records at a time (buffering). To keep the logic of my program simple my first approach was to write one-on-one (as the program works this way: reads 1 record on each variable, processes and writes it down) and play with the chunk size and chunk cache, but so far it hasn't helped much. Should I build a custom "buffering" layer or the chunk cache can help here? or should I simply get more ram :)? (3) Even when buffering, I see a performance degradation (memory goes down fast, and processing time increases) as the number of records per variable processed (written or read) increase. I really could use some "expert" advice on the best way to address this issues. Thanks in advance. Dani
netcdfgroup
archives: