NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

RE: Performance problem with large files

I think what's happening is this:

As data from disk is read in, it is placed into memory. It will
typically be cached in memory in an OS-dependent way. If you are lucky
(or smart) a second access will find the data already in cache, and will
save another disk read.

Now, your 1.6G file is too large for memory and so is getting 0 cache
hits; previously you may have been getting significant use of the cache;
therefore the time has increased more than linearly with file size.
Periodically the OS swaps to disk, bringing everyone else on the system
to a halt. Its unlikely that the swapping is helping anything though,
since there is likely no cache hits.

If you are doing subsetting across the record variable, you are probably
reading the complete file once for each subset.  If you are doing this
8000 time, its a bit much.

So what to do? The best thing to do is to rearrange your file to keep
your data access local. In other words, optimize for reading rather than
writing. Write another version of the file with the 8000 dimension as
the record variable, and use that when you want to read the 1 x 3 x
16000 subset. The original file will be much better for when you want to
read the 8000 x 3 x 1 subset.


> -----Original Message-----
> From: owner-netcdfgroup@xxxxxxxxxxxxxxxx
> [mailto:owner-netcdfgroup@xxxxxxxxxxxxxxxx]On Behalf Of
> hinsen@xxxxxxxxxxxxxxxxxxxxx
> Sent: Friday, June 25, 1999 11:07 AM
> To: netcdfgroup@xxxxxxxxxxxxxxxx
> Subject: Performance problem with large files
>
>
> I have been using netCDF for quite a while now, but this week I worked
> for the first time with really big files: I am reading from one 1.6 GB
> file and writing to another one. The data in the files is essentially
> one single-precision float array of dimensions 8000 x 3 x 16000, the
> last dimension being declared as "unlimited". I read and write
> subarrays of shape 1 x 3 x 16000. My computer is a Pentium II
> biprocessor machine at 450 MHz and with 512 MB of RAM, running Linux.
>
> My problem is that this is not only extremely slow (slower by a factor
> 2000 than doing the same on a file of a hundredth the size), but
> periodically blocks my computer in that all programs wanting to do
> some disk access have to wait for about five seconds until some
> operation is finished. And my office neighbour is complaining about
> the never-ending noise from the disk.
>
> Is there anything I can do do improve the performance of such
> operations? The blocked disk access makes me think that the critical
> operation happens in the Linux kernel, but I am not sure. I'd
> appreciate any advice from people who are more experienced with huge
> data files.
> --
> --------------------------------------------------------------
> -----------------
> Konrad Hinsen                            | E-Mail:
> hinsen@xxxxxxxxxxxxxxx
> Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
> Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
> 45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
> France                                   | Nederlands/Francais
> --------------------------------------------------------------
> -----------------
>