NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: performance degrades with filesize

Konrad,

> I think I found what slows it down. In my Python interface, every
> write operation to an array with an unlimited dimension is followed by
> a call to nc_inq_dimlen() in order to keep the internal dimension
> count up to date. I had assumed that this call would be cheap and
> independent of the file size, since it ought to be no more than an
> access to some data structure that the netCDF library should keep in
> memory.

The implementation of nc_inq_dimlen() is a bit more complicated, since
changes were added to support multiprocessing on Cray T3E systems
(September 1999).  However, it still should take constant time,
independent of the file size.  So I was puzzled by your findings and
tried to reproduce them.

Since the Python, C++, and Fortran netCDF interfaces all use the C
interface to do the I/O, I tried to duplicate the reported performance
problem in the C interface.  Translating the Python example to C and
running it still shows no apparent performance problem, even when I
add a call to nc_inq_dimlen() after each write operation.

Specifically, the following C program:

  http://www.unidata.ucar.edu/packages/netcdf/jg1.c

accepts a single command-line argument for how many records to write,
creates a netCDF file with the same structure as John Galbraith's
Python script, and prints the time required to append each additional
batch of 5000 records to the initially empty file.

The times are nearly constant even when the file has grown to 1.2
Gbytes:

  $ ./jg1 300000
  record 5000:      1.010 secs
  record 10000:      0.950 secs
  record 15000:      0.940 secs
  record 20000:      1.020 secs
  record 25000:      0.930 secs
  record 30000:      1.030 secs
  record 35000:      1.080 secs
  record 40000:      1.060 secs
  record 45000:      1.040 secs
  record 50000:      0.940 secs
   ...
  record 240000:      1.160 secs
  record 245000:      1.150 secs
  record 250000:      1.130 secs
  record 255000:      1.120 secs
  record 260000:      1.180 secs
  record 265000:      1.160 secs
  record 270000:      1.140 secs
  line 107 of jg1.c: No space left on device

If those reporting the problem could please compile and run this C
program on their systems and report the circumstances under which they
observe performance degrading with file size, or modify the program to
demonstrate the problem, that would assist us in determining the cause
of the problem and fixing it.

--Russ

P.S.  I'm trying to continue doing my job in the face of the terrible
tragedies that occurred earlier today.  It has been very difficult to
focus on work, but not working feels like giving in to those who would
demand that we instead focus on their acts of terror.