NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Quinn, > I typically write out a couple of meg's of data at the > beginning of the numerical computation, and this > normally takes about a minute or so if the code is > executing on the same cpu as the disk that is being written > to. But if the cpu and disk are different machines, > then it takes over 30 minutes to perform the same writing. I suspect you are appending a new record and using an ncsync() call after every write. In that case these anomalously large times are the result of NFS behavior on synchronous writes and the fact that each write is updating at least two disk blocks (one containing the number of records and one containing the data), so caching isn't working well. If the ncsync() calls are not necessary (and it sounds like they aren't), you could see significant performance improvements by removing them, or perhaps removing all but the last one. For example, I just wrote a program to write 2000 floating-point numbers, one at a time, to a record variable in a netCDF file. If I don't do an ncsync() call after each write, the time to write all 2000 records to the file file on a remote disk is only about twice what it is to write to a local disk: local file remote file no ncsync() call: 0.23 sec 0.42 sec If I add an ncsync() call after each write, the time for writing to the local file increases, but the time for writing the remote file increases much more dramatically: local file remote file ncsync() call: 5.84 sec 196.87 sec These example timings were run on a SPARCstation under Solaris 2.3 with the remote disk being on a SPARCserver under SunOS 4.1.4. > Can you tell me if its normal for netcdf to take a long > time to write across the ethernet, and if there are any > common pitfalls that people run into when trying to > do so? I realize that this might be dependent on > my code, so if it would help you to see some of the > netcdf call's that I'm using, please let me know. >From the above example, it's clear that the use of ncsync() can be costly, especially when updating record variables on an NFS-mounted file. The NetCDF User's Guide advises: It can be expensive in computer resources to always synchronize to disk after every write of variable data or change of an attribute value. There are two reasons you might want to synchronize after writes: To minimize data loss in case of abnormal termination, or To make data available to other processes for reading immediately after it is written. But note that a process that already had the file open for reading would not see the number of records increase when the writing process calls ncsync; to accomplish this, the reading process must call ncsync. Data is automatically synchronized to disk when a netCDF file is closed, or whenever you leave define mode. Data is also synchronized to disk whenever the buffer used in the netCDF implementation fills, so you should rarely need to use ncsync() unless for one of the above reasons. --Russ ______________________________________________________________________________ Russ Rew UCAR Unidata Program russ@xxxxxxxxxxxxxxxx P.O. Box 3000 http://www.unidata.ucar.edu/ Boulder, CO 80307-3000 ______________________________________________________________________________
netcdfgroup
archives: