NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
On 05/07/2014 09:20 AM, Alexis Praga wrote:
On Wed, May 07, 2014 at 09:02:23AM -0500, Rob Latham wrote:i'm not entirely sure what you're asking here. Most parallel I/O libraries carry out I/O to different regions of the file simultaneously (in parallel), and thereby extract more aggregate performance out of the storage system. for any application using any I/O library, the trickiest part is how to decompose your domain over N parallel processes and how to describe that decomposition.To clarify: the way I see it, you can do parallel I/O in three different ways. The first is to reserve a process which will only deal with I/O and other process will exchange data to read/write with it. The second is to have each process read/write independantly. The third is to aggregate the I/O for several processes to improve performances. So my question was: in practice, which approach does parallel netCDF use ?
you can use any of the I/O libraries (netcdf4, Parallel-NetCDF, HDF5) in either of those three models, but the third approach you describe is the use case for which all these libraries were designed.
in strict performance terms -- which in the end is not really the be-all end all -- Argonne-Northwestern Parallel-NetCDF will be hard to beat, unless you are working with record variables.Do you speak from personal experience ? I would be very interested in seeing some data or benchmark about it.
Second hand: Babak Behzad spent a summer at NCAR working with John Dennis doing I/O workload experiments in support of the CESM climate simulation project. I don't know if the results ended up in some kind of paper or other presentation.
==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA
netcdfgroup
archives: