- To: <netcdfgroup@xxxxxxxxxxxxxxxx>
- Subject: [netcdfgroup] Advice on parallel netcdf
- From: Ben <Benjamin.M.Auer@xxxxxxxx>
- Date: Tue, 15 Jan 2013 15:55:44 -0500
I was looking for general advice on using parallel netcdf/hdf5. I'm
working on development an atmospheric model at NASA.
The model of course is distributed with mpi (each process is essentially
working one section of the grid describing the world) but much of the
file IO is serial. The arrays to be written are gathered on the root
process and the root process then does the reading/writing to the netcdf
file. In an attempt to improve the overall IO performance I've been
experimenting with parallel netcdf/hdf5, where the file for IO is opened
for parallel access on all processes and each process read/writes the
data for the piece of the world it is working in the netcdf file.
Here is an outline of what I am doing in the code with a few actual code
snippets:
set some mpi info ... call MPI_info_create(info,STATUS) call MPI_Info_set(info,"romio_cb_read", "enable" ,STATUS) call MPI_Info_set(info,"romio_cb_write", "enable" ,STATUS)call ESMF_ConfigGetAttribute(CF, cb_buffer_size, Label='cb_buffer_size:', __RC__)
call MPI_Info_set(info,"cb_buffer_size", "16777216" ,STATUS)status = nf90_create("file_parallel.nc4",IOR(IOR(NF90_CLOBBER,NF90_HDF5),NF90_MPIIO),ncid,comm=comm,info=info)
define dimensions ... define vars ... set access to collective for each variable ... status = nf90_var_par_access(ncid,varid,NF90_COLLECTIVE) determine start and cnt for process ... read or write ... Here are a few general observations.- In general the IO does not scale with the number of processors and I'm seeing about the same write time for 1 or hundreds of mpi tasks.
- Gathering to root and having root write (and the converse for reading) was generally almost as fast or only marginally slower (2x) than using parallel IO regardless of mpi tasks.
- Setting the access of each variable to collective was crucial to write performance. If the access was set to independent the writing was horribly slow, 10 to 20 times longer than the gather to root/root write method.
- In general playing with the buffer size had no appreciable affect on the performance.
Does anyone have any tricks I haven't thought of or has seen the same thing with parallel IO performance? There really aren't that many things one can play with other than setting the MPI hints or changing the access type for variables (collective or independent). So far I have been using intel 11, intel mpi 3 on the gpfs file system but I plan to play with this on newer intel versions, different MPI stacks, and on lustre instead of gpfs.
-- Ben Auer, PhD SSAI, Scientific Programmer/Analyst NASA GSFC, Global Modeling and Assimilation Office Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771 Phone: 301-286-9176 Fax: 301-614-6246
- Follow-Ups:
- Re: [netcdfgroup] Advice on parallel netcdf
- From: Kokron, Daniel S. (GSFC-610.1)[Computer Sciences Corporation]
- Re: [netcdfgroup] Advice on parallel netcdf
- From: Rob Latham
- Re: [netcdfgroup] Advice on parallel netcdf