[netcdfgroup] Advice on parallel netcdf

To: <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: [netcdfgroup] Advice on parallel netcdf
From: Ben <Benjamin.M.Auer@xxxxxxxx>
Date: Tue, 15 Jan 2013 15:55:44 -0500

I was looking for general advice on using parallel netcdf/hdf5. I'mworking on development an atmospheric model at NASA.The model of course is distributed with mpi (each process is essentiallyworking one section of the grid describing the world) but much of thefile IO is serial. The arrays to be written are gathered on the rootprocess and the root process then does the reading/writing to the netcdffile. In an attempt to improve the overall IO performance I've beenexperimenting with parallel netcdf/hdf5, where the file for IO is openedfor parallel access on all processes and each process read/writes thedata for the piece of the world it is working in the netcdf file.Here is an outline of what I am doing in the code with a few actual codesnippets:


set some mpi info ...

call MPI_info_create(info,STATUS)
call MPI_Info_set(info,"romio_cb_read", "enable" ,STATUS)
call MPI_Info_set(info,"romio_cb_write", "enable" ,STATUS)

call ESMF_ConfigGetAttribute(CF, cb_buffer_size,Label='cb_buffer_size:', __RC__)

call MPI_Info_set(info,"cb_buffer_size", "16777216" ,STATUS)

status =nf90_create("file_parallel.nc4",IOR(IOR(NF90_CLOBBER,NF90_HDF5),NF90_MPIIO),ncid,comm=comm,info=info)


define dimensions ...
define vars ...
set access to collective for each variable ...

status = nf90_var_par_access(ncid,varid,NF90_COLLECTIVE)

determine start and cnt for process ...
read or write ...


Here are a few general observations.

- In general the IO does not scale with the number of processors and I'mseeing about the same write time for 1 or hundreds of mpi tasks.

- Gathering to root and having root write (and the converse for reading)was generally almost as fast or only marginally slower (2x) than usingparallel IO regardless of mpi tasks.

- Setting the access of each variable to collective was crucial to writeperformance.If the access was set to independent the writing was horribly slow, 10to 20 times longer than the gather to root/root write method.

- In general playing with the buffer size had no appreciable affect onthe performance.

Does anyone have any tricks I haven't thought of or has seen the samething with parallel IO performance? There really aren't that many thingsone can play with other than setting the MPI hints or changing theaccess type for variables (collective or independent). So far I havebeen using intel 11, intel mpi 3 on the gpfs file system but I plan toplay with this on newer intel versions, different MPI stacks, and onlustre instead of gpfs.


--
Ben Auer, PhD   SSAI, Scientific Programmer/Analyst
NASA GSFC,  Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
Phone: 301-286-9176               Fax: 301-614-6246

Follow-Ups:
- Re: [netcdfgroup] Advice on parallel netcdf
  - From: Kokron, Daniel S. (GSFC-610.1)[Computer Sciences Corporation]
- Re: [netcdfgroup] Advice on parallel netcdf
  - From: Rob Latham