Re: Possibly of interest: parallel netcdf study

To: netcdf-hdf@xxxxxxxxxxxxxxxx, netcdf-hdf@xxxxxxxxxxxxxxxx
Subject: Re: Possibly of interest: parallel netcdf study
From: "Muqun (Kent) Yang" <ymuqun@xxxxxxxxxxxxx>
Date: Wed, 05 May 2004 08:35:36 -0500

Russ:

Thanks, this was interesting.  I think you want to change the table
heading in Table 1 from "NCAR IBM P690" to "NCSA IBM P690".


Thanks for correction. We will correct them.

I wonder if the results that show less wall clock time for 6 time
steps than for 4 time steps and similarly for 10 time steps less than
for 8 time steps with pnetCDF on the NCSA P690 might be an indication
of a discretization error in the timing.  Or maybe something else was
consuming enough of the machine that the results are unreliable.

I am not sure whether the discretization error in the timing is the reason.It is possible thatthe machine is busy during some runs. The reason we show this figure is fordemonstration that

Parallel NetCDF is worse than Sequential NetCDF with small writes.

But things like parallel file system ,type of platforms, number ofprocessors, the file layout of the model output as well as

MPI-IO and GPFS will also affect the performance.

I'm also curious why the pnetCDF appears to be so much slower than
serial netCDF for small writes.  Do you know what the nature of the
MPI-IO overhead is that could explain what appears to be a 10:1
slowdown for using pnetCDF with 4 time steps on the NCSA P690?  I
could understand maybe a 2:1 slowdown, but 10:1 seems surprisingly
large ...

Thanks for pointing out this. As a matter of fact, we may add more contentsto explain this.


I can think the following factors that may be possibly  affect the performance:

MPI-IO library, parallel NetCDF implementation, parallel parallel filesystem ,type of platforms, number of processors, the file layout of themodel, the domain decomposition of the model. We will write another reportsolely for the performance of ROMS with Parallel NetCDF. In that report wemay talk more about these factors.


One important reason I can think of :

As the paper mentioned, there are about 20 1-element netcdf variablesinside ROMS.All these variables are written in independent IO mode. There are nocorresponding collective IO Parallel NetCDF functions. One strength forParallel NetCDF is the collective IO with good "set file view". So throughindependent IO to write one element into the NetCDF file is not using anyoptimization of Parallel NetCDF. That will, I think, tremendously degradethe performance.

We may do another study to do further investigate whether that will improvethe performance when we stop writing those variables into NetCDF.


Kent

--Russ

References:
- Possibly of interest: parallel netcdf study
  - From: Robert E. McGrath
- Re: Possibly of interest: parallel netcdf study
  - From: Russ Rew