NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

Jim,

I am using the gpfs filesystem, but did not set any MPI-IO hints.
I did not do processor binding, but I guess binding could help if
less processors used on a node.
I am actually using NC_MPIPOSIX, rather than NC_MPIIO as the later will give
even worse timing.

The 5G file has 170 variables, with some of them have size:
[ 1 <time | unlimited>, 27 <ilev>, 768 <lat>, 1152 <lon> ]
and used chunk size (1, 1, 192, 288).

The last part more like a netcdf developers work.

Thanks,

Wei

huangwei@xxxxxxxx
VETS/CISL
National Center for Atmospheric Research
P.O. Box 3000 (1850 Table Mesa Dr.)
Boulder, CO 80307-3000 USA
(303) 497-8924





On Sep 19, 2011, at 10:48 AM, Jim Edwards wrote:

> Hi Wei,
> 
> 
> Are you using the gpfs filesystem and are you setting any MPI-IO hints for 
> that filesystem?
> 
> Are you using any processor binding technique?   Have you experimented with 
> other settings?
> 
> You stated that the file is 5G but what is the size of a single field and how 
> is it distributed?  In other words is it already aggregated into a nice 
> blocksize or are you expecting netcdf/MPI-IO to handle that?
> 
> I think that in order to really get a good idea of where the performance 
> problem might be, you need to start by writing and timing a binary file of 
> roughly equivalent size, then write an hdf5 file, then write a netcdf4 file.  
>   My guess is that you will find that the performance problem is lower on the 
> tree...
> 
> - Jim
> 
> On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote:
> Hi, netcdfgroup,
> 
> Currently, we are trying to use parallel-enabled NetCDF4. We started with 
> read/write a 5G file and some computation, we got the following timing (in 
> wall-clock) on a IBM power machine:
> Number of Processors    Total(seconds)  read(seconds)   Write(seconds)  
> Computation(seconds)
> seq                                     89.137          28.206          
> 48.327          11.717
> 1                                       178.953         44.837          
> 121.17          11.644
> 2                                       167.25          46.571          
> 113.343         5.648
> 4                                       168.138         44.043          
> 118.968         2.729
> 8                                       137.74          25.161          
> 108.986         1.064
> 16                                      113.354         16.359          
> 93.253          0.494
> 32                                      439.481         122.201         
> 311.215         0.274
> 64                                      831.896         277.363         
> 588.653         0.203
> 
> First thing we can see is that when run parallel-enabled code at one 
> processor, the total
> wall-clok time doubled.
> Then we did not see the scaling when more processors added.
> 
> Anyone wants to share their experience?
> 
> Thanks,
> 
> Wei Huang
> huangwei@xxxxxxxx
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
> 
> 
> 
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/
> 

  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: