Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

To: Wei Huang <huangwei@xxxxxxxx>
Subject: Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
From: Jim Edwards <edwards.jim@xxxxxxxxx>
Date: Mon, 19 Sep 2011 10:48:52 -0600

Hi Wei,

Are you using the gpfs filesystem and are you setting any MPI-IO hints for
that filesystem?

Are you using any processor binding technique?   Have you experimented with
other settings?

You stated that the file is 5G but what is the size of a single field and
how is it distributed?  In other words is it already aggregated into a nice
blocksize or are you expecting netcdf/MPI-IO to handle that?

I think that in order to really get a good idea of where the performance
problem might be, you need to start by writing and timing a binary file of
roughly equivalent size, then write an hdf5 file, then write a netcdf4
file.    My guess is that you will find that the performance problem is
lower on the tree...

- Jim

On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote:

> Hi, netcdfgroup,
>
> Currently, we are trying to use parallel-enabled NetCDF4. We started with
> read/write a 5G file and some computation, we got the following timing (in
> wall-clock) on a IBM power machine:
> Number of Processors    Total(seconds)  read(seconds)   Write(seconds)
>  Computation(seconds)
> seq                                     89.137          28.206
>  48.327          11.717
> 1                                       178.953         44.837
>  121.17          11.644
> 2                                       167.25          46.571
>  113.343         5.648
> 4                                       168.138         44.043
>  118.968         2.729
> 8                                       137.74          25.161
>  108.986         1.064
> 16                                      113.354         16.359
>  93.253          0.494
> 32                                      439.481         122.201
> 311.215         0.274
> 64                                      831.896         277.363
> 588.653         0.203
>
> First thing we can see is that when run parallel-enabled code at one
> processor, the total
> wall-clok time doubled.
> Then we did not see the scaling when more processors added.
>
> Anyone wants to share their experience?
>
> Thanks,
>
> Wei Huang
> huangwei@xxxxxxxx
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>

Follow-Ups:
- Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
  - From: Wei Huang

References:
- [netcdfgroup] Unidata developers blog...
  - From: Ed Hartnett
- [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
  - From: Wei Huang