NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

Hi Wei,


Are you using the gpfs filesystem and are you setting any MPI-IO hints for
that filesystem?

Are you using any processor binding technique?   Have you experimented with
other settings?

You stated that the file is 5G but what is the size of a single field and
how is it distributed?  In other words is it already aggregated into a nice
blocksize or are you expecting netcdf/MPI-IO to handle that?

I think that in order to really get a good idea of where the performance
problem might be, you need to start by writing and timing a binary file of
roughly equivalent size, then write an hdf5 file, then write a netcdf4
file.    My guess is that you will find that the performance problem is
lower on the tree...

- Jim

On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote:

> Hi, netcdfgroup,
>
> Currently, we are trying to use parallel-enabled NetCDF4. We started with
> read/write a 5G file and some computation, we got the following timing (in
> wall-clock) on a IBM power machine:
> Number of Processors    Total(seconds)  read(seconds)   Write(seconds)
>  Computation(seconds)
> seq                                     89.137          28.206
>  48.327          11.717
> 1                                       178.953         44.837
>  121.17          11.644
> 2                                       167.25          46.571
>  113.343         5.648
> 4                                       168.138         44.043
>  118.968         2.729
> 8                                       137.74          25.161
>  108.986         1.064
> 16                                      113.354         16.359
>  93.253          0.494
> 32                                      439.481         122.201
> 311.215         0.274
> 64                                      831.896         277.363
> 588.653         0.203
>
> First thing we can see is that when run parallel-enabled code at one
> processor, the total
> wall-clok time doubled.
> Then we did not see the scaling when more processors added.
>
> Anyone wants to share their experience?
>
> Thanks,
>
> Wei Huang
> huangwei@xxxxxxxx
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: