NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Jim, I am using the gpfs filesystem, but did not set any MPI-IO hints. I did not do processor binding, but I guess binding could help if less processors used on a node. I am actually using NC_MPIPOSIX, rather than NC_MPIIO as the later will give even worse timing. The 5G file has 170 variables, with some of them have size: [ 1 <time | unlimited>, 27 <ilev>, 768 <lat>, 1152 <lon> ] and used chunk size (1, 1, 192, 288). The last part more like a netcdf developers work. Thanks, Wei huangwei@xxxxxxxx VETS/CISL National Center for Atmospheric Research P.O. Box 3000 (1850 Table Mesa Dr.) Boulder, CO 80307-3000 USA (303) 497-8924 On Sep 19, 2011, at 10:48 AM, Jim Edwards wrote: > Hi Wei, > > > Are you using the gpfs filesystem and are you setting any MPI-IO hints for > that filesystem? > > Are you using any processor binding technique? Have you experimented with > other settings? > > You stated that the file is 5G but what is the size of a single field and how > is it distributed? In other words is it already aggregated into a nice > blocksize or are you expecting netcdf/MPI-IO to handle that? > > I think that in order to really get a good idea of where the performance > problem might be, you need to start by writing and timing a binary file of > roughly equivalent size, then write an hdf5 file, then write a netcdf4 file. > My guess is that you will find that the performance problem is lower on the > tree... > > - Jim > > On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote: > Hi, netcdfgroup, > > Currently, we are trying to use parallel-enabled NetCDF4. We started with > read/write a 5G file and some computation, we got the following timing (in > wall-clock) on a IBM power machine: > Number of Processors Total(seconds) read(seconds) Write(seconds) > Computation(seconds) > seq 89.137 28.206 > 48.327 11.717 > 1 178.953 44.837 > 121.17 11.644 > 2 167.25 46.571 > 113.343 5.648 > 4 168.138 44.043 > 118.968 2.729 > 8 137.74 25.161 > 108.986 1.064 > 16 113.354 16.359 > 93.253 0.494 > 32 439.481 122.201 > 311.215 0.274 > 64 831.896 277.363 > 588.653 0.203 > > First thing we can see is that when run parallel-enabled code at one > processor, the total > wall-clok time doubled. > Then we did not see the scaling when more processors added. > > Anyone wants to share their experience? > > Thanks, > > Wei Huang > huangwei@xxxxxxxx > VETS/CISL > National Center for Atmospheric Research > P.O. Box 3000 (1850 Table Mesa Dr.) > Boulder, CO 80307-3000 USA > (303) 497-8924 > > > > _______________________________________________ > netcdfgroup mailing list > netcdfgroup@xxxxxxxxxxxxxxxx > For list information or to unsubscribe, visit: > http://www.unidata.ucar.edu/mailing_lists/ >
netcdfgroup
archives: