Hi Ingo,
The file is physically written (or read) as a sequence of bytes,
clusters, or sectors (depending on storage device); that is serially.
Only function of "parallel-enabled" file API is making an illusion of
parallel access. It makes sense only if you don't bother about
performance.
Much better performance can be achieved when serialization is
implemented in the program (executed in parallel) considering internal
program logic; then the file is accessed in serial way which is
obviously faster.
And if you really care about performance use native HDF API: it is much
faster.       
Regards,
Sergei
-----Original Message-----
From: ingo.bethke@xxxxxx [mailto:ingo.bethke@xxxxxx] 
Sent: 19 September 2011 20:25
To: Shibaev, Sergei
Cc: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
Hi Sergei,
Could you please elaborate on what you wrote? Do you think the problem  
is a file (or record) locking issue? if yes, would that explain the  
different result for read and write?
I've tried to test parallel-io on our CRAY XT4 but didn't manage to  
get even close to the serial io performance...
Thanks, Ingo
Siterer "Shibaev, Sergei" <Sergei.Shibaev@xxxxxxxxxx>:
> Hi Wei,
>
> Your result is quite obvious because the file itself is a serial
device,
> so "parallel" read/write means serialising of parallel requests. Of
> course, it is at least two times slower than serial requests from
single
> process.
> If you can make file access serialising in your program it could be
much
> faster than common parallel-enabled API.
>
> Regards,
> Sergei Shibaev
>
> -----Original Message-----
> From: netcdfgroup-bounces@xxxxxxxxxxxxxxxx
> [mailto:netcdfgroup-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Wei Huang
> Sent: 19 September 2011 17:28
> To: netcdfgroup@xxxxxxxxxxxxxxxx
> Subject: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
>
>
> Hi, netcdfgroup,
>
> Currently, we are trying to use parallel-enabled NetCDF4. We started
> with read/write a 5G file and some computation, we got the following
> timing (in wall-clock) on a IBM power machine:
> Number of Processors  Total(seconds)  read(seconds)   Write(seconds)
> Computation(seconds)
> seq                                   89.137          28.206
> 48.327                11.717
> 1                                     178.953         44.837
> 121.17                11.644
> 2                                     167.25          46.571
> 113.343               5.648
> 4                                     168.138         44.043
> 118.968               2.729
> 8                                     137.74          25.161
> 108.986               1.064
> 16                                    113.354         16.359
> 93.253                0.494
> 32                                    439.481         122.201
> 311.215               0.274
> 64                                    831.896         277.363
> 588.653               0.203
>
> First thing we can see is that when run parallel-enabled code at one
> processor, the total wall-clok time doubled. Then we did not see the
> scaling when more processors added.
>
> Anyone wants to share their experience?
>
> Thanks,
>
> Wei Huang
> huangwei@xxxxxxxx
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
> Click
>
https://www.mailcontrol.com/sr/qAdBheWrG8zTndxI!oX7UhQ1x5oWB0K1JQiz+EsP7
> a8E+4PlxZ84awGZMwDw3dulXStBmSRlfipTHufDF4Ashw==  to report this email
as
> spam.
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:  
> http://www.unidata.ucar.edu/mailing_lists/
>