NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] Strided reads slow

On Mon, Aug 12, 2013 at 10:02:35AM -0600, Dennis Heimbigner wrote:
> In thinking about this, the only partial solution
> I can think of at the moment is to do internally in the library
> that which you appear to already be doing, namely
> reading chunks and doing your own striding.
> This would work for small strides (say < 8?)
> but would require allocating a chunk of memory
> internally of size 8*element-size*n, where n is
> the number of stride elements to get at one time.
> It might keep the external interface simple while
> providing a speed up of some amount, but does not
> really solve the underlying problem that reading
> individual elements from a netcdf-3 or netcdf-4/HDF5 file
> is slow.

I guess you are talking about a single process doing i/o?  

If there are more than one processes available, do the I/O in parallel
and enable collective I/O.

==rob

> =Dennis Heimbigner
>  Unidata
> 
> 
> Peglar, Patrick wrote:
> >Hi
> >
> >I just thought I'd ask the world in general whether other people are having 
> >trouble with this.
> >
> >I was contacted for an internal support issue by someone getting very slow 
> >reading performance from large Netcdf4 files.
> >He was doing "strided" access to a variable (i.e. reading 1-of-every-N 
> >points).
> >I produced a simple C api testcase, which reads all of a 1M float array in 
> >about 2 mSecs, but takes nearly 4 seconds to load every-other-point 
> >(stride=2).
> >
> >This has already been discussed with the dev team, who replied variously...
> >   -----Original Message-----
> >   From: Unidata netCDF Support [mailto:support-netcdf@xxxxxxxxxxxxxxxx]
> >   Sent: 09 August 2013 21:57
> >   To: Peglar, Patrick
> >   Cc: support-netcdf@xxxxxxxxxxxxxxxx
> >   Subject: [netCDF #ZFB-587742]: Reading variable with strides very slow
> >
> >   Patrick,
> >
> >   This turns out to be a known problem with HDF5 performance:
> >
> >     
> > http://mail.lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2012-November/006195.html
> >
> >   --Russ
> >
> >(from older discussions ..)
> >   > > > Patrick-
> >   > > >
> >   > > > Vars in netcdf is inherently slow
> >   > > > (when stride > 1) because it cannot
> >   > > > easily make use of bulk read operations.
> >   > > > So the library must read element by element
> >   > > > from the underlying disk storage. This has
> >   > > > a noticeable effect on performance. This is not
> >   > > > easy to fix because it must do the read using only
> >   > > > the memory that is passed to it by the client.
> >   > > >
> >   > > > For netcdf versions before 4.3.0 (including 4.1.3)
> >   > > > there was an additional factor. For historical
> >   > > > reasons, vars was implemented in terms of varm
> >   > > > so there was some additional overhead.
> >   > > >
> >   > > > If you upgrade to 4.3.0, you will see some performance
> >   > > > improvement but not, probably, enough to solve your problem.
> >   > > >
> >   > > > Sorry I do not have better news.
> >   > > > =Dennis Heimbigner
> >   > > >  Unidata
> >   > >
> >   > > On the netcdf-3 vs netcdf-4 issue I can at the moment
> >   > > only speculate. As a rule, reading small quantities of data
> >   > > with netcdf-4 is always slower than netcdf-3 because the
> >   > > underlying HDF5 file format is based on b-trees rather than the
> >   > > linear disk layout of netcdf-3. Since vars reads a single
> >   > > element at a time, that overhead can, I suspect, be significant.
> >   > > I am, however surprised that it is as large as you show.
> >   > >
> >   > > =Dennis Heimbigner
> >   > >  Unidata
> >   > >
> >   > In this case, no b-trees are involved, because the data storage is
> >   > contiguous, not chunked (according to ncdump -h -s).  So I'm
> >   > surprised how slow the strided netCDF access is, and suspect there
> >   > might be a performance bug in how netCDF-4 uses the HDF5 API for
> >   > strided access.
> >
> >   Russ Rew                                         UCAR Unidata Program
> >   russ@xxxxxxxxxxxxxxxx                      http://www.unidata.ucar.edu
> >
> >
> >Our original usecase is constrained by memory space limitations.
> >Obviously, workarounds are possible, but all a bit awkward.
> >
> >It seems it is not yet clear that the HDF5 problem alone can explain the 
> >magnitude of the problem, so I think there may still be more to learn about 
> >this.
> >
> >The question is, does this really need addressing
> >-- so, is anyone else having serious problems with this ?
> >
> >Regards
> >Patrick
> >--
> >Patrick Peglar  AVD Team Software Engineer
> >Analysis, Visualisation and Data Team  http://www-avd/
> >Tel: +44 (0)1392 88 5748
> >Email: 
> >patrick.peglar@xxxxxxxxxxxxxxxx<mailto:patrick.peglar@xxxxxxxxxxxxxxxx>
> >Met Office  Fitzroy Road  Exeter  EX1 3PB  
> >web:www.metoffice.gov.uk<http://www.metoffice.gov.uk>
> >
> >
> >
> >
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >netcdfgroup mailing list
> >netcdfgroup@xxxxxxxxxxxxxxxx
> >For list information or to unsubscribe,  visit:
> >http://www.unidata.ucar.edu/mailing_lists/
> 
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



  • 2013 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: