NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Thanks John and Joe. Yes, I do know that disk I/O is the limiting factor, but optimising it isn't easy due to all the buffers and disk caches (as you and Joe have pointed out). Interestingly, I can "see" these caches. When I read random chunks of data from a file, sometimes a read takes ~1ms, sometimes ~5ms and sometimes ~10ms, with not much in between these values (a trimodal distribution). I think these must be three levels of caching. Also, if I run the same test multiple times on the same file, the number of 10ms reads drops off, and the number of 1ms reads increases. (I'm on a Windows XP laptop with a 5400 rpm hard drive.) I guess the only way to bypass the caches would be to cycle between a large set of data files, which are in total bigger than the disk caches. (I'm trying to simulate a busy server environment.) By the way, I've been digging in the IOSPs and the ucar RandomAccessFile class. The ucar RAF seems to be the same as java.io.RAF except that it implements an 8k buffer which is supposed to increase performance. But the code of N3raf (which extends N3iosp and I assume is the default class used for data reading) uses raf.readToByteChannel(), which bypasses the 8k buffer. So could a java.io.RAF have been used in this case? To expand a little on my use case: in general, to create a low-resolution map of data for a WMS, one has to read only a small fraction of the available data in the file. So I'm looking for an efficient way to read sparse clouds of data (not evenly-spaced). Reading point-by-point is not efficient, but nor is reading lots of data, converting it to new types, then throwing most of it away. Cheers, Jon From: John Caron [mailto:caron@xxxxxxxxxxxxxxxx] Sent: 15 July 2010 20:00 To: Joe Sirott Cc: Jon Blower; netcdf-java@xxxxxxxxxxxxxxxx Subject: Re: [netcdf-java] Reading contiguous data in NetCDF files Thanks Joe, I agree with your analysis. Its very hard to accurately time I/O, because there are disk and OS caches etc. Netcdf-Java also caches small variable data. Netcdf-4 format is an order magnitude more complicated, with chunking and compression and non-deterministic (perhaps order-dependent is a better term) data placement. The most useful optimisation is to try to make the commonly wanted subset fit inside of a single (or small number of) chunks. Jon, have you profiled your code and are sure that disk reading is the bottleneck? On 7/15/2010 11:39 AM, Joe Sirott wrote: Hi Jon, Benchmarks like these can be quite tricky, due to the interaction of the application with the OS. Unless you purge the OS page cache each time you run your benchmark, your application (after the first test) isn't reading data from disk but is instead copying data from the disk page cache into local buffers, and the benchmark will likely be CPU bound and execution time will be dominated by type conversion from raw buffered data arrays into Java types. That would account for the strange results you are seeing when reading 4K rather than 8K data chunks. Also, for more info on netcdf-4 chunking/compression, Unidata has a nice introduction at http://hdfeos.org/workshops/ws13/presentations/day1/HDF5-EOSXIII-Advance d-Chunking.ppt Cheers, Joe Jon Blower wrote: Hi John, Thanks for this. netcdf-3 IOSP uses a bufferred RandomAccessFile implementation, default 8096 byte buffer, which always reads 8096 bytes at a time. the only useful optimisation is to change the buffer size. Good to know, thanks. I would have thought that this would mean that there's no point reading data of less than 8096 bytes. But in my tests I see that even below this value there's a linear relationship between the size of data being read and the time to read the data (i.e. it's quicker to read 4K than 8K). I don't quite understand this. Are there any specs for the NetCDF-4 format that I could read? I'd like to know more about how the data are compressed, and how much data actually need to be read from disk to get a subset. Cheers, Jon -----Original Message----- From: netcdf-java-bounces@xxxxxxxxxxxxxxxx [mailto:netcdf-java-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John Caron Sent: 15 July 2010 00:26 To: netcdf-java@xxxxxxxxxxxxxxxx Subject: Re: [netcdf-java] Reading contiguous data in NetCDF files Hi Jon: On 7/14/2010 2:51 PM, Jon Blower wrote: Hi, I don't know anything about how data in NetCDF files are organized, but intuitively, I would think that, for a general 2D array, the data at points [j,i] and [j,i+1] would be contiguous on disk. Is this right? (i is the fastest-varying dimension) yes, for variables in netcdf-3 files I might also suppose that, for an array of size [nj,ni], that the data at points [j,ni-1] and [j+1,0] would also be contiguous. Is this true? yes, for variables in netcdf-3 files that dont use the unlimited dimension If so, is there a method in Java-NetCDF that would allow me to read these two points (and only these two points) in a single operation? netcdf-3 IOSP uses a bufferred RandomAccessFile implementation, default 8096 byte buffer, which always reads 8096 bytes at a time. the only useful optimisation is to change the buffer size. (Background: I'm trying to improve the performance of ncWMS by optimising how data is read from disk. This seems to involve striking a balance between the number of individual read operations and the size of each read operation.) Thanks, Jon -- Dr Jon Blower Technical Director, Reading e-Science Centre Environmental Systems Science Centre University of Reading Harry Pitt Building, 3 Earley Gate Reading RG6 6AL. UK Tel: +44 (0)118 378 5213 Fax: +44 (0)118 378 6413 j.d.blower@xxxxxxxxxxxxx http://www.nerc-essc.ac.uk/People/Staff/Blower_J.htm _______________________________________________ netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/ _______________________________________________ netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/ _______________________________________________ netcdf-java mailing list netcdf-java@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
netcdf-java
archives: