Re: [netcdfgroup] abysmal performance

To: Burlen Loring <bloring@xxxxxxx>
Subject: Re: [netcdfgroup] abysmal performance
From: Dave Allured - NOAA Affiliate <dave.allured@xxxxxxxx>
Date: Thu, 2 Jun 2016 14:35:55 -0600
Um, on more careful reading, I like Wei-keng's answer better than mine.  I
missed that Burlen's time dimension size is only 8.  This suggests that in
a worst-case scenario, reading the time coordinate should be only 8 times
slower than reading the lat or lon coordinate.

So I think Wei-keng is right, and the problem is in the file cacheing
somewhere.

Burlen, you said that reading the lon array with 1152 values only takes
7.7e-5 sec.  I wonder if you measured this on repeated reading of the same
file, which may have read from a system file cache at the time you got this
measurement.  Did you try measuring the average read time for lon, across a
large sample of files which had not been recently opened?

--Dave


On Thu, Jun 2, 2016 at 1:57 PM, Wei-keng Liao <wkliao@xxxxxxxxxxxxxxxxxxxxx>
wrote:

>
> Reading 64 bytes at the application level does not mean only 64 bytes
> were read at the system level. Most of modern file systems perform
> client-side file caching and read ahead. The amount of read ahead
> depends on the file system setting, usually a multiple of page size.
> You can try this by reading a few data elements of a netCDF variable,
> followed by a few successive elements. I bet the second read will
> cost almost nothing, as that data is already cached in memory.
>
> Wei-keng
>
> On Jun 2, 2016, at 2:41 PM, Burlen Loring wrote:
>
> > Hi Tom,
> >
> > That's not an option, and it has it's own issues. for example if file
> size exceeds the size of a tape drive we can't archive it. Beside it
> doesn't seem like a lustre metadata issue, open is relatively fast, like
> 0.096 sec. and wouldn't explain why reading the time dimension with only 8
> values takes on the order of 1 sec while reading the lon dimension with
> 1152 values takes on the order of 1e-4 sec. ?
> >
> > Burlen
> >
> > On 06/02/2016 12:16 PM, Tom Fogal wrote:
> >> Hi Burlen!
> >>
> >> Can you switch things to a single file?  I'm not up on NetCDF details,
> >> but I would expect that case to be very slow---*especially* with
> >> Lustre---using even advanced POSIX API programming.
> >>
> >> -tom
> >>
> >> On 06/02/2016 12:12 PM, Burlen Loring wrote:
> >>> I am working on a climate analysis app. it must scan the dataset,
> >>> comprised of many netcdf files to determine the available time steps.
> >>> I'm finding that during reading the time axis from the file the
> >>> performance is really bad. In one example the dataset is 10k files,
> each
> >>> file has 8 time values, each value is 8 bytes, so 64 bytes are read per
> >>> file. the time taken to read these arrays is between 0.3 and 2.2
> seconds
> >>> which puts the measured performance between 213 bytes per second and 29
> >>> bytes per second! I've had dial modems faster than that!
> >>>
> >>> reading other much larger arrays is much much faster, eg reading lon
> >>> array with 1152 values only takes 7.7e-5 sec. I don't get it.
> >>>
> >>> I'd like some advise on this. On my workstation the read times of time
> >>> values on the same dataset ranges between 1.9e-5 and 4.7e-5 sec. Thus
> >>> something seems very wrong with the performance on the cray. btw this
> is
> >>> on the lustre scratch2 file system and files are 433 MB each and
> striped
> >>> across 24 ost's with a stripe size of 1MB.
> >>>
> >>> Burlen
> >>>
> >>> here is the raw data gathered from edison
> >>>
> >>> $/usr/common/graphics/teca/builds/TECA/bin/bin/teca_metadata_probe
> >>> --input_regex
> >>>
> /scratch2/scratchdirs/prabhat/TCHero/data/cam5_1_amip_run2'\.cam2\.h2.2005-09-[0-9][0-9]-
> 10800.nc$'
> >>> scan_files = 1.1513
> >>> open_file = 0.0967366
> >>> x_axis_metadata = 7.4109e-05
> >>> y_axis_metadata = 7.732e-06
> >>> z_axis_metadata = 3.84e-07
> >>> t_axis_metadata = 5.418e-06
> >>> variables = 0.000824452
> >>> read_x_axis = 5.9976e-05
> >>> read_y_axis = 1.0444e-05
> >>> read_z_axis = 4.294e-06
> >>> read_t_axis = 0.368079
> >>> open_file = 0.122236
> >>> t_axis_metadata = 4.906e-05
> >>> read_t_axis = 0.431482
> >>> open_file = 0.0953903
> >>> t_axis_metadata = 3.4205e-05
> >>> read_t_axis = 0.3815
> >>> open_file = 0.0853393
> >>> t_axis_metadata = 2.9607e-05
> >>> read_t_axis = 0.396472
> >>> open_file = 0.0664037
> >>> t_axis_metadata = 2.8239e-05
> >>> read_t_axis = 0.351707
> >>> open_file = 0.748844
> >>> t_axis_metadata = 3.3047e-05
> >>> read_t_axis = 1.03634
> >>> open_file = 0.161732
> >>> t_axis_metadata = 2.6955e-05
> >>> read_t_axis = 0.377919
> >>> open_file = 0.0820469
> >>> t_axis_metadata = 3.1613e-05
> >>> read_t_axis = 0.363014
> >>> open_file = 0.0903407
> >>> t_axis_metadata = 3.1844e-05
> >>> read_t_axis = 0.370521
> >>> open_file = 0.092586
> >>> t_axis_metadata = 2.9547e-05
> >>> read_t_axis = 0.37146
> >>> open_file = 0.083997
> >>> t_axis_metadata = 4.0095e-05
> >>> read_t_axis = 0.396375
> >>> open_file = 0.0799897
> >>> t_axis_metadata = 2.9833e-05
> >>> read_t_axis = 0.386237
> >>> open_file = 0.105688
> >>> t_axis_metadata = 0.000124456
> >>> read_t_axis = 0.481453
> >>> open_file = 0.0816038
> >>> t_axis_metadata = 3.0969e-05
> >>> read_t_axis = 0.355051
> >>> open_file = 0.101204
> >>> t_axis_metadata = 2.5927e-05
> >>> read_t_axis = 0.408186
> >>> open_file = 0.108662
> >>> t_axis_metadata = 2.6313e-05
> >>> read_t_axis = 0.314209
> >>> open_file = 0.0916682
> >>> t_axis_metadata = 2.9697e-05
> >>> read_t_axis = 0.319686
> >>> open_file = 0.0812744
> >>> t_axis_metadata = 2.7813e-05
> >>> read_t_axis = 0.391357
> >>> open_file = 0.101898
> >>> t_axis_metadata = 2.8607e-05
> >>> read_t_axis = 0.305515
> >>> open_file = 0.0748274
> >>> t_axis_metadata = 1.7247e-05
> >>> read_t_axis = 0.69522
> >>> open_file = 0.100151
> >>> t_axis_metadata = 1.7887e-05
> >>> read_t_axis = 0.511695
> >>> open_file = 0.0700686
> >>> t_axis_metadata = 1.8225e-05
> >>> read_t_axis = 2.27155
> >>> open_file = 0.121193
> >>> t_axis_metadata = 1.9918e-05
> >>> read_t_axis = 0.497648
> >>> open_file = 0.0734922
> >>> t_axis_metadata = 2.1134e-05
> >>> read_t_axis = 1.70601
> >>> open_file = 0.312652
> >>> t_axis_metadata = 1.888e-05
> >>> read_t_axis = 0.468489
> >>> open_file = 0.0906366
> >>> t_axis_metadata = 1.9165e-05
> >>> read_t_axis = 0.348474
> >>> open_file = 0.0776435
> >>> t_axis_metadata = 2.3417e-05
> >>> read_t_axis = 0.387956
> >>> open_file = 0.157729
> >>> t_axis_metadata = 4.9765e-05
> >>> read_t_axis = 0.344205
> >>> open_file = 0.114566
> >>> t_axis_metadata = 2.8431e-05
> >>> read_t_axis = 0.374615
> >>> open_file = 0.0788776
> >>> t_axis_metadata = 2.7865e-05
> >>> read_t_axis = 0.381856
> >>> collect_t_axis = 19.47
> >>> total = 20.72
> >>> A total of 240 steps available in 30 files. Using the noleap calendar.
> Times
> >>> are specified in units of days since 1979-01-01 00:00:00. The available
> >>> times
> >>> range from 2005-9-1 3:0:0 (9733.12) to 2005-10-1 0:0:0 (9763).
>
Follow-Ups:
- Re: [netcdfgroup] abysmal performance
  - From: Burlen Loring
- Re: [netcdfgroup] abysmal performance
  - From: Ed Hartnett
References:
- [netcdfgroup] abysmal performance
  - From: Burlen Loring
- Re: [netcdfgroup] abysmal performance
  - From: Tom Fogal
- Re: [netcdfgroup] abysmal performance
  - From: Burlen Loring
- Re: [netcdfgroup] abysmal performance
  - From: Wei-keng Liao