NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Rob Latham wrote:
Greg S. found something noteworthy on the serial netcdf list. We do something similar (not surprising: i'm sure our NC_finddim and NC_findvar functions are 99% unchanged from serial netcdf) In NC_finddim we have a call to strlen as part of the condition of a for loop. If there are a lot of dimensions as in Greg's case, then yeah, we too would call strlen a lot. http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/src/lib/dim.c#L135 our ncmpii_NC_findvar calls strlen inside a loop for each variable in a dataset. http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/src/lib/var.c#L317 How common are datasets with thousands of dimensions and thousands of variables? In a followup message, Greg found at least one case where "size" was not the same as strlen(name) for one of these NC_dim types, so it looks like the easy optimization won't work out after all. The status quo isn't awful if you've got a small number of dimensions and variables: if anybody else has a dataset like Greg's, though, reply to this email and we'll put optimzing this workload on the todo list. thanks ==rob ----- Forwarded message from Greg Sjaardema <gdsjaar@xxxxxxxxxx> ----- Sender: netcdfgroup-bounces@xxxxxxxxxxxxxxxx From: Greg Sjaardema <gdsjaar@xxxxxxxxxx> Subject: [netcdfgroup] strlen calls in NC_finddim and NC_findvar Date: Thu, 3 Dec 2009 15:41:49 -0700 Message-ID: <4B183EAD.20808@xxxxxxxxxx> User-Agent: Thunderbird 2.0.0.23 (X11/20090812) To: "netcdfgroup@xxxxxxxxxxxxxxxx" <netcdfgroup@xxxxxxxxxxxxxxxx> X-Spam-Status: No, score=-2.599 tagged_above=-10 required=6.6 tests=[BAYES_00=-2.599] Delivered-To: netcdfgroup@xxxxxxxxxxxxxxxxxxxxxxxxxx Delivered-To: netcdfgroup@xxxxxxxxxxxxxxxx I have a monstrous file with several thousand dimensions and variables which is running slower than it should. I investigated the runtime and found that strlen was the major time user in the NC_finddim and NC_findvar calls. The obvious optimization was to cache the length of the name instead of calling strlen each time. However, when I went to do this, I discovered that the length is already cached as the nchars field in the NC_string struct. I did some checks in the code and also added some assertions to the code and verified that, as far as I can tell, nchars is the correct length of the string. Is there a reason that it isn't used and strlen() is called instead? Switching the code to use nchars dropped my execution time from 20 units to 6 units. I would like to make the switch, but wondered if there was some strange corner case where the nchars value is incorrect and will cause problems. Thanks, --Greg _______________________________________________ netcdfgroup mailing list netcdfgroup@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/ ----- End forwarded message -----
netcdfgroup
archives: