NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Dear Roy et al., Sorry for coming late to the party … Roy asked for some feedback from GDS administrators on how server-side analysis is being used. On Jul 1, 2012, at 4:13 PM, Roy Mendelssohn wrote: > ... That is why I would like to hear more from people who are running F-TDS > and GDS - how many requests do they get for server side functions, I did a quick 'grep' on our GDS log files (100 individual months) and calculated an average of 5585 server-side analysis requests per month, which is < 1% of the total number of data requests to the server. Many months had 0, the maximum was 247811. Most of these were for the real time GFS forecast data; we are not serving a whole lot of climate data on our GDS. The complexity of the analysis expressions is pretty broad -- some examples are basic subsets (which I would describe as user misunderstanding the purpose of server-side analysis), simple expressions to get the wind speed and direction from vector components, slp differences at two grid points, time series of area averages, ensemble averages, and variance of ensemble averages (this uses the cached result from the ensemble average calculation). > what is the usual response time and download for these request, It would take some clever parsing of the log entries to get an average time, but a cursory glance suggests most are less than 10 seconds. > how large are the usual expressions? If by 'large' you mean 'lots of characters in the expression', here are some examples (1 short, 2 long): _expr_{gfs2/gfs.2010062800i}{mag(u10m,v10m))}{8.45:8.45,56.0:56.0,1000:1000,00Z28jun2010:12Z02jul2010} _expr_{/gfsens/gfsens.2008052300,_exprcache_12118899183320}{tloop(ave(sqrt(pow(t2m- result.2,2)),e=1,e=21))}{-77:-77,39:39,1000:1000,23may2008:28may2008,c00:c00} _expr_{ssta,z5a}{tmave(const(maskout(aave(ssta.1,lon=-180,lon=-90,lat=-10,lat=10),aave (ssta.1,lon=-180,lon=-90,lat=-10,lat=10)-1.0),1),z5a.2(lev=500),t=1,t=600)}{0:360,0:90,500:500,jan1950:jan1950} The size of a request in terms of data volume can be constrained by server configuration. The third example above is from the GDS documentation, and a lot of users try it out and then modify it to suit their needs. It's more of a climate analysis kind of expression, it calculates the mean 500mb height anomaly associated with warm tropical SST anomalies. > … I would welcome people who are using some of these other approaches to > describe what they have done, the benefits of doing things that way, and what > it means for a client. I would say server-side analysis (of the kind employed by our GDS users) is useful on a small scale -- individuals who desire forecast information at their particular location. For hard-core climate research that requires the analysis of BIG data, we haven't yet been able to exploit the power of server-side analysis (moving the analysis to the data). At COLA, we generate a lot of data at remote super computer centers (e.g. NCAR), but then we move a lot of it back to our own disks to analyze it with our favorite tools, or else we login with accounts at the remote locations where our data reside and use the analysis servers set up there for users to access their data. For CMIP5, it is just not practical to try to automate remote analysis of data that are so widely distributed, with subtle differences between each data server, and a data structure that is highly granular. Nobody at COLA is interested in using a browser to do any data analysis, it must be programmable to be useful. --Jennifer -- Jennifer M. Adams IGES/COLA 4041 Powder Mill Road, Suite 302 Calverton, MD 20705
thredds
archives: