[netcdfgroup] SWAMP 0.1 and netCDF operators NCO version 3.9.3 are ready

To: NCO Announce Mail List <nco-announce@xxxxxxxxxxxxxxxxxxxxx>, netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: [netcdfgroup] SWAMP 0.1 and netCDF operators NCO version 3.9.3 are ready
From: Charlie Zender <zender@xxxxxxx>
Date: Sat, 08 Dec 2007 15:49:12 +0100
SWAMP 0.1 and the netCDF operators NCO version 3.9.3 are ready.

http://nco.sf.net (Homepage)
http://dust.ess.uci.edu/nco (Homepage "mirror")

This NCO release coincides with a user-friendly version of SWAMP, the
Script Workflow Analysis for MultiProcessing. SWAMP efficiently
schedules and executes NCO scripts on remote data servers:

http://swamp.googlecode.com

SWAMP can work with with any command-line operator analysis scripts--
not just NCO. If you must transfer lots of data from a server to your
client before you can analyze it, then SWAMP may speed things up.
The full SWAMP release announcement is below under the signature.
Give it a try and give us some feedback.

A. SWAMP released! Visit Daniel on Monday at Fall AGU poster IN11B-0469

B. Use ncecat -u ulm_nm to specify the new record dimension name
   http://nco.sf.net/nco.html#ncecat

C. Fix for autoconf builds using GCC on AIX

D. ncap2 supports OpenMP on most independent branches (Henry Butowsky)
   Demonstration scripts ~/nco/data/[bin_cnt.nco,psd_wrf.nco]

E. Support GCC 4.2 OpenMP (GOMP) in bld/Makefile (not yet configure)
   http://nco.sf.net/nco.html#openmp

F. Pre-built, up-to-date, Debian Sid & Ubuntu binaries are available.
   http://nco.sf.net#debian

G. Pre-built, up-to-date, RPM packages are available.
   http://nco.sf.net#rpm

H. The NCO scaling paper is out in IJHPCA:
   http://dust.ess.uci.edu/ppr/ppr_ZeM07_ijhpca.pdf

I. Reminder: NCO support for netCDF4 features is tracked at

   http://nco.sf.net/nco.html#nco4

   NCO currently supports netCDF4 atomic data types and compression.
   NCO 3.9.3 with netCDF4 support should work with HDF5 1.8 beta2
   and netCDF4 snapshot20070822 and newer.

   export NETCDF4_ROOT=/usr/local/netcdf4 # Set netCDF4 location
   cd ~/nco;./configure --enable-netcdf4  # Configure mechanism -or-
   cd ~/nco/bld;./make NETCDF4=Y allinone # Old Makefile mechanism

Enjoy,
Charlie
-- 
Charlie Zender, Department of Earth System Science, UC Irvine
Sab. at CNRS/LGGE-Grenoble until 20080715 :) 011+33+476+824236
Laboratoire de Glaciologie et Géophysique de l'Environnement
54 rue Molière BP 96, 38402 Saint Martin d'Hères Cedex, France

SWAMP: A System for Server-side Geoscience Data Analysis
             http://code.google.com/p/swamp

We are pleased to announce the beta release of the SWAMP system.
SWAMP augments data servers to have a data analysis service.
Current data access services focus on providing managed and
highly-available access to data. Many have realized the value of
providing computational services as well. This combined
data+computation approach is natural in the geosciences, where
datasets are often too large and bulky to exploit standard
computational grids.

SWAMP answers this need by providing a way to specify a rich set of
analysis operations to a server and merely download the results.
It leverages scientists' knowledge of shell scripts.
SWAMP can process almost any POSIX-compliant data analysis script
whose commands it "understands" (i.e., can parse). SWAMP currently
understands the full suite of NCO (netCDF Operator) commands and
it can be made to understand any command line operators (CLOs).
Using SWAMP is easy once it understands your CLO syntax.
In most cases a slight modification makes the same data analysis
script run remotely via a SWAMP service (i.e., the server with the
data does the analysis) or locally on the scientist's own machine
(aka the traditional method).

Request for testers:

We are currently looking for both scientists (clients) and data center
administrators (servers) to test SWAMP. We have a server you can try.
Our server contains a small subset of IPCC AR4 climate simulation data
from ~17 different climate models. You can use our demo script
or write your own to test remote analysis of these large datasets.
Our demo computes a time series of the predicted ensemble-average 21st
century temperature change.

Everything needed to test our SWAMP service is provided in the
swamp-client package. Installing the client and running the IPCC
test
on our server takes only five commands:

wget http://swamp.googlecode.com/files/swamp-client-0.1.tar.gz
tar xjvf swamp-client-0.1.tar.gz
cd swamp-client-0.1
export SWAMPURL='http://pbs.ess.uci.edu:8080/SOAP'
python swamp_client.py ipcctest.swamp

Administration, while relatively straightforward, may require our
help initially. Download the swamp-server package and let us know if
we can be of assistance. We are actively looking for large test
sites.

Announcements:

This beta-announcement is being made on many lists (netcdf, opendap,
nco, swamp, esg, fxm) that potential SWAMP users may read.
Future announcements will be restricted to the swamp and nco lists.
Sign up for one of these to stay apprised of SWAMP development:

For releases and other major announcements:
http://groups.google.com/group/swamp-announce

For discussion, help, bug reports, comments, and test server status:
http://groups.google.com/group/swamp-users

Interesting Features:

- Supports most common shell syntax, including for-loops, if-
branching, and variables.

- Detects dependencies and output files in your script-- no need to
manually specify.

- Detects intermediate files in your workflow -- no wasted time or
space transferring or storing them.

- Exploits parallelism in systems with multiple cores, multiple CPUs,
and compute clusters.

- Supports NCO-based data processing and reduction.

- Saves bandwidth: Transfers only output data, which is a few times to
tens of thousands of times smaller.

- Simple logging: know what sorts of analyses your users are
interested in.

- Overall time speedup ranges from 1X to 1000X.  In rare cases,
  SWAMP may slow things down by at most 10%, but these are very rare
cases.

Coming Soon:

- Integration with Grid Engine: Dynamic, on-demand allocation of
compute nodes in response to changing computational load

- Better performance for complex scripts by coarser-grained work
distribution.

- Support for workflows operating on data at multiple sites.

- Support for "standalone" mode: Take advantage of SWAMP's parallelism
and optimization on a single workstation.

Known Issues:

- Not all shell syntax is supported.  SWAMP implements the most
  commonly used syntax, but every user has her own style.  Let us know
  if you think something's missing.

- Log files are messy.  SWAMP is under constant development, and a
  little mess and sawdust is inevitable.  Let us know if there's
  information you'd like logged or if you have specific ideas on how
  to reduce the clutter.

- Only supports NCO binaries and a few common shell programming
  helpers (e.g. printf, seq).  While this already provides a rich set
  of data reduction and analysis functionality, we understand that
  other tools are desired.  Let us know which ones you use.  Please
  keep in mind that SWAMP is focused on programs that work with large
  data sets, and that, for security reasons, not all programs/binaries
  should be supported.

- Beta release roughness.

Learning more:

Visit the homepage to learn more about SWAMP including how it works:

http://code.google.com/p/swamp

Learn more at the 2007 AGU Fall Meeting, Monday at our SWAMP Poster,
IN11B-0469