Response: Splus to netcdf or hdf

To: hdfnews@xxxxxxxxxxxxx, s-news@xxxxxxxxxxxxxxxxxx, netcdfgroup@xxxxxxxxxxxxxxxx
Subject: Response: Splus to netcdf or hdf
From: Andrea Wood <d3g432@xxxxxxxxxxxx>
Date: Tue, 19 Apr 1994 08:57:54 -0700 (PDT)
A while back I polled the user's of Splus, HDF and NetCDF for programs
that interface Splus to a common data format.  I'd like to thank
everyone for their timely and helpful responses.  Below, I have
compiled the comments from the various newsgroups.

Thanks again,

Andrea Wood
Pacific Northwest Laboratory

-------------------------------------------------------------------
***From phil@xxxxxxxxxxx Wed Mar 16 14:05:18 1994*** 

Why don't you take a look (anonymous ftp) at the code in:

leghorn.geog.ubc.ca:/pub/splus?  We'd be interested in hearing
from anyone else that may contact you on this, as well.

Our interface is currently under development, and it crashes
occasionally due to a bug (we believe) in Splus, but we use it daily
to get random access to large netCDF arrays.  (Note that CDFchange
isn't working yet).  I've appended the file CDF_SPLUS.doc, and the
letter we have written to someone at Statsci about the bug.

<phil@coot: pub/splus> cat CDF_SPLUS.doc
# $Id: CDF_SPLUS.doc,v 1.2 1993/09/09 06:27:49 mintha Exp mintha $
#
# Documentation for the CDF->Splus->CDF interface
#
# $Log: CDF_SPLUS.doc,v $
# Revision 1.2  1993/09/09  06:27:49  mintha
# Prelim SPLUS->CDF docs & clean up
#
# Revision 1.1  1993/08/13  22:50:06  mintha
# Initial revision
#

CDF->Splus  To read files into splus.

User functions:

CDFgetinfo(filename, close=T) 

        - Reads in all the information about the given cdf file and
        returns a structure with all the global attibutes and a
        component for each variable, and all its dimensions and
        attributes.  If 'close' if FALSE then the file is not closed
        (used internally)

CDFget(filename, varname, start=NULL, count=NULL)
        
        - Reads in the data for the given variable (variable id can be
        substituted for name) The dimnames are attached and any
        attributes are added as Splus attributes.  If 'start' is NULL
        (default) the origin (1,1,...) is used.  If 'count' is NULL
        then the rest of the variable is used.  If any dimension in
        'count' is -1 then that is replaced by the remaining amount of
        that dimension.


Internal functions:

CDFgetdata(info, var, start=NULL, count=NULL)

        - Does the actual work of reading in the data and attaching
        the dimnames and attributes.  'info' is the structure returned
        by CDFgetinfo, 'var' is the variable name or ID.  'start' and
        'count' are as above.

CDFginfo(cdfp)

        - Reads in all the global information about a cdf file.
        Including dimension names, values, and labels, as well as
        global attributes.  'cdfp' is the structure returned from the
        cdf_open call.

CDFvinfo(ginfo, var)

        - Reads in all the information for the given variable ID
        ('var').  Including dimensions, labels, and attributes.
        'ginfo' is the structure returned by CDFginfo.


CDFgetinfo returns the folloing structure: 
(note this is a ficitious example)

$global:
 $ncid
        23
 $numvars 
        4
 $dims
        (5, 10, 20)
 $dnames
        ("Time", "Droplets", "Ozone")
 $dimlabs
        (NULL, "rx ry rz", "r1 r2 r3 r4 dydt dzdt")
 $unlimdim
        4
 $history 
        "Processed by ..."
 $doc
        "Ozone samples"
 $comment
        "Created by Jim"

$variable1

 $name
        "variable1"
 $id
        5
 $type
        5
 $dimids
        (1, 2)
 $dims
        (20, 10)
 $labels
        ("r1 r2 r3 r4 dydt", "rx ry rz")
 $title
        "Just a variable"

$variable2
 $name
        "variable2"
 $id
        1
 $type
        6
 $dimids
        (2, 3)
 $dims
        (5, 10)
 $labels
        ( NULL, "rx ry rz")
 $title
        "Bond, James Bond"
 $FillValue
        -999


CDFget returns something like this:

      rx ry rz
dy     2  3  4
dz     5 11  9 
dydz  13 12 11

attr(,"title"):
[1] "Hello"

attr(,"file"):
[1] "test2.nc"


SPLUS <- CDF  To write files out of splus

CDFappend(filename, varname, data, coord=NULL)

        - Add data to a variable in a netCDF file.  The variable must
        already exist and one of its dimensions must be unlimited.  If
        'coord' is not NULL the data is appended to the coordinate
        variable which has the same name as the unlimited dimension
        (see documentation on ncdump).

CDFchange(filename, varname, data, start=NULL, count=NULL)

        - Changes an existing netCDF file.  If the variable named does
        not exist then it is added to the file.  If it does exist then
        the data given replaces that in the file.  When adding a new
        variable the 'dnames' attribute must be present.  In both cases
        the dimension must match.  A hyperslab can be specified with
        'start' and 'count' as above.

CDFput(filename, varname, data, clobber=F, unlim=NULL)

        - Create a new netCDF file named 'filename'.  One variable
        'varname' is created, and filled with 'data'.  'data' must
        have the 'dnames' attribute.  The named dimensions are created
        and given the sizes of the array unless an attribute 'dims' is
        defined.  If the variable has dimnames these are added as
        labels for the dimensions.  This is also overrided with the
        'labels' attribute.  Unless 'clobber' is (T)rue this function
        will not overwrite an existing file.  If 'unlim' is not NULL
        the dimension given is made unlimited.

Cheers, Phil

---------------------------------------------------------------------
***From tam@xxxxxxxxxxx Thu Mar 17 12:49:20 1994***

A couple of years ago I was asked to interface Splus and HDF so that users
could retrieve and put data into HDF files much like they did in fortran.
The code is still here although I don't know whether it still being used.
It should still work.  The package consists of

        C code
                Glue code that was called by Splus and then in turn, calls
                HDF routines after fiddling with the parameters so they'll
                work right.

                This code was compiled into a module (S2hdf.o) that was
                dynamically loaded into Splus via the dyn.load() command.

        Splus code

                These were a set of Splus routines that made working with
                HDF more palatable (instead of trying to the .C() call
                yourself and having to mangle the output into the usual
                Splus convention).


Here's the README file (which upon reading myself is not really friendly and
looks half-baked).  I hope you can get the gist of the operations that this
package offers.

--------------------------------------------------------------------------------
Notes on interface to use HDF library for input-ouput of floating-point arrays
   into the Splus data analysis package.
 
                        noctiluca:/usr/local/src/hdf/splus/README
                        laplante@xxxxxxxxxxx 90may09


HDFload() makes the HDF library routine available by dynamically loading
/usr/local/lib/S2hdf.o .  NB: UBC fixes to HDF library are necessary
for correct operation in cases where rank>2.

HDFget() reads in an entire HDF Scientific Data Set into an S "object"
(array).  By specifying some of parameters 'start' and 'end' (or 'axis') the
user can get only a part of the SDS.  For example
'start=c(1,1,1),end=c(7,5,1)' would get a 2-dimensional array.  The form
'axis=2' would get a vector along the second dimension from the specified
start.  The resulting array is collapsed to a lower rank if some dimensions
are 1.  All parameters have reasonable defaults:
  file=HDFcurrentfile   (the last file loaded by this user)
  ref=1                 (select one of several SDS in file)
  start=c(1,1,...)      (start at beginning)
  end=c(...)            (end at end)

HDFaddata() appends the specified S array as an SDS to the named HDF file.

Error messages should be self-explanatory ;-) .  The functions normally reside
in /usr/local/splus/s/.Functions/HDF* .

=====================================================================
1) EXAMPLE OF USE:  

File "753.hdf" is a 7*5*3 test matrix containing 111.0 112.0 ... 752.0 753.0

% Splus
S-PLUS : Copyright (c) 1988, 1989, 1990 Statistical Sciences, Inc.  
S : Copyright AT&T.
Version 2.3 Release 1 for Sun4 : 1990
> HDFload()
S - HDF interface library loaded
> ls()
[1] "HDFadddata"     "HDFcurrentfile" "HDFget"         "HDFgetarray"   
[5] "HDFgetdata"     "HDFgetdims"     "HDFgetslice"    "HDFload"       
[9] "last.dump"     
> HDFget(file="753.hdf",ref=1)
HDFget: start= 1 1 1 
        slicedims= 7 5 3 

, , 1
     [,1] [,2] [,3] [,4] [,5] 
[1,]  111  121  131  141  151
[2,]  211  221  231  241  251
[3,]  311  321  331  341  351
[4,]  411  421  431  441  451
[5,]  511  521  531  541  551
[6,]  611  621  631  641  651
[7,]  711  721  731  741  751

, , 2
     [,1] [,2] [,3] [,4] [,5] 
[1,]  112  122  132  142  152
[2,]  212  222  232  242  252
[3,]  312  322  332  342  352
[4,]  412  422  432  442  452
[5,]  512  522  532  542  552
[6,]  612  622  632  642  652
[7,]  712  722  732  742  752

, , 3
     [,1] [,2] [,3] [,4] [,5] 
[1,]  113  123  133  143  153
[2,]  213  223  233  243  253
[3,]  313  323  333  343  353
[4,]  413  423  433  443  453
[5,]  513  523  533  543  553
[6,]  613  623  633  643  653
[7,]  713  723  733  743  753

> HDFget(end=c(7,5,1))
HDFget: using file  753.hdf 
HDFget: start= 1 1 1 
        slicedims= 7 5 1 
     [,1] [,2] [,3] [,4] [,5] 
[1,]  111  121  131  141  151
[2,]  211  221  231  241  251
[3,]  311  321  331  341  351
[4,]  411  421  431  441  451
[5,]  511  521  531  541  551
[6,]  611  621  631  641  651
[7,]  711  721  731  741  751

> HDFget(start=c(3,4,1),axis=3)
HDFget: using file  753.hdf 
HDFget: start= 3 4 1 
        slicedims= 1 1 3 
 [,1] [,2] [,3] 
  341  342  343

=====================================================================
2- FUNCTION LISTINGS

> HDFload
function()
{
        dyn.load("S2hdf.o")
        cat("S - HDF interface library loaded\n")
        invisible()
}
> HDFgetdims

function(file = HDFcurrentfile, maxrank = 10)
{
        if(missing(file))
                cat("Using file: ", file, "\n")
        else HDFcurrentfile <<- file
        z <- .C("HDFgetdims",
                file,
                rank = integer(1),
                shape = integer(maxrank),
                as.integer(maxrank))
        z$shape[1:z$rank]
}

> HDFget
function(file = HDFcurrentfile, ref = 1, start = NULL, end = NULL, axis = NULL
        )
{
# HDFget(file,ref,start,end,axis)
# file  (default last name, saved in global string HDFcurrentfile)
# ref   (serial number of data sets in the file)
# start (vector: coordinates of starting point)
# end   (vector: slicedims = end - start + 1)
#  -or-
# axis  (dimension number for extracting a vector from array)
        if(missing(file)) cat("HDFget: using file ", file, "\n") else 
                        HDFcurrentfile <<- file
        .C("DFSDrestart")       # rewind
        for(i in 1:ref)
                filedims <- HDFgetdims(file)    # step through data sets
        filerank <- length(filedims)
        if(missing(start))
                start <- rep(1, filerank)
        else if(filerank!=length(start)) {
                cat("HDFget dimension mismatch: start=", start, "but", file,
                        "has", filedims, "\n")
                return(NULL)
        }
        if(missing(end)) {
                if(missing(axis)) {
                        end <- filedims
                }
                else {
                        end <- start    # End = start except along axis.
                        end[axis] <- filedims[axis]
                }
        }
        else if(filerank!=length(end)) {
                cat("HDFget: end=", end, "but", file, "has", filedims, "\n")
                return(NULL)
        }
        slicedims <- end - start + 1
        cat("HDFget: start=", start, "\n")
        cat("\tslicedims=", slicedims, "\n")
        z <- .C("HDFgetslice",
                file,
                as.integer(start),
                as.integer(slicedims),
                dat = single(prod(slicedims)),
                as.integer(slicedims))
        array(z$dat, slicedims[slicedims!=1])
}

> HDFadddata
function(file, data, append = F)
{
        if(missing(file))
                cat("File must be supplied\n")
        else if(missing(data))
                cat("Missing data parameter\n")
        else {
                if(append)
                        cat("Appending data to ", file, " ...\n")
                else cat("Replacing data in ", file, " ...\n")
                d <- dim(data)
                .C("HDFadddata",
                        file,
                        as.integer(length(d)),
                        as.integer(d),
                        as.single(data),
                        as.integer(append))
        }
        invisible()
}
--------------------------------------------------------------------------------

Still interested?


Joseph Tam <tam@xxxxxxxxxxx>   \___ \___________
Department of Oceanography     \___ \___________
University of British Columbia \___ \___________
Vancouver, BC, Canada          \___     \___
V6T 1Z4                \___    \___     \___
                       \___    \___     \___
Phone: (604) 822-3911   \_________  \__ \___ \__
Fax:   (604) 822-6091    \______    \__ \___ \__


---------------------------------------------------------------
***From russ@xxxxxxxxxxxxxxxx Wed Mar 16 12:36:02 1994***

I got an evaluation copy of Splus and looked into the feasibility of
building a netCDF interface.  I also evaluated the possibility of using
netCDF for the external representation of Splus datasets.

I liked the powerful expression language, the fully worked out semantics for
"NA" missing values, and C and Fortran interfaces, and much else about
Splus.

I decided netcdf was not powerful enough to represent some kinds of Splus
recursive data structures, e.g. lists of lists or nested lists with named
elements.  This meant it would be impractical to try to export arbitrary
Splus data structures into netCDF files.

However, an interface that allowed the import of netCDF data into Splus
seemed quite practical.  That's about as far as I got during the evaluation.

__________________________________________________________________________
                      
Russ Rew                                              UCAR Unidata Program
russ@xxxxxxxxxxxxxxxx                                        P.O. Box 3000
(303)497-8645                                 Boulder, Colorado 80307-3000

---------------------------------------------------------------
***From maa017@xxxxxxxxxxxxxxxxx Wed Mar 16 13:21:50 1994***

I wrote some code to send HDF SDS data to the DTM outport so that I
could use XDataslice to visualise 3-dimensional arrays. It works quite
nice:

Splus> arr_array(runif(50*50*50),dim=c(50,50,50))
Splus> sendarraydtm(arr)

 and bingo, up it comes in Data Slice for animation, slicing, whatever.

 My needs didnt go past SDS data, so I havent coded raster images,
pallettes or anything else!!

 My one worry is that the HDF/DTM library uses malloc and so gobbles up
all the memory eventually.

 Anyway, if you want to see my S/C code for it, I'll mail it to you!

Barry Rowlingson
Research Associate, Maths/Stats Dept
Lancaster University

----------------------------------------------------------------
***From maclean@xxxxxxxxxxxxxxxxxx Wed Mar 16 12:51:35 1994***

We have a function, fun.readnetcdf(),which will read a scalar or
multidimensioned NetCDF variable into S.  This function calls C
functions which in turn make the NetCDF calls.

Right now it must read an entire variable - you cannot give the indices
of the NetCDF "hyperslab" that you want to read.  However, it should
give you the basic functionality and would be easy to upgrade for your
needs.

Let me know if you want it and I can send the C and S code to you.

Gordon Maclean
Surface and Sounding Systems Facility
Atmospheric Tech. Div.
National Center for Atmospheric Research
P.O. Box 3000, Boulder CO 80307-3000
email:     maclean@xxxxxxxxxxxxx
phone:     (303) 497-8794
FAX:       (303) 497-8770

-----------------------------------------------------------------
***From Jaason.Haapakoski@xxxxxx Wed Mar 16 22:47:44 1994***

I have build a read-only interface from Splus to netCDF files.
If that's enough then you can get the C routines, Splus functions and
the help pages on request. The interface utilizes generic Splus database 
functions so that you can attach to a netcdf file and then read the
variables in the file just refering to them in the same way you
refer to ordinary Splus objects in a directory. Possible netcdf attributes
are read and assigned to the resulting object. Netcdf dimensions are 
translated to the dim-attribute of Splus. Missing values are translated
to NA's.

Jaason Haapakoski

I forgot to mention in my previous message that our interface supports only
variables of fixed dimensions (uses the function ncvarget), not records (does
not use the function ncrecget). We have had no use for record-type  netcdf
variables yet. I think it would be fairly simple to add missing routines,
however, but I have no time for that now. On the contrary, adding a write
interface from Splus to netcdf  is not as simple because there are more types of
data structures in Splus than in netcdf. I think it would be wise to restrict
oneself only to Splus arrays, for example (a data frame as  a special case). 

Jaason Haapakoski
ATBC Cancer Prevention Study
The National Institute of Public Health in Finland