Jeff Whitaker wrote:
John Caron wrote:
Jeff Whitaker wrote:
John Caron wrote:
Jeff Whitaker wrote:
Ed Hartnett wrote:
Jeff Whitaker <jswhit@xxxxxxxxxxx> writes:
 
In netcdf-4.0, I don't see how to create variables which are 
arrays of
strings with length > 1.  I see how to create arrays of
single-characters, and arrays of variable-length strings, but not
strings of a specified length.
Am I missing something, or is this not supported by HDF5?
    
Howdy Jeff!
Strings are variable length by their nature.
How about a two dimensional array of NC_CHAR?
Thanks,
Ed
  
Ed:  I use arrays of fixed length strings, padded with spaces, 
quite a bit.  This simplifies the memory management issues 
associated with arrays of variable length strings (which have no 
counterpart in Fortran 90/95, although they are allowed in Fortran 
2003).  Below is an excerpt from my previous reply which explains 
why I don't like using 2-D arrays of characters to represent 1-D 
arrays of fixed-length strings:
Russ: I realize you can use a array of shape ndim,8 to store an 
array of ndim 8 character strings.  Thats the way I've done it with 
netcdf-3 - it just feels clunky.  A typical use case for me is 
station data, where you want to store the name of the station. I 
end up the with an array of characters shaped (nstations,ncars) -  
in fortran I read it into an  (nstations,nchars) character(len=1) 
array (after first finding out what both nstations and nchars are), 
then reshape it into a (nstations) character(len=nchars) array.  
I'd rather just read it into a character(len=nchars) array straight 
off.  Not a show stopper for sure, but it would be more 
convenient.  I realize that specifying the data type would be 
tricky,  instead of NC_CHAR, do you have a bunch of new types 
NC_CHAR1, NC_CHAR2, ... NC_CHAR120?  Or a new function datatype = 
nc_set_chartype(nchars)?  However, I bet it would get used a lot 
more than the esoteric datatypes you have in netcdf-4 already 
(enums and opaque for example).
Hi Jeff:
Suppose we stuck with fixed length char arrays for this case, but 
added a convenience method in the API that did the work for you. 
What would that convenience method look like?
John
John:  A convenience method won't really help much - the thing I'd 
most like to avoid is defining another dimension to hold the number 
of characters in each string.  Essentially, I'd like to have that 
information transferred to the datatype.
Hi Jeff:
It seems unlikely that we'd want to add a multitude of datatypes for 
this purpose. The extra dimension seems the right thing if you really 
want to specify that all the Strings are of the same length.
John
John:  OK.  Since HDF5 defines a fixed-length string datatype like this:
   tid = H5Tcopy(H5T_C_S1);
   H5Tset_size(tid, <string length>);
I thought you might be able to create a datatype on the fly in netcdf 
with something like
stringtype = nc_set_stringtype(<string length>)
and then use 'stringtype' instead of NC_CHAR when defining the variable.
I guess I'm just used to thinking in fortran, where the length of the 
string is a property of the datatype (defined when the variable is 
declared).  It seems like dimensions should hold information about the 
underlying physical grid, not implementation details of how the variable 
is stored in memory.  I guess it depends on whether you think of a 
string as an array of characters, or as a datatype unto itself.
-Jeff
Hi Jeff:
Its possible that the C API could offer such a feature, that would be up to 
Russ and Ed. But in terms of the underlying data model, I think that Strings 
will be considered variable length. One could consider Fortran's handling of 
Strings to be, uh, clumsy. I dont know of any modern languages that follow 
that, and I vaguely remembering Pascal being criticized for making array 
lengths part of the type.
Its a good point about what info a Dimension should capture. They really start off as 
indicating storage size and layout. Shared use with coordinate variables give them a much 
deeper significance describing the "underlying physical grid", as you say. 
Surprisingly, HDF5 and OPeNDAP data models didnt make them seperate objects that could be 
shared, though both have added other constructs to get similar semantics (dimension 
scales and grid maps, respectively).
As always, these things are a tradeoff.
I appreciate hearing your thoughts about these kinds of things.
John