Hi Russ,
> > I'll write some tests that check for proper insertion of non-ASCII
> > strings
> > as object & attribute names and let you know what I find out.
> >
> > Note that Unicode strings as elements of a dataset is harder and
> > probably
> > won't work correctly currently.
>
> Right. For data, multiple encodings would have to be supported. What
> we're considering is an "_Encoding" attribute that would identify the
> character encoding for a string, e.g.
>
> String Address;
> Address:_Encoding = "UTF-8";
>
> For backward compatibility, we would have to assume no encoding when
> this attribute is not specified. With this implementation of Unicode
> strings and the ability to store arbitrary arrays of bytes, there
> might not be any implications for the HDF5 library.
This is OK, but perhaps we should enable a new character set type of
H5T_CSET_UTF8 instead, so the information about the string was included in the
file format directly?
Quincey
P.S. - This reminds me that I will need to add an "encoding" attribute to the
object names in groups so that UTF-8 names can be distinguished from ASCII
names. :-)