RE: 19990929: Write/Read of 2D pages

To: "Lyn Greenhill" <lyng@xxxxxxxxxxxxxxxx>
Subject: RE: 19990929: Write/Read of 2D pages
From: "John Caron" <caron@xxxxxxxx>
Date: Tue, 5 Oct 1999 16:36:23 -0600
> >so you set  origin = {pageno, 0, 0)
> >             shape =  {1, nx, ny}
> >
> >and you get a 2D MultiArray of dimension nx by ny.
> >At this point, the data is memory resident, and you could access the
> >data using
> >     data.getDouble(index);
> >which would be my recommendation, since it avoids further data
copying.
> >
> >However if you need to pass a double array to some other routine, you
> >can use toArray() to get a Java array, but it is a 1D, not 2D array.
I
> >think in fact that it should construct the 2D array, but currently it
> >doesnt.
> >
> >So you would unfortunately have to transfer this into a 2D array
> >yourself, in which case you might as well loop over the
> >data.getDouble(index) call rather than use toArray().
>
> Thanks for this clarification.  If I use the data.getDouble()
> approach, I
> still need to loop to fill the desired 2D array.  However, I
> don't think
> this is a big deal, see my comments below.
>
> I agree wholeheartedly with your comment at toArray(), to me,
> it should be
> able to send back a multi-dimensional array.  Can you suggest this?

Yes I have and I will.

>
> I haven't tested the speed of looping with getDouble() versus
> parsing with
> the contents of the vector returned from toArray().  I would think it
> really doesn't matter, to get a 2D array, I will need to have
> 2 loops no
> matter which method I use.  As long as I am getting the data
> from the file
> into memory, after that point the I/O speed is not an issue.

It turns out in the current implementation getDouble() may be slower
than you expect; I'd time it if you are doing a lot of data movement.
(so I withdraw an unqualified recomendation of getDouble() over
toArray())

>
> I got the 3D block solution for writing "pages" of a 2D array
> running over
> the weekend.  Remember that I said the single element
> approach originally
> coded took around 10 minutes or so to write the data?  With
> the full 3D
> approach of reading all the pages into a 3D array, and then
> writing that
> array to the disk file, it takes around 2 seconds.  Just blows it out.
> How's that for time savings?

I like it

>
> However, I won't be able to implement the full 3D approach,
> because in some
> instances, the number of "pages" could be quite large, and I
> don't want to
> create a gigantic 3D array just to write the data out.
> Here's what I plan
> to do for the production code:
>
> 1. The 2D "pages" are actually a derived type called a Matrix (see
> http://math.nist.gov/javanumerics/jama/).  These are stored
> in a "scratch"
> netCDF file on a page-by-page basis.  The JAMA package has a
> utility method
> to provide a copy or clone of the Matrix in type double.
>
> 2. When I want to write the "save" file, I will loop on each
> "page", get
> the Matrix from the scratch file, and copy it to a 2D double
> array.  (Right
> now, I'm also taking the 2D array and putting it into the 3D
> array at a
> specific "page").
>
> 3. Having the 2D array in memory, I will then write each block to the
> "save" netCDF file.  The positioning is controlled by the
> first element in
> the origin array.  I have tested this scheme and it works fine.  For
> instance, if nrows is the first dimension of the 2D array,
> and ipage is the
> "page" position, then the origin array would be set to
> {ipage*nrows, 0}, or
> origin[0] = ipage*nrows.
>
> I'm hoping the disk access to read a page from the scratch
> file, and then
> write it back to the save file, will not be too much of a
> time waster.  If
> it is, then I will probably try to internally "buffer" the process by
> setting up a temporary array several pages long, read in
> several 2D blocks,
> and then write out the long 2D array.  I can also get the 2D arrays as
> vectors, so I could do something with this approach as well.

I looked quickly at the Jama docs. The approach that occurs to me (that
you are probably referring to) is:

   reading from netcdf:
        MultiArray ma = netcdfVar.copyout(new int[] {pageno, 0, 0},
                                                    new int[] {1, nx, ny});
        double [] vals = (double []) ma.toArray();
        Matrix(double[] vals, int nx); // nx or ny ??

   writing to netcdf:
      double[] vals = matrix.getColumnPackedCopy();
        MultiArray ma = new MultiArrayImpl(new int[] {1, nx, ny}, vals);
        netcdfVar.copyin(new int[] {pageno, 0, 0}, ma);

this might be significantly faster.
note im not sure if Matrix wants nx or ny; part of this confusion is
that it wants a Fortran column-major 1D array.  In any case, this
shouldnt be a problem - just switch the meaning of nx and ny if need be.

in terms of netcdf efficiency, just remember that disk reading is done
when you copyin/copyout; your cost is in units of disk blocks (typically
4K), and contiguous data will reduce the number of disk blocks. So if nx
x ny is reasonably large, this page oriented access seems not too bad.

>
> Thanks for all your help.  The netCDF package is pretty
> intimidating when
> you first get into it, but after you mess with it a little,
> it's not so
> bad.  Are you planning on putting out a manual for Java?  The
> C manual was
> helpful, but all the function calls were obviously not very useful.

you're welcome; when the dust settles we will put out a user manual.
1999 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdf-java archives: