NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

RE: 19990929: Write/Read of 2D pages

> >so you set  origin = {pageno, 0, 0)
> >             shape =  {1, nx, ny}
> >
> >and you get a 2D MultiArray of dimension nx by ny.
> >At this point, the data is memory resident, and you could access the
> >data using
> >     data.getDouble(index);
> >which would be my recommendation, since it avoids further data
copying.
> >
> >However if you need to pass a double array to some other routine, you
> >can use toArray() to get a Java array, but it is a 1D, not 2D array.
I
> >think in fact that it should construct the 2D array, but currently it
> >doesnt.
> >
> >So you would unfortunately have to transfer this into a 2D array
> >yourself, in which case you might as well loop over the
> >data.getDouble(index) call rather than use toArray().
>
> Thanks for this clarification.  If I use the data.getDouble()
> approach, I
> still need to loop to fill the desired 2D array.  However, I
> don't think
> this is a big deal, see my comments below.
>
> I agree wholeheartedly with your comment at toArray(), to me,
> it should be
> able to send back a multi-dimensional array.  Can you suggest this?

Yes I have and I will.

>
> I haven't tested the speed of looping with getDouble() versus
> parsing with
> the contents of the vector returned from toArray().  I would think it
> really doesn't matter, to get a 2D array, I will need to have
> 2 loops no
> matter which method I use.  As long as I am getting the data
> from the file
> into memory, after that point the I/O speed is not an issue.

It turns out in the current implementation getDouble() may be slower
than you expect; I'd time it if you are doing a lot of data movement.
(so I withdraw an unqualified recomendation of getDouble() over
toArray())

>
> I got the 3D block solution for writing "pages" of a 2D array
> running over
> the weekend.  Remember that I said the single element
> approach originally
> coded took around 10 minutes or so to write the data?  With
> the full 3D
> approach of reading all the pages into a 3D array, and then
> writing that
> array to the disk file, it takes around 2 seconds.  Just blows it out.
> How's that for time savings?

I like it

>
> However, I won't be able to implement the full 3D approach,
> because in some
> instances, the number of "pages" could be quite large, and I
> don't want to
> create a gigantic 3D array just to write the data out.
> Here's what I plan
> to do for the production code:
>
> 1. The 2D "pages" are actually a derived type called a Matrix (see
> http://math.nist.gov/javanumerics/jama/).  These are stored
> in a "scratch"
> netCDF file on a page-by-page basis.  The JAMA package has a
> utility method
> to provide a copy or clone of the Matrix in type double.
>
> 2. When I want to write the "save" file, I will loop on each
> "page", get
> the Matrix from the scratch file, and copy it to a 2D double
> array.  (Right
> now, I'm also taking the 2D array and putting it into the 3D
> array at a
> specific "page").
>
> 3. Having the 2D array in memory, I will then write each block to the
> "save" netCDF file.  The positioning is controlled by the
> first element in
> the origin array.  I have tested this scheme and it works fine.  For
> instance, if nrows is the first dimension of the 2D array,
> and ipage is the
> "page" position, then the origin array would be set to
> {ipage*nrows, 0}, or
> origin[0] = ipage*nrows.
>
> I'm hoping the disk access to read a page from the scratch
> file, and then
> write it back to the save file, will not be too much of a
> time waster.  If
> it is, then I will probably try to internally "buffer" the process by
> setting up a temporary array several pages long, read in
> several 2D blocks,
> and then write out the long 2D array.  I can also get the 2D arrays as
> vectors, so I could do something with this approach as well.

I looked quickly at the Jama docs. The approach that occurs to me (that
you are probably referring to) is:

   reading from netcdf:
        MultiArray ma = netcdfVar.copyout(new int[] {pageno, 0, 0},
                                                    new int[] {1, nx, ny});
        double [] vals = (double []) ma.toArray();
        Matrix(double[] vals, int nx); // nx or ny ??

   writing to netcdf:
      double[] vals = matrix.getColumnPackedCopy();
        MultiArray ma = new MultiArrayImpl(new int[] {1, nx, ny}, vals);
        netcdfVar.copyin(new int[] {pageno, 0, 0}, ma);

this might be significantly faster.
note im not sure if Matrix wants nx or ny; part of this confusion is
that it wants a Fortran column-major 1D array.  In any case, this
shouldnt be a problem - just switch the meaning of nx and ny if need be.

in terms of netcdf efficiency, just remember that disk reading is done
when you copyin/copyout; your cost is in units of disk blocks (typically
4K), and contiguous data will reduce the number of disk blocks. So if nx
x ny is reasonably large, this page oriented access seems not too bad.

>
> Thanks for all your help.  The netCDF package is pretty
> intimidating when
> you first get into it, but after you mess with it a little,
> it's not so
> bad.  Are you planning on putting out a manual for Java?  The
> C manual was
> helpful, but all the function calls were obviously not very useful.

you're welcome; when the dust settles we will put out a user manual.


  • 1999 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: