NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdf-java] Performance Issues and Buffering

Hi Robin,

> the processing I’m doing is writing to several different
> NetCDF files, multiple variables a row at a time.

So if I understand you, your write pattern looks something like:

Write row 0 for varA
Write row 0 for varB
Write row 0 for varC
Write row 1 for varA
Write row 1 for varB
Write row 1 for varC
etc...

Is that correct? If so, you are writing the contents of a file
*non-sequentially*, because a variable's data is laid out contiguously in
netcdf (unless it's chunked). Non-sequential (aka "random") I/O is always
going to be slower than sequential I/O, at least if you're writing to
spinning disks.

Do sequential I/O if you can. If that's not possible, the C library offers
a way to map a dataset into memory [1]. That'll make those random writes
much, much faster. Unfortunately, we don't currently provide a way to
access that feature from NetCDF-Java.

Cheers,
Christian

[1]
http://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga427f5a0b24f1d426a99bcc37b8a39cac
(look for "NC_DISKLESS")

On Wed, Jun 29, 2016 at 1:45 AM, Robin Moss <robin.moss@xxxxxxxxxxxxxx>
wrote:

> Sorry, let me add some additional information.
>
>
>
> I have been given a product specification, with several different files
> but the overall gist of the files is (The data type varies from byte to
> long):
>
>   <dimension name="columns" length="512"/>
>
>   <dimension name="rows" length="45000"/>
>
>   <dimension name="orphan_pixels" length="1" isUnlimited="true"/>
>
>
>
>   <variable name="var1" shape="rows columns" type="short">
>
>  <variable name=”var1_orphan" shape="orphan_pixels" type="short">
>
>   <variable name="var2" shape="rows columns" type="short">
>
>  <variable name=”var2_orphan" shape="orphan_pixels" type="short">
>
>
>
> For instance at most we are writing to 18 files sequentially (currently
> only 13). Note I did try to use the NetCDF-Java library with multiple
> threads but it causes a seg file (
> https://github.com/Unidata/thredds/issues/577).At its fastest we’ve been
> seeing data getting to the writers every couple of milliseconds, we then
> convert the data arrays (stored in lists) into NetCDF-Java Arrays and then
> go on to write them:
>
>
>
>   public <T> void writeData(String internalName, List<T> data, int[]
> shape, int[] origin, Class<T> type) {
>
>         // move data into netcdf data shape
>
>         Array rawData = Array.factory(type, shape);
>
>
>
>         for (int i = 0; i < data.size(); i++) {
>
>             rawData.setObject(i, data.get(i));
>
>         }
>
>         this.writeData(internalName, rawData, origin);
>
>     }
>
>
>
>    public void writeVariable(String name, int[] origin, Array values)
> throws IOException, InvalidRangeException {
>
>         LOG.trace("Wrting Variable {} to netcdf file", name);
>
>
>
>         Variable var = netcdfFileWriter.findVariable(name);
>
>         Objects.requireNonNull(var, String.format("Variable with name: %s
> cannot be found", name));
>
>         this.netcdfFileWriter.write(var, origin, values);
>
>     }
>
>
>
> What we then end up seeing if sampled by VisualVM (working on another
> profiler so I can get average call time) that
> `ucar.nc2.jni.netcdf.Nc4Iosp.writeData()` is using a lot of time to run.
>
>
>
> Hope this helps clarify my situation
>
>
>
> *From:* Bob Simons - NOAA Federal [mailto:bob.simons@xxxxxxxx]
> *Sent:* 28 June 2016 16:32
> *To:* Robin Moss
> *Cc:* netcdf-java@xxxxxxxxxxxxxxxx
> *Subject:* Re: [netcdf-java] Performance Issues and Buffering
>
>
>
> You don't say *how* you are writing the data, other than "a row at a
> time".
>
>
>
> Is the row dimension an unlimited dimension? (That is what I would
> recommend trying.)
>
>
>
> Or have you pre-allocated space in the variables and are now writing data
> into that space?
>
>
>
> Or are you reading the entire file, adding one row of data, then writing
> the entire file? (That is bound to be slow when the number of rows gets
> larger.)
>
>
>
>
>
>
>
>
>
> On Tue, Jun 28, 2016 at 1:26 AM, Robin Moss <robin.moss@xxxxxxxxxxxxxx>
> wrote:
>
> Hello,
>
>
>
> I’m hoping I can get some pointers to improve the way im using the NetCDF
> library.
>
>
>
> At the moment the processing I’m doing is writing to several different
> NetCDF files, multiple variables a row at a time. These are not currently
> multi-threaded.
>
>
>
> When the processed data is small I don’t see any issues (100’s of rows),
> however when I start running a bigger chain (10’s of thousands of rows) I
> see the performance of NetCDF Java plummet, a quick look at whats happening
> with VisualVM shows that most of my application times (~60%) is spent in
> `Nc4Iosp.writeData()`.
>
>
>
> Which leads me to believe I’m using the library wrong J, my initial
> thought having worked with the C Library directly before was to adjust the
> write buffer, but I don’t see any support for that in the Java lib and
> considering it would likely affect the C Lib I’m not sure it would help
> with the write data call.
>
>
>
> I had briefly looked into just buffering my rows so I write every 10-100
> rows to see what effect that would have on performance and memory usage,
> however I hit a bit of an issue with the variables that have an unlimited
> dimension of columns (most variables I have are row x column), in that I
> was unable to figure out how to create an Array that supported unlimited
> dimensions.
>
>
>
> We currently use the NetcdfFileWriter to writer data to the underlying
> NetCDF 4 files, I know the API suggests using the FileWriter2, but I
> couldn’t see a way to use that, that also allowed us to ‘stream’ data into
> the underlying files.
>
>
>
> Any suggestions would be greatly appreciated.
>
>
>
> Thanks,
>
> Robin
>
>
>
>
>
> WARNING: This message contains confidential and/or proprietary information
> which may be subject to privilege or immunity and which is intended for the
> use of its addressee only. Should you receive this message in error, you
> are kindly requested to inform the sender and to definitively remove it
> from any paper or electronic format. Any other use of this e-mail is
> strictly forbidden. Thank you in advance for your cooperation.
>
> Please consider the environment before printing this email.
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
>
>
> --
>
> Sincerely,
>
> Bob Simons
> IT Specialist
> Environmental Research Division
> NOAA Southwest Fisheries Science Center
> 99 Pacific St., Suite 255A      (New!)
> Monterey, CA 93940               (New!)
> Phone: (831)333-9878            (New!)
>
> Fax:   (831)648-8440
> Email: bob.simons@xxxxxxxx
>
> The contents of this message are mine personally and
> do not necessarily reflect any position of the
> Government or the National Oceanic and Atmospheric Administration.
> <>< <>< <>< <>< <>< <>< <>< <>< <><
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: