NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdf-java] newbie question on NetCDF file overhead

Hi Jeff,

Looks like you are writing netCDF-4 files. Given that, I believe the
problem is the default chunking that is used when an unlimited dimension
is involved. From an old email conversation on the netcdfgroup email
list [1] which references a netCDF-C chunking document [2], it sounds
like each record along the unlimited dimension is a single chunk. Since
each of your records is so small, that's a lot of overhead and likely
the culprit in the larger than expected files you are seeing.

In your code below, try the version of NetcdfFileWriter.createWriter()
that has a Nc4Chunking parameter. Looks like the easiest approach is to
use the "standard" strategy

> Nc4ChunkingStrategyImpl.factory(Nc4Chunking.Strategy.standard, 0, false)

and add "_ChunkSize" attributes to your variables with a single integer
value since you only have one dimension. I'm not a chunking expert but
maybe start with a value of 2000. I've included a few links to some blog
posts on chunking and compression ([3], [4], and [5]) which discuss
choosing chunk sizes.

Hope that helps.

Ethan


[1]
https://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2012/msg00005.html

[2]
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/nc_005fdef_005fvar_005fchunking.html#nc_005fdef_005fvar_005fchunking

[3]
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters

[4]
http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes

[5] http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression

On 4/30/2014 5:09 PM, Jeff Johnson - NOAA Affiliate wrote:
> Sorry, correction - raw data = 300000 bytes, so NetCDF is 8x larger.
> 
> jeff
> 
> 
> On Wed, Apr 30, 2014 at 4:37 PM, Jeff Johnson - NOAA Affiliate
> <jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>> wrote:
> 
>     Hi all-
> 
>     I'm working on generating my first NetCDF files and have a question.
>     The files I'm creating seem to be far larger than I would have
>     thought necessary to hold the given data. I'm wondering if there is
>     something I can do to trim this down a bit.
> 
>     Our data is simple time-series data (one unlimited dimension). Below
>     is a simple Java test program that generates a file with 10000
>     records, each of which contains a 24-character timestamp string and
>     three 2-byte values. This gives a raw data requirement of 30000
>     bytes. The generated NetCDF file is 2420656 bytes, or 80x larger. Is
>     this what is expected?  In my development with real data I'm seeing
>     7MB of data creating an 86MB NetCDF file, etc. It seems to settle
>     out at about 12x as the data sets grow, which is still pretty
>     onerous. Any insights or suggestions appreciated.
> 
>     package gov.noaa.swpc.solarwind;
> 
>     import org.joda.time.DateTime;
>     import ucar.ma2.ArrayShort;
>     import ucar.ma2.ArrayString;
>     import ucar.ma2.DataType;
>     import ucar.ma2.InvalidRangeException;
>     import ucar.nc2.*;
> 
>     import java.io.IOException;
>     import java.nio.file.FileSystems;
>     import java.nio.file.Files;
>     import java.nio.file.Path;
>     import java.util.ArrayList;
>     import java.util.List;
> 
>     public class TestGenFile {
>       public static void main(String[] args) {
>         DateTime startDate = new DateTime();
>         DateTime endDate = startDate.plusDays(1);
> 
>         NetcdfFileWriter dataFile = null;
> 
>         try {
>           try {
> 
>             // define the file
>             String filePathName = "output.nc <http://output.nc>";
> 
>             // delete the file if it already exists
>             Path path = FileSystems.getDefault().getPath(filePathName);
>             Files.deleteIfExists(path);
> 
>             // enter definition mode for this NetCDF-4 file
>             dataFile =
>     NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4,
>     filePathName);
> 
>             // create the root group
>             Group rootGroup = dataFile.addGroup(null, null);
> 
>             // define the global attributes
>             dataFile.addGroupAttribute(rootGroup, new
>     Attribute("startDate", startDate.toString()));
>             dataFile.addGroupAttribute(rootGroup, new
>     Attribute("endDate", endDate.toString()));
> 
>             // define dimensions, in this case only one: time
>             Dimension timeDim = dataFile.addUnlimitedDimension("time");
>             List<Dimension> dimList = new ArrayList<>();
>             dimList.add(timeDim);
> 
>             // define variables
>             Variable time = dataFile.addVariable(rootGroup, "time",
>     DataType.STRING, dimList);
>             dataFile.addVariableAttribute(time, new
>     Attribute("standard_name", "time"));
> 
>             Variable bx = dataFile.addVariable(rootGroup, "bx",
>     DataType.SHORT, dimList);
>             dataFile.addVariableAttribute(bx, new Attribute("long_name",
>     "IMF Bx"));
>             dataFile.addVariableAttribute(bx, new Attribute("units",
>     "raw counts"));
> 
>             Variable by = dataFile.addVariable(rootGroup, "by",
>     DataType.SHORT, dimList);
>             dataFile.addVariableAttribute(by, new Attribute("long_name",
>     "IMF By"));
>             dataFile.addVariableAttribute(by, new Attribute("units",
>     "raw counts"));
> 
>             Variable bz = dataFile.addVariable(rootGroup, "bz",
>     DataType.SHORT, dimList);
>             dataFile.addVariableAttribute(bz, new Attribute("long_name",
>     "IMF Bz"));
>             dataFile.addVariableAttribute(bz, new Attribute("units",
>     "raw counts"));
> 
>             // create the file
>             dataFile.create();
> 
>             // create 1-D arrays to hold data values (time is the dimension)
>             ArrayString timeArray = new ArrayString.D1(1);
>             ArrayShort.D1 bxArray = new ArrayShort.D1(1);
>             ArrayShort.D1 byArray = new ArrayShort.D1(1);
>             ArrayShort.D1 bzArray = new ArrayShort.D1(1);
> 
>             int[] origin = new int[]{0};
> 
>             // write the records to the file
>             for (int i = 0; i < 10000; i++) {
>               // load data into array variables
>               timeArray.setObject(timeArray.getIndex(), new
>     DateTime().toString());
>               bxArray.set(0, (short) i);
>               byArray.set(0, (short) (i * 2));
>               bzArray.set(0, (short) (i * 3));
> 
>               origin[0] = i;
> 
>               // write a record
>               dataFile.write(time, origin, timeArray);
>               dataFile.write(bx, origin, bxArray);
>               dataFile.write(by, origin, byArray);
>               dataFile.write(bz, origin, bzArray);
>             }
>           } finally {
>             if (null != dataFile) {
>               // close the file
>               dataFile.close();
>             }
>           }
>         } catch (IOException | InvalidRangeException e) {
>           e.printStackTrace();
>         }
>       }
>     }
> 
>     thanks,
>     jeff
> 
>     -- 
>     Jeff Johnson
>     DSCOVR Ground System Development
>     Space Weather Prediction Center
>     jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>
> 
> 
> 
> 
> -- 
> Jeff Johnson
> DSCOVR Ground System Development
> Space Weather Prediction Center
> jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>
> 303-497-6260
> 
> 
> _______________________________________________
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 
> 

-- 
Ethan Davis                                       UCAR Unidata Program
edavis@xxxxxxxxxxxxxxxx                    http://www.unidata.ucar.edu



  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: