NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

[netcdf-java] newbie question on NetCDF file overhead (revisited)

Hi again-

I got some feedback that the size issue may either be related to the use of
String variables or the layout of the schema. So, I tried simplifying even
further to take both of those factors out of the equation. The test program
below has only one variable, the time dimension as a LONG.

Here is the truncated ncdump of the output file:

netcdf output2 {
dimensions:
time = UNLIMITED ; // (10000 currently)
variables:
int64 time(time) ;
time:units = "milliseconds since 1970-01-01T00:00:00Z" ;
data:

 time = 1398978611132, 1398978611133, 1398978611134, 1398978611135,
    1398978611136, 1398978611137, 1398978611138, 1398978611139,
    1398978611140, 1398978611141, 1398978611142, 1398978611143,
    1398978611144, 1398978611145, 1398978611146, 1398978611147,
    1398978611148, 1398978611149, 1398978611150, 1398978611151,
    1398978611152, 1398978611153, 1398978611154, 1398978611155,
...
    <thousands of lines removed>
...
    1398978621104, 1398978621105, 1398978621106, 1398978621107,
    1398978621108, 1398978621109, 1398978621110, 1398978621111,
    1398978621112, 1398978621113, 1398978621114, 1398978621115,
    1398978621116, 1398978621117, 1398978621118, 1398978621119,
    1398978621120, 1398978621121, 1398978621122, 1398978621123,
    1398978621124, 1398978621125, 1398978621126, 1398978621127,
    1398978621128, 1398978621129, 1398978621130, 1398978621131 ;
}

The raw data is 8 bytes * 10000 records, or 80000 bytes. However, the
NetCDF-4 file created is 537872 bytes.  This is 6.7x larger, or 85%
overhead. :(  Hoping that the NetCDF format overhead is just stands out
with small datasets, I did an additional run of 1M records. The output file
was 53.4MB, also 6.7x larger.

I'm at a loss as to what the issue might be, unless this is just a fact of
life for NetCDF files? Any suggestions or insights appreciated!

jeff

===
import org.joda.time.DateTime;
import ucar.ma2.ArrayLong;
import ucar.ma2.DataType;
import ucar.ma2.InvalidRangeException;
import ucar.nc2.*;

import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;

public class TestGenFile2 {
  public static void main(String[] args) {
    NetcdfFileWriter dataFile = null;

    try {
      try {

        // define the file
        String filePathName = "output2.nc";

        // delete the file if it already exists
        Path path = FileSystems.getDefault().getPath(filePathName);
        Files.deleteIfExists(path);

        // enter definition mode for this NetCDF-4 file
        dataFile =
NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, filePathName);

        // create the root group
        Group rootGroup = dataFile.addGroup(null, null);

        // define dimensions, in this case only one: time
        Dimension timeDim = dataFile.addUnlimitedDimension("time");
        List<Dimension> dimList = new ArrayList<>();
        dimList.add(timeDim);

        // define variables
        Variable time = dataFile.addVariable(rootGroup, "time",
DataType.LONG, dimList);
        dataFile.addVariableAttribute(time, new Attribute("units",
"milliseconds since 1970-01-01T00:00:00Z"));

        // create the file
        dataFile.create();

        // create 1-D arrays to hold data values (time is the dimension)
        ArrayLong timeArray = new ArrayLong.D1(1);

        int[] origin = new int[]{0};
        long startTime = 1398978611132L;

        // write the records to the file
        for (int i = 0; i < 10000; i++) {
          // load data into array variables
          timeArray.set(timeArray.getIndex(), startTime++);

          origin[0] = i;

          // write a record
          dataFile.write(time, origin, timeArray);
        }
      } finally {
        if (null != dataFile) {
          // close the file
          dataFile.close();
        }
      }
    } catch (IOException | InvalidRangeException e) {
      e.printStackTrace();
    }
  }
}

-- 
Jeff Johnson
DSCOVR Ground System Development
Space Weather Prediction Center
jeff.m.johnson@xxxxxxxx
  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: