NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdf-java] newbie question on NetCDF file overhead (revisited)

Hi Jeff:

The latest development version of netcdf-java (4.5.0) does chunking and 
compression by default, using a more reasonable algorithm. So if you try your 
code below with that version, you should see much better results.

details are here:

http://www.unidata.ucar.edu/software/thredds/v4.5/netcdf-java/reference/netcdf4Clibrary.html#writing

John

On 5/1/2014 3:31 PM, Jeff Johnson - NOAA Affiliate wrote:
> Hi again-
> 
> I got some feedback that the size issue may either be related to the use 
> of String variables or the layout of the schema. So, I tried simplifying 
> even further to take both of those factors out of the equation. The test 
> program below has only one variable, the time dimension as a LONG.
> 
> Here is the truncated ncdump of the output file:
> 
> netcdf output2 {
> dimensions:
> time = UNLIMITED ; // (10000 currently)
> variables:
> int64 time(time) ;
> time:units = "milliseconds since 1970-01-01T00:00:00Z" ;
> data:
> 
>   time = 1398978611132, 1398978611133, 1398978611134, 1398978611135,
>      1398978611136, 1398978611137, 1398978611138, 1398978611139,
>      1398978611140, 1398978611141, 1398978611142, 1398978611143,
>      1398978611144, 1398978611145, 1398978611146, 1398978611147,
>      1398978611148, 1398978611149, 1398978611150, 1398978611151,
>      1398978611152, 1398978611153, 1398978611154, 1398978611155,
> ...
>      <thousands of lines removed>
> ...
>      1398978621104, 1398978621105, 1398978621106, 1398978621107,
>      1398978621108, 1398978621109, 1398978621110, 1398978621111,
>      1398978621112, 1398978621113, 1398978621114, 1398978621115,
>      1398978621116, 1398978621117, 1398978621118, 1398978621119,
>      1398978621120, 1398978621121, 1398978621122, 1398978621123,
>      1398978621124, 1398978621125, 1398978621126, 1398978621127,
>      1398978621128, 1398978621129, 1398978621130, 1398978621131 ;
> }
> 
> The raw data is 8 bytes * 10000 records, or 80000 bytes. However, the 
> NetCDF-4 file created is 537872 bytes.  This is 6.7x larger, or 85% 
> overhead. :(  Hoping that the NetCDF format overhead is just stands out 
> with small datasets, I did an additional run of 1M records. The output 
> file was 53.4MB, also 6.7x larger.
> 
> I'm at a loss as to what the issue might be, unless this is just a fact 
> of life for NetCDF files? Any suggestions or insights appreciated!
> 
> jeff
> 
> ===
> import org.joda.time.DateTime;
> import ucar.ma2.ArrayLong;
> import ucar.ma2.DataType;
> import ucar.ma2.InvalidRangeException;
> import ucar.nc2.*;
> 
> import java.io.IOException;
> import java.nio.file.FileSystems;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.util.ArrayList;
> import java.util.List;
> 
> public class TestGenFile2 {
>    public static void main(String[] args) {
>      NetcdfFileWriter dataFile = null;
> 
>      try {
>        try {
> 
>          // define the file
>          String filePathName = "output2.nc <http://output2.nc>";
> 
>          // delete the file if it already exists
>          Path path = FileSystems.getDefault().getPath(filePathName);
>          Files.deleteIfExists(path);
> 
>          // enter definition mode for this NetCDF-4 file
>          dataFile = 
> NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4, filePathName);
> 
>          // create the root group
>          Group rootGroup = dataFile.addGroup(null, null);
> 
>          // define dimensions, in this case only one: time
>          Dimension timeDim = dataFile.addUnlimitedDimension("time");
>          List<Dimension> dimList = new ArrayList<>();
>          dimList.add(timeDim);
> 
>          // define variables
>          Variable time = dataFile.addVariable(rootGroup, "time", 
> DataType.LONG, dimList);
>          dataFile.addVariableAttribute(time, new Attribute("units", 
> "milliseconds since 1970-01-01T00:00:00Z"));
> 
>          // create the file
>          dataFile.create();
> 
>          // create 1-D arrays to hold data values (time is the dimension)
>          ArrayLong timeArray = new ArrayLong.D1(1);
> 
>          int[] origin = new int[]{0};
>          long startTime = 1398978611132L;
> 
>          // write the records to the file
>          for (int i = 0; i < 10000; i++) {
>            // load data into array variables
>            timeArray.set(timeArray.getIndex(), startTime++);
> 
>            origin[0] = i;
> 
>            // write a record
>            dataFile.write(time, origin, timeArray);
>          }
>        } finally {
>          if (null != dataFile) {
>            // close the file
>            dataFile.close();
>          }
>        }
>      } catch (IOException | InvalidRangeException e) {
>        e.printStackTrace();
>      }
>    }
> }
> 
> -- 
> Jeff Johnson
> DSCOVR Ground System Development
> Space Weather Prediction Center
> jeff.m.johnson@xxxxxxxx <mailto:jeff.m.johnson@xxxxxxxx>
> 
> 
> 
> _______________________________________________
> netcdf-java mailing list
> netcdf-java@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit: 
> http://www.unidata.ucar.edu/mailing_lists/
> 



  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdf-java archives: