NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hello, My name is Mark Harfouche, I'm a researcher and engineer focusing on productizing new computational optics tools for biology. I had a question about data-alignment within netcdf4 files. Is it possible to specify the alignment boundary for each chunk of data? Lets say we had an array of bytes, but we only wanted that array to be aligned to boundaries of 128, 512, or even 4096 bytes. Is this possible in netcdf4? It seemed like this might be possible through calls to nc__enddef, https://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga5fe4a3fcd6db18d0583ac47f04f7ac60 but I tried to adjust those and they didn't seem to have the desired effect. There seemed to be a post from 2014 discussing this, but I can't find the referenced issue https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg12328.html >From my research, it seems like it should be possible to do through H5Pset/get_alignment https://support.hdfgroup.org/HDF5/doc/UG/FmSource/08_TheFile_favicon_test.html I typically use netcdf4 through xarray, which in turn uses the netcdf4-python backend. The python code below illustrates a typical problem we face where data becomes align to an offset of 6 bytes, not very ideal in many circumstances were performance is desired. Thank you very much for your help, Best, Mark ``` import xarray as xr import numpy as np import netCDF4 from pathlib import Path basic_filename = "basic_file_netcdf4.nc" if Path(basic_filename).exists(): Path(basic_filename).unlink() dataset = xr.DataArray( np.zeros((3072, 3072), dtype='uint8'), dims=("y", "x"), coords={ "y": np.arange(3072, dtype=int), "x": np.arange(3072, dtype=int), }, name='images').to_dataset() dataset.to_netcdf(basic_filename, format="NETCDF4", engine="netcdf4") import h5py h5file = h5py.File(basic_filename) h5dataset = h5file.get("images") offset = h5dataset.id.get_offset() print(offset % 4096) print(offset % 2048) print(offset % 1024) print(offset % 512) print(offset % 128) print(offset % 64) """ 3206 1158 134 134 6 6 """ ```
netcdfgroup
archives: