NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [netcdfgroup] Hello and a question on data alignment

Howdy Mark!

As you suggest, adding control over alignment would be great. Can you
submit a PR with changes to the library to support it?

Ed Hartnett

On Sat, Jan 8, 2022 at 5:30 AM Mark Harfouche <mark.harfouche@xxxxxxxxx>
wrote:

> Hello,
>
> My name is Mark Harfouche, I'm a researcher and engineer focusing on
> productizing new computational optics tools for biology.
>
> I had a question about data-alignment within netcdf4 files.
>
> Is it possible to specify the alignment boundary for each chunk of data?
> Lets say we had an array of bytes, but we only wanted that array to be
> aligned to boundaries of 128, 512, or even 4096 bytes.
> Is this possible in netcdf4?
> It seemed like this might be possible through calls to nc__enddef,
> https://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga5fe4a3fcd6db18d0583ac47f04f7ac60
> but I tried to adjust those and they didn't seem to have the desired effect.
>
> There seemed to be a post from 2014 discussing this, but I can't find the
> referenced issue
> https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg12328.html
>
> From my research, it seems like it should be possible to do through
> H5Pset/get_alignment
> https://support.hdfgroup.org/HDF5/doc/UG/FmSource/08_TheFile_favicon_test.html
>
> I typically use netcdf4 through xarray, which in turn uses the
> netcdf4-python backend.
>
> The python code below illustrates a typical problem we face where data
> becomes align to an offset of 6 bytes, not very ideal in many circumstances
> were performance is desired.
>
> Thank you very much for your help,
>
> Best,
>
> Mark
>
> ```
> import xarray as xr
> import numpy as np
> import netCDF4
> from pathlib import Path
>
> basic_filename = "basic_file_netcdf4.nc"
> if Path(basic_filename).exists():
>     Path(basic_filename).unlink()
>
> dataset = xr.DataArray(
>     np.zeros((3072, 3072), dtype='uint8'),
>     dims=("y", "x"),
>     coords={
>         "y": np.arange(3072, dtype=int),
>         "x": np.arange(3072, dtype=int),
>     },
>     name='images').to_dataset()
>
> dataset.to_netcdf(basic_filename, format="NETCDF4", engine="netcdf4")
>
> import h5py
> h5file = h5py.File(basic_filename)
> h5dataset = h5file.get("images")
> offset = h5dataset.id.get_offset()
> print(offset % 4096)
> print(offset % 2048)
> print(offset % 1024)
> print(offset % 512)
> print(offset % 128)
> print(offset % 64)
>
> """
> 3206
> 1158
> 134
> 134
> 6
> 6
> """
> ```
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> https://www.unidata.ucar.edu/mailing_lists/
>
  • 2022 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: