Re: [netcdfgroup] Hello and a question on data alignment

To: Mark Harfouche <mark.harfouche@xxxxxxxxx>
Subject: Re: [netcdfgroup] Hello and a question on data alignment
From: Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
Date: Sat, 8 Jan 2022 06:18:32 -0700

Howdy Mark!

As you suggest, adding control over alignment would be great. Can you
submit a PR with changes to the library to support it?

Ed Hartnett

On Sat, Jan 8, 2022 at 5:30 AM Mark Harfouche <mark.harfouche@xxxxxxxxx>
wrote:

> Hello,
>
> My name is Mark Harfouche, I'm a researcher and engineer focusing on
> productizing new computational optics tools for biology.
>
> I had a question about data-alignment within netcdf4 files.
>
> Is it possible to specify the alignment boundary for each chunk of data?
> Lets say we had an array of bytes, but we only wanted that array to be
> aligned to boundaries of 128, 512, or even 4096 bytes.
> Is this possible in netcdf4?
> It seemed like this might be possible through calls to nc__enddef,
> https://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga5fe4a3fcd6db18d0583ac47f04f7ac60
> but I tried to adjust those and they didn't seem to have the desired effect.
>
> There seemed to be a post from 2014 discussing this, but I can't find the
> referenced issue
> https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg12328.html
>
> From my research, it seems like it should be possible to do through
> H5Pset/get_alignment
> https://support.hdfgroup.org/HDF5/doc/UG/FmSource/08_TheFile_favicon_test.html
>
> I typically use netcdf4 through xarray, which in turn uses the
> netcdf4-python backend.
>
> The python code below illustrates a typical problem we face where data
> becomes align to an offset of 6 bytes, not very ideal in many circumstances
> were performance is desired.
>
> Thank you very much for your help,
>
> Best,
>
> Mark
>
> ```
> import xarray as xr
> import numpy as np
> import netCDF4
> from pathlib import Path
>
> basic_filename = "basic_file_netcdf4.nc"
> if Path(basic_filename).exists():
>     Path(basic_filename).unlink()
>
> dataset = xr.DataArray(
>     np.zeros((3072, 3072), dtype='uint8'),
>     dims=("y", "x"),
>     coords={
>         "y": np.arange(3072, dtype=int),
>         "x": np.arange(3072, dtype=int),
>     },
>     name='images').to_dataset()
>
> dataset.to_netcdf(basic_filename, format="NETCDF4", engine="netcdf4")
>
> import h5py
> h5file = h5py.File(basic_filename)
> h5dataset = h5file.get("images")
> offset = h5dataset.id.get_offset()
> print(offset % 4096)
> print(offset % 2048)
> print(offset % 1024)
> print(offset % 512)
> print(offset % 128)
> print(offset % 64)
>
> """
> 3206
> 1158
> 134
> 134
> 6
> 6
> """
> ```
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> https://www.unidata.ucar.edu/mailing_lists/
>

Follow-Ups:
- Re: [netcdfgroup] Hello and a question on data alignment
  - From: Mark Harfouche

References:
- [netcdfgroup] Hello and a question on data alignment
  - From: Mark Harfouche