[netcdfgroup] Hello and a question on data alignment

To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: [netcdfgroup] Hello and a question on data alignment
From: Mark Harfouche <mark.harfouche@xxxxxxxxx>
Date: Sat, 8 Jan 2022 07:29:53 -0500

Hello,

My name is Mark Harfouche, I'm a researcher and engineer focusing on
productizing new computational optics tools for biology.

I had a question about data-alignment within netcdf4 files.

Is it possible to specify the alignment boundary for each chunk of data?
Lets say we had an array of bytes, but we only wanted that array to be
aligned to boundaries of 128, 512, or even 4096 bytes.
Is this possible in netcdf4?
It seemed like this might be possible through calls to nc__enddef,
https://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga5fe4a3fcd6db18d0583ac47f04f7ac60
but I tried to adjust those and they didn't seem to have the desired effect.

There seemed to be a post from 2014 discussing this, but I can't find the
referenced issue
https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg12328.html

>From my research, it seems like it should be possible to do through
H5Pset/get_alignment
https://support.hdfgroup.org/HDF5/doc/UG/FmSource/08_TheFile_favicon_test.html

I typically use netcdf4 through xarray, which in turn uses the
netcdf4-python backend.

The python code below illustrates a typical problem we face where data
becomes align to an offset of 6 bytes, not very ideal in many circumstances
were performance is desired.

Thank you very much for your help,

Best,

Mark

```
import xarray as xr
import numpy as np
import netCDF4
from pathlib import Path

basic_filename = "basic_file_netcdf4.nc"
if Path(basic_filename).exists():
    Path(basic_filename).unlink()

dataset = xr.DataArray(
    np.zeros((3072, 3072), dtype='uint8'),
    dims=("y", "x"),
    coords={
        "y": np.arange(3072, dtype=int),
        "x": np.arange(3072, dtype=int),
    },
    name='images').to_dataset()

dataset.to_netcdf(basic_filename, format="NETCDF4", engine="netcdf4")

import h5py
h5file = h5py.File(basic_filename)
h5dataset = h5file.get("images")
offset = h5dataset.id.get_offset()
print(offset % 4096)
print(offset % 2048)
print(offset % 1024)
print(offset % 512)
print(offset % 128)
print(offset % 64)

"""
3206
1158
134
134
6
6
"""
```

Follow-Ups:
- Re: [netcdfgroup] Hello and a question on data alignment
  - From: Ed Hartnett

2022 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: