NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: [netcdfgroup] Hello and a question on data alignment

Mark,

Yes I think that is the correct place to make the change.

Is this change always on? Or does the user turn it on and off?

Yes, please open an issue on netcdf-c and continue the discussion there.
That is the appropriate place, so that a record can be kept...

Ed

On Sat, Jan 8, 2022 at 8:12 AM Mark Harfouche <mark.harfouche@xxxxxxxxx>
wrote:

> Hi Ed,
>
> Thank you for the confirmation that the feature does not exist yet and the
> quick reply.
>
> I was able to achieve the results using h5netcdf as a demo. I want to keep
> feature parity between the python netcdf backend and the h5netcdf backend.
> One needs to change the File Access Property List at opening time for the
> HDF5 file.
>
> Can you confirm that this is the correct location to make a patch to?
>
> https://github.com/Unidata/netcdf-c/blob/988e771a9ed99619c2e3261aea81f127dd7fa3d8/libhdf5/hdf5open.c#L772
>
> If so, I might be able to make a pull request in the coming months.
>
> Would it be appropriate to open a github issue with this info? or is the
> mailing list the appropriate location for this information?
>
> Best,
>
> Mark
>
>
> On Sat, Jan 8, 2022 at 8:18 AM Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
> wrote:
>
>> Howdy Mark!
>>
>> As you suggest, adding control over alignment would be great. Can you
>> submit a PR with changes to the library to support it?
>>
>> Ed Hartnett
>>
>> On Sat, Jan 8, 2022 at 5:30 AM Mark Harfouche <mark.harfouche@xxxxxxxxx>
>> wrote:
>>
>>> Hello,
>>>
>>> My name is Mark Harfouche, I'm a researcher and engineer focusing on
>>> productizing new computational optics tools for biology.
>>>
>>> I had a question about data-alignment within netcdf4 files.
>>>
>>> Is it possible to specify the alignment boundary for each chunk of data?
>>> Lets say we had an array of bytes, but we only wanted that array to be
>>> aligned to boundaries of 128, 512, or even 4096 bytes.
>>> Is this possible in netcdf4?
>>> It seemed like this might be possible through calls to nc__enddef,
>>> https://www.unidata.ucar.edu/software/netcdf/docs/group__datasets.html#ga5fe4a3fcd6db18d0583ac47f04f7ac60
>>> but I tried to adjust those and they didn't seem to have the desired effect.
>>>
>>> There seemed to be a post from 2014 discussing this, but I can't find
>>> the referenced issue
>>> https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg12328.html
>>>
>>> From my research, it seems like it should be possible to do through
>>> H5Pset/get_alignment
>>> https://support.hdfgroup.org/HDF5/doc/UG/FmSource/08_TheFile_favicon_test.html
>>>
>>> I typically use netcdf4 through xarray, which in turn uses the
>>> netcdf4-python backend.
>>>
>>> The python code below illustrates a typical problem we face where data
>>> becomes align to an offset of 6 bytes, not very ideal in many circumstances
>>> were performance is desired.
>>>
>>> Thank you very much for your help,
>>>
>>> Best,
>>>
>>> Mark
>>>
>>> ```
>>> import xarray as xr
>>> import numpy as np
>>> import netCDF4
>>> from pathlib import Path
>>>
>>> basic_filename = "basic_file_netcdf4.nc"
>>> if Path(basic_filename).exists():
>>>     Path(basic_filename).unlink()
>>>
>>> dataset = xr.DataArray(
>>>     np.zeros((3072, 3072), dtype='uint8'),
>>>     dims=("y", "x"),
>>>     coords={
>>>         "y": np.arange(3072, dtype=int),
>>>         "x": np.arange(3072, dtype=int),
>>>     },
>>>     name='images').to_dataset()
>>>
>>> dataset.to_netcdf(basic_filename, format="NETCDF4", engine="netcdf4")
>>>
>>> import h5py
>>> h5file = h5py.File(basic_filename)
>>> h5dataset = h5file.get("images")
>>> offset = h5dataset.id.get_offset()
>>> print(offset % 4096)
>>> print(offset % 2048)
>>> print(offset % 1024)
>>> print(offset % 512)
>>> print(offset % 128)
>>> print(offset % 64)
>>>
>>> """
>>> 3206
>>> 1158
>>> 134
>>> 134
>>> 6
>>> 6
>>> """
>>> ```
>>>
>>>
>>> _______________________________________________
>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>> recorded in the Unidata inquiry tracking system and made publicly
>>> available through the web.  Users who post to any of the lists we
>>> maintain are reminded to remove any personal information that they
>>> do not want to be made public.
>>>
>>>
>>> netcdfgroup mailing list
>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> https://www.unidata.ucar.edu/mailing_lists/
>>>
>>