[netcdfgroup] retrieving data from corrupted/truncated netcdf file

To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: [netcdfgroup] retrieving data from corrupted/truncated netcdf file
From: Ramakrishnan N <ram.n.krishnan@xxxxxxxxx>
Date: Mon, 16 May 2022 13:14:08 -0400
I use NetCDF format to store molecular dynamics trajectories generated by
OpenMM with the AMBER force field. Recently, one of the servers running the
simulations had some unknown issue due to which all the NetCDF files (each
~8 GB) generated on this server are not readable by any of the netCDF
utilities. The trajectories are nearly 750 ns long for which the typical
runtime is ~2 months. I am looking for some help/advice to retrieve as much
data as possible from the corrupted files. I am providing the required info
based on a couple of threads (very old
<https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg00201.html>
and more recent
<https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg14595.html>)
on this mailing list.


   - *ncinfo gives the following error message:*


(openmm)  >> ncinfo prod.nc
Traceback (most recent call last):
  File "/opt/anaconda3/envs/openmm/bin/ncinfo", line 11, in <module>
    sys.exit(ncinfo())
  File
"/opt/anaconda3/envs/openmm/lib/python3.9/site-packages/netCDF4/utils.py",
line 550, in ncinfo
    f = Dataset(filename)
  File "src/netCDF4/_netCDF4.pyx", line 2307, in
netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 1925, in
netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b'prod.nc'



   -
*octal dump shows that the file of type netcdf3 and contains data (as noted
   here
   
<https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg14595.html>)
   *


(openmm)  >> od -c prod.nc | head -n 30
0000000    C   D   F 002  \0  \0   x   I  \0  \0  \0  \n  \0  \0  \0 006
0000020   \0  \0  \0 005   f   r   a   m   e  \0  \0  \0  \0  \0  \0  \0
0000040   \0  \0  \0  \a   s   p   a   t   i   a   l  \0  \0  \0  \0 003
0000060   \0  \0  \0 004   a   t   o   m  \0  \0   O 271  \0  \0  \0  \f
0000100    c   e   l   l   _   s   p   a   t   i   a   l  \0  \0  \0 003
0000120   \0  \0  \0  \f   c   e   l   l   _   a   n   g   u   l   a   r
0000140   \0  \0  \0 003  \0  \0  \0 005   l   a   b   e   l  \0  \0  \0
0000160   \0  \0  \0 005  \0  \0  \0  \f  \0  \0  \0 006  \0  \0  \0 005
0000200    t   i   t   l   e  \0  \0  \0  \0  \0  \0 002  \0  \0  \0   4
0000220    C   R   E   A   T   E   D       a   t       2   0   2   2   -
0000240    0   2   -   2   3       1   6   :   3   1   :   2   0   .   8
0000260    0   6   5   9   3       o   n       u   s   a   m   -   a   m
0000300    b   e   r   1  \0  \0  \0  \v   a   p   p   l   i   c   a   t
0000320    i   o   n  \0  \0  \0  \0 002  \0  \0  \0 005   O   m   n   i
0000340    a  \0  \0  \0  \0  \0  \0  \a   p   r   o   g   r   a   m  \0
0000360   \0  \0  \0 002  \0  \0  \0 006   M   D   T   r   a   j  \0  \0
0000400   \0  \0  \0 016   p   r   o   g   r   a   m   V   e   r   s   i
0000420    o   n  \0  \0  \0  \0  \0 002  \0  \0  \0 005   1   .   9   .
0000440    5  \0  \0  \0  \0  \0  \0  \v   C   o   n   v   e   n   t   i
0000460    o   n   s  \0  \0  \0  \0 002  \0  \0  \0 005   A   M   B   E
0000500    R  \0  \0  \0  \0  \0  \0 021   C   o   n   v   e   n   t   i
0000520    o   n   V   e   r   s   i   o   n  \0  \0  \0  \0  \0  \0 002
0000540   \0  \0  \0 003   1   .   0  \0  \0  \0  \0  \v  \0  \0  \0  \a
0000560   \0  \0  \0  \f   c   e   l   l   _   a   n   g   u   l   a   r
0000600   \0  \0  \0 002  \0  \0  \0 003  \0  \0  \0 005  \0  \0  \0  \0
0000620   \0  \0  \0  \0  \0  \0  \0 002  \0  \0  \0 020  \0  \0  \0  \0
0000640   \0  \0 003   <  \0  \0  \0  \f   c   e   l   l   _   s   p   a
0000660    t   i   a   l  \0  \0  \0 001  \0  \0  \0 003  \0  \0  \0  \0
0000700   \0  \0  \0  \0  \0  \0  \0 002  \0  \0  \0 004  \0  \0  \0  \0
0000720   \0  \0 003   L  \0  \0  \0  \a   s   p   a   t   i   a   l  \0


   - *I know the structure of my NetCDF from an identical file  that is
   readable*


(openmm)  >> ncdump -h prod.nc
netcdf prod {
dimensions:
frame = UNLIMITED ; // (578 currently)
spatial = 3 ;
atom = 20504 ;
variables:
char spatial(spatial) ;
float time(frame) ;
time:units = "picosecond" ;
float coordinates(frame, atom, spatial) ;
coordinates:units = "angstrom" ;

// global attributes:
:Conventions = "AMBER" ;
:ConventionVersion = "1.0" ;
:application = "AmberTools" ;
:program = "ParmEd" ;
:programVersion = "3.4.1" ;
:title = "ParmEd-created trajectory" ;
}


Given this information,  could you please suggest ways to retrieve my data?
Any help in this regard will be greatly appreciated.

Best
Ram
2022 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: