NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

Re: md5 checksum or gpg signature

I borrowed Tim's ideas for general approach, and wrote the attached scripts for binary mode checksums. They seem to be fairly efficient. I added more file diagnostics. See what you think of them.

I favor the idea of keeping the checksum inside the file. This seems like a natural for any self describing file format, and eliminates a record keeping problem.

Thanks for your original script, Tim. It was very impressive as a one-day turnaround, and I borrowed considerably from it.

--Dave A.
CDC/NOAA/CIRES
Boulder, Colorado USA

t.hume@xxxxxxxxxx wrote:
Hi,

I like your idea. I threw together a quick pdksh script to implement
something like you suggest. It assumes you have ncdump and the NetCDF
operators (in particular the ncatted program). Basically, I ncdump the
file, and calculate the MD5 sum. I then create a global attribute called
md5sum. To check the file, I ncdump it again, being careful not to
include the line containing the md5sum global attribute. If you look at
the attached script you'll get the idea.

The script seems to work OK on my Linux box, but I guess it is slow and
inefficient, especially on large NetCDF files. Perhaps someone has a
better solution, or might refine the script a bit?

Tim Hume
Bureau of Meteorology Research Centre
Melbourne
Australia

Script follows:

#!/bin/ksh
#
# A quick and dirty hack to incorporate a MD5 sum in a NetCDF file.
#
# Tim Hume.
# 4 February 2005.

export PATH=/bin:/usr/bin:/usr/local/bin:/arm/tph/bin

#
# Defaults.
#
action=checkmd5
ncfile=""

while [[ $# -ge 1 ]]
do
        case "${1}" in
                ( "-C" | "-c" )
                action="checkmd5"
                shift
                ;;
                ( "-S" | "-s" )
                action="makemd5"
                shift
                ;;
                ( "-H" | "-h" )
                echo "Usage: sign_netcdf [ -C ] [ -S ] file.nc"
                exit 0
                ;;
                ( * )
                ncfile="${1}"
                shift
                ;;
        esac
done

if [[ ! -f "${ncfile}" ]]
then
        echo "E: No such file: ${ncfile}"
        exit 1
fi

#
# Now check an existing MD5 sum, or create a new one.
#
md5sum=$(ncdump "${ncfile}" | grep -E -v -e '[[:space:]]+:md5sum =' | md5sum | 
awk '{print $1}')

if [[ "${action}" == "makemd5" ]]
then
        ncatted -h -a md5sum,global,o,c,"${md5sum}" "${ncfile}"
else
        md5sum_att=$(ncdump -h "${ncfile}" | grep -E -e '[[:space:]]+:md5sum =' | 
awk -F\" '{print $2}')
        if [[ "${md5sum}" == "${md5sum_att}" ]]
        then
                echo "Good MD5 sum: ${md5sum}"
        else
                echo "Bad MD5 sum. Actual sum is:  ${md5sum}"
                echo "Attribute says it should be: ${md5sum_att}"
        fi
fi


On Thu, 3 Feb 2005 23:38:16 +0100
Reimar Bauer <R.Bauer@xxxxxxxxxxxxx> wrote:


Dear all

One of my colleagues gots the idea to have included into each netCDF
file a  checksum or a signature to indicate if the file gots changed
by some kind of  damaging (a virus or some hardware failures).

The reason why we need such a information is the files gots larger and
larger  and you can't guarantee if a file which comes from a backup or
from a file  copy is the original one.

We have had a very interesting hardware failure on a harddisk on one
of our  systems which inhibits the copying of a netCDF file in that
way that's only  one parameter of a file was not readable. It takes a
long time to understand  why this file on an other machine was readed
right.
If there will be automaticly included a kind of self diagnostic it is
much  easier to find out why some data does not look as supposed. And
you know  immediatly that there is something wrong!

I think if there are ideas how to implent this feature it should be
done. It  is very important for all of us!


cheers
Reimar

--
Forschungszentrum Juelich
email: R.Bauer@xxxxxxxxxxxxx
http://www.fz-juelich.de/icg/icg-i/
=================================================================
a IDL library at ForschungsZentrum Juelich
http://www.fz-juelich.de/icg/icg-i/idl_icglib/idl_lib_intro.html




#!/bin/csh
#
# nc_add_checksum -- Add an MD5 file checksum to a NetCDF file.
#
# 2005-feb-04   by Dave Allured, NOAA/CIRES Climate Diagnostics Center
#               (CDC), Boulder Colorado USA.
#
# This version computes an MD5 binary mode checksum for the whole NetCDF
# file, and writes it into the same file as a global attribute.
#
# Acknowledgement goes to Tim Hume, Bureau of Meteorology Research Centre,
# Melbourne, Australia for the first working script, which uses ncdump and
# md5sum to add and verify a text mode checksum.
#
# This is the compliment to nc_verify_checksum, which confirms whole
# NetCDF files using the MD5 checksum that was added with this script.
#
# Usage:   nc_add_checksum file.nc
# Exit:    $status > 0 if error occurred
#
# Notes:  This version overwrites unused header space in the input file
# with the MD5 checksum and history information.  The entire file will
# be invisibly rewritten, costing extra time and temp file space, if
# sufficient extra header space was not allocated when the file was
# originally created (or last expanded).  See section 4.1 of the NetCDF
# Users Guide for a discussion of this issue.
#
#   http://my.unidata.ucar.edu/content/software/netcdf/docs/netcdf
#   /Parts-of-a-NetCDF-File.html
#
# The history attribute is also updated when the checksum attribute is
# first created.  This is desirable for good audit, but the behavior may
# be disabled by adding the "-h" option to the ncatted command below.
#
# This method depends on the NCO operator "ncatted" to reliably
# overwrite a previous character string attribute of identical length,
# while leaving the rest of the NetCDF file completely unchanged.

set ncfile   = "${1}"
set att_name = checksum_md5                  # standard global attribute name
set x32 = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   # standard blank attribute value

if ( ! -f "${ncfile}" ) then
   echo "E: No such file: ${ncfile}"
   exit 1
endif

# Add global checksum attribute to file.  Value is "standard blank" for now.
# If a previous checksum exists, it is overwritten.

ncatted -a ${att_name},global,o,c,${x32} "${ncfile}"
                 # No "-h" switch here, to record history of attrib creation.

set ncstatus = ${status}
if ( ${ncstatus} != 0 ) then
   echo "E: ncatted error ${status} while processing file:"
   echo "E: ${ncfile}"
   exit ${ncstatus}
endif

# Compute checksum while attribute is "standard blank".
# Then overwrite the checksum into file.

set cksum = `md5sum -b "${ncfile}" | awk '{print $1}'`
ncatted -h -a ${att_name},global,o,c,${cksum} "${ncfile}"  # "-h" required now

# All done.
#!/bin/csh
#
# nc_verify_checksum -- Verify a NetCDF file containing an MD5 file checksum.
#
# 2005-feb-05   by Dave Allured, NOAA/CIRES Climate Diagnostics Center
#               (CDC), Boulder Colorado USA.
#
# This version verifies an entire NetCDF file by recomputing the MD5
# binary mode file checksum, and comparing it with the checksum_md5
# global attribute.
#
# Acknowledgement goes to Tim Hume, Bureau of Meteorology Research Centre,
# Melbourne, Australia for the first working script, which uses ncdump and
# md5sum to add and verify a text mode checksum.
#
# This is the compliment to nc_add_checksum, which inserts the original MD5
# checksums into NetCDF files.
#
# Usage:   nc_verify_checksum file.nc
# Exit:    $status > 0 if verification failed or other error occurred
#
# Notes:  This version keeps the original file intact, by making a
# temporary file copy.  For very large NetCDF files, the user may need
# to change the location of the temporary file below.
#
# This method depends on the NCO operator "ncatted" to reliably
# overwrite a previous character string attribute of identical length,
# while leaving the rest of the NetCDF file completely unchanged.

set ncfile   = "${1}"
set att_name = checksum_md5                  # standard global attribute name
set x32 = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   # standard blank attribute value
set tempdir  = "/tmp/{$USER}/nc_verify_checksum/$$"
set tempfile = "${tempdir}/temp1.nc"

# Confirm that the file is NetCDF readable.

if ( ! -f "${ncfile}" ) then
   echo "E: No such file: ${ncfile}"
   exit 1
endif

ncdump -h "${ncfile}" > /dev/null              # trial read of file header
set ndstatus = ${status}

if ( ${ndstatus} != 0 ) then
   echo "E: File is not NetCDF readable, ncdump error ${ndstatus}:"
   echo "E: ${ncfile}"
   exit ${ndstatus}
endif

# Get the recorded value in the checksum attribute.

set att_line = `ncdump -h "${ncfile}" | grep ":${att_name} "`
set count1 = `echo "${att_line}" | grep -c "^:${att_name} "`

if ( ${count1} != 1 ) then
   echo "E: Global attribute ${att_name} is missing"
   exit 1
endif

set cksum_att = `echo "${att_line}" | awk -F\" '{print $2}'`

# Make a temporary copy of the NetCDF file.

mkdir -p "${tempdir}"
set mdstatus = ${status}

if ( ${mdstatus} != 0 ) then
   echo "E: Cannot create temp directory: ${tempdir}"
   exit ${mdstatus}
endif

cp "${ncfile}" "${tempfile}"
set cpstatus = ${status}

if ( ${cpstatus} != 0 ) then
   echo "E: Cannot copy to temp file: ${tempfile}"
   exit ${cpstatus}
endif

# Blank out the global checksum attribute.

# The new value must be "standard blank" to make the file identical
# to its original pre-checksum condition.

chmod u+w "${tempfile}"                        # must be write enabled
ncatted -h -a ${att_name},global,o,c,${x32} "${tempfile}"  # "-h" required here

set ncstatus = ${status}
if ( ${ncstatus} != 0 ) then
   echo "E: ncatted error ${status} while processing temp file copy:"
   echo "E: ${ncfile}"
   exit ${ncstatus}
endif

# Recompute the checksum of the modified, "blanked" file.

set cksum = `md5sum -b "${tempfile}" | awk '{print $1}'`

if ( "${cksum}" == "${cksum_att}" ) then
   echo "Good MD5 checksum: ${cksum}"            # will exit with 0 status 
(pass)
else
   echo "Bad MD5 checksum. Actual sum is: ${cksum}"
   echo "Attribute says it should be:     ${cksum_att}"
   rm "${tempfile}"
   exit 1
endif

rm "${tempfile}"                            # clean up the temp file

# All done.