NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
> > PACKING is a particular kind of compression that reduces the number of > bits per symbol by relaxing the requirement that symbols be aligned with > storage unit boundaries. An example would be packing 3 10-bit AVHRR > samples into 1 32-bit word, as opposed to using 3 16-bit words. Packing > may be either lossless or lossy, although when used without > qualification it usually means lossless packing (I can't imagine a > situation where lossy packing would be the most appropriate compression > scheme.) I wouldn't go as far as to say lossy packing is not appropriate. The GRIB format's packing is an example of "lossy packing" which is very appropriate. I have to assume since this is the netCDF group we're talking about scientific data, ie floating point numbers. IEEE for example is not exact yet we represent numbers using it all the time. IEEE floating point actually stores two separate values, a mantissa and an exponent which when combined are used to represent floating point numbers by using the following formula. F = m * 2 ** e The mantissa is 23 bits, the exponent is 8 bits and there is a sign bit. For simplification I'm not going to talk about the hidden bit. Anyhow another way to look at this is that there (2**23) possible values for the mantissa, (2**8) values for the sign bit, and (2**1). This yields (2**23)*(2**8)*(2**1) total *discrete* values that IEEE (minus the hidden bit) can represent with the widest possible range of values. The reason I'm discussing this is to make the following point: Any range of numbers can be represented by a specific discrete set of integers to within a certain number of significant digits. Now consider the following example of a GRIB like packing. What GRIB does is take advantage of the fact that a specific grid of values will not span the entire range of floating point values and in fact the max and min values of the data will only vary at the most by a few of orders of magnitude. Lets say we want to packing a floating point field into 8 bit values. This will allow us to represent the data to 2-3 significant digits in base ten. 2-3 digits can be enough to represent data for graphics or simple analysis without a problem. First find the min and max. var_min = min(var) var_max = max(var) Second subtract the min from the entire set. var_normalized = var - var_min add_offset = var_min The max value of var_normalized now defines the scale. max_normalized = max(var_normalized) Now the next step is the complicated one. We need to scale var_normalized into values from 0 <= x < (2**8). This is done with the following operation: scale_factor= (2**7)/(2**ceil(log(max_normalized)/log(2.0))) The log(max_normalized) gives you the number of orders of magnitude (base 10) of the data. Dividing by log of 2 gives you the exponent in base 2. Taking the ceiling of this number gives you maximum order of magnitude of the data in base two. The ratio 2**7/(2**...) when multiplied by var_normalized will yield values in a range (0 <= x <(2**8)) . output_values = var_normalized * scale_factor Now you just have to coerce the output_values array to an integer and mask the low order bits. In the output file you write the bytes and put attributes add_offset and scale_factor. To unpack the data you need only multiply by 1.0/scale_factor and add the add_offset. This is a lossy packing algorithm but can very useful. How many significant digits do you need to make a contour plot out of data or do elementary analysis (avg's, std deviations, differences etc.) Some data is made from objective analysis of irregular data, like station observations. I doubt this data is ever valid to more than 2 significant digits why store 7? -ethan Then you subtract the min value for each element of the array and then divide by the scale of the valid range, coerce the data to integers and mask the number of bits you want. This gives you the ability to unpack the data by multiplying the integer component by a scale_factor and adding an additive_offset a straight forward linear time operation. > > Regards, > /Frew > > #====================================================================== > # James Frew frew@xxxxxxxxxxxxx http://www.bren.ucsb.edu/~frew/ > # School of Environmental Science and Management +1.805.893.7356 vox > # University of California, Santa Barbara, CA 93106-5131 .7612 fax >
netcdfgroup
archives: