NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [thredds] How are compressed netcdf4 files handled in TDS

On Apr 25, 2011, at 3:42 PM, John Caron wrote:

> On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
>> yes, internal compression.  All the files were made from netcdf3 files using 
>> NCO with the options:
>> 
>> ncks -4 -L 1
>> 
>> The results so far show a decrease in file size from 40% of original to 
>> 1/100 th of the original file size.   If the internally compressed data 
>> requests are cached differently than request to netcdf3 files, we want to 
>> take that into account when we do the tests, so that we do not just see the 
>> affect of differential cacheing.
>> 
>> When we have done tests on just local files, the reads where about  8 times 
>> slower from a compressed file.  But Rich Signell has found that the 
>> combination  of serialization/bandwidth is the bottleneck, and you hardly 
>> notice the difference in a remote access situation.  That is what we want to 
>> find out, because we run on very little money and with compression as 
>> mentioned above our RAIDS would go a lot farther, as long the hit to the 
>> access time is not too great.
>> 
>> Thanks,
>> 
>> -Roy
> 
> in netcdf4/hdf5, compression is tied to the chunking. Each chunk is 
> individually compressed, and must be completely decompressed to retrieve even 
> one value from that chunk. So the trick is to make your chunks correspond to 
> your "common cases" of data access. If thats possible, you should find that 
> compressed access is faster than non-compressed access, because IO is 
> smaller. but it will be highly dependent on that.

John, is there a loss of efficiency when compressing chunks compared to 
compressing the entire file? I vaguely recall that for some compression 
algorithms, compression efficiency is a function of the volume of data 
compressed.

Peter

> 
>> 
>> 
>> 
>> On Apr 25, 2011, at 12:28 PM, John Caron wrote:
>> 
>>> On 4/25/2011 11:30 AM, Roy Mendelssohn wrote:
>>>> Hi All:
>>>> 
>>>> We just converted one or our larger datasets  (larger in terms of the 
>>>> number of files that are aggregated) into compressed netCDF4. There is a 
>>>> substantial savings in storage, but we wanted to do a series of tests to 
>>>> see what hit in access time we would take, if any, wsince many of our 
>>>> users will make requests involving a lot of time periods.
>>>> 
>>>> In order to design these tests properly, we need to get a better 
>>>> understanding of how the TDS handles netcdf4 datasets that have 
>>>> compression.  Are the decompressed data cached, or more accurately cached 
>>>> any differently from data read from an uncompressed series of netcdf3 
>>>> files, or since the decompression is handled automatically on the read, is 
>>>> everything handled the same after that?
>>>> 
>>>> We would also be interested other peoples experience with compressed 
>>>> netcdf4 files in TDS, in particular when the extracts are not synoptic, 
>>>> but cover a lot of time periods in a region, or make a lot of very small 
>>>> calls to a large number of time periods  - such as we need to do for 
>>>> tagging data.
>>>> 
>>>> Thanks for any info,
>>>> 
>>>> -Roy
>>> Hi Roy:
>>> 
>>> I assume you mean internally compressed, not externally (like zipping up a 
>>> file) ?
>>> 
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit: 
>>> http://www.unidata.ucar.edu/mailing_lists/
>> **********************
>> "The contents of this message do not reflect any position of the U.S. 
>> Government or NOAA."
>> **********************
>> Roy Mendelssohn
>> Supervisory Operations Research Analyst
>> NOAA/NMFS
>> Environmental Research Division
>> Southwest Fisheries Science Center
>> 1352 Lighthouse Avenue
>> Pacific Grove, CA 93950-2097
>> 
>> e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
>> voice: (831)-648-9029
>> fax: (831)-648-8440
>> www: http://www.pfeg.noaa.gov/
>> 
>> "Old age and treachery will overcome youth and skill."
>> "From those who have been given much, much will be expected"
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 

--
Peter Cornillon
  215 South Ferry Road                                     Telephone: (401) 
874-6283
   Graduate School of Oceanography                          Fax: (401) 874-6283
    University of Rhode Island                                 Internet: 
pcornillon@xxxxxxxxxxx
     Narragansett, RI 02882   USA


  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: