NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

Re: [thredds] TDS as a big data platform

  • To: Antonio S. Cofiño <cofinoa@xxxxxxxxx>
  • Subject: Re: [thredds] TDS as a big data platform
  • From: Guan Wang <gwang@xxxxxxx>
  • Date: Fri, 26 Feb 2016 10:13:16 -0500 (EST)
Hi Antonio,

Thank you for sharing! Do you mind also share the server config, cpu, ram etc. 
that runs TDS?

Guan

----- Original Message -----
From: "Antonio S. Cofiño" <cofinoa@xxxxxxxxx>
To: thredds@xxxxxxxxxxxxxxxx
Cc: "y kudo" <y_kudo@xxxxxxxxxxxx>
Sent: Friday, February 26, 2016 8:52:33 AM
Subject: Re: [thredds] TDS as a big data platform


Yoshi,

Below my expertise on TDS (v4.3 and v4.6)



El 19/02/2016 a las 7:26, Yoshiyuki Kudo escribió:
> Hi,
>
> I am in a project where bunch of EO data researchers will use some data 
> access services for an attempt to create new data products out of the wealth 
> of the data pool.  The data will be EO data (coverage data) in netCDF, some 
> GBytes per data granule, and will amount to over 120TB, 0.3 million data 
> files in total (1 year worth of collection).
>
> I feel TDS or Hyrax can be a good candidate for this platform, but would like 
> to hear your advice before further estimation of work and hardware purchase.  
> I very much appreciate your expertise on this.
>
> 1) I see some historical threads about how aggregation of large volumes of 
> data can be problematic.  I will need to consider the aggregation as well, 
> but is the 100TB+ aggregation possible ? Both technically and performance 
> wise ?

We have an operational service  which aggregate collections of datasets. 
One of the aggregations consist in 135k files in GRIB1 format and 13TB 
of data. Another collection is based on 300k+ files but 8TB on size. 
This collections are aggregated in just one NetCDF entity using a NCML, 
each one. The 100TB+ of aggregation will be possible, but the limit will 
be the performance because the amount of files.





>
> 2) Is there any HW restriction for the TDS set up I should have in mind 
> before preparing the HW ?  Do I need to have a single disk drive (partition) 
> for the 100+TB data management in TDS ?
No, you don't need to have just one partition. But In our case we have  
400TB of disk based in ZFS (OpenIndiana) using a pool of 150 desktop 
HDDs, using a configuration of raidz2 vdev (10+2 disks). For TDS 
services we are using a load-balanced configuration with TDS instances 
running in a cluster.
>   
> 3) Could you share any success story you know of, about handling large 
> volumes of data in a TDS ?
https://rd-alliance.org/sites/default/files/attachment/20150924_Day2_1330_End-userGatewayForClimateServicesAndDataInitiatives_Cofino.pdf


>
> 4) Any other recommendation or things I need to keep in mind ?
We considered, at the beginning, dynamic aggregation based on scan 
directory facilities provided by TDS, but at the end it didn't perform 
well, and what are we doing is generate static ncml aggregations.
>
> Thank you so much for your support.

Please feel free to ask.

Regards

Antonio

--
Antonio S. Cofiño
Grupo de Meteorología de Santander
Dep. de Matemática Aplicada y
         Ciencias de la Computación
Universidad de Cantabria
http://www.meteo.unican.es



> Yoshi
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/



  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: