Re: [thredds] TDS as a big data platform

To: Antonio S. Cofiño <cofinoa@xxxxxxxxx>
Subject: Re: [thredds] TDS as a big data platform
From: Guan Wang <gwang@xxxxxxx>
Date: Fri, 26 Feb 2016 10:13:16 -0500 (EST)

Hi Antonio,

Thank you for sharing! Do you mind also share the server config, cpu, ram etc. 
that runs TDS?

Guan

----- Original Message -----
From: "Antonio S. Cofiño" <cofinoa@xxxxxxxxx>
To: thredds@xxxxxxxxxxxxxxxx
Cc: "y kudo" <y_kudo@xxxxxxxxxxxx>
Sent: Friday, February 26, 2016 8:52:33 AM
Subject: Re: [thredds] TDS as a big data platform


Yoshi,

Below my expertise on TDS (v4.3 and v4.6)



El 19/02/2016 a las 7:26, Yoshiyuki Kudo escribió:
> Hi,
>
> I am in a project where bunch of EO data researchers will use some data 
> access services for an attempt to create new data products out of the wealth 
> of the data pool.  The data will be EO data (coverage data) in netCDF, some 
> GBytes per data granule, and will amount to over 120TB, 0.3 million data 
> files in total (1 year worth of collection).
>
> I feel TDS or Hyrax can be a good candidate for this platform, but would like 
> to hear your advice before further estimation of work and hardware purchase.  
> I very much appreciate your expertise on this.
>
> 1) I see some historical threads about how aggregation of large volumes of 
> data can be problematic.  I will need to consider the aggregation as well, 
> but is the 100TB+ aggregation possible ? Both technically and performance 
> wise ?

We have an operational service  which aggregate collections of datasets. 
One of the aggregations consist in 135k files in GRIB1 format and 13TB 
of data. Another collection is based on 300k+ files but 8TB on size. 
This collections are aggregated in just one NetCDF entity using a NCML, 
each one. The 100TB+ of aggregation will be possible, but the limit will 
be the performance because the amount of files.





>
> 2) Is there any HW restriction for the TDS set up I should have in mind 
> before preparing the HW ?  Do I need to have a single disk drive (partition) 
> for the 100+TB data management in TDS ?
No, you don't need to have just one partition. But In our case we have  
400TB of disk based in ZFS (OpenIndiana) using a pool of 150 desktop 
HDDs, using a configuration of raidz2 vdev (10+2 disks). For TDS 
services we are using a load-balanced configuration with TDS instances 
running in a cluster.
>   
> 3) Could you share any success story you know of, about handling large 
> volumes of data in a TDS ?
https://rd-alliance.org/sites/default/files/attachment/20150924_Day2_1330_End-userGatewayForClimateServicesAndDataInitiatives_Cofino.pdf


>
> 4) Any other recommendation or things I need to keep in mind ?
We considered, at the beginning, dynamic aggregation based on scan 
directory facilities provided by TDS, but at the end it didn't perform 
well, and what are we doing is generate static ncml aggregations.
>
> Thank you so much for your support.

Please feel free to ask.

Regards

Antonio

--
Antonio S. Cofiño
Grupo de Meteorología de Santander
Dep. de Matemática Aplicada y
         Ciencias de la Computación
Universidad de Cantabria
http://www.meteo.unican.es



> Yoshi
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/

Follow-Ups:
- Re: [thredds] TDS as a big data platform
  - From: Antonio S . Cofiño

References:
- Re: [thredds] TDS as a big data platform
  - From: Antonio S . Cofiño