[thredds] TDS as a big data platform

To: thredds@xxxxxxxxxxxxxxxx
Subject: [thredds] TDS as a big data platform
From: Yoshiyuki Kudo <y_kudo@xxxxxxxxxxxx>
Date: Fri, 19 Feb 2016 15:26:23 +0900

Hi,

I am in a project where bunch of EO data researchers will use some data access 
services for an attempt to create new data products out of the wealth of the 
data pool.  The data will be EO data (coverage data) in netCDF, some GBytes per 
data granule, and will amount to over 120TB, 0.3 million data files in total (1 
year worth of collection).  

I feel TDS or Hyrax can be a good candidate for this platform, but would like 
to hear your advice before further estimation of work and hardware purchase.  I 
very much appreciate your expertise on this.

1) I see some historical threads about how aggregation of large volumes of data 
can be problematic.  I will need to consider the aggregation as well, but is 
the 100TB+ aggregation possible ? Both technically and performance wise ?

2) Is there any HW restriction for the TDS set up I should have in mind before 
preparing the HW ?  Do I need to have a single disk drive (partition) for the 
100+TB data management in TDS ?
 
3) Could you share any success story you know of, about handling large volumes 
of data in a TDS ?

4) Any other recommendation or things I need to keep in mind ?

Thank you so much for your support.

Yoshi

Follow-Ups:
- Re: [thredds] TDS as a big data platform
  - From: Antonio S . Cofiño