Re: [thredds] improved performances through GPFS

To: Robert Casey <rob@xxxxxxxxxxxxxxxxxxx>, John Caron <jcaron1129@xxxxxxxxx>
Subject: Re: [thredds] improved performances through GPFS
From: Thomas LOUBRIEU <thomas.loubrieu@xxxxxxxxxx>
Date: Wed, 2 Mar 2016 09:19:18 +0100

Hi John, Robert,

Thanks very much for your replies.

From what I understand now, TDS would take advantage of GPFS tooptimize request prcessing between several users (in different threadsreading files in parallel) if the GPFS is parameterize to work likeHADOOP (using IBM's FPO). For a single user request the performanceswould be roughly equivalent (if files are read sequentially by TDS).

We can test this. We'll let you know.

In addition, we'll investigate to read a long list of netcdf files inparallel in different threads (5, 10, more ?) and see. We can do it instandalone benchmark or in one of our application server (oceanotron).We'll let you know about this as well.


Thomas





On 03/01/2016 06:36 PM, Robert Casey wrote:

Hi Thomas and John-
From what I have been able to gather, GPFS is a parallel cluster thatbehaves according to POSIX standards and looks to an OS just like anyother file mount. You should be able to use all of the same file I/Ocommands you already use. Not aware of any specialized enhancements.All of the I/O libraries for GPFS appear to be very low level. It'soptimized for fast parallel reads and writes and parallelizes themetadata servers to each disk node as well, which is much more capablethan even Parallel NFS. Looks like it is a good alternative to usingHDFS based on this article.
http://www.datanami.com/2014/02/18/what_can_gpfs_on_hadoop_do_for_you_/
As they suggest, you can get Hadoop like behavior on GPFS by usingIBM's File Placement Optimization (FPO), mapping compute cycles toeach of the data nodes in parallel.
-Rob
On Mar 1, 2016, at 8:57 AM, John Caron <jcaron1129@xxxxxxxxx<mailto:jcaron1129@xxxxxxxxx>> wrote:
Hi Thomas:
TDS uses standard Java interfaces to the filesystem, so it wouldnt betaking advantage of anything that needed special commands. Both thenetcdf library and TDS are thread-safe, so can scale up to largenumber of simultaneous requests, so it seems likely that a clusteredTomcat environment would work well.
Perhaps by distributing data correctly over data nodes, significantimprovements might be possible. So much depends on access patterns,so a good way to proceed would be to create a synthentic load (egscript a bunch of requests to the TDS) that mimics what you expectusers to need, and measure performance as you modify your system.
I dont know enough about GPFS to know what features could be used togo beyond what you get from posix API. Anyone else?
John
On Thu, Feb 25, 2016 at 2:27 AM, Thomas LOUBRIEU<thomas.loubrieu@xxxxxxxxxx <mailto:thomas.loubrieu@xxxxxxxxxx>> wrote:
    Dear all,

    In our data center, the new high-performance clustered file
    system we're going to use is GPFS (General Parallel File System).
    I am wondering is java-netcdf library or thredds data server can
    take benefit of this high performance file system if the netcdf
    files are stored on it ?

    Are you aware of work being done or systems working with GPFS or
    otherwise on similar high performance systems (HDFS, moosefs,
    ...). I am definitely not an expert and any information regarding
    reading netcdf in java on these clustered file system (preferably
    GPFS) would help us very much.

    Thanks,

    Thomas

    _______________________________________________
    thredds mailing list
    thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx>
    For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:http://www.unidata.ucar.edu/mailing_lists/
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/

References:
- [thredds] improved performances through GPFS
  - From: Thomas LOUBRIEU
- Re: [thredds] improved performances through GPFS
  - From: John Caron
- Re: [thredds] improved performances through GPFS
  - From: Robert Casey