NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi John, Robert, Thanks very much for your replies.From what I understand now, TDS would take advantage of GPFS to optimize request prcessing between several users (in different threads reading files in parallel) if the GPFS is parameterize to work like HADOOP (using IBM's FPO). For a single user request the performances would be roughly equivalent (if files are read sequentially by TDS).
We can test this. We'll let you know.In addition, we'll investigate to read a long list of netcdf files in parallel in different threads (5, 10, more ?) and see. We can do it in standalone benchmark or in one of our application server (oceanotron). We'll let you know about this as well.
Thomas On 03/01/2016 06:36 PM, Robert Casey wrote:
Hi Thomas and John-From what I have been able to gather, GPFS is a parallel cluster that behaves according to POSIX standards and looks to an OS just like any other file mount. You should be able to use all of the same file I/O commands you already use. Not aware of any specialized enhancements. All of the I/O libraries for GPFS appear to be very low level. It's optimized for fast parallel reads and writes and parallelizes the metadata servers to each disk node as well, which is much more capable than even Parallel NFS. Looks like it is a good alternative to using HDFS based on this article.http://www.datanami.com/2014/02/18/what_can_gpfs_on_hadoop_do_for_you_/As they suggest, you can get Hadoop like behavior on GPFS by using IBM's File Placement Optimization (FPO), mapping compute cycles to each of the data nodes in parallel.-RobOn Mar 1, 2016, at 8:57 AM, John Caron <jcaron1129@xxxxxxxxx <mailto:jcaron1129@xxxxxxxxx>> wrote:Hi Thomas:TDS uses standard Java interfaces to the filesystem, so it wouldnt be taking advantage of anything that needed special commands. Both the netcdf library and TDS are thread-safe, so can scale up to large number of simultaneous requests, so it seems likely that a clustered Tomcat environment would work well.Perhaps by distributing data correctly over data nodes, significant improvements might be possible. So much depends on access patterns, so a good way to proceed would be to create a synthentic load (eg script a bunch of requests to the TDS) that mimics what you expect users to need, and measure performance as you modify your system.I dont know enough about GPFS to know what features could be used to go beyond what you get from posix API. Anyone else?JohnOn Thu, Feb 25, 2016 at 2:27 AM, Thomas LOUBRIEU <thomas.loubrieu@xxxxxxxxxx <mailto:thomas.loubrieu@xxxxxxxxxx>> wrote:Dear all, In our data center, the new high-performance clustered file system we're going to use is GPFS (General Parallel File System). I am wondering is java-netcdf library or thredds data server can take benefit of this high performance file system if the netcdf files are stored on it ? Are you aware of work being done or systems working with GPFS or otherwise on similar high performance systems (HDFS, moosefs, ...). I am definitely not an expert and any information regarding reading netcdf in java on these clustered file system (preferably GPFS) would help us very much. Thanks, Thomas _______________________________________________ thredds mailing list thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx> For list information or to unsubscribe, visit:http://www.unidata.ucar.edu/mailing_lists/_______________________________________________ thredds mailing list thredds@xxxxxxxxxxxxxxxx <mailto:thredds@xxxxxxxxxxxxxxxx>For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/_______________________________________________ thredds mailing list thredds@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/
thredds
archives: