NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Robert -- Thanks for your specifications. We're planning an upgrade for our EDEX server with a single 1.9TB SAS "Mixed Use" SSD, which uses MLC NAND flash technology and has a reliability/endurance rating of 3 DWPD (Drive Writes Per Day). Unfortunately, our server doesn't support an NVMe interface. The atop utility reports that our current 10K SAS drive reaches 100% busy even when no CAVE clients are running. In general, the highest disk occupation percentage varies among the ldmd, java, and httpd processes (all mostly in write operations). I'm looking forward to (hopefully) being able to report here improved drive performance with the new SSD! Michael -- As a workaround while we wait for this new drive to arrive, we attempted to have eight students simultaneously connect to edex-cloud.unidata.ucar.edu for the first time today, April 24, 2019, between about 14:05-14:20Z (I'd removed all of their ~/caveData directories prior to today so everyone was starting from scratch). It took an excruciatingly long time to get past the initial splash screen, on the order of several minutes. CAVE eventually said it was "not responding", and asked if we wanted to force quit or wait. Students ended up needing to work in two groups in order to use CAVE successfully. Once CAVE actually launched, loading data was relatively responsive. I didn't have an opportunity to try turning off data caching. It may be helpful, when looking at log files on the cloud EDEX server, that the names of the computers that students used today begin with l-dl-asac315 immediately appended by two additional numbers. -Jason _________________________________________ Jason N. T. Kaiser Atmospheric Sciences Data Systems Administrator Northern Vermont University-Lyndon http://atmos.NorthernVermont.edu -----Original Message----- From: Haley, Robert E <haley787@xxxxxxxx> Sent: Monday, April 8, 2019 11:55 AM To: Kaiser, Jason N. <jason.kaiser@xxxxxxxxxxxxxxxxxxx>; Michael James <mjames@xxxxxxxx> Cc: awips2-users@xxxxxxxxxxxxxxxx Subject: RE: [EXTERNAL] Re: [awips2-users] EDEX to CAVE latency with multiple simultaneous users Hello Jason, Currently we're running two Samsung 970 EVO NVMe M.2 1TB solid state drives in RAID 1 on the PCIe card. We wanted to try a mainstream, off-the-shelf drive to see how it performed before looking into something fancier, and didn't really compare the NAND architecture of different drives. The sales pitch for V-NAND seemed good enough. According to the RAID management utility, we're writing 1.38 terabytes of data per day, and the 970 EVO has write endurance up to 600 terabytes. So we're expecting only a year of service before the drives have to be replaced. At the very least I'd recommend the 970 Pro, which has twice the write endurance (1,200 terabytes) for "only" a 50% price increase. At the enterprise level there are "write intensive" SSDs with ten times the write endurance, but they start at more than $4,000. That's a pretty tough bill to swallow. We figure replacing the drives every one or two years, in addition to saving money, gives us the opportunity to replace old drives with higher performance, higher capacity options as they come out, improving the capability of our AWIPS 2 server over time. Robert Haley Weather Systems Administrator Applied Aviation Sciences, College of Aviation 600. S. Clyde Morris Blvd. Daytona Beach, FL 32114 386.323.8033 haley787@xxxxxxxx Embry-Riddle Aeronautical University Florida | Arizona | Worldwide -----Original Message----- From: Kaiser, Jason N. <jason.kaiser@xxxxxxxxxxxxxxxxxxx> Sent: Friday, April 5, 2019 2:50 PM To: Haley, Robert E <haley787@xxxxxxxx>; Michael James <mjames@xxxxxxxx> Cc: awips2-users@xxxxxxxxxxxxxxxx Subject: RE: [EXTERNAL] Re: [awips2-users] EDEX to CAVE latency with multiple simultaneous users Robert, Thank you for your informative experience. Out of curiosity, would you mind sharing what the brand and model number is for NVMe SSDs you use? With EDEX constantly performing significant amounts of disk writing, I've read that for SSDs, the underlying NAND flash type may be an important consideration when determining long-term SSD reliability/endurance (i.e. Drive Writes Per Day). Jason N. T. Kaiser Atmospheric Sciences Data Systems Administrator Northern Vermont University-Lyndon -----Original Message----- From: Haley, Robert E <haley787@xxxxxxxx> Sent: Thursday, April 4, 2019 1:46 PM To: Kaiser, Jason N. <jason.kaiser@xxxxxxxxxxxxxxxxxxx>; Michael James <mjames@xxxxxxxx> Cc: awips2-users@xxxxxxxxxxxxxxxx Subject: RE: [EXTERNAL] Re: [awips2-users] EDEX to CAVE latency with multiple simultaneous users Jason, We experienced an issue similar to what you're describing and for us the culprit was insufficient disk I/O on the EDEX server, even with an array of eight 10K RPM 12G SAS hard disks. When a class started launching CAVE and loading data not only would their clients slow down (even menus would take time to populate), but we also saw data processing latency on EDEX start climbing until the class was done loading their initial data sets. In a few cases the EDEX server could not catch up with processing and we had to stop the LDM and give EDEX a chance to clear the backlog. Monitoring with top we saw IO-waits typically between 5% and 10%, with instances as high as 20% It's worth noting we originally had possibly the most inefficient disk set up imaginable: EDEX was running on a VM and virtual storage, so the hypervisor was dealing with two layers of file systems on a RAID 5 array. That was a lot of extra work... We replaced the hard disk array with a PCIe RAID card with two NVMe SSDs, directly attached the SSD array to the VM, and the difference was MIND BLOWING. IO-wait stays below 1% and data processing latency messages have disappeared entirely, even when a class of 30 students are using CAVE. We even saw CPU usage drop significantly, probably because very little time is being wasted waiting for read/write ops now. Robert Haley Weather Systems Administrator Applied Aviation Sciences, College of Aviation 600. S. Clyde Morris Blvd. Daytona Beach, FL 32114 386.323.8033 haley787@xxxxxxxx Embry-Riddle Aeronautical University Florida | Arizona | Worldwide -----Original Message----- From: awips2-users-bounces@xxxxxxxxxxxxxxxx <awips2-users-bounces@xxxxxxxxxxxxxxxx> On Behalf Of Kaiser, Jason N. Sent: Wednesday, April 3, 2019 12:36 PM To: Michael James <mjames@xxxxxxxx> Cc: awips2-users@xxxxxxxxxxxxxxxx Subject: [EXTERNAL] Re: [awips2-users] EDEX to CAVE latency with multiple simultaneous users Hi Michael, /awips2/cave/ is locally mounted, on each SSD. Only the home directories are NFS-mounted. Multiple sessions of CAVE are run as different users (i.e. students are each logged in to Linux with their own user account), meaning that you’re correct, no two users should be reading/writing to the same ~/caveData directory at the same time. I will try turning off data cacheing and see if that alleviates the problem. -Jason From: Michael James <mjames@xxxxxxxx> Sent: Wednesday, April 3, 2019 10:52 AM To: Kaiser, Jason N. <jason.kaiser@xxxxxxxxxxxxxxxxxxx> Cc: awips2-users@xxxxxxxxxxxxxxxx Subject: Re: [awips2-users] EDEX to CAVE latency with multiple simultaneous users Hi Jason, I don't believe that CAVE using an NFS-mounted user home directory should result in the performance issues you are experiencing, but I wonder if multiple users running the same CAVE executable over NFS could cause this... is that how the application is being used? (meaning /awips2/cave/ is on an NFS mount and each users is running the app from that mount?). In our classrooms we have seen no issues with multiple CAVE clients connecting to a single server and I have not seen network latency caused by multiple clients connecting at the same time. Can we confirm that the multiple session of CAVE are run as different users, meaning no two users would be reading/writing the same ~/caveData directory at the same time? Perhaps turning off data cacheing (CAVE > Preferences > Cache) would reduce the latency to an acceptable level? _______________________________________________ NOTE: All exchanges posted to Unidata maintained email lists are recorded in the Unidata inquiry tracking system and made publicly available through the web. Users who post to any of the lists we maintain are reminded to remove any personal information that they do not want to be made public. awips2-users mailing list awips2-users@xxxxxxxxxxxxxxxx For list information, to unsubscribe, or change your membership options, visit: http://www.unidata.ucar.edu/mailing_lists/
awips2-users
archives: