NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Thanks Art, I've checked out the hardware and it seems fine...We don't have any NFS mounted, so that's not a problem...I've checked all system logs, none produce anything...I've done a little logging now and it appears that a gempak "gf" program that runs at about the same time as my cleanup script "runs away" such that it runs for several hours, taking up 99% of one ofour cps's, then CRASH!!!
On Fri, 6 May 2005, Arthur A. Person wrote:
Gabe, On Thu, 5 May 2005, Marcus Christie wrote:Gabe Langbauer wrote:Hello All, I'm running a Dell Xeon processor with RedHat enterprise linux v. 3 installed. This system is used only for retriving data from ldm (6.1.0), creating images with gempak (5.7.3) and displaying these images over the web via apache. (also 1 java script). Recently, the computer has been locking up. The symptoms are simply the computer becomes unreachable. Even locally, if I go to the computer, it does not respond. I am forced then to cycle the power in order to get the system to once again respond. Checking the system log files hasn't shown anything (at least to me) except that when this lockup occurs, the cron does not run (usually). If anyone has any ideas as to what may be occuring it would be GREATLY appreciated. --Gabe LangbauerIn case you haven't fixed the above problem yet, I'll start down another possible path for you...We have a Dell server which is now our main decoder/file serving system that used to do what you've described "every-so-often" (maybe once/month???). It doesn't do it now, however, and the bad news is, I don't know for sure what fixed it. However, I do know that the LDM was not the problem since I extensively profiled it and could actually find no cause for the freeze-up (including system resource problems or LDM problems). However, the one thing I did notice, was that when I rebooted another older machine running RH9 with which it shared mounted file systems (both ways), the reboot of the older system seemed to trigger an imminent (say within 24 hours) freeze-up of the new system. I can't explain this, I only know that there was a pretty strong correlation. After de-tangling these two systems (i.e. getting rid of all NFS mounts of the old system onto the new), I don't believe I've seen this problem again.So, if you still haven't figured this out, you might try creating some "ps -eaf" and "free" (or other) logs from a script and check them after your next freeze-up. If you can't find any obvious abnormallities, analyze your system for NFS tangledness and try some detangling and see if that helps.Of course, there could always be a hardware problem too (which is what I thought our problems were initially)... have you checked your system "/var/log/messages" file for errors? You might also try running vendor hardware diagnostics on the system, although I've rarely found these to be very useful.Art. Arthur A. Person Research Assistant, System Administrator Penn State Department of Meteorology email: person@xxxxxxxxxxxxx, phone: 814-863-1563
gembud
archives: