NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Some things are coming back to me now. Back when Sun was being idiotic about Solaris x86 in 2002, I switched one LDM server over to RH 8, and immediately on the same box that ran Solaris with an uptime of 300 days I started having random lockups like this where you could not ssh or telnet into the machine. In that case all I could do was hard power it on/off. This happened about once aweek.
I never did figure out what it was that was causing it. There was nothing in the logs that showed anything. The only thing I could connect it to was that it always happened whne the box was under it's heaviest load. I switched the box back to Solaris a year later and the problem went away, so it definitely was not hardware. If the box is running an X-server, I also had problems with the Nvidia binary drivers and Red Hat that caused the same problem, except this was on workstations. X would lockup and the box would become unresponsive to telnet or ssh. That problem was solved by switching to the Xfree86 NV driver. You might also check that you have the latest patches for your NICs. A buggy NIC driver could do this. I have a Solaris box that uses a Broadcom driver (written by Broadcom). Under heavy load the NIC driver causes a hang or sometimes a kernel panic. That problem was resolved by using a Intel card instead with the built-in Solaris 10 driver. I would also double check on NFS as well. Robert
-----Original Message----- Cc: ldm-users@xxxxxxxxxxxxxxxx; gembud@xxxxxxxxxxxxxxxx Sent: 5/11/2005 10:02 AM During these periods ssh is completely unable to connect to the machine...I do log ps -eaf and free, although I think those have been written over since this most recent crash. The free command shows that most of the memory is used, however according to some google searches this is because at boot the kernel "takes" the memor and allocates it as necessary...ps didn't show me anything...I've checked all /var/logfiles and nothing jumps out there either.
gembud
archives: