NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
All - seeing the issue with history files we thought we should alert you to a known "bug" with the history command and history file in RHEL that caused major slow down and eventual lock up of our system. It is a known problem in RHEL 2.1, 3, 4, 5 and 6. We only ran into it on RHEL 6.2 but the remedy was to add "set history=0" in our .cshrc We also added an alias h50 "set history=50" so that when you open an terminal window and type h50 it will start remembering your commands. Klunky yes, but at least our workstations don't lock up. I had to log on to the REHL customer portal and search for "42739" to get this printout - which I cut/paste into this email. I hope it translates well in your email viewer. See below - although not sure it will address your original issue. Pete ------------------------------------------- Currently Being Moderated A system slows down due to a .history file Article ID: 42739 - Created on: Oct 5, 2010 10:54 PM - Last Modified: Jan 23, 2011 9:09 PM Issue The data in the file ".history" becomes malformed and its file size gets larger and larger. Each command history should be recoded in the file with a timestamp line a command entry line which should end with EOL(End Of Line). Example of normal .history file: $ cat -n .history 1 #+1289787344 2 set A timestamp and a run command entry are not recorded by turns. Not a run command entry but a timestamp is recorded unexpectedly (see the line 4). Additionally, some entries are merged unexpectedly (see the line 6 and 7 below). Example of malformed .history file : $ cat -n .history 1 #+1289787344 2 test 3 #+1289787367 4 1289787366 5 #+1289787367 6 128978#+12897#+1289#+12897test 7 #+1289#+12897testls##+1289787401l12#+1289787402 8 st A system slows down because csh uses a lot of memory to read such a large .history file. Environment Red Hat Enterprise Linux 2.1, 3, 4, 5, and 6 Resolution This issue will be addressed in "Bug 648592 - .history file gets corrupted if several scripts run at once" There is a temporary workaround by disabling the history of csh Set the one of the following lines in ~/.cshrc or on the command line: unset savehist OR set savehist= Root Cause The main issue which should be fixed is that tcsh does not handle ~/.history file exclusively. The "merge" option causes possibility of a little more unexpected behaviour with warning mentioned in man page regarding the "-S" built-in command: ... history [-hTr] [n] history -S|-L|-M [filename] (+) history -c (+) ... With -S, the second form saves the history list to file- name. If the first word of the savehist shell variable is set to a number, at most that many lines are saved. If the second word of savehist is set to ?merge?, the history list is merged with the existing history file instead of replac- ing it (if there is one) and sorted by time stamp. (+) ____Merging is intended for an environment like the X Window System with several shells in simultaneous use. Currently it succeeds only when the shells quit nicely one after another.____ Additionally, both a value greater than 0 and "merge" option are set to savehist on csh and this is the default setting in Red Hat Enterprise Linux 5.4 or later: $ set | grep savehist savehist (1024 merge) From: Greg Stossmeister <gstoss@xxxxxxxx> To: gembud@xxxxxxxxxxxxxxxx Date: 04/30/2012 03:48 PM Subject: Re: [gembud] Generating NEXRAD radar imagery with gpmap_gf Sent by: gembud-bounces@xxxxxxxxxxxxxxxx Daryl, I noticed .history was bizarre last week - it had a ton of stuff in it that didn't look like the normal history stuff I was used to seeing. We deleted it this morning and I set my .cshrc to only save the last 40 commands. Greg On Apr 30, 2012, at 1:40 PM, daryl herzmann wrote: > > Offline... How large is your ~/.history file? > > daryl > > On Mon, 30 Apr 2012, Greg Stossmeister wrote: > >> Daryl, >> No each process creates a temporary working subdirectory to run in based on the process id. >> >> Greg >> >> On Apr 30, 2012, at 1:23 PM, daryl herzmann wrote: >> >>> Greg, >>> >>> Are all these processes running out of the same CWD (directory)? Try creating temp directories for each process and run the code from those directories. >>> >>> daryl >>> >>> On Mon, 30 Apr 2012, Greg Stossmeister wrote: >>> >>>> Daryl, >>>> I have several shell scripts that I'm running out of cron every 5 minutes. Each shell script runs 10 gpmap_gf processes in sequence. I've tried running 1 - 6 scripts at a time. This typically works fine during the day with one of the these scripts completing in about 2 minutes. As evening comes on they take longer and longer to run and it seems like that take more and more memory. From the "top" command the scripts often use 500-800 MB of memory but in the evening this seems to mushroom to > 3GB per script. The load on the machine at night from these scripts alone jumps to >30 and by morning the machine usually dies with out of memory errors even though I'm automatically killing the scripts when they run longer than 2 minutes. >>>> >>>> Looking at /var/log/debug.log I'm seeing segfault errors: >>>> >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2164]: segfault at 0 ip 000000392692ff7f sp 00007fff8d981128 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic abrt[2179]: saved core dump of pid 2164 (/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf) to /var/spool/abrt/ccpp-201 >>>> 2-04-26-17:07:25-2164.new/coredump (827392 bytes) >>>> Apr 26 17:07:25 sferic abrtd: Directory 'ccpp-2012-04-26-17:07:25-2164' creation detected >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2239]: segfault at 0 ip 000000392692ff7f sp 00007ffffd173658 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2242]: segfault at 0 ip 000000392692ff7f sp 00007fff8f4df6f8 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2247]: segfault at 0 ip 000000392692ff7f sp 00007fff73574d18 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2261]: segfault at 0 ip 000000392692ff7f sp 00007fff8bda1358 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2245]: segfault at 0 ip 000000392692ff7f sp 00007fff71495a28 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: Pid 2245(gpmap_gf) over core_pipe_limit >>>> Apr 26 17:07:25 sferic kernel: Skipping core dump >>>> Apr 26 17:07:25 sferic abrt[2260]: not dumping repeating crash in '/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf' >>>> Apr 26 17:07:25 sferic abrt[2279]: not dumping repeating crash in '/export/ldm/home/gempak/GEMPAK6.4.0/os/linux64/bin/gpmap_gf' >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2289]: segfault at 0 ip 000000392692ff7f sp 00007fffca7118a8 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2286]: segfault at 0 ip 000000392692ff7f sp 00007fffef00ac98 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: gpmap_gf[2303]: segfault at 0 ip 000000392692ff7f sp 00007fff92019618 error 4 in libc-2.12.so[3926800000+186000] >>>> Apr 26 17:07:25 sferic kernel: Pid 2303(gpmap_gf) over core_pipe_limit >>>> >>>> Greg >>>> >>>> On Apr 30, 2012, at 12:10 PM, daryl herzmann wrote: >>>> >>>>> On Mon, 30 Apr 2012, Greg Stossmeister wrote: >>>>> >>>>>> Does anyone generate a lot of individual NEXRAD level III products with gpmap_gf? I'm trying to generate real-time plots of NOQ Reflectivity and NOU Velocity from 30 radars in the midwest and its crashing my server after a few hours, even when I only run 3 plots at a time. I'm running GEMPAK6.4.0 on a RHEL 6 machine with 64 GB of memory. I'm wondering what I'm doing wrong and if someone has a better way of doing this. >>>>> >>>>> crashing your server, how? Exhausting memory? kernic panic? Are the processes not going away once running them? How are you running them? >>>>> >>>>> daryl >>>>> >>>>> -- >>>>> /** >>>>> * Daryl Herzmann >>>>> * Assistant Scientist -- Iowa Environmental Mesonet >>>>> * http://mesonet.agron.iastate.edu >>>>> */ >>>> _______________________________________________ gembud mailing list gembud@xxxxxxxxxxxxxxxx For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/ ----------------------------------------- The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.
gembud
archives: