NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
Hi Daryl, > Annoying me again. Previously, I bugged you about slow pipes not > reporting what process it was: > > http://www.unidata.ucar.edu/support/help/MailArchives/ldm/msg04879.html > > Thanks for implementing this, hopefully others found it useful. > > Now, I am trying to figure out which of my buggy decoders is exiting > badly. As my logs are filling with this: > > Feb 04 16:21:36 mesonet pqact[1938] NOTE: child 2155 exited with status 1 > Feb 04 16:26:16 mesonet pqact[1938] NOTE: child 8102 exited with status 1 > Feb 04 16:35:39 mesonet pqact[1938] NOTE: child 18758 exited with status 1 > Feb 04 16:36:58 mesonet pqact[1938] NOTE: child 20265 exited with status 1 > > So I do the -USR2 to pqact, but the logs I get are not inuitive as to > which product going to which processor is actually erroring out. The > child PIDs are not included in the logs, unless I am missing something? > For example: > > Feb 04 14:57:41 mesonet pqact[32073] INFO: 115 20080204145112.042 > IDS|DDPLUS 119265941 SPCN46 CWAO 041446 > Feb 04 14:57:41 mesonet pqact[32073] INFO: pipe: dcmetr > -b 9 -m 72 -s /mesonet/TABLES/awos.stns -d logs/dcmetr_awos.log -a 0 > /mesonet/data/gempak/awos/YYMMDD_awos.gem > Feb 04 14:57:41 mesonet pqact[32073] INFO: pipe: dcmetr > -b 9 -m 72 -s /mesonet/TABLES/mesonet4.stns -d logs/dcmetr_meso1.log > -a 0 /mesonet/data/gempak/meso/YYMMDD_meso.gem > Feb 04 14:57:41 mesonet pqact[32073] INFO: pipe: dcmetr -b > 9 -m 72 -s /mesonet/TABLES/asos.stns -d logs/dcmetr_asos.log -a 0 > /mesonet/data/gempak/asos/YYMMDD_asos.gem > Feb 04 14:57:41 mesonet pqact[32073] NOTE: child 27014 exited with status > 1 > > > Looking at the source (at least trying to), I see a case where child > exiting with some status may not print out the process name. I tried to > diagnose how this happens, but only confused myself. > > Any comments on this? Because no command-line was printed by "pqact", the child process was either due to an EXEC entry in the "pqact" configuration-file or it was due to a PIPE entry and "pqact" closed the pipe because it needed a file-descriptor for a new process and nothing had been written to that pipe for the longest time (closing a pipe removes the associated entry from an internal list with the consequent loss of the command-line). Can you have your decoders write a "Starting up" message to the LDM log file? This would allow you to match-up the PID-s. > thanks! > daryl Regards, Steve Emmerson Ticket Details =================== Ticket ID: WDX-973084 Department: Support LDM Priority: Normal Status: On Hold