NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #WDX-973084]: Exit status 1 of what? :)



Hi Daryl,

> Annoying me again.  Previously, I bugged you about slow pipes not
> reporting what process it was:
> 
> http://www.unidata.ucar.edu/support/help/MailArchives/ldm/msg04879.html
> 
> Thanks for implementing this, hopefully others found it useful.
> 
> Now, I am trying to figure out which of my buggy decoders is exiting
> badly.  As my logs are filling with this:
> 
> Feb 04 16:21:36 mesonet pqact[1938] NOTE: child 2155 exited with status 1
> Feb 04 16:26:16 mesonet pqact[1938] NOTE: child 8102 exited with status 1
> Feb 04 16:35:39 mesonet pqact[1938] NOTE: child 18758 exited with status 1
> Feb 04 16:36:58 mesonet pqact[1938] NOTE: child 20265 exited with status 1
> 
> So I do the -USR2 to pqact, but the logs I get are not inuitive as to
> which product going to which processor is actually erroring out.  The
> child PIDs are not included in the logs, unless I am missing something?
> For example:
> 
> Feb 04 14:57:41 mesonet pqact[32073] INFO:      115 20080204145112.042
> IDS|DDPLUS 119265941  SPCN46 CWAO 041446
> Feb 04 14:57:41 mesonet pqact[32073] INFO:                pipe: dcmetr
> -b  9 -m 72 -s /mesonet/TABLES/awos.stns  -d logs/dcmetr_awos.log -a 0
> /mesonet/data/gempak/awos/YYMMDD_awos.gem
> Feb 04 14:57:41 mesonet pqact[32073] INFO:                pipe: dcmetr
> -b 9 -m 72 -s /mesonet/TABLES/mesonet4.stns      -d logs/dcmetr_meso1.log
> -a 0        /mesonet/data/gempak/meso/YYMMDD_meso.gem
> Feb 04 14:57:41 mesonet pqact[32073] INFO:                pipe: dcmetr -b
> 9 -m 72 -s /mesonet/TABLES/asos.stns  -d logs/dcmetr_asos.log -a 0
> /mesonet/data/gempak/asos/YYMMDD_asos.gem
> Feb 04 14:57:41 mesonet pqact[32073] NOTE: child 27014 exited with status
> 1
> 
> 
> Looking at the source (at least trying to), I see a case where child
> exiting with some status may not print out the process name.  I tried to
> diagnose how this happens, but only confused myself.
> 
> Any comments on this?

Because no command-line was printed by "pqact", the child process was
either due to an EXEC entry in the "pqact" configuration-file or it was
due to a PIPE entry and "pqact" closed the pipe because it needed a
file-descriptor for a new process and nothing had been written to that
pipe for the longest time (closing a pipe removes the associated entry
from an internal list with the consequent loss of the command-line).

Can you have your decoders write a "Starting up" message to the LDM
log file?  This would allow you to match-up the PID-s.

> thanks!
> daryl

Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: WDX-973084
Department: Support LDM
Priority: Normal
Status: On Hold