problem with ldmadmin queuecheck

To: ldm-users@xxxxxxxxxxxxxxxx
Subject: problem with ldmadmin queuecheck
From: Jim Cowie <cowie@xxxxxxxx>
Date: Mon, 23 Sep 2002 10:34:05 -0600


I'm trying to come up with a fool-proof way of knowing whether the
LDM product queue is intact after a system reboots. The ldmadmin
perl script includes a function called "queuecheck" which runs
pqcat to read through the queue and returns a 1 status if it detects
a problem. This is great, but doesn't always seem to work.

For example, I currently have a corrupted queue because of a sudden
reboot. My ldm log says:

Sep 23 16:17:16 vis rpc.ldmd[17335]: Starting Up (built: Jun 21 2002 10:16:41)
Sep 23 16:17:16 vis ofour[17337]: run_requester: Starting Up: ofour.rap.ucar.edu
Sep 23 16:17:16 vis pqact[17336]: Starting Up
Sep 23 16:17:16 vis front[17338]: run_requester: Starting Up: front.rap.ucar.edu
Sep 23 16:17:18 vis localhost[17346]: Connection from localhost
Sep 23 16:17:18 vis localhost[17346]: Connection reset by peer
Sep 23 16:17:18 vis localhost[17346]: Exiting
Sep 23 16:17:21 vis front[17338]: run_requester: 20020923151716.313 TS_ENDT
{{WMO,  ".*"}}
Sep 23 16:17:21 vis ofour[17337]: run_requester: 20020923151716.239 TS_ENDT
{{WMO,  ".*"}}
Sep 23 16:17:21 vis ofour[17337]: FEEDME(ofour.rap.ucar.edu): OK
Sep 23 16:17:22 vis front[17338]: FEEDME(front.rap.ucar.edu): OK
Sep 23 16:17:22 vis front[17338]: assertion "rl->nelems + rl->nfree + rl->nempty
== rl->nalloc" failed: file "pq.c", line 1993
Sep 23 16:17:23 vis ofour[17337]: assertion "rl->nelems + rl->nfree + rl->nempty
== rl->nalloc" failed: file "pq.c", line 1993
Sep 23 16:17:29 vis rpc.ldmd[17335]: child 17337 terminated by signal 6
Sep 23 16:17:29 vis rpc.ldmd[17335]: Killing (SIGINT) process group
Sep 23 16:17:29 vis rpc.ldmd[17335]: Interrupt
Sep 23 16:17:29 vis rpc.ldmd[17335]: Exiting
Sep 23 16:17:29 vis pqact[17336]: Interrupt
Sep 23 16:17:29 vis pqact[17336]: Exiting
Sep 23 16:17:29 vis rpc.ldmd[17335]: Terminating process group
Sep 23 16:17:29 vis rpc.ldmd[17335]: child 17338 terminated by signal 6
Sep 23 16:17:29 vis rpc.ldmd[17335]: Killing (SIGINT) process group

Clearly, there is a problem with the queue. But when I run ldmadmin
queuecheck, I get a 0 exit status indicating that the queue is OK.
I could use brute force and always create a new queue, but I'd like
to be able to determine if it is corrupted or not.

Does anybody know if this is *supposed* to work? I am running ldm 5.1.4
on Solaris 7 (x86-intel).



-- 
Jim Cowie
NCAR/RAP
cowie@xxxxxxxx
303-497-2831

Follow-Ups:
- Re: problem with ldmadmin queuecheck
  - From: Anne Wilson

2002 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the ldm-users archives: