NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.

problem with ldmadmin queuecheck


I'm trying to come up with a fool-proof way of knowing whether the
LDM product queue is intact after a system reboots. The ldmadmin
perl script includes a function called "queuecheck" which runs
pqcat to read through the queue and returns a 1 status if it detects
a problem. This is great, but doesn't always seem to work.

For example, I currently have a corrupted queue because of a sudden
reboot. My ldm log says:

Sep 23 16:17:16 vis rpc.ldmd[17335]: Starting Up (built: Jun 21 2002 10:16:41)
Sep 23 16:17:16 vis ofour[17337]: run_requester: Starting Up: ofour.rap.ucar.edu
Sep 23 16:17:16 vis pqact[17336]: Starting Up
Sep 23 16:17:16 vis front[17338]: run_requester: Starting Up: front.rap.ucar.edu
Sep 23 16:17:18 vis localhost[17346]: Connection from localhost
Sep 23 16:17:18 vis localhost[17346]: Connection reset by peer
Sep 23 16:17:18 vis localhost[17346]: Exiting
Sep 23 16:17:21 vis front[17338]: run_requester: 20020923151716.313 TS_ENDT
{{WMO,  ".*"}}
Sep 23 16:17:21 vis ofour[17337]: run_requester: 20020923151716.239 TS_ENDT
{{WMO,  ".*"}}
Sep 23 16:17:21 vis ofour[17337]: FEEDME(ofour.rap.ucar.edu): OK
Sep 23 16:17:22 vis front[17338]: FEEDME(front.rap.ucar.edu): OK
Sep 23 16:17:22 vis front[17338]: assertion "rl->nelems + rl->nfree + rl->nempty
== rl->nalloc" failed: file "pq.c", line 1993
Sep 23 16:17:23 vis ofour[17337]: assertion "rl->nelems + rl->nfree + rl->nempty
== rl->nalloc" failed: file "pq.c", line 1993
Sep 23 16:17:29 vis rpc.ldmd[17335]: child 17337 terminated by signal 6
Sep 23 16:17:29 vis rpc.ldmd[17335]: Killing (SIGINT) process group
Sep 23 16:17:29 vis rpc.ldmd[17335]: Interrupt
Sep 23 16:17:29 vis rpc.ldmd[17335]: Exiting
Sep 23 16:17:29 vis pqact[17336]: Interrupt
Sep 23 16:17:29 vis pqact[17336]: Exiting
Sep 23 16:17:29 vis rpc.ldmd[17335]: Terminating process group
Sep 23 16:17:29 vis rpc.ldmd[17335]: child 17338 terminated by signal 6
Sep 23 16:17:29 vis rpc.ldmd[17335]: Killing (SIGINT) process group

Clearly, there is a problem with the queue. But when I run ldmadmin
queuecheck, I get a 0 exit status indicating that the queue is OK.
I could use brute force and always create a new queue, but I'd like
to be able to determine if it is corrupted or not.

Does anybody know if this is *supposed* to work? I am running ldm 5.1.4
on Solaris 7 (x86-intel).



-- 
Jim Cowie
NCAR/RAP
cowie@xxxxxxxx
303-497-2831

  • 2002 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the ldm-users archives: