NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
I'm trying to come up with a fool-proof way of knowing whether the LDM product queue is intact after a system reboots. The ldmadmin perl script includes a function called "queuecheck" which runs pqcat to read through the queue and returns a 1 status if it detects a problem. This is great, but doesn't always seem to work. For example, I currently have a corrupted queue because of a sudden reboot. My ldm log says: Sep 23 16:17:16 vis rpc.ldmd[17335]: Starting Up (built: Jun 21 2002 10:16:41) Sep 23 16:17:16 vis ofour[17337]: run_requester: Starting Up: ofour.rap.ucar.edu Sep 23 16:17:16 vis pqact[17336]: Starting Up Sep 23 16:17:16 vis front[17338]: run_requester: Starting Up: front.rap.ucar.edu Sep 23 16:17:18 vis localhost[17346]: Connection from localhost Sep 23 16:17:18 vis localhost[17346]: Connection reset by peer Sep 23 16:17:18 vis localhost[17346]: Exiting Sep 23 16:17:21 vis front[17338]: run_requester: 20020923151716.313 TS_ENDT {{WMO, ".*"}} Sep 23 16:17:21 vis ofour[17337]: run_requester: 20020923151716.239 TS_ENDT {{WMO, ".*"}} Sep 23 16:17:21 vis ofour[17337]: FEEDME(ofour.rap.ucar.edu): OK Sep 23 16:17:22 vis front[17338]: FEEDME(front.rap.ucar.edu): OK Sep 23 16:17:22 vis front[17338]: assertion "rl->nelems + rl->nfree + rl->nempty == rl->nalloc" failed: file "pq.c", line 1993 Sep 23 16:17:23 vis ofour[17337]: assertion "rl->nelems + rl->nfree + rl->nempty == rl->nalloc" failed: file "pq.c", line 1993 Sep 23 16:17:29 vis rpc.ldmd[17335]: child 17337 terminated by signal 6 Sep 23 16:17:29 vis rpc.ldmd[17335]: Killing (SIGINT) process group Sep 23 16:17:29 vis rpc.ldmd[17335]: Interrupt Sep 23 16:17:29 vis rpc.ldmd[17335]: Exiting Sep 23 16:17:29 vis pqact[17336]: Interrupt Sep 23 16:17:29 vis pqact[17336]: Exiting Sep 23 16:17:29 vis rpc.ldmd[17335]: Terminating process group Sep 23 16:17:29 vis rpc.ldmd[17335]: child 17338 terminated by signal 6 Sep 23 16:17:29 vis rpc.ldmd[17335]: Killing (SIGINT) process group Clearly, there is a problem with the queue. But when I run ldmadmin queuecheck, I get a 0 exit status indicating that the queue is OK. I could use brute force and always create a new queue, but I'd like to be able to determine if it is corrupted or not. Does anybody know if this is *supposed* to work? I am running ldm 5.1.4 on Solaris 7 (x86-intel). -- Jim Cowie NCAR/RAP cowie@xxxxxxxx 303-497-2831
ldm-users
archives: