NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LDM #DCN-100393]: Writer-Counter Error



Robert,

> I have encountered 5 or so instances in the past several years
> where I have attempted to manually restart LDM and received the "The
> writer-counter of the product-queue isn't zero..." message, which left
> LDM is a stopped state.  I always resolved the situation by rebuilding
> the queue.  In any case, I am somewhat hesitant to restart LDM during
> times when I am "pqinsert-ing" large files into the queue (for instance
> GRIB files during the model cycles) as I feel that would leave the queue
> most vulnerable.  That said, I realized recently that the 'ldmadmin check'
> (which I run each hour) will induce an automatic restart if it needs
> to reconcile the queue (in my case I have a static 4G queue size and
> choose to decrease max latency).  Getting to my question... are there
> any safegaurds built into the 'ldmadmin check' that might prevent the
> aforementioned error from occurring if it needs to restart the service?
> The last thing I would want is for the LDM service to stop during a
> self-induced restart.  If there is no guarantee the service will always
> restart, is it better to set reconciliation to "do nothing" and manually
> reconcile the queue's max latency?  Mind you I have never had such a
> auto-restart ever fail to restart, but I have had manual restarts result
> in the writer-counter error.

There are safeguards to ensure that the LDM product-queue doesn't get 
corrupted. For example, the product-queue library blocks most signals 
(including SIGTERM) while the queue is being accessed. That being said, there 
is no guarantee that the LDM code is bug free.

I have no qualms having an active reconciliation mode if the product-queue is 
close to its equilibrium size. The only problems I've seen are when the queue 
is far too small for the reconciliation algorithm to make a good guess.

If the LDM doesn't restart after a reconciliation, then you likely have bigger 
problems (disk partition full, for example).

> Best Regards,
> Bob
Regards,
Steve Emmerson

Ticket Details
===================
Ticket ID: DCN-100393
Department: Support LDM
Priority: Normal
Status: Closed