NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.

To learn about what's going on, see About the Archive Site.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20010531: ldm 5.1.3 with RH 7.1 thrashing



Hi Art,

"Arthur A. Person" wrote:
> 
> On Thu, 31 May 2001, anne wrote:
> 
> 
> Actually, it's just a Pentium III 833mhz with embedded SCSI and NIC.  It's
> fairly vanilla as PC's go...  I just loaded RedHat on it with the default
> full install and started running it (security hardened, of course).  I
> think any similar Pentium III PC you might have should behave similarly
> unless SCSI was an issue.  So, I would be running 32-bit, right?  I'm not
> familiar with 32 vs. 64 bit installs.
> 

Yes, you would be running a 32bit version.  But this makes things
easier.

> > We can build 64bit versions of the LDM on our SPARCv9 or IRIX64
> > machines.  If I here from you that you're running a 64bit version, I
> > will do this and request WSI data and see what happens.
> 
> It does seem that when I kill of the wsi rpc's that the system becomes
> more responsive, but it still thrashes.  I just stopped and restarted the
> ldm without rebooting and it actually worked, and the data seem to be
> slowly catching up, except NEXRAD seems to be lagging still.  I also made
> a new queue of 600mb to hopefully prevent the problem for overnight.
> 

When you say "it still thrashes", do you mean that products aren't being
received in a timely manner?  Right now products on ldm.meteo appear to
be arriving pretty quickly.  And, 'top' is showing a low load average,
the machine appears to be responsive, and there's a reasonable number of
rpc.ldmds...  Is this all with your 600Mb queue?



> I guess I'm not sure yet where to point the finger at this problem...
> maybe it's not the wsi connection, maybe the wsi connection is a symptom
> of slowness and it times out and reconnects.
> 

But, it shouldn't leave processes lying around.   I don't yet know where
to point the finger either..

At least a few sites are running 7.1 without any apparent problems.  I
know at least one site to ask - Gilbert's running 7.1 and I think he's
also getting data from WSI, but I'm not positive...

> I think maybe I'd better run this with a small (600m) queue for now and
> see if the problem recurs since I leave for vacation next Friday and I
> don't want to leave an unstable system behind.  Will you do any testing on
> this or will we wait until I return mid-month?
> 
>                                   Art.

Let me know how it goes with your 600Mb queue - I'll be really
interested to know if that made a difference. 

I will try to do some testing.  I will ask WSI to feed our RH7.1 pc for
a while for debugging purposes.  I don't know if they'll agree or
not...  For that matter, I don't know how quickly they will even
respond.  

I will also be away for a week starting next Wednesday.  I hope you can
find a way to get by - this looks like it may take a while.   If I can't
duplicate the problem...  it could take a LONG while.

I'll keep you posted.

Anne 
-- 
***************************************************
Anne Wilson                     UCAR Unidata Program            
address@hidden                 P.O. Box 3000
                                  Boulder, CO  80307
----------------------------------------------------
Unidata WWW server       http://www.unidata.ucar.edu/
****************************************************