NOTICE: This version of the NSF Unidata web site (archive.unidata.ucar.edu) is no longer being updated.
Current content can be found at unidata.ucar.edu.
To learn about what's going on, see About the Archive Site.
On 06/14/2011 03:29 PM, Gilbert Sebenste wrote: > On Tue, 14 Jun 2011, Jeff Lake wrote: > >> I have been using the plotMetrics of LDM for a few months now ... >> http://ldm01.michiganwxsystem.net/vnstat/index.php >> I'm a bit lost as to what it's telling me.. >> Is my machine healthy?? >> Is there any place I can dummy these up? > > "Dummy these up"? Hmmmm. Not sure what you mean by that..."dummy up" > means to "shut up", and I don't think you mean that. Anyway... > Agreed...not sure what you mean by dummy... > I was waiting for an explanation as well from UNIDATA, but right now Steve > Emmerson is real busy (so busy he didn't even send out the announcement > from Friday that LDM 6.9.8 is available, and fixes some significant bugs > for Solaris/Redhat RHEL/CentOS users...but the announcement is available > on UNIDATA's web site at http://www.unidata.ucar.edu/software/ldm . > Go get you some!). > Good to know about this...I'll have to take a look to see whats new. > Anyhoo, I'd like to know more about the various parameters as well. Load > average is obvious, and as long as the incoming data amount stays > roughly the same every day, things look good...but beyond that...I don't > know. > I'll just go chart by chart. Number of Bytes Use in Queue vs. Time Umm...the number of bytes used in the queue. Basically this is kind of deceiving. It should always be very very close to full. I can tell you have a 6G Queue. and it's usually full, which means you have a stable queue. Theoretically if you're not using your full queue you could reduce your size, I think that's the theory behind this, but I think the inherent problem with LDM is it only clears space when it's needed...at least that's how I understand it. So this has good intentions to help you know if you can get away with a smaller queue, but I don't think it provides the greatest analysis...I like the Age vs Time much better... Space vs Time - Memory consumption. Pretty obvious. I won't elaborate. Number of Products in Queue vs. Time Ok, this is my #2. It tells you how many products you currently hold in queue. So if this graph levels off (flat lines)...that means you've stopped receiving products. It should reset to zero when you remake the queue but NOT necessarily when you restart. Only after a delqueue command (because you've deleted the queue and thus all items in it!). CPU Context Switch Rate vs. Time Context switching means your doing alot of things. To ME, this looks like a pretty busy box based on this. More of a machine relevant item than LDM/data related. Age of Oldest Product in Queue vs. Time This is my #1 most important graph. This is your recovery window. It is the age in hours of your oldest product. So lets say you have a crash downstream and lose a host. To get it back up without losing any data you have to know how long data is held in the LDM queue upstream. The critical point are the low ones, because at those times you have the shortest window. So based on your chart I would say you have 3.5 hours to get the host back online without losing any information. To utilize this you would tune downstream max_latency and offset parameters to look back far enough and ensure you request everything from upstream. Also, as is with some of our development tasks, we make small queues and need to make sure the processing finishes before the items are deleted from the queue. So if a process takes an long amount of time as can be the case during development you might miss products along the way. This is mainly because we've found LDM doesn't exactly multitask, it seems to take an Product and traverse the pqact entries. So if one entry takes an hour to process , the product is deleted from the queue, then it attempts to process the next pqact entry it will never execute because it's been deleted. This is observation and experimentation, maybe someone else knows better but it's what I've seen. This graph has been priceless for tweaking different use cases. If you stop receiving products the chart should do a 45 degree - Bernie Madoff style growth line....it will also do that whenever you remake the queue until you max out the size and start replacing items. LDM Connections vs. Time The number ldm connections the host has over time. My only complaint with this is under perfect cirucumstances sometimes the lines end up matching the top and bottom exactly so it looks empty. CPU-Modes vs Time & Load-Average vs Time. Typical CPU information. Looks pretty busy but healthy. * Disclaimer - These are my interpretations. I was in the LDM training when Steve introduced these for the first time. If I got something wrong I hope Steve or someone else will correct me. -- Eric M. Hudish 174 Faith Circle Boalsburg, PA 16827 Cell: +1.724.977.3314 Goog: +1.814.689.9148 "Duh! To make room for Tuna!" <http://www.google.com/profiles/eric.hudish> Search <http://keyserver.pgp.com>
ldm-users
archives: