A couple of issues I ran into this week with vCOPs that are worth mentioning. One has to do with the infamous grey question mark. This replaces a badge when data collection is broke. The other has to do with pruning data collection in an environment where you are licensed for less per VM than what is being managed by a single vCenter instance. Yes a bit long winded, but let me explain…
First off the most noticeable tip off to vCOPs not collecting data is the grey question mark. In times of normalcy, each badge should be colored with a number indicating, visually and numerically, that it’s reading and making sense of the data. If it’s not there are a couple things that could contribute to this. First, disk space on the analytics VM, lack of..breaks ActiveMQ. VMware uses conservative guidelines as follows in regards to disk sizing, but keep in mind you can add disk space to the Analytics VM, albeit with a shutdown, at anytime.
A bit of background, ActiveMQ is an open source message broker that writes to a database called kahadb. Here it stores and forwards to other services. If the service can’t accept the messages, it writes it to this database. If the disk underflowth ActiveMQ is unable to write a complete file, there in potentially corrupting the database. To resolve this you can follow the steps in VMware KB 2013266. The process is pretty straight forward which haves you rename the kahadb and create a new one. However, it doesn’t state which default account to use to do so. I mistakenly used root (as opposed to admin) to make the changes which prevented the ActiveMQ service (which uses admin) from writing to the kahadb. Simply changing the permissions ( chown -R admin:admin /data/activemq/kahadb/)on the database resolved the issue.
Secondarily, unexpected shutdowns. As with any vApp make sure to shutdown, or restart the VM’s within the vApp at the vApp level. This will save you a lot of trouble down the line and keep that kahadb in working order.
Finally, data collection pruning is something I have heard numerous times from customers. How do I only report on 100 VM’s (because that is all I am licensed for) even though my vCenter instance see’s 300? Very simple. vCOPs collects data from vCenter using the vCenter enterprise adapter. Using the vCOPs admin UI, you establish this pull communication by registering vCOPs with your vCenter instance(s). Within this setup it asks for the IP address, Display name and the registration user for the vCenter plug-in. An optional setting is the collection user. This is the account it uses, based on its visibility within vCenter, to pull metrics. By default it will use the registration user. If you want to only collect on one datacenter or one cluster, for example, setup your collection user with just visibility to that datacenter or that cluster within vCenter. Perhaps in future versions this will be a little more stream lined, doable within the vSphere UI. But for now it’s the recommended way of pruning collections. Visit KB 1036195 for more information.
Understanding data collection in vCOPs tends to be a mind bender. There are multiple associated terms that are used to illustrate this process. Terms like attributes, metrics, super metrics, thresholds, and KPI’s is enough to make you question your own self worth. But with a few helpful explanations maybe we can break new ground together…
First off we all know what resources are in a data center environment. These are items such as datastores, VM’s, vSS’s, vDS’s, vCPU’s, pCPU’s, etc. They are items we consume or that have value. Data that is ingested within vCOPs has an associated attribute(s) for individual resources. By default vCOPs groups together multiple attributes for resources into attribute packages. By doing so you are telling vCOPs to collect only these attributes for this resource.
A metric is a point in time instance of an attribute. If a single metric doesn’t clearly convey what you are looking for, then perhaps a super metric would. A super metric takes this metric, that metric and that other metric applies a mathematical operation (formula) to give you a broader more scalable metric. For instance, if you wanted to capture the average CPU utilization across 50 web servers you would use a super metric to do so.
Thresholds mark the boundary between what’s normal and what’s abnormal for a single metric or super metric. When either boundary is crossed an anomaly is logged. Both hard and dynamic thresholds exist, with the latter being the default. Dynamic thresholds are new to vCOPS and are based on incoming and historical data. It formulates a pattern of normalcy, there in decreasing the number of alerts that are generated. Hard thresholds model themselves after vCenter thresholds. Static in nature, changing only when you change it.
Finally, KPI’s or Key Performance Indicators are attributes you deem important. By defining an upper and/or lower threshold violation on a attribute or super metric package, you are in essence establishing a KPI for those attributes. Once an attribute is defined as KPI, new rules are set forth in motion. For instance, alerting is treated differently than non-KPI attributes. But more importantly a feature called predictive alerting is utilized. Once a violation occurs on a KPI for a application or tier, vCOPs examines the events prior to the violation and marks that as a fingerprint. If it finds similar events in the future it can alert you prior to the thresholds being broken. How sick is that!
Here is an evolving graphic to illustrate the relationships. It’s not pretty, sorry…More to come…