How to Create Contribution KPIs in IT Service Intelligence (ITSI)

Transaction processing can involve transactions flowing from multiple sources.  In that case, you can use Splunk ITSI to drill down to issues with specific transaction end points.

One of the most important KPIs to track is transaction error rates. These can indicate problems with application health, with remote dependencies, or with the transaction requests themselves. 

For our purposes, we’ll assume you are handling transactions from both customers and suppliers.  In that case, these KPIs can be split by customer and supplier entities.  We found that calculating error rates per entity and then averaging them at the aggregate level often led to either excessive alert noise or failure to detect problems.  The reason for this was fairly simple to understand: different customers or suppliers have different transaction volumes.  The following example of such a calculation will make the problem clear, in this case with low transaction volume and high error rate for an individual entity:

CustomerTotal TransactionsTotal ErrorsError Rate
Customer 199910.10%
Customer 211100.00%
1000Average Error %50.05%

 How do we track overall error rate when there is a common occurrence of low-volume entity (customers or suppliers)?  In that case, we can implement a KPI for error rate contribution per entity.  We can count errors at the entity level and get a sum at the aggregate level, then divide the entity count by the total across all entities. 

Want to Know More? Contact Aditum’s Splunk Experts.

“We have a demanding development environment and Aditum has delivered top notch support.”

– Large Health Insurance Provider

Aditum’s Splunk Architects, Splunk Administrators, Splunk Developers and Information Security consultants deliver outstanding results to companies like yours every day. From initial installation to managed services, our experts can help you deliver success.

In the base search configuration we can then take the appropriate measure at the entity level and sum it at the aggregate level.  Using the same entities and numbers from the example above, you get the following results:

CustomerTotal TransactionsTotal ErrorsError Rate Contribution
Customer 199910.10%
Customer 2110.10%
1000Overall Error %0.20%

The basics of this can be adapted to several common ITSI use cases.  For example, you could take a distinct count of sessions per web server, calculate the per-server percentage of the total, and use this to detect load-balancing problems.  This can be especially useful if you need to do entity-level anomaly detection over many entities such as Docker containers when your entity count exceeds the limits of ITSI’s out-of-the-box Entity Cohesion anomaly detection.

About Aditum

Aditum’s Splunk Professional Services consultants can assist your team with best practices to optimize your Splunk deployment and get more from Splunk.

Our certified Splunk Architects and Splunk Consultants manage successful Splunk deployments, environment upgrades and scaling, dashboard, search, and report creation, and Splunk Health Checks. Aditum also has a team of accomplished Splunk Developers that focus on building Splunk apps and technical add-ons.

Contact us directly to learn more.

Chris Selvig
Share this Article

Leave a Reply

Your email address will not be published. Required fields are marked *