Benchmarking vs. Baselining

ITSM Dashboards
11 Dec, 2019

This is a pretty big nut I’m cracking today. There are entire books dedicated to the topic.  So, as always, I’d like to start with some simple terminology that keeps us on the same page without requiring a Ph.D. in data science.  If you disagree with the terminology, let’s talk.  I always appreciate discussions in the thread below.  Of course, you’re just falling into a marketing trap.

Now, on to the terminology:

  • Baseline – Take a measure or Key Performance Indicator and compare it to your internal performance.
  • Benchmark – Take a measure or Key Performance Indicator and compare it externally.

Ok, so what do I think in general about benchmarking and baselining?  Well, baselining is always necessary for performance management and benchmarking is sometimes helpful, but only when you know the limitations.

So, for you inductive thinkers out there (like me), I have an example that I think is pretty helpful in the world of IT Management:

  • Good Measure (and KPI, more on that later) for benchmarking in the world of IT management — % of Incidents Resolved by Tier 1
  • Poor Measure (and KPI) for benchmarking  — % of Incidents Meeting Service Level Agreement

Why is % of Incidents Meeting SLAs Poor?  And what makes certain measures poor for benchmarking in general?

I’d like to use this to introduce the top 3 issues, in my opinion, associated with benchmarking:

  • Variability
  • Aggregation
  • Data Availability

So, the answer to the previous question highlights issue #1 with benchmarking:  variability across organizations.  Service Level Agreements are generally agreed upon based on the needs of the business users balanced against the cost of delivering service, so that SLAs should be and are usually different in most organizations.  As an aside, we’ve seen that most (but not all) large organizations have priority-based SLAs, they have different response time targets, resolution time targets, business hour definitions and so forth.  So, a smaller, but growing percentage of Global 2000 organizations are moving toward Service Level Agreements that are truly based on business and technical services, fueled by cloud services, thus improving visibility and communication up the chain to executive management.

Now, with % of Incidents Resolved by Front Line, most organizations do have a 1st tier support organization.  That is fairly universal.  And, it is also generally understood that an Incident resolved at the 1st tier is less expensive than in deeper tiers (such as engineering or development).  So, hopefully, this will help you consider the viability of certain measures for IT Performance Improvement based on variability.

The next issue with benchmarking (and issue #2) is aggregation.  What we typically find with benchmarking solutions, is that they are presenting you with aggregate level information, with no ability to drill-down deeply (due to anonymity requirements) to the grain in order to see the level of detail that is necessary for your specific, comparative analytical use cases.

Example: Average resolve time (from our database) is roughly 100 hours for organizations with > 150 people in IT.  But, when you drill-through to the next level of detail you will see that the average resolution time might be 10 hours for desktop support incidents and 200+ for Tier 3 incidents.  To drill even further, you might see that average resolution time for all Cisco core router incidents is 360 hours (yipes, but maybe not yipes?).  SAP Application issues might be in the 280 hour neighborhood (depending on the SAP module).  Finally, if you drill-down to the individual Incidents, you will some Incidents are 1,000+ hours (thus skewing the average and why we prefer median in many cases).  Now, let’s say we were to bundle together hundreds of different customers together and categorize them by vertical, segment and IT organization size.  So, the issue is that aggregation many times hides necessary detail that is needed for benchmarking.

The third point and by no means final is the issue of data availability.  This issue is particularly acute when looking at benchmarking solutions that rely on survey samples alone (which is most) or are pulling the data from only 1 type of application or data source.  For example, if you are going to get accurate costing data for an IT staff member, you would potentially need to pull data from the financial system in addition to the ITSM platform, across multiple ITSM tools to avoid data bias.  Or, how about  the total cost of an IT asset?

Outside of IT for a bit, benchmarking, from my personal experience with tools like QuickBooks Online (QBO), for example, offer a benchmarking & analytics product called Fathom (or whatever it’s called now) that allows you to compare your performance against your industry peers.  So, at Northcraft, we were able to see that our revenue growth, margins, and cash flow operating ratio were in the top quartile of all companies in the “software” category.  Yay for us. And, this was a pretty solid use of benchmarking because these measures are generally accepted and measured in a similar way across most different organizations due to GAAP principles (ITIL for accountants).

However, if I want to measure cost per marketing qualified lead, which helps us make necessary tactical decisions around our marketing budget, I’ve found that the fathom numbers are completely inaccurate and of no use.  Why?  Because I need to combine data from our marketing system, sales system (, QBO and Google Analytics.  Would Fathom take me across all of those data sets?  No.  And, it never will.  Embedded BI is different.  This is where a true BI platform becomes necessary for evidence-based improvement.  The data might be available, but it kind of isn’t… without a multi-dimensional combination of the different sources.

So, here’s where the rubber meets the road.  For your Key Performance Indicator Goals, should you baseline or benchmark?  The default solution is really to baseline your own performance and look to improve from there.  Improvement is improvement regardless of what your peers are doing.  That being said, it probably wouldn’t hurt to take a few common industry measures from offerings like MetricNet as long as the cost of the information isn’t too high!

Happy analyzing!