How do we evaluate the design? Typically, the first ones to approve it are the designers themselves. After that, other decision-makers give their opinion. Then, some users test it and give their feedback.
Having few levels of approval is great, but all those opinions are rather subjective. As a design agency, we always promote a data-driven approach. And in the case of measuring user experience, usability metrics are that essential data.
What are usability metrics?
Usability metrics are a system of measurement of the effectiveness, efficiency, and satisfaction of users working with a product.
To put it simply, such metrics are used to measure how easy and effective the product is for users.
Most usability metrics are calculated based on the data collected during usability testing. Users are asked to complete a task while researchers observe the user behavior and take notes. A task can be something like “Find the price of delivery to Japan”, or “Register on the website”.
The minimum number of users for measuring usability is 5. Jacob Nielsen, the founder of "Nielsen Norman Group", recommends running usability testing with 20 users.
To analyze every user action, researchers might record the testing and watch it a few times. All of it is necessary to calculate the metrics.
Let’s take a closer look at the most used usability metrics. We’ll start with the metrics for effectiveness measurement.
However long is your list of usability metrics, success score will probably be at the top of the list. Before we go into the details of usability, we have to find out if the design works. Success, or completion means that a user managed to complete a task that they were given.
The basic formula for the success score is:
Success score would be somewhere in between 0 and 1 (or 0 and 100%). 0 and 1 are not just simple numbers. In this binary system, these numbers refer to the task being completed successfully or not. All the other specific situations are overlooked. Partial task success is considered a failure.
To have a more nuanced picture, UX researchers can include tasks performed with errors in a separated group. Let’s say, the task is to purchase a pair of yellow shoes. The options of “partial success” can be buying a pair of shoes of the wrong size, not being able to pay with a credit card, or entering wrong data.
Let’s say, there were 20 users, 10 of which successfully bought the right shoes, 5 chose wrong type of delivery, 2 entered their address incorrectly, and 3 were unable to make the purchase. If we were counting just 0 or 1, we would have rather low 50% success score. By counting all kinds of “partially successful” tasks, we get a whole spectrum.
Note! Avoid counting “wrong address” as 0,5 of success and adding it to overall average, as it distorts the results.
Each of the “partially successful” groups can tell us more than a general success score: using these groups, we can understand where the problem lies. This is something that we expect more often from qualitative UX research, while quantitative just gives us a precise but narrow-focused set of data.
To consider a product having good usability, success score doesn’t have to be 100%. The average score is around 78%.
Now, let’s move to the second of the most common usability testing metrics:
Number of errors
In user testing, an error is any wrong action performed while completing a task. There are two types of errors: slips and mistakes.
Slips are those errors that are made with the right goal (for example, a typo when entering the date of birth), and mistakes are errors made with the wrong goal (for instance, entering today’s date instead of birth date).
There are two ways of measuring errors: measuring all of them (error rate) or focusing on one error (error occurrence rate).
To find the error occurrence rate, we have to calculate the total number of errors and divide it by the number of attempts. It is recommended to count every error, even the repetitive ones. For example, if a user tried to click an unclickable zone more than once, count each one.
Error rate counts all possible errors. To calculate it, we need to define the number of error opportunities, all possible slips and mistakes. This number can be bigger or smaller depending on the complexity of the task. After that, we apply this simple formula:
Can there be a perfect user interface that prevents people from making typos? Unlikely. That is why the error rate almost never equals zero. Making mistakes is human nature, so having errors in usability testing is totally fine.
As Jeff Sauro states in his “Practical Guide to Measuring Usability”, only about 10% of the tasks are completed without any mistakes, and the average number of errors per task is 0,7.
Success score and error rate measure the effectiveness of the product. The following metrics are used to measure efficiency.
Good usability typically means that users can perform their tasks successfully and fast. The concept of task time metric is simple, yet there are some tricks to using it with the most efficiency.
Having the average time, how do we know if the result is good or bad? For other metrics, there are some industry standards, but for task time there can’t be any.
Still, you can find an “ideal” task time — a result of an experienced user. To do this, you have to add up the average time for each little action, like “pointing with the mouse” and “clicking”, using KLM (Keystroke Level Modeling). This system allows us to calculate this time quite precisely.
Most often, the task time metric is measured to compare the results with older versions of the design or with competitors.
Many times the difference in time will be tiny, but caring about time tasks is not just perfectionism. Remember, we are living in a world where the majority of people leave a website if it’s not loading after 3 seconds. Saving those few seconds for users can impact their user experience quite a lot.
There are many ways of measuring efficiency, one of the most basic is called time-based efficiency and combines both task time and success score.
Doesn’t look basic, right? Not all formulas are easy to catch. It would take another article to explain this one in detail.
Now that we have figured out how to measure both effectiveness and efficiency, we get to measuring satisfaction, the key of user experience studies.
There are many satisfaction metrics, but we’ll bring two that we consider being the most efficient. For these metrics, the data is collected during usability testing by asking the users to fill in a questionnaire.
Single Ease Question (SEQ)
This is one of those easy and genius solutions that every UX researcher loves. Compared to all those complex formulas, this one is as simple as it gets: a single question is asked after the task.
While most task-based usability metrics are aiming at finding objective parameters, SEQ is tapping into the essence of user experience: its subjectivity. Maybe the task took a user longer to complete, but they had no such impression.
What if the user just reacts slower? Or were they distracted for a bit? User's subjective evaluation of difficulty is no less important than the number of errors they made.
On average, users evaluate task difficulty at 4.8. Make sure your results are no less than that.
System Usability Scale (SUS)
For those who don’t trust the single-question solution, there is a list of 10 questions, known as the System Usability Scale. Based on the answers, the product gets a score on a scale from 0 to 100 (each question is worth 10 points).
This scale comes in handy when you want to compare your product with the others: the average SUS is 68 points. Results over 80 are considered excellent.
Why care about usability metrics?
The basic rule of user research states that conducting about three user interviews gives us a big chunk of info about the product usability, as well as usability problems. But why bother measuring quantitative UX metrics?
Well, if this is the first time you run user tests, you should stick to qualitative tests, of course. The foremost is to get to know the users well. However, when a company gets serious about user research, quantitative data comes to play.
What is the difference between qualitative and quantitative tests then? The first one gives us valuable insights like “users find the navigation of the website confusing”, and the second gives data in precise numbers, like “our redesign makes users do their tasks 61,5% faster than the old design”.
The latter insight does not tell you what exactly makes a new design work faster than the old one and doesn’t tell you how it can be further improved. However, when you have to justify redesign to the CEO, solid data would look more convincing than excerpts from user interviews.
The same metrics can be a good basis for the assessment of a design team’s success and defining design KPIs. It helps with an old problem of UI/UX designers: when a good interface is barely noticeable. Few people understand how much work lies behind these seemingly “simple and obvious” solutions.
With the urge to make the changes slightly more visible, designers sometimes are tempted to make small but noticeable adjustments like switching colors, replacing buttons, and so on. These are the things that annoy users so much every time their favorite app changes. This is what happened to Twitter, by the way. We wrote about the scandal around Twitter redesign recently.
How do metrics help with it? When designers know that their objective is to improve the metrics, they won’t be just changing visuals and reshaping logos to make the results of their work more “noticeable”. Their management knows the KPIs and can easily see the impact.
All in all, tracking usability metrics is a sign of a company with a certain level of UX maturity. Once you decide to invest in that, you’ll find out that usability metrics can be as valuable as CAC, MRR, AARRR, and others.
To sum up
Usability metrics are not the easiest part of a UX researcher’s work. There is a lot more to do than just counting the numbers. You have to recruit many users, create tasks, organize the testing, observe and collect the data, and after that, you can finally apply the formulas and get the results.
For those who dare to go all this long way, we have to say that they will be rewarded with a clear data-driven system of UX evaluation.
Curious to find out what is there beyond usability testing? Read our article about other crucial UX research methods.