/
Design process

Measuring the Intangible. Usability Metrics

0

mins to read

How do we evaluate the design? Typically, the first ones to approve it are the designers themselves. After that, other decision-makers give their opinion. Then, some users test it and give their feedback.

Having few levels of approval is great, but all those opinions are rather subjective. As a design agency, we always promote a data-driven approach. And in the case of measuring user experience, usability metrics are that essential data.

What are usability metrics?

Usability metrics are a system of measurement of the effectiveness, efficiency, and satisfaction of users working with a product. 

To put it simply, such metrics are used to measure how easy and effective the product is for users.

Most usability metrics are calculated based on the data collected during usability testing. Users are asked to complete a task while researchers observe the user behavior and take notes. A task can be something like “Find the price of delivery to Japan”, or “Register on the website”.

The minimum number of users for measuring usability is 5. Jacob Nielsen, the founder of "Nielsen Norman Group", recommends running usability testing with 20 users.

To analyze every user action, researchers might record the testing and watch it a few times. All of it is necessary to calculate the metrics. 

Let’s take a closer look at the most used usability metrics. We’ll start with the metrics for effectiveness measurement.

Success score

However long is your list of usability metrics, success score will probably be at the top of the list. Before we go into the details of usability, we have to find out if the design works. Success, or completion means that a user managed to complete a task that they were given.

The basic formula for the success score is:

Success score

Success score would be somewhere in between 0 and 1 (or 0 and 100%). 0 and 1 are not just simple numbers. In this binary system, these numbers refer to the task being completed successfully or not. All the other specific situations are overlooked. Partial task success is considered a failure.

To have a more nuanced picture, UX researchers can include tasks performed with errors in a separated group. Let’s say, the task is to purchase a pair of yellow shoes. The options of “partial success” can be buying a pair of shoes of the wrong size, not being able to pay with a credit card, or entering wrong data.

Let’s say, there were 20 users, 10 of which successfully bought the right shoes, 5 chose wrong type of delivery, 2 entered their address incorrectly, and 3 were unable to make the purchase. If we were counting just 0 or 1, we would have rather low 50% success score. By counting all kinds of “partially successful” tasks, we get a whole spectrum.

Note! Avoid counting “wrong address” as 0,5 of success and adding it to overall average, as it distorts the results.

Each of the “partially successful” groups can tell us more than a general success score: using these groups, we can understand where the problem lies. This is something that we expect more often from qualitative UX research, while quantitative just gives us a precise but narrow-focused set of data.

To consider a product having good usability, success score doesn’t have to be 100%. The average score is around 78%.

Now, let’s move to the second of the most common usability testing metrics:

Number of errors

In user testing, an error is any wrong action performed while completing a task. There are two types of errors: slips and mistakes.

Slips are those errors that are made with the right goal (for example, a typo when entering the date of birth), and mistakes are errors made with the wrong goal (for instance, entering today’s date instead of birth date).

There are two ways of measuring errors: measuring all of them (error rate) or focusing on one error (error occurrence rate).

To find the error occurrence rate, we have to calculate the total number of errors and divide it by the number of attempts. It is recommended to count every error, even the repetitive ones. For example, if a user tried to click an unclickable zone more than once, count each one.

Error rate

Error rate counts all possible errors. To calculate it, we need to define the number of error opportunities, all possible slips and mistakes. This number can be bigger or smaller depending on the complexity of the task. After that, we apply this simple formula:

Error occurrene rate

Can there be a perfect user interface that prevents people from making typos? Unlikely. That is why the error rate almost never equals zero. Making mistakes is human nature, so having errors in usability testing is totally fine.

As Jeff Sauro states in his “Practical Guide to Measuring Usability”, only about 10% of the tasks are completed without any mistakes, and the average number of errors per task is 0,7.


Success score and error rate measure the effectiveness of the product. The following metrics are used to measure efficiency.

Task time

Good usability typically means that users can perform their tasks successfully and fast. The concept of task time metric is simple, yet there are some tricks to using it with the most efficiency.

Task time

Having the average time, how do we know if the result is good or bad? For other metrics, there are some industry standards, but for task time there can’t be any.

Still, you can find an “ideal” task time — a result of an experienced user. To do this, you have to add up the average time for each little action, like “pointing with the mouse” and “clicking”, using KLM (Keystroke Level Modeling). This system allows us to calculate this time quite precisely.

Most often, the task time metric is measured to compare the results with older versions of the design or with competitors. 

Many times the difference in time will be tiny, but caring about time tasks is not just perfectionism. Remember, we are living in a world where the majority of people leave a website if it’s not loading after 3 seconds. Saving those few seconds for users can impact their user experience quite a lot.

Efficiency

There are many ways of measuring efficiency, one of the most basic is called time-based efficiency and combines both task time and success score.

Efficiency

Doesn’t look basic, right? Not all formulas are easy to catch. It would take another article to explain this one in detail.

Tracking metrics is a whole science. If you want to dive deep into it, check out our list of best books about metrics (or leave it to the professional UX designers).


Now that we have figured out how to measure both effectiveness and efficiency, we get to measuring satisfaction, the key of user experience studies. 

There are many satisfaction metrics, but we’ll bring two that we consider being the most efficient. For these metrics, the data is collected during usability testing by asking the users to fill in a questionnaire.

Single Ease Question (SEQ)

This is one of those easy and genius solutions that every UX researcher loves. Compared to all those complex formulas, this one is as simple as it gets: a single question is asked after the task.

SEQ

While most task-based usability metrics are aiming at finding objective parameters, SEQ is tapping into the essence of user experience: its subjectivity. Maybe the task took a user longer to complete, but they had no such impression. 

What if the user just reacts slower? Or were they distracted for a bit? User's subjective evaluation of difficulty is no less important than the number of errors they made.

On average, users evaluate task difficulty at 4.8. Make sure your results are no less than that.

System Usability Scale (SUS)

For those who don’t trust the single-question solution, there is a list of 10 questions, known as the System Usability Scale. Based on the answers, the product gets a score on a scale from 0 to 100 (each question is worth 10 points).

System Usability Scale

This scale comes in handy when you want to compare your product with the others: the average SUS is 68 points. Results over 80 are considered excellent.

Why care about usability metrics?

The basic rule of user research states that conducting about three user interviews gives us a big chunk of info about the product usability, as well as usability problems. But why bother measuring quantitative UX metrics?

Well, if this is the first time you run user tests, you should stick to qualitative tests, of course. The foremost is to get to know the users well. However, when a company gets serious about user research, quantitative data comes to play.

What is the difference between qualitative and quantitative tests then? The first one gives us valuable insights like “users find the navigation of the website confusing”, and the second gives data in precise numbers, like “our redesign makes users do their tasks 61,5% faster than the old design”.

The latter insight does not tell you what exactly makes a new design work faster than the old one and doesn’t tell you how it can be further improved. However, when you have to justify redesign to the CEO, solid data would look more convincing than excerpts from user interviews.

The same metrics can be a good basis for the assessment of a design team’s success and defining design KPIs. It helps with an old problem of UI/UX designers: when a good interface is barely noticeable. Few people understand how much work lies behind these seemingly “simple and obvious” solutions.

With the urge to make the changes slightly more visible, designers sometimes are tempted to make small but noticeable adjustments like switching colors, replacing buttons, and so on. These are the things that annoy users so much every time their favorite app changes. This is what happened to Twitter, by the way. We wrote about the scandal around Twitter redesign recently.

How do metrics help with it? When designers know that their objective is to improve the metrics, they won’t be just changing visuals and reshaping logos to make the results of their work more “noticeable”. Their management knows the KPIs and can easily see the impact.

All in all, tracking usability metrics is a sign of a company with a certain level of UX maturity. Once you decide to invest in that, you’ll find out that usability metrics can be as valuable as CAC, MRR, AARRR, and others.

To sum up

Usability metrics are not the easiest part of a UX researcher’s work. There is a lot more to do than just counting the numbers. You have to recruit many users, create tasks, organize the testing, observe and collect the data, and after that, you can finally apply the formulas and get the results.

For those who dare to go all this long way, we have to say that they will be rewarded with a clear data-driven system of UX evaluation.

Curious to find out what is there beyond usability testing? Read our article about other crucial UX research methods.

Masha Panchenko

Author

Table of contents

Top Stories

Design process
/
0
min read

What’s Wrong With the Recent Twitter Redesign?

People don’t like changes. That’s why they rarely like redesigns. So the only change the Twitter community would appreciate was adding a button for editing tweets, but Twitter didn’t add the editing button. 

The changes Twitter made were way more controversial and framed as an "Accessibility" thing.

  • Twitter changed the system font with a new custom one.
  • They switched to a high contrast color scheme.
  • They changed the color of the buttons.
Twitter

The redesign probably helped those with low vision or color-blindness. But it made Twitter inaccessible for people with astigmatism and dyslexia (the new font), and color-contrast and photosensitive migraineurs (the new color scheme). Even people without any visual impairments complain of headaches after scrolling through the updated interface.

All of those novelties are default. Users can’t change the font or reduce the contrast. They can only express their outrage — express with a fierce and unruly force.

Users hate Twitter's new font.

The new opinion-dividing font is called Chirp (Twitter, Chirp, you get it). In early August the company replaced the previous system font, and for some people, it has gone unnoticed. At the same time, tons of people found the new font barely readable.

Tempo

Complaints are not a surprise. Font readability is in large part a matter of habit — after all, medieval Europeans believed the gothic typefaces were pretty convenient. That brings us back to the statement about changes that people hate. Naturally, a standard system typeface feels better than a font you’ve never seen before. Most social media know that and use operating systems’ default fonts. Twitter also knows that. In the update announcement, they say that changes “might feel weird at first.”

We can trust Twitter when they say we’ll get used to the new font. But we may doubt when Twitter says they introduce the font that is harder to read as an update for better accessibility. When the company first showed Chirp to the public half a year ago, they said the font was for “having a holistic brand,” and this reason makes much more sense.

We’ve recently written about how brands balance on the dizzying path between unique and usable. In short, most apps sacrifice their distinguishing features because they want to be more convenient for users. Design uniformity helps us to switch between dozens of apps effortlessly.

Seems like Twitter decided to sacrifice usability for branding purposes, yet advertising it as if they care about users. The part of the users’ disappointment should have been caused by this contradiction.

Yet there are plenty of other reasons for driving users mad. Chirp has visible issues with displaying on Windows, that Twitter promises to fix soon. 

Twitter

Another thing is a problem with multilingual support. For some reason, Chirp doesn’t have a Greek and Cyrillic character set, or those sets don’t work as they should. Lots of people say their non-English posts look noticeably different from the Latin text.

Twitter

People would probably accept the problematic default font if only they could change to a preferable one. But such functionality is not available on Twitter.

Looks like I got a headache during the work on this article, and the text tends to be too critical of Twitter. To correct the balance, I would like you to meet Maksym, UI/UX designer at Eleken. He likes Twitter’s redesign, and here’s why:

Except for the evocative shortcomings, Twitter’s design team made a bunch of invisible improvements. They removed minor inconsistencies, cleared the interface from visual clutter and simply made design elements work better together.

For instance, they changed the color of navigation elements. Now you have black color for navigation and blue color for action buttons, and it makes much more sense than having everything in blue.

Users hate the new contrast mode.

“Guys went too far with the contrast” — Maksym says, on the contrary. He believes contrast change became the most problematic aspect of Twitter’s redesign, even though most people blame the new font.

Low contrast in design causes problems for people with low vision or color-blindness. Web Content Accessibility Guidelines (WCAG) require the contrast ratio for text to be at least 4.5:1, but it doesn’t regulate the upper limit.

Twitter’s example showed that the high contrast that exceeds the specified minimum can be also a bad idea, especially for users who suffer from chronic migraines. 

Twitter

A significant part of complaints about headaches and eyestrain was probably caused not by the font itself, but by the increased gap between the color of the font and the color of the background.

The good news is that Twitter acknowledged the contrast problem and promised to fix it.

Twitter

The bad news is that Twitter didn’t promise to expand the contrast customization capabilities so that users could change the contrast to whatever works best for them.

Users hate new buttons (but probably shouldn't).

Another hotly debated change is the colors inversion of the "Follow" and "Unfollow" buttons. Users habitually hit unfilled buttons to follow people… And accidentally unfollow them. Pretty annoying, but there's an explanation for this upside-down move.

Dann Petty says the following button was broken before, and now it is fixed. The app shows two buttons simultaneously, for tweeting and for following people. Those buttons used to be blue and white — to make them different and avoid user confusion.

Follow buttons

Now, what happened after you followed somebody?

The unfollowing button turned blue, just like the tweeting button, bringing you back to the problem with visual hierarchy.

The recent redesign has solved the problem by making the second-priority following button gray. As you follow someone, the button turns white and stops drawing your attention. It’s just like YouTube’s "Subscribe" button goes from red to grey when activated, and Instagram’s "Follow" button goes from blue to white.

So even if the new buttons will annoy us until we learn to live with them, they were redesigned for greater usability good.

What can we learn from Twitter’s mistakes?

Twitter’s redesign showed what may happen if your users wake up to see changes they didn’t ask you for. That’s an important lesson for any SaaS company — automatic updates are as much of a liability as a strength. 

Another lesson is that there’s no one-size-fits-all accessibility. Integrate a wide range of disabled people for testing before your design goes live.

And finally, make changes an option, not a default. Let people customize the design to whatever works best for them.

Design process
/
0
min read

Product-Market Fit: How to Interview Users [Questions List Included]

Startup founders spend lots of time preparing to make a good pitch that would be the magic wand for their product.

But what if I told you that it can be no less important to talk to potential customers without even mentioning the idea? And this situation is way more likely to happen than the dreamed elevator ride with an angel investor. The moment comes when you have an idea and want to find out if it will reach product-market fit. Questions are more important at this point than telling your story.

As a SaaS design agency, we often work with startups on their way to the product-market fit. This initial period is a key moment when little things can make or break the whole business.

But how do you assess product-market fit? The answer is metrics.

Product-market fit metrics

When startup founders ask themselves “How to find product-market fit” (or anything else), the most secure answer is “By measuring relevant metrics”. They include the churn rate and the number of specific surveys.

One of the most popular surveys related to product-market fit is known as the Sean Ellis test. It goes with just one question:

How would you feel if you could no longer use this product? Very disappointed, somewhat, N/A, Not disappointed
Image credit: pmfsurvey.com

If you want to know some other product-market fit survey questions, read our guide on product-market fit. It contains a few metrics to choose from, but all of them fit to the existing product that already has customers. But what about those that are yet to be launched?

Smart founders start thinking about product-market fit before it even gets to the market. One of the best ways to do this is by talking to users directly. We will talk about this later in the article and share some tips with you on how to prepare product-market fit interview questions. But now, let's discuss why asking the wrong questions might cost you a fortune.

The cost of the wrong question

Asking the wrong question can cost millions. It is not an exaggeration, but a real story. In 2009, Walmart decided to make a redesign in order to de-clutter their shops. To find out whether the user would like it, they ran user research.

The question they asked their customers was “Would you like Walmart aisles to be less cluttered?”. Of course, most people said yes. So, Walmart invested millions in the redesign, convinced that it would make customers happier. The result was a large decrease in sales (maybe customers actually became happier, but that’s not what you care about when the sales plunge). Walmart lost over one billion dollars.

Now that we look back at the situation, it seems obvious that the question was driving biased answers. So, how do you pose the right question when you want to verify an idea? There are some rules that you can rely on to check the quality of the questions.

Mom Test

When you come to your mom and tell her a new exciting idea, asking what she thinks about it, what would be her answer? Great idea, darling, I'd love to see it! Is she lying to you? No. Should you base your business decisions on it? No once again.

It does not mean that your mother can't give you a good business advice. But the secret here is to shape the conversation in a way that would not be about praising your idea.

This is a key idea behind the mom test invented by Rob Fitzpatrick, who even wrote a whole book on it. Thanks to this test, you can turn any conversation into a useful source of information to get to product-market fit.

Imagine you have a genius idea: make an app that would connect dog groomers with clients. You come to your mom and ask her if she likes the idea. She says “of course, what a great idea, darling”.

But the conversation doesn’t end there. Next, you ask if she would use this app to find a groomer for her terrier. And if the answer is yes, you ask her if she would buy a monthly subscription. Of course, your mom will be willing to pay a good price for her kid's app! It doesn't mean though that other people will do so.

Naturally, moms want to support their kids and therefore they give compliments to their ideas. The challenge is to lead the conversation in a way that would give really useful information and not just compliments.

The mom test teaches us rules to make interview questions unbiased. Here are the main ones:

1. Talk about the problem, not the suggested solution

Shift the focus from your product to the customer. It is a bit counterintuitive because we really want to know whether our idea is good or not and whether people are willing to pay for it. And asking questions about the problems your customers is nowhere near as easy.

Still, asking about the solution is OK when it is not put in a hypothetical way (Do you think a new productivity app would solve your procrastination issues?), but refers to real experience (Have you tried some methods to fight procrastination?). It brings us to the next rule.

2. Ask about the past, not the future

Here is a real-life story. When people who just bought a treadmill were asked how often they plan to use it, they said 3 times a week. After one month, it turned out that they used it about once a week. They were asked again and the answer was “five times a week” this time, as customers hoped to to recover lost time.

Humans tend to imagine the future way more optimistically than it really is. These pink glasses help us survive in this not-so-optimistic world. That is why you can't rely on what people predict about their future behavior. They are not lying, they are just being optimistic.

3. Avoid compliments

When you hear something like "your product is a great solution", it's time to move the conversation in another direction. Compliments are a symptom of the “mom bias”. The interviewer has to carefully return to talking about customer experience instead.

Coming back to our example of a dog grooming app, how could you shape your questions? First of all, you would start asking about her experience instead of telling the idea.

This question asks about the past. Then, you can find out where she got the info about her groomer. Did she google it or was it another dog owner’s recommendation?

Further you might find out that Lucky hates car rides and the only reason why your mom goes to a dog groomer is that they live on the next street. That's when it becomes clear your mom wouldn't use the dog groomer app.

That’s it, your mom just ruined your idea… Now you don’t rush into it and do more research before investing in the development. Also, you can try looking at other kinds of customers: for example, those attending a dog competition, where people are very serious about grooming.

4. Listen more, speak less

Getting the interviewees to talk is hard, but talking as little as possible can be even harder. It is commonly believed that 90% interviewee / 10% interviewer speaking is a good distribution. How do you get there? Don’t interrupt, even if they start speaking of something that is not very relevant. Ask open questions, not yes/no.

5. Pay attention to emotions

When interviewees show excitement or annoyance about the things they are talking about, ask more questions about their emotions. At the same time, you should be sensitive to topics they don't want to talk about and not push in that direction. Interviews must not be uncomfortable.

6. Don’t ask about prices

This goes back to the rules mentioned above: people can’t predict their own behavior and they want to compliment you. In an ideal world, they would pay a good price for useful products. In reality, most people are not willing to pay more than the minimum price.

What if top managers of Louis Vuitton were asking clients “How much are you willing to pay for a bag?” They would never get to where they are. We are very far from luxury bags, but you get the idea. Don’t base pricing decisions on users.

7. Organize the process

Don’t think of it as “just a talk about our product”. Now that you see how considerate you should be with every question, you understand that the best way to make it is by properly writing all the questions down and trying to follow the script.

Use whatever instruments that can help you: recording devices, note-taking, and so on. Make an exception only for those occasional situations when you meet a customer at a conference or see your mom at a Christmas dinner, trying to get some valuable information about your product.

For more tips on organizing user interviews, read our article “How to talk to users”.

Good Questions

Now that you know all the rules, preparing questions becomes both easier and harder. Here are some examples of questions that can be included in the mom test. Take this list as an inspiration and make your own.

  • What does your typical day at work look like? 
  • Tell me about the last time you faced this [problem]?
  • What have you tried to do to solve [this problem]?
  • What is good/bad about the solution you are using now?
  • How did you find this solution?
  • Have you looked for alternatives to this solution?
  • How much did you pay to solve this problem?

To make the most of the interview, finish it with a plan for the future (yes, at this point you can talk about your product and about the future).

  • Would you like to be on the list of beta testers when we launch?
  • Could we meet next week to hear your opinion on our product?

Bad questions

If you already have a draft interview script, make a fast check that your questions don’t look like these:

  • How often do you fail to do [a task]?

NO — this question makes the interviewee feel guilty

  • Would this [product] be useful to solve this [problem]?

NO — this question is begging for an affirmation and a compliment

  • Would you be willing to pay a bit more than you are paying now to get a much faster solution?

NO — the answer will never predict the future.

How do you know that the interview was unbiased?

You never know for sure. That is why, whenever possible, you should use different types of user research to prove your hypothesis right or wrong. For example, in the case of Walmart, an A/B test would do the job.

Curious to find out what is out there beyond user interviews? Read our article about UX research methods.

Don't want to miss anything?

Get weekly updates on the newest design stories, case studies and tips right in your mailbox.

Success!

Your email has been submitted successfully. Check your email for first article we’ve sent you.

Oops! Something went wrong while submitting the form.
Don't want to miss anything?

Get weekly updates on the newest design stories, case studies and tips right in your mailbox.

Success!

Your email has been submitted successfully. Check your email for first article we’ve sent you.

Oops! Something went wrong while submitting the form.