The CUQ is a new questionnaire specifically designed for measuring the usability of chatbots by an interdisciplinary team from the Ulster University.
Whilst the CUQ tool measures UX and usability of chatbots, other tools exist, including the ALMA Chatbot Test Tool (Martín et al. 2017) which measures similar factors to CUQ, such as:
- Chatbot Personality
- User Experience
- Error Handling
Why should I use the CUQ?
Other usability metrics are available, and perhaps the most popular and well-known is the System Usability Scale (SUS), developed in 1986 by John Brooke.
SUS is a ten-item scale that provides a general assessment of system usability (Brooke, 1996), which may be compared to the SUS benchmark of 68.0 (Sauro, 2018).
While SUS is a “quick and dirty” tool for assessing system usability (Brooke, 1996), it has primarily been designed for measuring the usability of conventional computer systems.
As conversation-driven systems, chatbots do not conform to conventional design and testing principles, thus chatbot usability testing may require a different approach, thus SUS on its own may not be the best option.
Can I use other metrics (such as SUS) alongside the CUQ?
The CUQ is designed to be comparable to SUS and may be freely used alongside it, or in combination with other usability metrics.
Recent research has suggested that multiple metrics will give a more comprehensive assessment of chatbot usability (Baki Kocaballi et al. 2018).
How do I use the CUQ?
The CUQ may be used during the post-test evaluation phase of chatbot usability tests.
It may be administered in either paper form or digitised and administered electronically using web-based tools such as Qualtrics.
CUQ scores are calculated in a similar manner to SUS scores.
Positive aspects of chatbot usability are assessed by odd-numbered questions, and negative aspects are assessed by even-numbered questions.
Scores are calculated out of 100 (to be comparable to SUS).
The CUQ Calculation Tool is a Microsoft Excel spreadsheet that may be used as a quick and easy means of calculating CUQ scores.
Has the CUQ been properly validated?
The CUQ was validated as part of a Research PhD at the Ulster University in August 2019.
Twenty-six participants used the tool to evaluate three chatbots (classed as good, average and poor quality) and results suggested that the questionnaire demonstrated construct validity and test-retest reliability.
Findings from this study will be submitted for publication soon.
Please note the following limitations to the validation study:
- Construct validity was assessed based on a consensus ranking by an expert panel. Three chatbots were selected for this purpose. It would be useful to compare scores for more than three chatbots, and for different types of chatbot (e.g. health, financial, booking systems etc.)
- The validation was conducted using a relatively small number of participants (n=26)
What research has been published relating to the CUQ?
The CUQ was first used during chatbot usability tests conducted as part of a PhD at the Ulster University in Northern Ireland.
Findings from these tests were presented at the 2019 European Conference on Cognitive Ergonomics.
Proceedings from this conference have now been published and the paper is available to download.
Please cite this paper when you use the CUQ as part of your own research or testing.
Samuel Holmes, Anne Moorhead, Raymond Bond, Huiru Zheng, Vivien Coates, and Michael Mctear. 2019. Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?. In Proceedings of the 31st European Conference on Cognitive Ergonomics (ECCE 2019), Maurice Mulvenna and Raymond Bond (Eds.). ACM, New York, NY, USA, 207-214.
If you have any questions about the CUQ, please contact one of the following:
- Samuel Holmes (firstname.lastname@example.org)
- Raymond Bond (email@example.com)