Comparison of Result Presentation Methods in Spoken Dialogue Search - Focusing on Surrogate Elements

Koyo Nagano (AY 2018)

In recent years, the market for smart speakers has expanded. Smart speakers are also attracting attention as devices that enable conversational information retrieval using natural language. A system that enables conversational information retrieval is called a conversational search system, and one that uses only voice to communicate with the user is called a conversational voice search system. With the proliferation of devices equipped with conversational voice search systems, the population using conversational voice for information retrieval is expected to grow.

Previous research has shown that an important element of information retrieval systems is the method of presenting search results, which affects the user's retrieval performance. Therefore, in this study, we focused on the presentation method of search results in spoken conversational search.

The presentation of search engine result pages (SERPs) using text and images has been studied in a variety of fields, and knowledge has been gained about the elements that make up search results and how they are arranged. However, there has been little research on methods for presenting search results using speech. Therefore, the goal of this study was to clarify effective methods for presenting search results in spoken dialog search systems. In particular, we focused on a surrogate presentation method that uses three representative elements, the title, the URL, and the snippet (summary of the main text) of the searched Web page, which are the main components of SERPs, and investigated the effects of the order in which the three elements are presented on the accuracy of the user's match judgment and the elapsed time to reach it in laboratory experiments. The experiment was conducted on 24 students at the University of Tsukuba. The experiment was conducted on 24 students at Tsukuba University. The system for reading the search results aloud was developed using Amazon Alexa and Amazon Echo, and the search topics indicating the situational settings for voice search were created by referring to a dataset provided by NTCIR.

The main result of the experiment was that when the last output element was the title, the accuracy of the match decision was higher than in the other conditions. In other words, the results can be interpreted as indicating that the title is not useful in terms of accuracy, since it can be assumed that the accuracy of the match judgments made in the condition where the title is the hardest to refer to was higher. This suggests that the function of titles, which is to express the content of a topic (document) in a straightforward way by being given at the beginning, may not work effectively in speech output, which has more restrictions on how to refer to surrogate information compared to text output. On the other hand, the participants stated in the post-questionnaire that they used the combination of titles and snippets as a factor in determining the suitability of the system. Therefore, we conclude that the title in the presentation of search results in a spoken dialog search system should be changed to a format more suitable for spoken output.

We believe that the results of this study will be useful in constructing a method for presenting search results to improve the performance of spoken dialog search systems. One of our future tasks is to objectively clarify the surrogate factors that participants used to judge the suitability of the data from the recordings of this experiment.

(Translated by DeepL)

Back to Index