A Typology of Potential Information Needs in Collaborative Interactions
Sosuke Shiga (AY 2016)
Spoken dialog systems have boomed in recent years (Kawahara, 2013), with a 2016 report showing that 20% of queries in apps and Android are voice-based searches (Greg, 2016), suggesting the growing importance of spoken dialog systems in information retrieval (Greg, 2016). However, to date, spoken dialogue systems have dealt with predefined questions and commands, and have not been able to deal well with the expression of ambiguous information needs. On the other hand, in the field of information retrieval, Taylor (1968) proposed a model that classifies information needs into four levels of indefiniteness, and we thought that investigating the behavior of these models on real data would lead to the realization of a spoken information retrieval system that can handle ambiguously expressed information needs.
In this study, three RQs were formulated: RQ1: How can information needs in dialogue be categorized; RQ2: What characteristics exist in categorized information needs; RQ3: What characteristics are useful for identifying categorized information needs; and RQ4: How can information needs in dialogue be categorized? In RQ1, based on Taylor's model, we applied Jarvelin et al.'s (1995) model to create a model of interactive information needs by creating 10 categories under ambiguous and explicit information needs. In addition, the 10 categories of information needs were annotated using crowdsourcing on a corpus of 32950 sentences from a paired travel planning task. The results of the analysis showed that for RQ2, information needs were present in about 16% of the sentences during the dialog and tended to decrease as the task progressed. The same decreasing trend was observed for both ambiguous and explicit information needs. The analysis of the probability of state transitions suggested that the transition from ambiguous to explicit information needs was not direct, but was most likely made through other interactions. In addition to semantic features, time series, linguistic, statistical, and interactive features are also useful for identifying information needs. Future prospects include the automatic generation and extension of queries using dialogues with ambiguous information needs.
(Translated by DeepL)