コンテンツにスキップ

Semantic Modelling of Document Focus Time for Information Retrieval Tasks

張 麗蓉(2021年度修了)

Effective modeling of a document’s temporal attributes is crucial for a range of applications such as temporal information retrieval, summarization (and clustering), or question answering. However, the current approaches suffer from the sparsity of temporal expressions and entity names appeared in documents. To address the limitation, we proposed a new method to estimate the focus time of unstructured texts based on a sequence classification model. We also developed a new reranking method where the application of document focus time was optimised for different temporal classes such as past, recency, and future. We introduced a new way of calculating query trend time, using Google Trend to extract the popular time period of the query.

Evaluation with the NTCIR Temporalia test collections demonstrates that the proposed method outperforms the baseline approaches in different retrieval tasks, and works effectively when the documents have few temporal expressions and entity names. Also, the effectiveness of the calculated focus time in estimating temporal relevance was comparable to that of the article’s own publication time. This suggests that our focus time method can be used as an alternative or supplementary tool against to the publication date. It can help retrieve articles that either do not have a publication date or the publication date does not match the actual content. Also, the results demonstrate that semantic information can be used to predict the temporal tendency of documents. The significance of this study is that it overcomes the problem of lack of entity names or temporal expressions which are commonly used in the traditional techniques of calculating document focus time. For the temporal retrieval task, we find that the trend time works better than the issue time when calculating the temporal relevance of query and document. In future work, we plan to apply learning to rank models to adjust the weight of textual. We also intend to incorporate the entity-based techniques to increase the prediction accuracy by feeding models with entities and their relations.


学位論文に戻る