Skip to content

Assigning and Classifying Activity and Situation Labels in Lifelog Images

Iwao Yamagata (AY 2017)

In recent years, the market for wearable terminals has been expanding. Wearable terminals are devices with various built-in sensors that are worn on the human body to collect information. The use of these wearable devices to collect, store, and organize data about one's daily activities (lifelogging) and to build an archive of personal data is called lifelogging. The expansion of the wearable device market suggests that the number of people engaged in lifelogging activities will continue to grow in the future.

One of the main use cases of lifelogging is the behavior of looking back into the past. Among the five senses, previous studies have shown that visual information is useful for retrospective behavior. Therefore, this study focused on lifelog images collected by wearable cameras.

In order to use the collected lifelog images for looking back, it is necessary to assign labels to the images that meet the needs of lifeloggers, such as daily activities and activity locations (situations). However, there has been little research on assigning labels to lifelog images at a relatively abstract level, such as activity and situation. Therefore, in this study, we manually assigned a total of 16 labels (11 activity-related labels and 5 situation-related labels) to approximately 19,000 images (2 lifeloggers, for a total of 3 weeks) from the dataset provided by the NTCIR-13 Lifelog Task, based on the lifeloggers' needs. The main objectives of the research were 1) to clarify the annotation issues by manually assigning 16 labels (11 labels for activity and 5 labels for situation), and 2) to measure the classification accuracy by machine learning using the annotated image dataset and clarify the effective features of the annotation.

The annotation results showed that the seven labels (Eating Food, Drinking Drinks, Socializing/Casual Conversation, Reading a Book/Paper, Watching (TV), Walking, and Street) were more accurate than the other labels. The seven labels (Drinking, Socializing/Casual Conversation, Reading a Book/Paper, Watching (TV), Walking, and Street) were more difficult to annotate than the other labels. In addition, the results of the classification accuracy test showed that classifying other sports/exercise, street, and traveling was relatively easy, while preparing meals and reading a book/paper were more difficult to classify. The feature analysis showed that while image object features are effective for classification overall, there are labels for which sensor data is useful and labels for which accuracy can be improved by combining the two.

These findings will be useful in constructing the teacher data needed for future development of automatic lifelog labeling technology. Future work includes 1) evaluating the classification accuracy using a larger number of image samples and 2) verifying the reproducibility of the generated label assignment rules.

(Translated by DeepL)


Back to Index