Progress 06/01/23 to 05/31/24
Outputs Target Audience:The target audiences for this project are producers, animal scientists, academia, and animal industry professionals. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Dr. Lalman has hired a post-doctoral fellow. The postdoctoral fellow is working with sensors to be used in this research. How have the results been disseminated to communities of interest?
Nothing Reported
What do you plan to do during the next reporting period to accomplish the goals?We are starting the animal experiments in October. In addition, we will start working on objective 2.
Impacts What was accomplished under these goals?
We started working on the objective 1a. Various algorithms to work on multi-modal data are in progress. Multimodal data, such as those collected from sensors, video, and audio data from farm animals provide a rich representation of the various factors that affect the stress levels in animals. However, while they provide a comprehensive view of the animal's behavior, they can possess noisy observations and, sometimes, irrelevant information that does not add to the inference process to assess the stress levels of the animal. As a first step in AI-based livestock monitoring, we worked on two aspects of multimodal livestock activity monitoring and evaluated them on publicly available datasets. Specifically, we explored how time series data, such as sensor data (ECG, EEG, IMU, etc.) is temporally structured and exploit these structures to learn robust representations using deep learning frameworks. Additionally, we also explored how individual behaviors evolve when present in a social dynamic. Specifically, we aim to understand livestock behavior changes as they move and interact with other individuals around them in a group setting. This will allow us to model their stress levels as a function of social experience. We briefly describe the two works below. In the first work, we worked on the problem of Social Activity Recognition (SAR), a critical component in real-world tasks like livestock activity surveillance and stress monitoring. This work is currently under review at an international conference focused on machine learning and pattern recognition and provides one of the first approaches to tackle SAR in an unsupervised manner and from streaming videos, i.e., without any labeled data and without storing the data locally or in the cloud. Unlike traditional event understanding approaches, SAR necessitates modeling individual actors' appearance and motions and contextualizing them within their social interactions. Traditional action localization methods fall short due to their single-actor, single-action assumption. Previous SAR research has relied heavily on densely annotated data, but privacy concerns limit their applicability in real-world settings. In this work, we propose a self-supervised approach based on multi-actor predictive learning for SAR in streaming videos. Using a visual-semantic graph structure, we model social interactions, enabling relational reasoning for robust performance with minimal labeled data. In this work, we make three specific contributions towards action understanding from videos. First, we are the first to tackle the problem of self-supervised social activity detection in streaming videos. Second, we show that relational reasoning over the proposed visual-semantic graph structure by spatial and temporal graph smoothing can help learn the social structure of cluttered scenes in a self-supervised manner requiring only a single pass through the training data to achieve robust performance. Third, we show that the framework can generalize to arbitrary action localization without bells and whistles to achieve competitive performance on publicly available benchmarks. The proposed framework achieves competitive performance on standard human group activity recognition benchmarks. Evaluation of three publicly available human action localization benchmarks demonstrates its generalizability to arbitrary action localization. In the second work, we tackle the problem of time series classification, the task of categorizing sequential data. This work is also under review at an international conference focused on machine learning and pattern recognition. Analyzing sequential data is crucial in making actionable outcomes based on the data collected from the Internet of Things paradigm. Machine learning approaches demonstrate remarkable performance on public benchmark datasets. However, progress has primarily been in designing architectures for learning representations from raw data at fixed (or ideal) time scales, which can fail to generalize to longer sequences. This work introduces a compositional representation learning approach trained on statistically coherent components extracted from sequential data. Based on a multi-scale change space, an unsupervised approach is proposed to segment the sequential data into chunks with similar statistical properties. A sequence-based encoder model is trained in a multi-task setting to learn compositional representations from these temporal components for time series classification. We make four specific contributions to multimodal understanding in this work. First, we are, to the best of our knowledge, to introduce a multi-scale change space for time series data to segment them into statistically atomic components. Second, we introduce the notion of compositional feature learning from temporally segmented components in time series data rather than modeling the raw data points. Third, we show that the temporal components detected by the algorithm are highly correlated with natural boundaries in time series data by evaluating it on the time series segmentation task, achieving state-of-the-art performance compared with other non-learning-based approaches. Finally, we establish a competitive baseline that provides competitive performance with state-of-the-art approaches on benchmark datasets for both time series classification and segmentation with limited training needs and without explicit handcrafting. We demonstrate its effectiveness through extensive experiments on publicly available time series classification benchmarks. Evaluating the coherence of segmented components shows its competitive performance on the unsupervised segmentation task.
Publications
|