agentic learning
ai lab
app
Learning from Visual Experience

Visual learning has made significant progress with single-subject, iconic images, but learning useful representations from long-form egocentric videos remains challenging. These videos provide a naturalistic, embodied sensory experience, offering spatiotemporal grounding in the real world. Leveraging multimodality, object motion, temporal structure and consistency can improve performance and data efficiency, yet learning is also hindered by constant and continual distribution shifts. To address these challenges, we have developed methods for incremental recognition in open-world environments, unsupervised continual representation learning, and video representation learning. Our vision is to build efficient and adaptable learning algorithms for on-device visual learning from streaming embodied experiences, with applications in downstream tasks such as planning and visual assistance, where the learned representations are broadly useful.

Research Works in the Area

design

Replay Can Provably Increase Forgetting

We provide a theoretical analysis of sample replay in over-parameterized continual linear regression, and we show that replay can provably increase forgetting in the worst case even though the network has the capacity to memorize all tasks.

Published: 2025-06-04

Learn more
design

Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos

Memory Storyboard groups recent past frames into temporal segments and provides effective summarization of the past visual streams for memory replay.

Published: 2025-01-21

Learn more
design

PooDLe: Pooled and Dense Self-Supervised Learning from Naturalistic Videos

We propose PooDLe, a self-supervised learning method that combines an invariance-based objective on pooled representations with a dense SSL objective that enforces equivariance to optical flow warping.

Published: 2024-08-20

Learn more
design

Integrating Present and Past in Unsupervised Continual Learning

We formulate Osiris, a unifying framework for unsupervised continual learning (UCL), which disentangles learning objectives that encompass stability, plasticity, and cross-task consolidation.

Published: 2024-04-29

Learn more
design

Self-Supervised Learning of Video Representations from a Child's Perspective

We train self-supervised video models on longitudinal, egocentric headcam recordings collected from a child over a two year period in their early development.

Published: 2024-02-01

Learn more
design

LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

LifelongMemory is a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval.

Published: 2023-12-07

Learn more