Multimodal AI - Video and temporal understanding

Apr

Multimodal AI - Video and temporal understanding

General Event,
Academics & Research,
Advising, Teaching, Learning

Vision-language models can process video and image series. Additionally, they are trained to ground their response in time. This allows them to indicate when a particular event or shift occurs in the film. This session will explore the use of vision-language models for analyzing and interpreting moving images.

Image: Elise Racine & Digit / Woven Dialogues / Licensed by CC-BY 4.0

Event Details

Date

April 15, 2026

Time

3:00 p.m.

Location

Commons Library Classroom (D112), Commons Library

Audience

Faculty & Academic Professionals,
Staff,
Students

Event details

Multimodal AI - Video and temporal understanding

Date

Time

Location

Audience