Ishan Misra: General purpose visual recognition across modalities with limited supervision
Автор: Learning with Limited and Imperfect Data
Загружено: 2022-10-19
Просмотров: 881
Abstract: Modern computer vision models are good at specialized tasks. Given the right architecture, right supervision, supervised learning can yield great specialist models. However, specialist models also have severe limitations — they can only do what they are trained for and require copious amounts of pristine supervision for it. In this talk, I’ll focus on two limitations: specialist models cannot work on tasks beyond what they saw training labels for, or on new types of visual data. I’ll present our recent efforts that design better architectures, training paradigms and loss functions to address these issues.
Our first work, called Omnivore, presents a single model that can operate on images, videos, and single-view 3D data. Omnivore leads to shared representations across visual modalities, without using paired input data. Omnivore can also be trained in a self-supervised manner. I’ll conclude the talk with Detic, a simple way to train large-vocabulary detectors using image-level labs which leads to a 20,000+ class detector.
Bio: Ishan Misra is a Research Scientist at FAIR, Meta AI where he works on computer vision. His interests are primarily in learning visual representations with limited supervision - using self-supervised, and weakly supervised learning. For his work in self-supervised learning, Ishan was features in MIT Tech Review’s list of 35 innovators under 35 compiled globally across all areas of technology. You can hear about his work at length on Lex Fridman’s podcast.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: