UW ECE Research Colloquium: June 5, 2025 - Alexander Waibel, Carnegie Mellon University

Автор: UWECEmedia

Загружено: 2025-06-06

Просмотров: 141

Описание:

AI, back when we were not allowed to be deep, neural, or learning

Abstract

Motivated by a dream of building systems that could interpret speech in any language and provide international communication, I started my career during a time, when human interfaces, speech, language and vision processing were primitive, rule-based and heuristic. Programming linguistic rules into machines was fun and adequate when building the first successful speech synthesis systems (my first “AI” project as a student at MIT in 1978). But this approach to AI proved to be woefully inadequate to handle the ambiguities in a real world that would be needed for successful machine perception, translation and other important AI tasks. How could we possibly encode all the facts and knowledge in the world by introspection and programming? At Carnegie Mellon (where I did my PhD on Speech Recognition), I was naturally drawn toward early machine learning. Perceptrons, HMM’s, stochastic models and other methods offered solutions, but were still static classifiers and had to be trained on carefully labeled data and fed explicit knowledge. Neural Nets, and particularly Backpropagation were simple, yet could learn complex non-linear classifiers. They also offered the fascinating ability to develop hidden knowledge as part of their training. But they still were static pattern classifiers and had to be trained on well labeled, pre-segmented data, a requirement that I knew was unrealistic and problematic, as segmentation and sequencing were problems in themselves. To make neural nets practical for speech and vision, we needed independence from segmentation, we needed shift-invariance and sequencing. I set out to develop a shift-invariant neural network while at ATR in Japan, and we called it the Time-Delay Neural Net. It was surprisingly successful: the TDNN turned out to deliver great performance, classified patterns shift-invariantly (without segmentation) and it would also learn as a by-product acoustic-phonetic features that researchers previously attempted to discover by introspection and program laboriously into AI-systems. The first “convolutional neural network” was born.

In 1987, however, despite our early excitement, TDNN’s aka “CNN’s” did not find broad adoption for practical AI. Alternative approaches (e.g. HMM’s) given appropriate tricks and design could offer equivalent performance at much less computational cost, and thus NN’s were broadly derided by the research community during the 1990’s as something akin to a cult. Still, NN’s benefit of learning implicit knowledge automatically and merge it with other such hidden knowledge from other tasks, kept us going and we proposed early NN-based large vocabulary speech recognizers, face recognition and tracking, lipreading, handwriting recognition, multimodal fusion, cross-modal repair, machine translators, and many more.. They led us to develop practical, successful AI systems, and to building more than 10 successful startups.

In this talk, I will review our early neural systems, early insights, and lessons learned for science in a practical world. I will also discuss our current research and way forward.

Bio

Alexander Waibel is Professor of Computer Science at Carnegie Mellon University (USA) and at Karlsruhe Institute of Technology (Germany). He is director of the International Center for Advanced Communication Technologies. Waibel is known for work in AI, Machine Learning, Multimodal Interfaces and Speech Translation Systems. He developed the first consecutive and simultaneous speech translation systems in 1991 and 2005. Waibel proposed early Neural Network learning methods, including the TDNN, the first shift-invariant (“convolutional”) Neural Net (1987) and many multimodal interaction systems. Waibel founded/co-founded more than 10 startups, including Jibbigo, first speech translator on a phone (acquired by Facebook 2013), and Kites, simultaneous translation services (acquired by Zoom 2021). Waibel is a member of the National Academy of Sciences of Germany, a Fellow of the IEEE and of ISCA, and a Research Fellow at Zoom. He holds BS/MS/PhD degrees from MIT and CMU.

UW ECE Research Colloquium: June 5, 2025 - Alexander Waibel, Carnegie Mellon University

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Accelerating Scientific Discovery with AI - lecture by Sir Demis Hassabis

Accelerating Scientific Discovery with AI - lecture by Sir Demis Hassabis

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Lytle Lecture 2025-2026: Anima Anandkumar, Caltech

Lytle Lecture 2025-2026: Anima Anandkumar, Caltech

AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference

AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference

Как производятся микрочипы? 🖥️🛠️ Этапы производства процессоров

Как производятся микрочипы? 🖥️🛠️ Этапы производства процессоров

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

МФТИ — как учат ГЕНИЕВ? Полнометражный фильм

МФТИ — как учат ГЕНИЕВ? Полнометражный фильм

Почему простые числа образуют эти спирали? | Теорема Дирихле и пи-аппроксимации

Почему простые числа образуют эти спирали? | Теорема Дирихле и пи-аппроксимации

Преломление и «замедление» света | По мотивам лекции Ричарда Фейнмана

Преломление и «замедление» света | По мотивам лекции Ричарда Фейнмана

UWEE Research Colloquium: April 5, 2016 - Reid Harrison, Intan Technologies

UWEE Research Colloquium: April 5, 2016 - Reid Harrison, Intan Technologies

Принц Персии: разбираем код гениальной игры, вытирая слезы счастья

Принц Персии: разбираем код гениальной игры, вытирая слезы счастья

Как вылечить БЕЗ операций Близорукость,Дальнозоркость,Астигматизм,Косоглазие.Упражнения проф.Жданова

Как вылечить БЕЗ операций Близорукость,Дальнозоркость,Астигматизм,Косоглазие.Упражнения проф.Жданова

Как LLM могут хранить факты | Глава 7, Глубокое обучение

Как LLM могут хранить факты | Глава 7, Глубокое обучение

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

The Turing Lectures: The future of generative AI

The Turing Lectures: The future of generative AI

UW ECE Research Colloquium: May 29, 2025 - Les Atlas, UW Electrical & Computer Engineering

UW ECE Research Colloquium: May 29, 2025 - Les Atlas, UW Electrical & Computer Engineering

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Ученые переписали историю одомашнивания кошачьих

Ученые переписали историю одомашнивания кошачьих