Bringing Multi-Modal LLMs to Autonomous Driving
Автор: Ghost Autonomy
Загружено: 2023-11-08
Просмотров: 4958
Large Language Models (LLMs) are continually advancing their capabilities and expanding into new applications on a near-daily basis, disrupting the existing computing architecture across various industries. At Ghost we believe that LLMs will have a profound impact on the autonomy software stack, and the addition of multi-modal capabilities to LLMs (accepting image and video inputs alongside text) only accelerates their applicability to the autonomy use case.
Multi-modal LLMs (MLLMs) have the potential power to reason about driving scenes holistically, combining perception and planning to provide autonomous vehicles deeper scene understanding and guidance on the correct driving maneuver by considering the scene in totality. In this video we share some examples of out-of-the box commercial MLLMs analyzing driving scenes to provide scene understanding and maneuver guidance.
While traditional in-car AI systems can identify objects they have been trained on or read signs with OCR, MLLMs have the ability to go beyond detections and reason on appropriate actions or outcomes from the totality of information in the scene.
Ghost’s autonomy platform is evolving quickly to include capabilities powered by large multi-modal vision-language models. If you are an automaker who would like to explore adding MLLM-based reasoning and intelligence to your ADAS or AV system, we’d love to collaborate.
Disclaimer: Ghost Autonomy’s MLLM-based capabilities are currently in development. These video and image examples show MLLM-based analysis of driving scenes captured from Ghost vehicles driving in both autonomous and conventional mode. MLLM-based reasoning is not yet being returned to the car to impact actual driving maneuvers.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: