AI Visual Assistant: Build Multimodal (Image + Text) App with Python & Gemini 2.0 Flash Model
Автор: Sandip's Technology Channel
Загружено: 25 мар. 2025 г.
Просмотров: 567 просмотров
In this project, an AI Visual Assistant Multimodal (Image + Text) App has been built with Python (PIL, pyautogui, pygetwindow, streamlit library etc.) and Google Gemini 2.0 Flash Model (with free API Key). pyautogui is a Python library that allows us to automate mouse and keyboard actions on our computer. In this Application we are using it for taking Screenshot image. pygetwindow is a Python module, used to interact with and manage application windows on our computer. It allows us to automate window management and help us list, manipulate, and resize active windows programmatically. In the App, user can either upload an image or he/she can take a screenshot of any window on their system automatically by clicking on the "Capture Screenshot Image" button. They have to just make sure that window was the last visited window before clicking on that button. As soon as, image is uploaded or screenshot image is taken, it will be displayed to the user on the App, Then user can write a query about the image in the Query text box and then they just need to click on the "Analyze Image" button. Our AI Visual Assistant will analyze the image content with the help of Google Gemini 2.0 Flash Model (Multimodal LLM) and give the answer of the query.
GitHub Link: https://github.com/dharsandip/ai_visu... LinkedIn: / sandip-dhar-40145546 #multimodalai, #gemini2, #pyautogui, #streamlitlibrary, #aiapplication, #gemini2flashmodel, #multimodal, #googlegeminimodel, #python, #automationwithpython

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: