Optimising Open Source LLM Deployment on Cloud Run

Автор: PracticalGCP

Загружено: 2025-02-20

Просмотров: 541

Описание:

🚀 Deep Dive: Ollama vs VLLM vs HuggingFace TGI – Performance Comparison for Open-Source LLMs on Google Cloud Run

I’ve just released a follow-up to my first video, “When Cloud Run Meets Deepseek”! This new instalment is a detailed performance comparison of three deployment methods for open-source LLMs: Ollama, VLLM, and HuggingFace TGI. If you’re aiming for speed, concurrency, or cost-efficiency on Google Cloud Run, here’s a closer look!

🔑 Key Insights:
• Why Open-Source LLMs? Enjoy security, flexibility for fine-tuning, and cost control—excellent for enterprise scenarios.
• Why Cloud Run? Take advantage of serverless scaling (from 0 to 1,000 instances!), GPU support in preview, and scale-to-zero to keep costs down.

⚙️ Performance Deep Dive:
• Ollama: Straightforward to deploy and well-suited for moderate concurrency.
• VLLM: Excels at concise outputs, making it ideal for shorter or mid-length responses.
• HuggingFace TGI: Handles 60+ concurrent requests and 2,000+ tokens seamlessly.

✨ Distilled Models (e.g., Deepseek R1-7B): Compact, cost-effective, and surprisingly powerful for niche use cases.

💷 Cost Analysis: Combining Cloud Run with TGI can bring costs down to roughly 2.6p per user-hour at scale.

📈 Future Trends: Distilled models and innovations like NVIDIA’s Project Digits are leading to smaller, more efficient solutions with sharper performance.

⏱️ Jump to Key Sections:
• 01:17 - Why Open-Source LLMs Matter
• 03:04 - Why Cloud Run?
• 05:06 - Ollama vs VLLM vs HuggingFace TGI
• 07:26 - What’s a Distilled Model?
• 10:34 - Ollama Performance
• 12:36 - VLLM Performance
• 15:20 - TGI Performance
• 18:30 - Side-by-Side Comparison
• 22:37 - Cloud Run Cost Breakdown
• 23:46 - Live Demo
• 37:12 - The Future of Open-Source LLMs

👉 Watch the full video for GPU utilisation stats, latency benchmarks, and live demos. If you’re exploring LLM deployments or cloud optimisation, I’d love to hear your insights!

Source Code:
TGI: https://github.com/richardhe-fundamen...
VLLM: https://github.com/richardhe-fundamen...
Ollama: https://github.com/richardhe-fundamen...

#OpenSourceAI #LLM #GoogleCloud #CloudRun #AIOptimisation #TechInsights #MachineLearning #DeepSeek #DeepSeekR1

Optimising Open Source LLM Deployment on Cloud Run

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(10) { [0]=> object(stdClass)#6056 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "7H6fJVf79o0" ["related_video_title"]=> string(29) "When Cloud Run Meets Deepseek" ["posted_time"]=> string(25) "4 месяца назад" ["channelName"]=> string(12) "PracticalGCP" } [1]=> object(stdClass)#6029 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "B1ULvYY-0Uo" ["related_video_title"]=> string(124) "Закон сохранения энергии — величайшее заблуждение физики [Veritasium]" ["posted_time"]=> string(21) "1 день назад" ["channelName"]=> string(10) "Vert Dider" } [2]=> object(stdClass)#6054 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "XCEggqu1WqI" ["related_video_title"]=> string(58) "Use BigQuery Transfer Service to Optimise a File Based CDC" ["posted_time"]=> string(27) "8 месяцев назад" ["channelName"]=> string(12) "PracticalGCP" } [3]=> object(stdClass)#6061 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "fOkVdbNDCEM" ["related_video_title"]=> string(75) "ОН ТОЧНО ЛУЧШЕ WRANGLER, НО ЕСТЬ НЮАНС. FORD BRONCO." ["posted_time"]=> string(21) "1 день назад" ["channelName"]=> string(8) "AcademeG" } [4]=> object(stdClass)#6040 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "X_KvkcZD6Co" ["related_video_title"]=> string(67) "Josiah Parry - Arrow, Rust, and cross-language data science tooling" ["posted_time"]=> string(25) "2 недели назад" ["channelName"]=> string(28) "Scientific Computing in Rust" } [5]=> object(stdClass)#6058 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "95Mkwbsk2HQ" ["related_video_title"]=> string(79) "Можно ли поменять родину так быстро? / вДудь" ["posted_time"]=> string(21) "1 день назад" ["channelName"]=> string(10) "вДудь" } [6]=> object(stdClass)#6053 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "CKS2HBi2g_E" ["related_video_title"]=> string(173) "[ НОВЫЙ 2025 ] Уральские Пельмени -Смейтесь без остановки вместе с комедийной группой №1 в России!" ["posted_time"]=> string(21) "1 день назад" ["channelName"]=> string(23) "কেমনে কি?" } [7]=> object(stdClass)#6063 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "vKP6C83lZN0" ["related_video_title"]=> string(56) "How effective is History-based Optimisations on BigQuery" ["posted_time"]=> string(27) "6 месяцев назад" ["channelName"]=> string(12) "PracticalGCP" } [8]=> object(stdClass)#6039 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "hFcEVM0moYU" ["related_video_title"]=> string(58) "Как Путин видит окончание войны" ["posted_time"]=> string(22) "22 часа назад" ["channelName"]=> string(27) "Анатолий Шарий" } [9]=> object(stdClass)#6057 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "OGRNIXZQhd0" ["related_video_title"]=> string(25) "DBT Core on Cloud Run Job" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(12) "PracticalGCP" } }

When Cloud Run Meets Deepseek

When Cloud Run Meets Deepseek

Закон сохранения энергии — величайшее заблуждение физики [Veritasium]

Закон сохранения энергии — величайшее заблуждение физики [Veritasium]

Use BigQuery Transfer Service to Optimise a File Based CDC

Use BigQuery Transfer Service to Optimise a File Based CDC

ОН ТОЧНО ЛУЧШЕ WRANGLER, НО ЕСТЬ НЮАНС. FORD BRONCO.

ОН ТОЧНО ЛУЧШЕ WRANGLER, НО ЕСТЬ НЮАНС. FORD BRONCO.

Josiah Parry - Arrow, Rust, and cross-language data science tooling

Josiah Parry - Arrow, Rust, and cross-language data science tooling

Можно ли поменять родину так быстро? / вДудь

Можно ли поменять родину так быстро? / вДудь

[ НОВЫЙ 2025 ] Уральские Пельмени -Смейтесь без остановки вместе с комедийной группой №1 в России!

[ НОВЫЙ 2025 ] Уральские Пельмени -Смейтесь без остановки вместе с комедийной группой №1 в России!

How effective is History-based Optimisations on BigQuery

How effective is History-based Optimisations on BigQuery

Как Путин видит окончание войны

Как Путин видит окончание войны

DBT Core on Cloud Run Job

DBT Core on Cloud Run Job