Optimising Open Source LLM Deployment on Cloud Run
Автор: PracticalGCP
Загружено: 2025-02-20
Просмотров: 541
🚀 Deep Dive: Ollama vs VLLM vs HuggingFace TGI – Performance Comparison for Open-Source LLMs on Google Cloud Run
I’ve just released a follow-up to my first video, “When Cloud Run Meets Deepseek”! This new instalment is a detailed performance comparison of three deployment methods for open-source LLMs: Ollama, VLLM, and HuggingFace TGI. If you’re aiming for speed, concurrency, or cost-efficiency on Google Cloud Run, here’s a closer look!
🔑 Key Insights:
• Why Open-Source LLMs? Enjoy security, flexibility for fine-tuning, and cost control—excellent for enterprise scenarios.
• Why Cloud Run? Take advantage of serverless scaling (from 0 to 1,000 instances!), GPU support in preview, and scale-to-zero to keep costs down.
⚙️ Performance Deep Dive:
• Ollama: Straightforward to deploy and well-suited for moderate concurrency.
• VLLM: Excels at concise outputs, making it ideal for shorter or mid-length responses.
• HuggingFace TGI: Handles 60+ concurrent requests and 2,000+ tokens seamlessly.
✨ Distilled Models (e.g., Deepseek R1-7B): Compact, cost-effective, and surprisingly powerful for niche use cases.
💷 Cost Analysis: Combining Cloud Run with TGI can bring costs down to roughly 2.6p per user-hour at scale.
📈 Future Trends: Distilled models and innovations like NVIDIA’s Project Digits are leading to smaller, more efficient solutions with sharper performance.
⏱️ Jump to Key Sections:
• 01:17 - Why Open-Source LLMs Matter
• 03:04 - Why Cloud Run?
• 05:06 - Ollama vs VLLM vs HuggingFace TGI
• 07:26 - What’s a Distilled Model?
• 10:34 - Ollama Performance
• 12:36 - VLLM Performance
• 15:20 - TGI Performance
• 18:30 - Side-by-Side Comparison
• 22:37 - Cloud Run Cost Breakdown
• 23:46 - Live Demo
• 37:12 - The Future of Open-Source LLMs
👉 Watch the full video for GPU utilisation stats, latency benchmarks, and live demos. If you’re exploring LLM deployments or cloud optimisation, I’d love to hear your insights!
Source Code:
TGI: https://github.com/richardhe-fundamen...
VLLM: https://github.com/richardhe-fundamen...
Ollama: https://github.com/richardhe-fundamen...
#OpenSourceAI #LLM #GoogleCloud #CloudRun #AIOptimisation #TechInsights #MachineLearning #DeepSeek #DeepSeekR1

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: