GLM-4.7 vs Opus 4.5 vs GPT-5.2: One-Shot Build Test (Very Different Results)
Автор: Snapper AI
Загружено: 2026-01-13
Просмотров: 1012
In this video, I run a controlled one-shot AI coding benchmark comparing GLM-4.7, Opus 4.5, and GPT-5.2 on the same F1 Dashboard PRD, using the same Cursor agent environment and identical constraints.
Each model is given a single prompt, no follow-ups, and no human edits.
The builds are evaluated using blind reviews, a structured rubric, and dev-mode validation to understand how each model behaves under the same workflow — not just which one “wins”.
This isn’t about absolute capability.
It’s about model behavior under constraints, and what that means for real-world AI coding workflows.
⏱️ TIMESTAMPS
00:00 Introduction & test goal
00:48 PRD walkthrough & prompt overview
01:53 Test setup, constraints & scoring rubric
03:32 Full score matrix (overall results)
04:22 Tech stack choices & why they matter
05:19 Build review breakdown
07:04 Why Opus and GPT-5.2 score so differently
07:53 Build time & estimated cost comparison
09:05 GLM-4.7 dashboard — dev mode validation
10:02 Opus 4.5 dashboard — dev mode validation
11:11 GPT-5.2 dashboard — dev mode validation
11:45 Key takeaways & workflow optimization
🔍 WHAT THIS TEST SHOWS
◆ How different models define “quality” under identical constraints
◆ Why some models prioritize correctness, while others favor completeness or polish
◆ How tech stack decisions influence review outcomes
◆ Why workflow composition matters more than picking a single “best” model
🧪 IMPORTANT CONTEXT
This benchmark is based on a single-agent Cursor environment with one-shot builds and no iteration.
In different setups — such as Claude Code with extended prompting, task decomposition, or longer refinement loops — these models may behave very differently.
This video is a snapshot under controlled conditions, not a statement about theoretical maximum performance.
💬 WHAT SHOULD I TEST NEXT?
If you have ideas for:
◆ Different constraints (TDD, acceptance tests, iteration loops)
◆ Other models or environments
◆ Specific build types or PRDs
◆ Drop them in the comments!
▶️ WATCH NEXT
→ How the Creator of Claude Code Sets Up His Workflow (Full Setup Tutorial)
• How the Creator of Claude Code Sets Up His...
→ Claude Code Advanced Workflow Tutorial (Slash Commands, Subagents & Hooks)
• How the Creator of Claude Code Uses Slash ...
→ GPT-5.2-Codex vs Opus 4.5 — Tetris Build Test (Specialist vs Generalist)
• GPT-5.2-Codex vs Opus 4.5: Tetris Build Te...
🔔 SUBSCRIBE
Subscribe for real-world AI coding benchmarks, workflow breakdowns, and hands-on tooling reviews.
🌐 Newsletter & free templates: https://snapperai.io
🐦 Updates on X: https://x.com/SnapperAI
🧑💻 Follow me on Github: https://github.com/snapper-ai
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: