ChatGPT 4.1 vs Gemini 2.5—analysis both on test results and actual usage
Автор: AI News & Strategy Daily | Nate B Jones
Загружено: 15 апр. 2025 г.
Просмотров: 9 367 просмотров
OpenAI model report: https://openai.com/index/gpt-4-1/
My site: https://natebjones.com/
My links: https://linktr.ee/natebjones
My substack: https://natesnewsletter.substack.com/
Takeaways
1. 4.1 Isn’t State-of-the-Art: Despite the name and rollout, GPT-4.1 doesn’t beat the current leaders. Gemini 2.5 scores 64% on SWE-Bench, while GPT-4.1 sits at 55%. It’s OpenAI’s best public coding model to date—but still behind.
2. 4.5 Quietly Deprecated: OpenAI announced they’re sunsetting GPT-4.5, stating 4.1 is better. But 4.5 was only recently introduced as a research preview, and now it’s gone—an unusual and somewhat confusing move.
3. ChatGPT vs. API Gap: OpenAI continues to ship stronger models inside ChatGPT (like “Deep Research”) that are not available in the API. This creates a frustrating mismatch for developers trying to build with parity.
4. OpenAI’s Ecosystem Strategy Is Narrowing: By keeping their best models in the app, OpenAI’s ecosystem strategy leans more toward consumer lock-in than enabling external infrastructure and innovation.
5. Gemini 2.5 Feels More Polished: In side-by-side usage, Gemini 2.5 isn’t just better on benchmarks—it feels better in practice. It’s more confident, cleaner, and gives better experience in tools like IDEs.
6. 4.1 Is a Patch, Not a Leap: It’s a necessary release—4.0 wasn’t holding up, especially for devs. But this isn’t a game-changer. It fixes obvious flaws. It doesn’t push things forward meaningfully.
7. The Bigger Picture Is Competitive Pressure: For once, OpenAI isn’t leading. Gemini is ahead on core dev metrics, and Claude 3.5 is in the wings. The pressure is real, and 4.1 doesn’t change that.
Quotes:
“We are used to thinking of OpenAI as always releasing a state-of-the-art model. It’s not.”
“It feels really weird to me to release 4.5 as a research preview and then yank it back.”
“4.1 was probably a necessary release, but it’s not a sufficient release.”
Summary:
GPT-4.1 dropped, and while it brings some improvements, it’s not a state-of-the-art model. It performs better than GPT-4.0, but still trails behind Gemini 2.5, which leads on SWE-Bench and feels better in real-world usage. OpenAI has deprecated GPT-4.5, citing 4.1’s improvements, but continues to withhold some of its best models—like Deep Research—from API access. This raises questions about the company’s broader strategy and its commitment to supporting the developer ecosystem. 4.1 is a patch, not a breakthrough, and the competitive gap is starting to show.
Keywords:
OpenAI, GPT-4.1, GPT-4.5, Gemini 2.5, SWE-Bench, Deep Research, ChatGPT, API access, developer tools, artificial intelligence, AI ecosystem, Google, Claude 3, model deprecation, coding benchmarks, IDE integration, agent infrastructure, developer frustration, AI release strategy, language models, competition in AI

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: