We put GPT-5.5 to the test!

Two stories dominated this week. Anthropic and SpaceX announced a compute deal - the company Musk once said "hates Western civilisation" is now renting his servers. Don’t laugh.

SoftBank's earnings revealed that 98% of its Vision Fund gains came from a single bet: OpenAI, now valued at $852 billion. Nice for those who can invest in private companies...

-MV

What’s in store:

GPT-5.5 put to the test!
Your Weekly AI Roundup!
Poll of the week: What did AI ruin for you?
Get 22 agents you can deploy in ChatGPT right now!

Read Time: 5 minutes

AI TESTING

GPT-5.5 is great - but can it live up to Claude?

OpenAI recently shipped GPT-5.5, and the honest verdict after a day of real-world testing is: this is a meaningful upgrade, not a marketing increment. The gap between 5.3, 5.4, and now 5.5 is noticeable in a way that the previous iterations simply weren't.

But "meaningfully better than its predecessor" and "best model available" are two different claims, and only one of them holds up.

The agentic shift is real

The most important thing about 5.5 isn't any single capability. It's the orientation of the model. OpenAI is explicitly positioning this as an agentic workhorse: something designed to move across tasks, connect to tools, and keep working until a job is done rather than handing the baton back to you after every response.

The workflow agents platform that shipped alongside 5.5 is the clearest expression of this. These aren't glorified chat prompts. They're sequences of actions the model orchestrates on your behalf, pulling in data, operating software, and producing outputs without you babysitting the process.

The 5.5 model powering this is noticeably better at holding complex instructions across multiple steps, which is the actual bottleneck for agentic work.

For coding specifically, the thinking mode settings matter more than most reviews bother to explain. Standard thinking gets you fast results that sometimes miss the mark on complex builds.

Switching to extended thinking takes longer but fixes problems the model would otherwise get stuck on. If you're using 5.5 for anything non-trivial and you're on a Pro, Business, or Enterprise plan, extended thinking should be your default, not a fallback.

What it actually does well

In practical terms, the model is excellent at turning rough, disorganized inputs into polished outputs quickly. Feed it messy raw data and ask for an executive dashboard, and you get something genuinely presentable back.

Ask it to handle a stack of tasks from a single prompt and it'll work through a surprising number of them in sequence without needing hand-holding.

The design sensibility in its HTML and code outputs has also improved. There are still recurring layout quirks, particularly around spacing and text density in more complex UIs, but the baseline aesthetic is cleaner and more considered than previous versions.

For someone who needs a functional prototype or a shareable dashboard without a dedicated front-end developer, 5.5 is a serious tool.

Where it hits a ceiling is on the kind of complex, multi-file knowledge work where you need outputs in proper formats rather than everything wrapped inside a single HTML file.

Ask 5.5 to spin up a complete business-in-a-box and you'll get landing pages, financial projections, and slideshows that are genuinely impressive from one prompt. But they'll all arrive as HTML artefacts rather than actual PowerPoints, spreadsheets, or PDFs.

Claude Cowork handles that kind of structured file output better, and if your workflow depends on deliverables that need to live in specific formats, that gap matters.

Where the honest comparison lands

For most everyday AI tasks, 5.5 is absolutely capable of being your primary tool. The improvements in agentic capability, instruction-following, and design quality are real. If ChatGPT is already your daily driver, this update is worth taking seriously rather than dismissing as incremental.

For coding, Claude is still the more reliable first choice overall, even though OpenAI's Codex is closing the gap fast enough that a comparison in a few months could look different.

For deep knowledge work and complex file-based outputs, Claude Cowork still has the edge. And 5.5 is more expensive per token than 5.4, though OpenAI claims improved token efficiency offsets the price increase in practice. That's worth watching rather than taking on faith.

The most useful framing is this: 5.5 is the best version of ChatGPT that has ever shipped, and for agentic, multi-step work in particular, it's a genuine step toward AI that does things rather than just says things. That's worth paying attention to.

Thanks to the team at Futurepedia for the breakdown!

If you want to get the most out of GPT-5.5, get 22 agents you can deploy instantly!

Get The 22 Agents →

POLL OF THE WEEK

What did AI ruin for you?

Googling (too slow now)

Reading full articles (just summarise it)

My own writing voice (is this even me anymore)

Other - tell us!

Full results will be at the bottom of tomorrow’s newsletter!

THIS WEEK IN AI

Old enemies, new deals, and AirPods that are watching you now

Anthropic shook hands with Elon Musk. Apple put cameras in your ears. And Princeton lasted 133 years on the honour system before a browser tab ended the whole experiment. Genuinely unhinged week.

Here's what mattered:

Anthropic struck a compute deal with SpaceXAI, giving Claude users access to Musk's Memphis supercomputer — months after he called the company "misanthropic" and meant it as an insult.
Apple is finalising camera-equipped AirPods that let Siri see your surroundings in real time, suggest recipes from your fridge contents, and quietly raise every privacy concern you forgot to have about earbuds.
OpenAI launched three new voice API tools including GPT-Realtime-2, a live translation model supporting 70+ languages, and real-time transcription — officially making your next holiday less terrifying.
Researchers at EPFL developed Synthegy, an AI framework that lets chemists guide molecule-planning tools in plain English, matching expert judgement 71% of the time across nearly 400 evaluations.
SoftBank is expected to report a $1.5 billion quarterly profit largely thanks to its OpenAI stake — now worth roughly $80 billion — while analysts quietly compare the risk profile to WeWork.
AI and smart sensors are transforming woodworking, with blade-stopping safety tech, dust-free workshops, and robot microfactories that can build timber panels for a full home in a single day.
WhatsApp rolled out a private mode for Meta AI chats where nothing is stored or monitored — giving users medical and financial conversations with no paper trail, and cybersecurity experts one new headache.
Princeton voted to bring back supervised exams for the first time since 1893, after AI and open tabs made the 133-year-old honour system quietly impossible to enforce.
The UK Government's Sovereign AI Fund invested in Isomorphic Labs, the London-based drug discovery company founded by DeepMind's Demis Hassabis and built on the AlphaFold breakthrough.
Anthropic launched Claude for Small Business, connecting Claude directly to QuickBooks, PayPal, HubSpot, Canva, and more — with 15 ready-made workflows for the admin that usually piles up after hours.

AI Art

Our Image of the Week

Our favourite image this week was submitted by Dana: “A LEGO grim reaper”

Daily Image Prompt

Melancholic portrait of a human warrior from Lord of the Rings universe

Submit your artwork to Mindstream →

#1082
When will it end? Never!

Stop scrolling! Give us your feedback:

I loved it today. 😍 | It was okay today. 😐 | It wasn't good today. 🤬

Business as unusual: The Hustle

Grow your business: Starter Story

Expert insights: Marketing Against the Grain

Sell better: The Science of Scaling

View our Privacy Policy.

Update your email preferences or unsubscribe here

2 Canal Park
Boston, Massachusetts MA 02141, United States