Vibe Check

Vibe Check: Opus 4.8—Anthropic Should’ve Rounded Up to 5

Opus 4.8 tops both our Senior Engineer benchmark and our writing tests. It’s the most complete model we’ve tested. We just wish it had an app to match.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Anthropic is back.

After a year of riding Claude Code into the rest of knowledge work, the lab hit a rough patch: Opus 4.7 was hard to love, and OpenAI’s Codex desktop app pulled even devoted Claude users from our team to GPT models. Opus 4.8, out today, has us running back—for the model, if not the app around it. It tops our Senior Engineer Benchmark and our writing tests at once, and it’s the first Anthropic release in a year we’d reach for across coding, prose, and everyday work.

The big insights from our testing:

Best on senior-engineer coding. At extra-high effort, Opus 4.8 scored 63 on our Senior Engineer Benchmark, versus 62 for GPT-5.5 and 33.5 for Opus 4.7. At lower effort settings, the score drops significantly.
The strongest writing model we’ve tested. Opus 4.8 at high effort scored 79.6, ahead of Sonnet 4.6 (74.5), GPT-5.5 (73), and Opus 4.7 (63), with fewer AI tells than any non-Claude model.
Best one-shot PowerPoint we’ve seen. On our Every Consulting Benchmark, Opus 4.8 produced a well-designed deck that told a clear story—something most models still can’t do.
The model is stronger than the app. Opus 4.8 is good enough to make us want to live in Claude, but the split between Chat, Code, and Cowork keeps Codex as the better daily harness.

The full Vibe Check has the benchmark results, Reach Test ratings, pricing, screenshots, and advice on when to reach for Opus 4.8 versus GPT-5.5.

Read the full Vibe Check

You’ve been meaning to start outbound for six months.

You know who to reach. You’ve got the messaging in your head. You just don’t have the hours to build the list, find the right contacts, write the sequences, and do it again next week. And the week after that. Lightfield’s outbound agents run on your CRM. They score accounts against your real won deals. Draft sequences from the language your customers actually use. Surface warm intro paths from your team’s network. You see every campaign before it ships.

You set the strategy. The agents build the list, run the sequences, and escalate the replies that need you.

Learn more

Want to sponsor Every? Click here.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn.

Katie Parrott is a staff writer at Every. You can read more of her work in her newsletter.

To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Discover Every’s upcoming workshops and camps, and access recordings from past events.

For sponsorship opportunities, reach out to sponsorships@every.to.

What did you think of this post?

Amazing Good Meh Bad

Get More Out Of Your Subscription

Try our AI tools for ultimate productivity

Front-row access to the future of AI

In-depth reviews of new models on release day

Playbooks and guides for putting AI to work

Prompts and use cases for builders

Bundle of AI software

Sparkle: Organize your Mac with AI

Cora: The most human way to do email

Spiral: Repurpose your content endlessly

Monologue: Effortless voice dictation for your Mac