Dear friends,
I’m thrilled to announce my latest course: Agentic AI! This course will get you up to speed building cutting-edge agentic workflows. It is available from DeepLearning.AI here. The only prerequisite is familiarity with Python, though knowing a bit about LLMs helps too.
More important, you’ll also learn best practices for building effective agents.
Having worked with many teams on many agents, I’ve found that the single biggest predictor of whether someone can build effectively is whether they know how to drive a disciplined process for evals and error analysis. Teams that don’t know how to do this can spend months tweaking agents with little progress to show for it. I’ve seen teams that spent months tuning prompts, building tools for an agent to use, etc., only to hit a performance ceiling they could not break through.
But if you understand how to put in evals and how to monitor an agent’s actions at each step (traces) to see when part of its workflow is breaking, you’ll be able to efficiently home in on which components to focus on improving. Instead of guessing what to work on, you'll let evals data guide you.
Keep building, Andrew
A MESSAGE FROM DEEPLEARNING.AIAI Dev 25, hosted by Andrew Ng and DeepLearning.AI, heads to New York City! On November 14, join 1,200+ AI developers for a day full of technical keynotes, hands-on workshops, live demos, and a new Fintech Track. Secure your tickets here!
News
Claude Levels Up
Anthropic updated its mid-size Claude Sonnet model, making it the first member of the Claude family to reach version 4.5. It also enhanced the Claude Code agentic coding tool with long-desired features.
Claude Sonnet 4.5: The new model offers a substantial increase in performance as well as a variable budget for reasoning tokens.
Results: In Anthropic’s tests, Claude Sonnet 4.5’s coding metrics stood out, but it performed well on broader assessments, too.
Claude Code: Anthropic’s agentic coding tool got a design overhaul that adds a number of fresh capabilities. Notably, it comes with a software development kit — based on the same software infrastructure, toolkit, orchestration logic, and memory management that underpins Claude Code — for building other agentic tools.
Behind the news: Founded by ex-Open AI employees, Anthropic markets itself as an alternative to that company: safer, more humane, and more tasteful. Although it hasn’t stopped touting those values, the emphases have grown simpler: coding and workplace productivity. While ChatGPT may be synonymous with AI among consumers, Anthropic is focusing on software developers and businesses.
Why it matters: The coupling of Claude Sonnet 4.5 with the enhanced Claude Code reflects Anthropic’s emphasis on workplace productivity. This focus speaks to some of the business world’s anxieties: When will AI pay off for my workforce? When will it transform what they do? For now, coding (via Claude Code or a competitor) is one obvious answer.
We’re thinking: The Claude Agent SDK is a significant release that will enable many developers to build powerful agentic apps. We look forward to an explosion of Claude-based progeny!
OpenAI, Meta Diversify AI Product Lines
OpenAI and Meta, which have been content to offer standalone chatbots or tuck them into existing products, introduced dueling social video networks and other initiatives designed to boost revenue and engagement.
What’s new: OpenAI’s Sora 2 is a TikTok-style app that lets users share 10-second clips, while Meta’s Vibes enables Facebook users to generate new videos or remix existing ones. In addition, OpenAI launched ChatGPT Pulse, which creates personal briefings based on recent chats and data from connected apps like calendars, and Instant Checkout, which allows ChatGPT users to shop as they chat.
How it works: The new initiatives take advantage of existing AI capabilities to boost engagement and bring in revenue.
Behind the news: For revenue, OpenAI so far has relied on chatbot subscriptions, which account for roughly 80 percent. However, only a tiny fraction of ChatGPT’s 700 million weekly active users subscribe. Tactics such as imposing rate limits persuade some to sign up, but personal productivity, shopping commissions, and advertising offer ways to earn money from the rest.
Why it matters: Products based on generative AI are already well established, but they’re still in their infancy, and an infinite variety of AI-powered consumer products and services remains to be invented. OpenAI’s ChatGPT Pulse is a genuinely fresh idea, using agentic capabilities to deliver timely, personalized information and perspective in any domain. Both OpenAI and Facebook are experimenting with social video, giving users new ways to entertain friends and express themselves. And, of course, melding large language models with digital commerce may come to feel natural as people increasingly turn to chatbots for purchasing advice.
We’re thinking: The financial success of such AI-driven products is bound to have a powerful impact on future directions of AI research and development.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered DeepSeek’s new sparse attention model that cuts long-context inference costs by up to 50% and OpenAI’s launch of third-party apps inside ChatGPT with its new Apps SDK. Subscribe today!
Qwen3 Goes Big (and Smaller)
Alibaba rounded out the Qwen3 family with its biggest large language model to date as well as smaller models that process text, images, video, and/or audio.
What’s new: The closed-weights Qwen3-Max gives Alibaba a foothold among the biggest large language models. Qwen3-VL-235B-A22B is an open-weights model that processes text, images, and video at the top of its size class and beyond. Qwen3-Omni, also open-weights, adds audio to the mix with outstanding results.
Qwen3-Max encompasses 1 trillion parameters trained on 36 trillion tokens. It’s available in base and instruction-tuned versions, with a reasoning version to come. Like Alibaba’s other Max models (but unlike most of the Qwen series), its weights are not available.
Qwen3-VL-235B-A22B, a vision-language variant of Qwen3-235B-A22B, is designed to drive agentic interactions that require understanding of images and videos. It comes in base, instruction-tuned, and reasoning versions.
Qwen3-Omni-30B-A3B was pretrained on text, images, video, and audio, so it translates between them directly. It comes in instruction-tuned and reasoning versions as well as a specialized audio/video captioner model.
Behind the news: Alibaba recently released Qwen3-Next, which accelerates performance by alternating attention and Gated DeltaNet layers. The new models don’t use this architecture, but it remains a potential path for future models in the Qwen family.
Why it matters: While Qwen3-Max falls short of competitors, the new open-weights multimodal models offer opportunities for developers. Qwen3-VL-235B-A22B offers low cost, versatility, and customizability, and Qwen3-Omni-30B-A3B provides a welcome option for voice applications. Alibaba has been a consistent, versatile experimenter that has put open releases first, and its new releases cover a wide range of needs.
We’re thinking: We love to see open-weights models turning in world-beating results! With their prowess in multimedia understanding, reasoning, and tool use, Qwen3-VL and Qwen3-Omni put a wide range of agentic applications within reach of all developers.
LoRA Adapters On Tap
The approach known as LoRA streamlines fine-tuning by training a small adapter that modifies a pretrained model’s weights at inference. Researchers built a model that generates such adapters directly.
What’s new: Rujikorn Charakorn and colleagues at the Tokyo-based startup Sakana AI introduced Text-to-LoRA, a model that produces task-specific LoRA adapters based on natural language descriptions of tasks to be performed by a separate large language model.
Key insight: Typically, a LoRA adapter is trained for a particular task. However, a model can learn, given a description of a task, to generate a suitable adapter for tasks it may not have encountered in its training.
How it works: The authors trained a vanilla neural network, given text that describes a task, to produce a task-specific LoRA adapter for the large language model Mistral-7B-Instruct.
Results: The authors evaluated Mistral-7B-Instruct with Text-to-LoRA on 10 reasoning benchmarks (such as BoolQ, Hellaswag, and WinoGrande). They compared the results to Mistral-7B-Instruct (i) with conventional task-specific adapters, (ii) with a single adapter trained on all 479 training tasks simultaneously, (iii) unadapted but with the task description prepended to the prompt, and (iv) unadapted but with a plain prompt.
Why it matters: The demands placed on a model often change over time, and training new LoRA adapters to match is cumbersome. In effect, Text-to-LoRA compresses a library of LoRA adapters into a parameter-efficient hypernetwork that generalizes to arbitrary tasks. Because it generates them based on text descriptions, different descriptive phrasing can produce different styles of adaptation to emphasize, say, reasoning, format, or other constraints. In this way, Text-to-LoRA makes it easy, quick, and inexpensive to produce new adapters for idiosyncratic or shifting tasks.
We’re thinking: Training LoRA adaptors typically involves a tradeoff between specialization and generalization, and ensembles or mixtures of adapters can improve generalization. This approach offers an efficient, low-cost way to produce LoRA ensembles, which typically are expensive to train and maintain.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|