Dear friends,

I hope we can empower everyone to build with AI. Starting from K-12, we should teach every student AI enabled coding, since this will enable them to become more productive and more empowered adults. But there is a huge shortage of computer science (CS) teachers. I recently spoke with high school basketball coach Kyle Creasy, who graduated with a B.A. in Physical Education in 2023. Until two years ago, he had never written a line of Python. Now — with help from AI — he not only writes code, he also teaches CS. I found Kyle’s story inspiring as a model for scaling up CS education in the primary- and secondary-school levels.

Kyle’s success has been with the support of Kira Learning (an AI Fund portfolio company), whose founders Andrea Pasinetti and Jagriti Agrawal have created a compelling vision for CS education. In K-12 classrooms, teachers play a huge social-emotional support role, for example, encouraging students and helping them when they stumble. In addition, they are expected to be subject-matter experts who can deliver the content needed for their subject. Kira Learning uses digital content delivery — educational videos, autograded quizzes, and AI-enabled chatbots to answer students' questions but without giving away homework answers — so the teacher can focus on social-emotional support. While these are still early days, it appears to be working!

A key to making this possible is the hyperpersonalization that is now possible with AI (in contrast to the older idea of the flipped classroom, which had limited adoption). For example, when assigned a problem in an online coding environment, if a student writes this buggy line of Python code

best_$alty_snack = 'potato chips'

Kira Learning’s AI system can spot the problem and directly tell the teacher that $ is an invalid character in a variable name. It can also suggest a specific question for the teacher to ask the student to help get them unstuck, like “Can you identify what characters are allowed in variable names?” Whereas AI can directly deliver personalized advice to students, the fact that it is now helping teachers also deliver personalized support will really help in K-12.

Basketball 3-point shooting chart with AI-generated student portraits and stats for volume and accuracy.

Additionally, agentic workflows can automate a lot of teachers’ repetitive tasks. For example, when designing a curriculum, it’s time-consuming to align the content to educational standards (such as the Common Core in the United States, or the AP CS standard for many CS classes). Having an AI system carry out tasks like these is already proving helpful for teachers.

Since learning to code, Kyle has built many pieces of software. He proudly showed me an analysis he generated in matplotlib of his basketball players’ attempts to shoot three-pointers (shown above), which in turn is affecting the team’s strategy on the court. One lesson is clear: When a basketball coach learns to code, they become a better basketball coach!

I talked about Kyle (and other topics) at the ASU+GSV Summit on education. You can see a video here.

In the future, people who know how to code and build with AI will be much more productive than people who don’t. I’m excited about how AI will lead to new models for K-12 education. By delivering CS education to everyone, I hope that in the future, everyone will be able to build with AI.

Keep learning!

Andrew

A MESSAGE FROM DEEPLEARNING.AI

Promo banner for: "LLMs as Operating Systems: Agent Memory"

In “LLMs as Operating Systems: Agent Memory,” you’ll learn to build agents that manage their own memory using the MemGPT approach. This newly updated short course includes cloud-based deployment and real-time, step-by-step output, so you can see how your agents reason as they respond. Join in today!

News

GIF showing GPT Image 1 generating AI images: emotions, surreal scenes, satire, fantasy, and photo-realistic edits.

New Image Generator for OpenAI API

ChatGPT’s image generator is available via API.

What’s new: GPT Image 1, which produces images from text or other images, has proven enormously popular among ChatGPT users. The OpenAI Images API enables developers to incorporate OpenAI’s most sophisticated image generator into their own software tools and platforms.

Input/output: Text and images in, images out
Architecture: Autoregressive (details undisclosed)
Performance: Currently tops Artificial Analysis’ Image Arena leaderboard.
Price: $5 per 1 million tokens of text input, $10 per 1 million tokens of image input, $40 per 1 million tokens of image output (roughly $0.02, $0.07, and $0.19 per generated image for low, medium, and high-quality square images, respectively)
Undisclosed: Architecture details, parameter count, training data, training methods

How it works: GPT Image 1 generates and modifies images in a wide range of styles, performs image editing and other alterations, renders text, and follows detailed instructions. Shortly after its debut, the version of GPT-4o equipped with GPT Image 1 quickly soared to the No. 1 spot on the Artificial Analysis Image Arena leaderboard.

The model employs an autoregressive design rather than the more typical diffusion architecture (like Open AI’s DALL·E 3), using generated parts of an image to predict the next part.
Its pricing structure differs from rivals, charging by input/output tokens rather than per image generated.
The model’s output is watermarked unobtrusively with C2PA data that identifies it as AI-generated.
The model may struggle to process non-English text, small type, rotated type, varying colors and styles, counting, and localization in space such as positions of pieces on a game board.

Behind the news: In March, OpenAI attracted huge public interest when it deployed the model, then unnamed, in ChatGPT. Within the first week, 130 million users used it to create more than 700 million images.

Why it matters: Adding GPT Image 1 to the API enables developers to use OpenAI’s most sophisticated image generator in a wide variety of automated workflows. OpenAI’s initial API partners include design companies (Adobe and Canva), marketers (HubSpot), and web designers (GoDaddy), all of which are using GPT Image 1.

We’re thinking: GPT Image 1 is part of an exciting trend toward unification of multimodal architectures. Researchers have progressed from text-in, text-out to text/images-in, text-out and increasingly text/images/audio-in, text/images/audio-out. This paints a beautiful picture of where multimodal models can go!

AI music generation interface showing waveform and text prompts like deep house, djembe, and saxophone.

Music Generation for Pros

Google refreshed its experimental tools for composers and producers.

What’s new: Google announced updates of two music-generation apps and the models they're based on. Music AI Sandbox, an app that generates and modifies music according to text prompts, now accepts lyrics to generate songs as well as instrumental music. You can join a waitlist here. MusicFX DJ generates a continuous stream of music that users can modify as it plays. Try it out here.

How it works: The apps generate 48kHz audio suitable for professional productions. Users can specify key, tempo in beats per minute, instrumentation, style, mood, and other details.

Music AI Sandbox is based on the updated Lyria 2 music generator. It lets users generate new clips, roughly 30 seconds long, according to prompts. Users can enter lyrics, extend existing clips, and rearrange segments with generated transitions, introductions, and endings.
MusicFX DJ, which is based on a different model called Lyria RealTime, lets users control streaming music via prompts and other settings. Users can change or combine genres, add or subtract instruments, change key, and speed up or slow down without interrupting the stream.

Behind the news: Google launched Lyria 1 and Music AI Sandbox in 2023 as part of an experiment with YouTube, which made them available to composers, producers, and musicians. Since then, the company has developed them with help from music stars including Jacob Collier, Donald “Childish Gambino” Glover, and Wyclef Jean. Lyria 1 recently became available via the Vertex API to developers who are preapproved by Google.

Why it matters: While music generators like Suno and Udio appeal to casual musicians, Music AI Sandbox, with its digital audio workstation-style user interface, aims to address the needs of professionals. This approach puts AI directly into the hands of talented, experienced artists, similar to the way Adobe has empowered videographers and Runway has partnered with movie producers.

We’re thinking: API access to Lyria 2 would be music to our ears!

Shoppers in a grocery store aisle selecting boxed AI assistant devices from shelves, smiling and examining the products.

Learn More About AI With Data Points!

AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six very brief news stories. This week, we covered Zhipu AI’s new open models challenging DeepSeek and how Rowboat’s IDE makes building multi-agent systems easier. Subscribe today!

CB Insights AI 100 2025 infographic showing top AI startups across sectors like healthcare, robotics, and infrastructure.

Up-and-Coming Startups

AI agents and infrastructure made a strong showing on CB Insights’s latest list of the top 100 AI startups.

What’s new: CB Insights, which tracks tech startups and venture capital, selected companies in the AI 100 based on their market traction, talent, finances, and partnerships. The list purports to highlight the next wave of winners, shedding light on the key executives, investors, fundraising, and valuations behind up-and-coming AI ventures.

How it works: The analysts evaluated 17,000 early-stage, private AI companies that had raised funds within the last year and continue to seek further investment.

CB Insights evaluated the startups according to its own Mosaic Score, a proprietary system designed to assess the health and growth potential of private companies. The score takes into account a startup’s market momentum (traction and growth rate), market size, financial health, and management team.
The analysts divided their choices into three broad categories: (i) horizontal (providing business products or services common to multiple industries), (ii) vertical (serving a single industry or business function), or (iii) providers of AI hardware or software infrastructure.
They further divided the horizontal companies by business function (customer service, cybersecurity, software development, and so on), the vertical companies into industries (healthcare, automotive, aerospace, manufacturing, finance, energy, and the like), and the infrastructure providers into segments (hardware, monitoring, data, and development and training).

Where the action is: This year’s AI 100 companies are based in 14 countries, around two-thirds of them in the United States. 10 are based in the United Kingdom, five in France, and four in Germany, with one each in India (Bioptimus), Norway (Braintrust), Singapore (Bria), Spain (Cartwheel), Sweden (Chainguard), and Switzerland (Clarium).

More than 20 percent of this year’s AI 100 build AI agents or support them, including Texas-based Apptronik (valued at $423 million) and Canada’s 1X ($134 million, the second-most highly valued agent specialist).
The report also notes the rapid growth of companies that monitor AI performance and reliability, such as California-based Arize (valued at $131 million) and the French startup Bioptimus ($76 million).
Opportunity may be rising for AI companies that cater to specific industries. This year, the vertical companies pulled in the most total funding, just over $1 billion. These included the Texas aerospace specialist Saronic (valued at $4 billion) and the California software development and training provider Together.AI ($3.3 billion).
The AI infrastructure category raised the second-highest total funding, a leading indicator of need for infrastructure as businesses take advantage of the technology. Infrastructure companies on the list were led by Munich’s defense startup Helsing (valued at $5.37 billion), California robot maker Figure ($2.77 billion) and Washington-state cybersecurity provider Chainguard ($1.12 billion).

Why it matters: This year’s AI 100 offers a snapshot of AI becoming more central to businesses of all kinds. Most of the startups listed here offer practical products and services that are poised to deliver a timely return, rather than moonshots with long development cycles and risky payoffs. In addition, they mostly target corporate customers rather than consumers.

We’re thinking: The falling cost of access to AI models and increasingly capable open-weights models make this the perfect time to build applications. What kind? The report singles out health care (8 companies) and life sciences (6 companies) as growing areas, but it also documents opportunities in defense, gaming, and finance.

Diagram of LLM-based preference approximation and multimodal sequential recommendation for personalized product suggestions.

Inferring Customer Preferences

Large language models can improve systems that recommend items to purchase by inferring customer preferences.

What’s new: Fabian Paischer and colleagues at Johannes Kepler University Linz, University of Wisconsin, and Meta introduced Multimodal Preference Discerner (Mender), a recommender that integrates a large language model (LLM).

Key insight: Text that attracts customers, such as product descriptions, and text they write, such as product reviews, may contain information that indicates their preferences, such as the craft projects that required a particular power tool. But it also may include irrelevant information, such as a complaint that the tool was delivered late, which can throw recommendation systems off track. An LLM can derive preferences from text, providing a clearer signal of what a customer wants.

How it works: Mender comprises an LLM (Llama 3 70B-Instruct), an encoder (Flan-T5 pretrained on a wide variety of text and frozen) that embeds customer data, and a decoder (a transformer trained from scratch) that predicts the next item a customer will buy. The system learned to predict the next item based on descriptions of items a customer purchased, the customer’s ratings and reviews of those products (drawn from datasets of Steam reviews of video games and Amazon reviews of items related to beauty, toys-and-games, and sports-and-outdoors), and customer preferences inferred by the LLM from the foregoing data.

The authors started with a list of products a given customer had purchased and reviewed. Given an item’s description and all reviews up to that point, the LLM inferred five customer preferences in the form of instructions such as, “Look for products with vibrant, bold colors.”
The authors built a dataset in which each example included a sequence of items a customer had purchased and on inferred preference that matched the next purchase. To choose the matching preference, they separately embedded all prior preferences and item descriptions using a pretrained Sentence-T5 embedding model. They chose the preference whose embedding was most similar to that of the next purchase.
The encoder embedded the list of purchases and the selected preference. Given the embeddings, the decoder learned to predict the next purchase.

Results: The authors compared Mender to TIGER, a recommender that also takes a purchase history and predicts the next purchase, on the Steam and Amazon datasets. They scored the results using recall @5, a measure of how often the correct item is within the model’s top five most likely predictions.

Mender produced the best recommendations for all datasets.
On Steam, TIGER was close. Mender achieved 16.8 percent recall @5, while TIGER achieved 16.3 percent.
The difference was most pronounced on the Amazon toys-and-games dataset. Mender achieved 5.3 percent recall @5, while TIGER achieved 3.75 percentrecall @5.

Why it matters: Drawing inferences from text information like customer reviews and item descriptions boosts a recommender’s signal, making it clearer what a given customer is likely to want. Previous systems used customer reviews or item descriptions directly; Mender uses customer preferences extracted from that information.

We’re thinking: Be on the lookout for innovative ways to use LLMs. We recommend it!

Work With Andrew Ng

Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.

Subscribe and view previous issues here.

Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.