Dear friends,
I hope we can empower everyone to build with AI. Starting from K-12, we should teach every student AI enabled coding, since this will enable them to become more productive and more empowered adults. But there is a huge shortage of computer science (CS) teachers. I recently spoke with high school basketball coach Kyle Creasy, who graduated with a B.A. in Physical Education in 2023. Until two years ago, he had never written a line of Python. Now — with help from AI — he not only writes code, he also teaches CS. I found Kyle’s story inspiring as a model for scaling up CS education in the primary- and secondary-school levels.
Additionally, agentic workflows can automate a lot of teachers’ repetitive tasks. For example, when designing a curriculum, it’s time-consuming to align the content to educational standards (such as the Common Core in the United States, or the AP CS standard for many CS classes). Having an AI system carry out tasks like these is already proving helpful for teachers.
Keep learning! Andrew
A MESSAGE FROM DEEPLEARNING.AI
In “LLMs as Operating Systems: Agent Memory,” you’ll learn to build agents that manage their own memory using the MemGPT approach. This newly updated short course includes cloud-based deployment and real-time, step-by-step output, so you can see how your agents reason as they respond. Join in today!
News
New Image Generator for OpenAI API
ChatGPT’s image generator is available via API.
What’s new: GPT Image 1, which produces images from text or other images, has proven enormously popular among ChatGPT users. The OpenAI Images API enables developers to incorporate OpenAI’s most sophisticated image generator into their own software tools and platforms.
How it works: GPT Image 1 generates and modifies images in a wide range of styles, performs image editing and other alterations, renders text, and follows detailed instructions. Shortly after its debut, the version of GPT-4o equipped with GPT Image 1 quickly soared to the No. 1 spot on the Artificial Analysis Image Arena leaderboard.
Behind the news: In March, OpenAI attracted huge public interest when it deployed the model, then unnamed, in ChatGPT. Within the first week, 130 million users used it to create more than 700 million images.
Why it matters: Adding GPT Image 1 to the API enables developers to use OpenAI’s most sophisticated image generator in a wide variety of automated workflows. OpenAI’s initial API partners include design companies (Adobe and Canva), marketers (HubSpot), and web designers (GoDaddy), all of which are using GPT Image 1.
We’re thinking: GPT Image 1 is part of an exciting trend toward unification of multimodal architectures. Researchers have progressed from text-in, text-out to text/images-in, text-out and increasingly text/images/audio-in, text/images/audio-out. This paints a beautiful picture of where multimodal models can go!
Music Generation for Pros
Google refreshed its experimental tools for composers and producers.
What’s new: Google announced updates of two music-generation apps and the models they're based on. Music AI Sandbox, an app that generates and modifies music according to text prompts, now accepts lyrics to generate songs as well as instrumental music. You can join a waitlist here. MusicFX DJ generates a continuous stream of music that users can modify as it plays. Try it out here.
How it works: The apps generate 48kHz audio suitable for professional productions. Users can specify key, tempo in beats per minute, instrumentation, style, mood, and other details.
Behind the news: Google launched Lyria 1 and Music AI Sandbox in 2023 as part of an experiment with YouTube, which made them available to composers, producers, and musicians. Since then, the company has developed them with help from music stars including Jacob Collier, Donald “Childish Gambino” Glover, and Wyclef Jean. Lyria 1 recently became available via the Vertex API to developers who are preapproved by Google.
Why it matters: While music generators like Suno and Udio appeal to casual musicians, Music AI Sandbox, with its digital audio workstation-style user interface, aims to address the needs of professionals. This approach puts AI directly into the hands of talented, experienced artists, similar to the way Adobe has empowered videographers and Runway has partnered with movie producers.
We’re thinking: API access to Lyria 2 would be music to our ears!
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six very brief news stories. This week, we covered Zhipu AI’s new open models challenging DeepSeek and how Rowboat’s IDE makes building multi-agent systems easier. Subscribe today!
Up-and-Coming Startups
AI agents and infrastructure made a strong showing on CB Insights’s latest list of the top 100 AI startups.
What’s new: CB Insights, which tracks tech startups and venture capital, selected companies in the AI 100 based on their market traction, talent, finances, and partnerships. The list purports to highlight the next wave of winners, shedding light on the key executives, investors, fundraising, and valuations behind up-and-coming AI ventures.
How it works: The analysts evaluated 17,000 early-stage, private AI companies that had raised funds within the last year and continue to seek further investment.
Where the action is: This year’s AI 100 companies are based in 14 countries, around two-thirds of them in the United States. 10 are based in the United Kingdom, five in France, and four in Germany, with one each in India (Bioptimus), Norway (Braintrust), Singapore (Bria), Spain (Cartwheel), Sweden (Chainguard), and Switzerland (Clarium).
Why it matters: This year’s AI 100 offers a snapshot of AI becoming more central to businesses of all kinds. Most of the startups listed here offer practical products and services that are poised to deliver a timely return, rather than moonshots with long development cycles and risky payoffs. In addition, they mostly target corporate customers rather than consumers.
We’re thinking: The falling cost of access to AI models and increasingly capable open-weights models make this the perfect time to build applications. What kind? The report singles out health care (8 companies) and life sciences (6 companies) as growing areas, but it also documents opportunities in defense, gaming, and finance.
Inferring Customer Preferences
Large language models can improve systems that recommend items to purchase by inferring customer preferences.
What’s new: Fabian Paischer and colleagues at Johannes Kepler University Linz, University of Wisconsin, and Meta introduced Multimodal Preference Discerner (Mender), a recommender that integrates a large language model (LLM).
Key insight: Text that attracts customers, such as product descriptions, and text they write, such as product reviews, may contain information that indicates their preferences, such as the craft projects that required a particular power tool. But it also may include irrelevant information, such as a complaint that the tool was delivered late, which can throw recommendation systems off track. An LLM can derive preferences from text, providing a clearer signal of what a customer wants.
How it works: Mender comprises an LLM (Llama 3 70B-Instruct), an encoder (Flan-T5 pretrained on a wide variety of text and frozen) that embeds customer data, and a decoder (a transformer trained from scratch) that predicts the next item a customer will buy. The system learned to predict the next item based on descriptions of items a customer purchased, the customer’s ratings and reviews of those products (drawn from datasets of Steam reviews of video games and Amazon reviews of items related to beauty, toys-and-games, and sports-and-outdoors), and customer preferences inferred by the LLM from the foregoing data.
Results: The authors compared Mender to TIGER, a recommender that also takes a purchase history and predicts the next purchase, on the Steam and Amazon datasets. They scored the results using recall @5, a measure of how often the correct item is within the model’s top five most likely predictions.
Why it matters: Drawing inferences from text information like customer reviews and item descriptions boosts a recommender’s signal, making it clearer what a given customer is likely to want. Previous systems used customer reviews or item descriptions directly; Mender uses customer preferences extracted from that information.
We’re thinking: Be on the lookout for innovative ways to use LLMs. We recommend it!
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|