Dear friends,
Even though I’m a much better Python than JavaScript developer, with AI assistance, I’ve been writing a lot of JavaScript code recently. AI-assisted coding is making specific programming languages less important, even though learning one is still helpful to make sure you understand the key concepts. This is helping many developers write code in languages we’re not familiar with, which lets us get code working in many more contexts!
Different programming languages reflect different views of how to organize computation, and understanding the concepts is still important. For example, someone who does not understand arrays, dictionaries, caches, and memory will be less effective at getting an LLM to write code in most languages.
Similarly, a Python developer who moves toward doing more front-end programming with JS would benefit from learning the concepts behind front-end systems. For example, if you want an LLM to build a front end using the React framework, it will benefit you to understand how React breaks front ends into reusable UI components, and how it updates the DOM data structure that determines what web pages look like. Thislets you prompt the LLM much more precisely, and helps you understand how to fix issues if something goes wrong. Similarly, if you want an LLM to help you write code in CUDA or ROCm, it helps to understand how GPUs organize compute and memory.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AILearn to build agents that write and run code to complete complex tasks. “Building Code Agents with Hugging Face smolagents,” made in collaboration with Hugging Face, teaches you how to build code agents, execute their code safely, and set up evals for production-ready multi-agent systems, using the smolagents framework. Enroll for free
News
OpenAI Launches Cost-Effective AlternativesOpenAI refreshed its roster of models and scheduled the largest, most costly one for removal. What’s new: OpenAI introduced five new models that accept text and images inputs and generate text output. Their parameter counts, architectures, training datasets, and training methods are undisclosed. The general-purpose GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano are available via API only. The reasoning models o3 and o4-mini, are available via API to qualified developers as well as users of ChatGPT Plus, Pro, and Team, and soon ChatGPT Enterprise and ChatGPT Education. The company will terminate GPT-4.5 — which it introduced as a research preview in late February — in July. GPT-4.1 family: In an odd turn of version numbers, the GPT-4.1 models are intended to be cost-effective equivalents to GPT-4.5 and updates to GPT-4o. They accept inputs of up to 1 million tokens (compared to GPT-4.5’s and GPT-4o’s 128,000 tokens).
o3 and o4-mini: These models update o1 and o3-mini, respectively. They have input limits of 200,000 tokens and can be set to low-, medium-, or high-effort modes to process varying numbers of reasoning tokens, which are hidden from users. Unlike their predecessors, they were fine-tuned to decide when and how to use the tools, including web search, code generation and execution, and image editing.
Behind the news: Late last year, OpenAI introduced o1, the first commercial model trained via reinforcement learning to generate chains of thought. Within a few months, DeepSeek, Google, and Anthropic launched their respective reasoning models DeepSeek-R1, Gemini 2.5 Pro, and Claude 3.7 Sonnet. OpenAI has promised to integrate its general-purpose GPT-series models and o-series reasoning models, but they remain separate for the time being. Why it matters: GPT-4.5 was an exercise in scale, and it showed that continuing to increase parameter counts and training data would yield ongoing performance gains. But it wasn’t widely practical on a cost-per-token basis. The new models, including those that use chains of thought and tools, deliver high performance at lower prices. We’re thinking: Anthropic is one of OpenAI’s key competitors, and a large fraction of the tokens it generates (via API) are for writing code, a skill in which it is particularly strong. OpenAI’s emphasis on models that are good at coding could boost the competition in this area!
Hugging Face Rolls Out Open RobotHugging Face has made a name by providing open AI models. Now it’s providing an open robot. What’s new: Hugging Face acquired the French company Pollen Robotics for an undisclosed price. It plans to offer Pollen’s Reachy 2, a robot that runs on code that’s freely available under an Apache 2.0 license, for $70,000. How it works: Reachy 2 has two arms, gripper hands, and a wheeled base (optional). It’s designed primarily for education and research in human-robot interaction in real-world settings.
Behind the news: Last year, Remi Cadene, who worked on Tesla’s Optimus, joined Hugging Face to lead robotics projects. In May, he and his team rolled out the LeRobot open source robotics code library, which provides pretrained models, datasets, and simulators for reinforcement learning and imitation learning. In November, Nvidia announced a collaboration with Hugging Face to accelerate LeRobot’s data collection, training, and verification. Why it matters: Hugging Face’s acquisition of Pollen reflects an industry-wide investment in robots, notably humanoid robots, whose prices have been falling. Nvidia CEO Jensen Huang has called AI-enabled robotics a “multi-trillion dollar” opportunity. We’re thinking: AI-enabled robots are marching slowly toward what we hope will be breakthrough applications. Open-source systems are an important part of the trend!
Learn More About AI With Data Points!AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six very brief news stories. This week, we covered Wikimedia’s push to build AI for the commons and a new leaderboard that ranks the best AI search engines based on human preferences. Subscribe today!
U.S. Tightens Grip on AI ChipsThe U.S. government escalated its long-running effort to block China’s access to cutting-edge AI hardware. What’s new: The White House announced that future shipments of Nvidia H20s, AMD MI308s, or equivalent chips to China would require a license. Concurrently, the United States Congress launched an investigation into whether chip vendor Nvidia violated earlier export rules. How it works: Nvidia launched the H20 in late 2023 to comply with a 2022 U.S. ban on China-bound shipments of Nvidia’s H100 and H200 processors. The H20 uses the same architecture as the H200, but it’s an order of magnitude slower with less memory and memory bandwidth.
Behind the news: The U.S. government’s many moves to restrict shipments of advanced processors to China have sought to protect the nation’s lead in AI, but they have not prevented Chinese developers from closing the gap. In 2020, the U.S. required chip makers that use U.S. technology — which includes both domestic chip designers like Nvidia and makers of advanced fabrication equipment like the Netherlands’ ASML — to seek permission before doing business with Chinese tech giant Huawei. Last December, the U.S. published sweeping limits on sales of processors that involve U.S. technology, as well as the technology itself, to Chinese businesses. Yes, but: Export restrictions may have slowed China’s production of advanced chips, but they have also incentivized China to invest in establishing leadership in AI. In January, the Chinese AI developer DeepSeek surprised U.S. policymakers and AI leaders with the release of DeepSeek-R1, which performs comparably to OpenAI’s o1, but whose weights are freely available and trained using less computation. Why it matters: The first wave of restrictions on sales of advanced chips to China did little harm to U.S. chipmakers, largely because demand outstripped supply. But later restrictions have had a greater impact on their sales. The new limits could cost Nvidia and AMD significant revenue and likely will degrade their competitiveness abroad and bolster China’s homegrown chip-making industry. We’re thinking: The AI community’s international scope is one of its greatest strengths. While individual countries must attend to their national security, progress in AI benefits all nations. Even in this era of rising protectionism, we hope members of the global AI community continue to support one another and encourage the free flow of ideas.
Text-Only LLM Goes MultimodalLarge language models excel at processing text but can’t interpret images, video, or audio directly without further training on those media types. Researchers devised a way to overcome this limitation.
Results: The authors evaluated MILS on captioning images, videos, and audio clips. They measured performance according to Metric for Evaluation of Translation with Explicit ORdering (METEOR), which checks for synonyms, words that share the same root, and word order to determine whether a generated caption matches a ground-truth caption (higher is better). Overall, MILS outperformed models that underwent task-specific training.
Why it matters: Zero-shot captioning models like Aya Vision and Pixtral require training on paired captions and media. The authors’ approach takes advantage of pretrained multimodal models to enable an LLM to compose multimedia captions without further training.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|