Dear friends,
On Monday, a United States District Court ruled that training LLMs on copyrighted books constitutes fair use. A number of authors had filed suit against Anthropic for training its models on their books without permission. Just as we allow people to read books and learn from them to become better writers, but not to regurgitate copyrighted text verbatim, the judge concluded that it is fair use for AI models to do so as well.
Access to high-quality data is important. Even though the mass media tends to talk about the importance of building large data centers and scaling up models, when I speak with friends at companies that train foundation models, many describe a very large amount of their daily challenges as data preparation. Specifically, a significant fraction of their day-to-day work follows the usual Data Centric AI practices of identifying high-quality data (books are one important source), cleaning data (the ruling describes Anthropic taking steps like removing book pages' headers, footers, and page numbers), carrying out error analyses to figure out what types of data to acquire more of, and inventing new ways to generate synthetic data.
I am glad that a major risk to data access just decreased. Appropriately, the ruling further said that Anthropic’s conversion of books from paper format to digital — a step that’s needed to enable training — also was fair use. However, in a loss for Anthropic, the judge indicated that, while training on data that was acquired legitimately is fine, using pirated materials (such as texts downloaded from pirate websites) is not fair use. Thus, Anthropic still may be liable on this point. Other LLM providers, too, will now likely have to revisit their practices if they use datasets that may contain pirated works.
Keep building! Andrew
A MESSAGE FROM DEEPLEARNING.AILearn how to use Agent Communication Protocol (ACP) to connect AI agents built using different agentic frameworks! In this short course, you’ll build ACP-compliant agents and connect them in hierarchical and sequential workflows for seamless collaboration. Sign up now!
News
Meta Befriends Scale AI
Meta hired the leadership of ScaleAI and put billions into the data-labeling startup to accelerate its AI efforts.
What’s new: Meta recruited Scale AI founder and CEO Alexandr Wang along with members of his team and pumped $14.3 billion into the startup in a new deal. The agreement, which was inked as the United States Federal Trade Commission investigates Meta over its acquisitions of Instagram and WhatsApp, could avoid the government scrutiny that acquiring Scale AI outright would have invited.
How it works: The agreement between Meta and Scale AI gives Meta an infusion of high-profile talent and priority access to Scale AI’s large-scale data operations. It doubles the valuation of Scale AI, which was valued at $13.8 billion last year, and provides funding to fuel growth and reward shareholders. The terms echo similar deals last year between Microsoft and Inflection AI, Amazon and Adept AI, and Google and Character.AI.
Behind the news: Wang and his team could help fulfill Meta’s need for top AI talent.
Why it matters: Meta is racing with other Silicon Valley giants to establish and maintain a decisive lead in AI, and that requires making big bets. In this deal, it gains a star AI entrepreneur as well as closer access to Scale AI’s pipeline of high-quality training data. For Scale AI, Meta’s enormous resources and know-how could come in handy as it contends with competitors and extends its business into new areas. For the AI community, Meta’s willingness to spend such an immense sum for top talent could boost engineers’ salaries and block less-moneyed competitors.
We’re thinking: Meta has made valuable contributions to open-weights models, including Llama 4, and it has played an important role in making open models competitive with their closed counterparts. We look forward to seeing what the new team will accomplish!
A Research Agent for All Biology
An agent designed for broad biological research could accelerate the work of scientists in specialties from anatomy to zoology.
What’s new: Kexing Huang and colleagues at Stanford, Princeton, University of Washington, Arc Institute, and Genentech introduced Biomni, an agent that performs tasks in genomics, immunology, microbiology, neuroscience, pathology, and much more. You can join a waitlist to get access. The authors intend to release the system as open source.
How it works: The authors assembled a collection of tools, software packages, and databases. Then they built an agent based on Claude 4 Sonnet that draws upon those resources to answer questions, propose hypotheses, design processes, analyze datasets, generate graphs, and so on.
Results: Biomni outperformed Claude 4 Sonnet alone, as well as the same model with access to research literature, on Lab-bench, a biomedical subset of Humanity’s Last Exam, and eight other datasets, as well as three practical case studies.
Behind the news: While Biomni is designed to apply to biology broadly, most previous work on agents focused on narrower areas. For instance, just two days after the release of Biomni, a separate team at Stanford released CellVoyager, an agent that generates hypotheses about datasets of single-cell RNA sequences. Other examples include CRISPR-GPT, which designs gene-editing experiments, and SpatialAgent, which analyzes and hypothesizes about how cells interact within organisms.
Why it matters: While agents conversant in biology typically focus on narrow specialties, Biomni’s knowledge and skills span the entire domain, offering expert assistance to biologists across many specialties. Its reasoning capabilities can improve by substituting more capable LLMs as they become available, and its library of resources can be updated to keep up with changes in the field and extend its knowledge to new areas.
We’re thinking: Like biology, many sciences are so deep and broad that most scientists have deep expertise only within their areas of specialty. Yet agents can pull together resources from disparate areas to reach novel conclusions. In this way, Biomni demonstrates the potential of AI to augment human expertise in meaningful ways.
Learn More About AI With Data Points!
AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Harvard’s release of nearly one million historic books, now available for free to train AI models. Subscribe today!
CEOs Look to AI to Replace Workers
Leaders at some of the biggest U.S. corporations say they’re preparing for AI to eliminate many jobs within their organizations.
What’s new: Amazon CEO Andy Jassy wrote in a memo to employees that generative AI and AI agents within the next few years would enable the company to reduce its corporate workforce. (Disclosure: Andrew Ng is a member of Amazon’s board of directors.) Similarly, the CEOs of Bank of America, IBM, Shopify, and Williams-Sonoma have said they are embracing AI and expect to hire fewer workers as a result. Worldwide, around 40 percent of employers expect to downsize their workforce, largely due to the rise of AI, according to a survey by the World Economic Forum.
How it works: Many business leaders skirt the topic of job losses when they describe the impact of AI on their companies, but these executives put the technology front and center in their plans to downsize.
Yes, but: Several studies in recent years have shown that AI is likely to increase, not reduce, the number of jobs.
Why it matters: Technological advances typically create more jobs than they destroy; an estimated 60 percent of U.S. jobs in 2018 did not exist in 1940. An explosion of machine learning engineers, AI application engineers, and data engineers is highly likely! In the short term, though, AI is poised to impinge on a wide variety of roles including many that once were considered immune to automation: knowledge workers, creative people, and so on. Executives have a responsibility to prepare their companies for a coming wave of AI-driven applications, and many expect to hire fewer employees.
We’re thinking: When executives speak, it’s hard to differentiate those who are sincerely trying to navigate change from those whose primary aim is to reassure shareholders, drum up publicity, attract talent, or what have you. Regardless, professionals of all kinds who embrace AI will be much more productive and significantly outperform those who don’t. Jassy said it well in his message to Amazon employees: “As we go through this transformation together, be curious about AI, educate yourself, attend workshops and take trainings, use and experiment with AI whenever you can, participate in your team’s brainstorms to figure out how to invent for our customers more quickly and expansively, and how to get more done with scrappier teams.”
Low Precision, High Performance
Reducing the number of bits used to represent each parameter in a neural network from, say, 16 bits to 8 bits shrinks the network’s size and boosts its speed. Researchers took this approach to an extreme: They built a competitive large language model whose weights are limited to three values.
What’s new: Shuming Ma, Hongyu Wang, and colleagues at Microsoft, University of Chinese Academy of Sciences, and Tsinghua University updated their earlier BitNet b1.58, in which most weight values are limited to -1, 0, or +1, competing with the top full-precision models up to 2 billion parameters. Weights are free to download for noncommercial and commercial uses according to an MIT license.
Key insight: Linear layers have a big impact on a transformer’s overall speed. They make up large parts of attention layers and fully connected layers, so they account for most computations. The authors’ 2023 work on BitNet showed that using 1-bit weights — whose values are limited to -1 and +1 — makes multiplications very fast (because multiplying by -1 simply flips the sign and multiplying by +1 changes nothing), but performance suffers. They improved on the idea the following year with BitNet b1.58, which allowed weights to be -1, 0, or +1. (Implemented perfectly, this approach allocates approximately 1.58 bits per parameter, since the number of bits needed to represent 3 values is log₂(3)=1.58.) In this case, multiplying by -1 or +1 still just flips or keeps the sign, and multiplying by 0 zeroes out the value. This ternary setup retains the original BitNet’s low memory requirements, fast training, and fast inference. With careful attention to hyperparameters, it also improves performance.
How it works: The authors pretrained the 2-billion parameter BitNet b1.58, which has an architecture similar to LLaMA, on a dataset of 4 trillion tokens that included web data plus synthetic math problems. To strengthen its reasoning abilities, they fine-tuned it on chat data, instruction-following data, and synthetic instruction-following data. Finally, they fine-tuned the model via DPO to better match human preferences.
Results: Across 16 popular benchmarks for language understanding, mathematical reasoning, and coding, BitNet b1.58 was faster and used less memory than competitors, including Alibaba’s Qwen2.5-1.5B, Google’s Gemma-3 1B, Hugging Face’s SmolLM2 1.7B,Meta’s Llama 3.2 1B, and ModelBest’s MiniCPM 2B. It achieved better performance than all except Qwen2.5 1.5B.
Why it matters: Quantizing an LLM to a few bits is not as simple as applying the current best practices for full-precision models. It demands rethinking LLM training, down to hyperparameter details like learning rate and weight decay. Even these seemingly small changes can have a large impact on final performance. By delving into these nuances, the authors provide a guide for how to ensure good performance from low-precision models.
We’re thinking: This work makes more than a bit of progress!
A MESSAGE FROM RAPIDFIRE AITired of waiting for hours to learn that your experiment didn’t work? RapidFire AI helps you explore, tune, and train more ideas in less time, with intelligent early stopping, cloning, and modification in real time and parallel configuration runs. Try it free on our cloud or bring your own model.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|