Dear friends,

On Monday, a United States District Court ruled that training LLMs on copyrighted books constitutes fair use. A number of authors had filed suit against Anthropic for training its models on their books without permission. Just as we allow people to read books and learn from them to become better writers, but not to regurgitate copyrighted text verbatim, the judge concluded that it is fair use for AI models to do so as well.

Indeed, Judge Alsup wrote that the authors’ lawsuit is “no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works.” While it remains to be seen whether the decision will be appealed, this ruling is reasonable and will be good for AI progress. (Caveat: I am not a lawyer and am not giving legal advice.)

AI has massive momentum, but a few things could put progress at risk:

Regulatory capture that stifles innovation, including especially open source, in the false name of “AI safety”
Loss of access to cutting-edge semiconductor chips (the most likely cause would be war breaking out in Taiwan)
Regulations that severely impede access to data for training AI systems

Access to high-quality data is important. Even though the mass media tends to talk about the importance of building large data centers and scaling up models, when I speak with friends at companies that train foundation models, many describe a very large amount of their daily challenges as data preparation. Specifically, a significant fraction of their day-to-day work follows the usual Data Centric AI practices of identifying high-quality data (books are one important source), cleaning data (the ruling describes Anthropic taking steps like removing book pages' headers, footers, and page numbers), carrying out error analyses to figure out what types of data to acquire more of, and inventing new ways to generate synthetic data.

Court document excerpt supporting fair use of copyrighted books to train LLMs, comparing it to teaching children to write.

I am glad that a major risk to data access just decreased. Appropriately, the ruling further said that Anthropic’s conversion of books from paper format to digital — a step that’s needed to enable training — also was fair use. However, in a loss for Anthropic, the judge indicated that, while training on data that was acquired legitimately is fine, using pirated materials (such as texts downloaded from pirate websites) is not fair use. Thus, Anthropic still may be liable on this point. Other LLM providers, too, will now likely have to revisit their practices if they use datasets that may contain pirated works.

Overall, the ruling is positive for AI progress. Perhaps the biggest benefit is that it reduces ambiguity with respect to AI training and copyright and (if it stands up to appeals) makes the roadmap for compliance clearer. This decision indicates it is okay to train on legitimately acquired data to build models that generate transformational outputs, and to convert printed books to digital format for this purpose. However, downloading from pirate sites (as well as permanently building a “general purpose” library of texts, stored indefinitely for purposes to be determined, without permission from the relevant copyright holders) are not considered fair use.

I am very sympathetic with the many writers who are worried about their livelihoods being affected by AI. I don‘t know the right solution for that. Society is better off with free access to more data; but if a subset of people is significantly negatively affected, I hope we can figure out an arrangement that compensates them fairly.

Keep building!

Andrew

A MESSAGE FROM DEEPLEARNING.AI

Promo banner for: "ACP: Agent Communication Protocol"

Learn how to use Agent Communication Protocol (ACP) to connect AI agents built using different agentic frameworks! In this short course, you’ll build ACP-compliant agents and connect them in hierarchical and sequential workflows for seamless collaboration. Sign up now!

News

Meta Befriends Scale AI

Meta hired the leadership of ScaleAI and put billions into the data-labeling startup to accelerate its AI efforts.

What’s new: Meta recruited Scale AI founder and CEO Alexandr Wang along with members of his team and pumped $14.3 billion into the startup in a new deal. The agreement, which was inked as the United States Federal Trade Commission investigates Meta over its acquisitions of Instagram and WhatsApp, could avoid the government scrutiny that acquiring Scale AI outright would have invited.

How it works: The agreement between Meta and Scale AI gives Meta an infusion of high-profile talent and priority access to Scale AI’s large-scale data operations. It doubles the valuation of Scale AI, which was valued at $13.8 billion last year, and provides funding to fuel growth and reward shareholders. The terms echo similar deals last year between Microsoft and Inflection AI, Amazon and Adept AI, and Google and Character.AI.

Wang will oversee a Meta research lab focused on developing superintelligence, a term that refers loosely to artificial intelligence that exceeds human intelligence, The New York Times reported. The 28-year-old executive has expertise in model training and evaluation.
Meta’s investment in Scale AI bought 49 percent of the startup in non-voting shares.
Scale AI will use the investment to “accelerate innovation and strengthen strategic partnerships,” the company said. It plans to distribute some of the funds to shareholders and vested equity holders.
Scale AI Chief Strategy Officer Jason Droege will take over as Scale AI’s interim CEO.
Since Meta’s investment became publicly known, some of Scale AI’s major customers including Google and OpenAI announced they would seek new providers of data labeling services.

Behind the news: Wang and his team could help fulfill Meta’s need for top AI talent.

Wang founded Scale AI in 2016, when he was a teenager. As the company’s business grew, he found himself, at the age of 24, the world’s youngest self-made billionaire.
Meta’s AI efforts have lost traction since its Llama 4 large language model met with a cool reception. In April, unnamed Meta employees told Fortune Meta’s AI lab was “dying a slow death.” The same month, AI research chief Joelle Pineau stepped down after 8 years in the position.
Since then, Meta has been on a mission to add firepower to its AI divisions. CEO Mark Zuckerberg discussed acquiring, among others, Safe Superintelligence, founded by former OpenAI chief scientist Ilya Sutskever and former head of Apple AI Daniel Gross, and Perplexity AI.

Why it matters: Meta is racing with other Silicon Valley giants to establish and maintain a decisive lead in AI, and that requires making big bets. In this deal, it gains a star AI entrepreneur as well as closer access to Scale AI’s pipeline of high-quality training data. For Scale AI, Meta’s enormous resources and know-how could come in handy as it contends with competitors and extends its business into new areas. For the AI community, Meta’s willingness to spend such an immense sum for top talent could boost engineers’ salaries and block less-moneyed competitors.

We’re thinking: Meta has made valuable contributions to open-weights models, including Llama 4, and it has played an important role in making open models competitive with their closed counterparts. We look forward to seeing what the new team will accomplish!

Biomni AI agent analyzing oncogenic pathways using genomics tools like Scanpy and CellxGene for enriched gene expression.

A Research Agent for All Biology

An agent designed for broad biological research could accelerate the work of scientists in specialties from anatomy to zoology.

What’s new: Kexing Huang and colleagues at Stanford, Princeton, University of Washington, Arc Institute, and Genentech introduced Biomni, an agent that performs tasks in genomics, immunology, microbiology, neuroscience, pathology, and much more. You can join a waitlist to get access. The authors intend to release the system as open source.

How it works: The authors assembled a collection of tools, software packages, and databases. Then they built an agent based on Claude 4 Sonnet that draws upon those resources to answer questions, propose hypotheses, design processes, analyze datasets, generate graphs, and so on.

The authors prompted Claude 3.5 Sonnet (the most current version when the work started) to extract the relevant tasks, tools, and databases used in 2,500 recent papers (100 from each of 25 specialties). They filtered the list manually to settle on 150 tools and nearly 60 databases. To that, they added around 100 popular biological software packages.
At inference, given a query, Biomni prompts Claude 4 Sonnet to determine which tools, packages, and databases are needed. Then it prompts the model to build a step-by-step plan to produce a response.
From there, the agent follows the CodeAct framework: Given a prompt to follow the plan or results of executing code, it can ask for clarification, write code and execute it, and return the result. The agent continues to follow the plan, generate code, and reason iteratively until it’s ready to produce a final response.
At each intermediate output, a different copy of Claude 4 Sonnet judges whether the model followed a proper procedure or confabulated its output. If the judge determines the model fell short, it tells the agent to repeat the step. If not, execution continues normally.

Results: Biomni outperformed Claude 4 Sonnet alone, as well as the same model with access to research literature, on Lab-bench, a biomedical subset of Humanity’s Last Exam, and eight other datasets, as well as three practical case studies.

On the subset of Humanity’s Last Exam, Biomni (17.3 percent accuracy) outperformed Claude 4 Sonnet alone (6 percent accuracy) and Claude 4 Sonnet with access to research (12.2 percent accuracy).
Asked to diagnose a patient based on a full genome, Biomni achieved roughly 85 percent accuracy, while Claude 4 Sonnet alone achieved 5 percent.
The authors assessed the ability to produce a protocol for cloning DNA sequences, co-author Serena Zhang said in an interview. Across 10 tests, experts rated Biomni’s protocol around 4.5 out of 5 — on par with those produced by human experts, higher than trainees, and much higher than Claude 4 Sonnet alone. A DNA synthesis lab was able to produce the sequence specified by one of the generated protocols.

Behind the news: While Biomni is designed to apply to biology broadly, most previous work on agents focused on narrower areas. For instance, just two days after the release of Biomni, a separate team at Stanford released CellVoyager, an agent that generates hypotheses about datasets of single-cell RNA sequences. Other examples include CRISPR-GPT, which designs gene-editing experiments, and SpatialAgent, which analyzes and hypothesizes about how cells interact within organisms.

Why it matters: While agents conversant in biology typically focus on narrow specialties, Biomni’s knowledge and skills span the entire domain, offering expert assistance to biologists across many specialties. Its reasoning capabilities can improve by substituting more capable LLMs as they become available, and its library of resources can be updated to keep up with changes in the field and extend its knowledge to new areas.

We’re thinking: Like biology, many sciences are so deep and broad that most scientists have deep expertise only within their areas of specialty. Yet agents can pull together resources from disparate areas to reach novel conclusions. In this way, Biomni demonstrates the potential of AI to augment human expertise in meaningful ways.

Team collaborating in futuristic library with digital interfaces, integrating technology and books, emphasizing modern research.

Learn More About AI With Data Points!

AI is moving faster than ever. Data Points helps you make sense of it just as fast. Data Points arrives in your inbox twice a week with six brief news stories. This week, we covered Harvard’s release of nearly one million historic books, now available for free to train AI models. Subscribe today!

Illustration of AI replacing office workers, showing a funnel of blue figures being filtered into people at desks using laptops.

CEOs Look to AI to Replace Workers

Leaders at some of the biggest U.S. corporations say they’re preparing for AI to eliminate many jobs within their organizations.

What’s new: Amazon CEO Andy Jassy wrote in a memo to employees that generative AI and AI agents within the next few years would enable the company to reduce its corporate workforce. (Disclosure: Andrew Ng is a member of Amazon’s board of directors.) Similarly, the CEOs of Bank of America, IBM, Shopify, and Williams-Sonoma have said they are embracing AI and expect to hire fewer workers as a result. Worldwide, around 40 percent of employers expect to downsize their workforce, largely due to the rise of AI, according to a survey by the World Economic Forum.

How it works: Many business leaders skirt the topic of job losses when they describe the impact of AI on their companies, but these executives put the technology front and center in their plans to downsize.

Amazon, which employs roughly 1.5 million people, is investing in AI “quite expansively,” Jassy’s memo notes. “We will need fewer people doing some of the jobs that are being done today, and more people doing other types of jobs. It’s hard to know exactly where this nets out over time, but in the next few years, we expect that this will reduce our total corporate workforce,” he wrote.
Bank of America CEO Brian Moynihan told Bloomberg that widespread use of AI in banking would lead to a smaller workforce industry-wide.
At IBM, AI agents have replaced hundreds of workers in the human resources department, CEO Arvind Krisna told The Wall Street Journal. Nonetheless, total employment has gone up at the company, he said.
Shopify CEO Tobias Lütke in April instructed employees, before they request new hires, to explain why AI isn’t sufficient to help them meet their goals. While this policy doesn’t inevitably lead to fewer jobs, it exerts pressure in that direction.
Williams-Sonoma CEO Laura Alber told investors on an earnings call that the retailer planned to use AI to avoid adding new employees. This year, the company will “focus on using AI to offset headcount growth,” she said.

Yes, but: Several studies in recent years have shown that AI is likely to increase, not reduce, the number of jobs.

Researchers at European Central Bank found that employment in occupations affected by AI has risen over nearly a decade. A job’s exposure to AI was associated with greater employment in some cases, and it had little effect on wages.
The U.S. government determined that employment grew in 9 out of 11 occupations that may be subject to automation.
The accounting firm PricewaterhouseCoopers analyzed nearly 1 billion job ads internationally and found that job availability grew 38 percent in roles that were more exposed to AI. (This growth rate was lower than that of less-exposed occupations.)

Why it matters: Technological advances typically create more jobs than they destroy; an estimated 60 percent of U.S. jobs in 2018 did not exist in 1940. An explosion of machine learning engineers, AI application engineers, and data engineers is highly likely! In the short term, though, AI is poised to impinge on a wide variety of roles including many that once were considered immune to automation: knowledge workers, creative people, and so on. Executives have a responsibility to prepare their companies for a coming wave of AI-driven applications, and many expect to hire fewer employees.

We’re thinking: When executives speak, it’s hard to differentiate those who are sincerely trying to navigate change from those whose primary aim is to reassure shareholders, drum up publicity, attract talent, or what have you. Regardless, professionals of all kinds who embrace AI will be much more productive and significantly outperform those who don’t. Jassy said it well in his message to Amazon employees: “As we go through this transformation together, be curious about AI, educate yourself, attend workshops and take trainings, use and experiment with AI whenever you can, participate in your team’s brainstorms to figure out how to invent for our customers more quickly and expansively, and how to get more done with scrappier teams.”

BitNet b1.58 matrix shows ternary weights enabling faster, multiplication-free neural network computation using new hardware.

Low Precision, High Performance

Reducing the number of bits used to represent each parameter in a neural network from, say, 16 bits to 8 bits shrinks the network’s size and boosts its speed. Researchers took this approach to an extreme: They built a competitive large language model whose weights are limited to three values.

What’s new: Shuming Ma, Hongyu Wang, and colleagues at Microsoft, University of Chinese Academy of Sciences, and Tsinghua University updated their earlier BitNet b1.58, in which most weight values are limited to -1, 0, or +1, competing with the top full-precision models up to 2 billion parameters. Weights are free to download for noncommercial and commercial uses according to an MIT license.

Key insight: Linear layers have a big impact on a transformer’s overall speed. They make up large parts of attention layers and fully connected layers, so they account for most computations. The authors’ 2023 work on BitNet showed that using 1-bit weights — whose values are limited to -1 and +1 — makes multiplications very fast (because multiplying by -1 simply flips the sign and multiplying by +1 changes nothing), but performance suffers. They improved on the idea the following year with BitNet b1.58, which allowed weights to be -1, 0, or +1. (Implemented perfectly, this approach allocates approximately 1.58 bits per parameter, since the number of bits needed to represent 3 values is log₂(3)=1.58.) In this case, multiplying by -1 or +1 still just flips or keeps the sign, and multiplying by 0 zeroes out the value. This ternary setup retains the original BitNet’s low memory requirements, fast training, and fast inference. With careful attention to hyperparameters, it also improves performance.

How it works: The authors pretrained the 2-billion parameter BitNet b1.58, which has an architecture similar to LLaMA, on a dataset of 4 trillion tokens that included web data plus synthetic math problems. To strengthen its reasoning abilities, they fine-tuned it on chat data, instruction-following data, and synthetic instruction-following data. Finally, they fine-tuned the model via DPO to better match human preferences.

During training, the authors used a quantized version of the model for forward passes and the non-quantized version for backward passes. Before each forward pass, they quantized the weights in linear layers to -1, 0, or +1. They ran the model, quantizing layer outputs to 8 bits. During backpropagation, they updated the weights of the non-quantized version, copied them, and quantized them before the next forward pass.
For ease of implementation, they ran attention, layer normalization, and other operations in 8-bit precision and stored the gradients and loss in 16 bits.
They used a two-phase schedule for the learning rate: an initial high learning rate helped BitNet b1.58 make updates large enough to affect the 1.58-bit weights after quantization — since small changes often had no effect — followed by a sharp drop in the learning rate mid-training to refine all weights on higher-quality data.
Similarly, they structured weight decay, which encourages weights to have lower values, in two phases. During the early phase, when the data quality was lower and learning rate higher, they used a strong decay to prevent overfitting. During the second phase, with higher-quality data and a lower learning rate, they disabled weight decay. This let all weights adapt to the data without interference from weight decay.

Results: Across 16 popular benchmarks for language understanding, mathematical reasoning, and coding, BitNet b1.58 was faster and used less memory than competitors, including Alibaba’s Qwen2.5-1.5B, Google’s Gemma-3 1B, Hugging Face’s SmolLM2 1.7B,Meta’s Llama 3.2 1B, and ModelBest’s MiniCPM 2B. It achieved better performance than all except Qwen2.5 1.5B.

Running on a laptop, BitNet generated 34.5 tokens per second on average, whereas Qwen2.5-1.5B generated 15.4 tokens per second on average.
BitNet’s memory requirement was 0.4GB, while Qwen2.5-1.5B required 2.6 GB.
BitNet achieved average accuracy of 54.19 percent, while Qwen2.5-1.5B achieved average accuracy of 55.23 percent. SmolLM2 1.7B was next-best (48.7 percent average accuracy).
BitNet also outperformed a 4-bit quantized version of Qwen2.5-1.5B (52.15 percent average accuracy).

Why it matters: Quantizing an LLM to a few bits is not as simple as applying the current best practices for full-precision models. It demands rethinking LLM training, down to hyperparameter details like learning rate and weight decay. Even these seemingly small changes can have a large impact on final performance. By delving into these nuances, the authors provide a guide for how to ensure good performance from low-precision models.

We’re thinking: This work makes more than a bit of progress!

A MESSAGE FROM RAPIDFIRE AI

Tired of waiting for hours to learn that your experiment didn’t work? RapidFire AI helps you explore, tune, and train more ideas in less time, with intelligent early stopping, cloning, and modification in real time and parallel configuration runs. Try it free on our cloud or bring your own model.

Work With Andrew Ng

Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.

Subscribe and view previous issues here.

Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.