Chain of Thought

Seeing Science Like a Language Model

Language models reveal what centuries of scientific method missed: Some truths resist reduction.

I’m writing a book about the worldview I’ve developed by writing, coding, and living with AI. Last week I published the first piece from it, about the differences between the old (pre-GPT-3) worldview and the new. Here’s the second, about how AI will impact science.—Dan Shipper

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Read with ChatGPT

I’m writing this to you from Bocas Del Toro, Panama. I’m living in a little cabin in the jungle at the top of a hill five minutes from the Caribbean sea. When I walk down to the water, the ocean’s flat plane stretches out in front of me like an endless bed, its surface raised here and there with ridges of foam-tipped waves—a bedsheet not quite pulled taught. On this clear evening, I have a front-row seat to our cosmic neighbors’ celestial ballet. The red sun dips slowly behind the horizon line and the moon loiters ominously above my head; the stars wander in circles.

The ocean looks flat to me, but I know—really—it is curved. The sky looks to be teeming with movement but really, I know, I am the one that is moving. This is intuitive to you and me because it’s how we’ve grown up. We know that the way things appear is different from the way they are—we trust the science.

To people in Copernicus or Galileo’s day, these ideas were preposterous. If the earth was in motion, why weren’t the trees blowing in the wind generated by its orbit? Why would an object dropped from a height drop straight down if the planet was shifting underneath it? If the earth was round instead of flat, why didn’t we fall off of it?

Their world was organized quite differently from ours. That we see the world the way we do is a testament to how powerfully ideas shape what we see, and our capacity, through cultural transmission, to know things that we have not directly experienced.

And this is a good thing. This way of seeing the world, as I’ve said over and over again, is responsible for rockets and vaccines, computers and smartphones. It has changed everything about our culture:

When we talk about "forces" shaping society or "momentum" in markets, we're borrowing concepts from physics. If you’ve ever described having “chemistry” with another person, or the people in a charismatic politician’s “orbit”, or needing to get “leverage” in a deal, or the “friction” between two co-workers, you’re using ideas and words from Newton’s time. If you’ve ever read Stephen Pressfield’s The War of Art, and encountered his concept of "resistance," or Robert Greene’s The 48 Laws of Power which tries to identify simple, general, universal laws of human behavior, you're seeing how deeply rationalist Enlightenment thinking has penetrated our understanding of life itself.

But this way of thinking has also begun to reach its limits. Just as it did with machine learning research, the tendency to reduce, to break apart, to make explicit, and to generalize has failed, hard, in many places.

Build something people desperately need

Want to steer AI in the right direction? Join Slingshot AI, which is building the first psychology foundation model to help millions of people access mental health care that’s otherwise out-of-reach. They’re looking for senior product engineers who can work across mobile, backend, and ML tooling. Fast-paced, zero-to-one work. $100 million raised from a16z, Forerunner, and others. This might just be your next big adventure.

Ready to build the future of mental health? Apply now.

Want to sponsor Every? Click here.

The limits of science

Psychology and the replication crisis

Psychology is an easy example. Most psychology research is built on linear regressions—a statistical tool that assumes straightforward, predictable relationships between variables. When we use linear regression, though, we're also imposing a specific view of how the world works. We're saying: "If we change X by this much, Y will change by that much, plus or minus some random noise."

It’s the standard scientific approach, inspired by Newtonian physics, applied to the complex domain of the mind. But human behavior rarely follows such neat, predictable patterns. Our actions and reactions are deeply contextual, interconnected, and often nonlinear. Small changes in circumstances can lead to dramatic shifts in behavior, while major interventions sometimes produce no effect at all.

The result is a replication crisis: Scientists use tools built for physics to make their intuitive theories fit into statistical models, only to find that their results can't be reproduced. Usually, the replication crisis is portrayed as malfeasance on the part of academics looking to “publish or perish” who are "p-hacking" their way to implausible statistically significant results.

But the deeper issue, per the psychologist and philosopher of science Tan Yarkoni, is not necessarily a crisis of replication but one of generalizability. We tend to overstate how universal what we’ve found is; if we were better at replicating the context of the original study—the lab environment, the researchers, the subject population, the specific interventions, and countless other variables—we’d probably replicate the results.

What’s really going on here?

The usual picture of science is that it progresses through a steady accumulation of knowledge, with each discovery building upon previous findings. Scientists construct careful experiments designed to falsify their hypotheses—to truly put their explanations at risk. If a theory survives rigorous attempts to prove it wrong, we can be more confident it captures something true about reality.

The currency of science is what the physicist David Deutsch calls “hard-to-vary” explanations: explanations that act as precisely engineered machines where every part serves an essential purpose.

Think, again, of a mechanical clock—you can't randomly change or remove any gear without breaking the whole system. A good scientific explanation works the same way: If you change any part of the explanation it would no longer accurately predict real-world phenomena.

The idea of “hard-to-vary explanations” contrasts with explanations that bend to accommodate any evidence. Take astrology: If a horoscope prediction fails, astrologers can always adjust their interpretation of planetary positions or add new variables without changing their core claims. It's like having a theory made of rubber—you can stretch and twist it to explain anything, which ironically means it explains nothing.

This picture of science is beautiful, and it works in some domains of knowledge. But it’s not really how science works in practice, and perhaps more importantly, it’s not a good way to work with or understand complex parts of reality.

Paradigms and paradigm shifts

In the 1950s, scientist and philosopher Michael Polanyi made a crucial observation about how science actually works. In his 1958 book Personal Knowledge, he pointed out that scientists constantly make judgment calls that can't be reduced to explicit rules or procedures. When running experiments, they must decide which measurements matter and which are mere artifacts. They must determine whether unexpected results represent genuine discoveries or experimental errors. They must choose which data points to investigate further and which to set aside.

These decisions, Polanyi argued, rely on what he called "tacit knowledge"—a form of knowing that can't be fully articulated or written down in a manual. Just as we can recognize a face without being able to explain exactly how we do it, scientists develop an intuitive feel for their subject matter that guides these crucial judgments. This expertise is passed down through direct apprenticeship rather than just textbooks; young scientists learn by working alongside experienced researchers who demonstrate these unwritten skills.

In the 1970s Thomas Kuhn published The Structure of Scientific Revolutions, which took this idea further. Kuhn argued that science proceeds under paradigms: comprehensive worldviews composed of assumptions, methods, and exemplary solutions that guide research in a field.

Most science, he argued, is the work of researchers who are working under a particular paradigm and attempting to extend its methods to explain more and more phenomena—sort of like solving a jigsaw puzzle. You assume it can be completed with the pieces you have; it just requires hard work and ingenuity. When scientists can’t explain something they usually ignore it as noise, or push it aside as something to be solved later. This is not a moral or ethical failing, just—as Polanyi pointed out—a necessary part of making any progress at all.

But occasionally, anomalies accumulate that the current paradigm can’t explain. The jigsaw puzzle of reality can’t be solved with the provided pieces. This contradiction leads to a crisis and eventually a “paradigm shift”—a new worldview with new tools that solve problems that were previously unsolvable under the old paradigm.

Crucially, new paradigms often take hold because of new tools. Copernicus’s theory of heliocentricity wasn’t taken seriously in his lifetime. Instead, it only began to gain acceptance decades later when Galileo first pointed a telescope at the night sky and observed the phases of Venus, which couldn't be explained by the geocentric model. The telescope provided compelling evidence that forced astronomers to reconsider their fundamental assumptions about the cosmos. Similarly, the microscope revealed an entirely new world of microorganisms, revolutionizing biology and medicine. These technological breakthroughs fundamentally altered how scientists understood and interpreted the natural world.

This brings us, of course, to the new tool at our disposal today—language models—and the new worldview that it is bringing into focus.

Seeing science like a language model

By now it should be obvious why areas of science like psychology seem to be stuck, and also why there is now a possibility for progress where before it seemed hopeless.

Psychology, economics, social science—even areas of biology—have been using the tools of physics, based on a Western rationalist paradigm, to try to understand the universe as if it worked like a clock or a computer. Sometimes it does—sometimes the project of abstraction and generalization yields simple, powerful mathematical theories that predict and explain much of the world.

But, as we’ve already seen, much of the world doesn’t work that way. Psychology—the nearly infinite collection of variables that cause a person to behave, think, or feel a certain way is not amenable to abstraction. You can’t treat context as noise in order to abstract—context is king in domains like this. Psychology is also not amenable to hard-to-vary explanations: Any given outcome could be preceded by a nearly infinite set of causes, just like any given next word in a sequence can be caused by a nearly infinite set of preceding words. The fact that psychological explanations can be changed and still result in the same outcome is a property of a good psychological explanation—just like any given neuron in a neural net can be changed or eliminated without breaking the entire network.

We’ve been blind to this for centuries because the prevailing physics-based paradigm is so powerful and because we haven’t had anything with which to replace it. So we’ve shoved the anomalies and complications to the side, and doubled down on even more careful experiments and statistical analyses, hoping that with enough rigor we could force psychological phenomena into the physics paradigm. This has had predictably mediocre results.

But language models offer a new way of thinking about these complex systems. They show us that meaningful patterns can emerge from vast networks of relationships, without requiring reduction to simple mathematical laws. They offer the prospect of a fundamental paradigm shift in the way science is done.

Predictions over explanations

In this new paradigm, the focus shifts from seeking simple, general, universal explanations to making accurate predictions.

Rather than trying to isolate single causes, we embrace the complex web of interconnected factors that influence outcomes. Language models demonstrate that highly accurate predictions can emerge from statistical patterns across vast datasets, even when we can't trace exact causal chains. This approach may be particularly valuable in fields like psychology, where traditional reductionist methods have often fallen short.

Aggregating data instead of aggregating conclusions

Normal science proceeds through publishing papers: Science is conceived as a group repository of facts and conclusions that researchers contribute to over time. Most of science consists of individual scientists gathering small datasets of 10 or 15 people, conducting an experiment, and hoping for funding for larger scale follow-ups.

In this new paradigm, the way we collect and use scientific data fundamentally changes. Rather than running small studies to test narrow hypotheses, scientists would prioritize gathering rich, high-quality datasets that capture more of the complexity of the phenomena they study. These datasets become valuable as training data for many different models that can discover many more different patterns and relationships than any one individual scientist.

This shift would transform how scientific institutions operate. Instead of thousands of researchers conducting small independent studies, we could create collaborative data trusts—shared repositories of high-quality, ethically collected data. Large companies, which already possess some of the richest behavioral and biological datasets ever assembled, could contribute their data (properly anonymized and protected) to these trusts. Universities could pool their research data. Healthcare systems could share patient outcomes. Each contribution would enhance our collective ability to model and understand complex phenomena.

The goal isn't just to amass data, but to create the kind of comprehensive, nuanced datasets that allow AI models to capture the intricate web of relationships that characterize real-world systems. With access to data trusts, academic and independent researchers could train models that solve real world problems through more accurate predictions.

Empiricism of the contextual instead of the general

Science is empirical, but it values empiricism because it allows us to abstract away from specific situations. This new worldview prizes specific situations. It encourages scientists—and everyone else—to become connoisseurs of the particular, to seek out valuable examples and patterns that enrich our understanding.

In this world, much psychological research may look a lot like the early work of psychoanalysts or of novelists and philosophers. These practitioners deeply study individual cases, recognizing that each person's unique context and narrative hold valuable insights. Rather than dismissing these methods as unscientific, we might see them as early attempts to grapple with complexity in ways that traditional scientific methods can’t capture.

The key difference now is that we have tools—like large language models—that can systematically analyze and learn from vast collections of such detailed, contextual observations.

What comes next

Because many other parts of culture attempt to mimic science in how they operate—startups prove “hypotheses,” for example—the same kinds of changes that are coming for science are coming for them. In the coming weeks we’ll explore how business and creativity will change when we see them like a language model.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

We build AI tools for readers like you. Write brilliantly with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora. Dictate effortlessly with Monologue.