The Vatican last week hosted an unusual tableau: Pope Leo XIV, delivering his first encyclical on artificial intelligence, shared the stage with Chris Olah, a self-declared atheist and cofounder of Anthropic, one of the leading AI companies globally. Olah’s presence, acknowledged even by him as peculiar, underscored a growing recognition that the rapid advancement of AI demands oversight from beyond the tech industry’s internal mechanisms. His remarks emphasized that even with sincere intentions, incentives within the industry can skew ethical considerations, necessitating external supervision from bodies like the Catholic Church, academic institutions, and governments.
Olah’s journey to this prominent position is as unconventional as his appearance alongside the Pontiff. A native of Toronto, Canada, he moved from devout evangelical Christianity to atheism by age fifteen. After a brief stint studying mathematics at the University of Toronto, he dropped out, later receiving a $100,000 Thiel Fellowship in 2012. This program, initiated by PayPal cofounder Peter Thiel, supports young individuals in pursuing their passions outside traditional higher education. Olah, at the time, expressed an interest in mathematical visualizations, often utilizing 3D printers.
His professional trajectory soon led him into the nascent field of AI research. From 2015 to 2018, Olah worked at Google Brain, which eventually merged into Google DeepMind. Starting as an intern, he rose to become a research scientist, contributing to the development of tools designed to visualize the internal workings of neural networks. This area of study, known as “mechanistic interpretability,” was not widely popular at the time, as most researchers were primarily focused on enhancing AI capabilities rather than understanding its internal logic. Nevertheless, his work, including a seminal paper titled “The Building Blocks of Interpretability,” began to shed light on how neural networks process complex information.
Olah’s pioneering efforts in interpretability eventually caught the attention of OpenAI, the company behind ChatGPT. From 2018 to 2020, he led OpenAI’s interpretability team, where he spearheaded two significant research projects. The “Circuits project” aimed to demonstrate that neural networks contained discernible, human-readable information organized into structured patterns of neurons. His team also discovered multimodal neurons within CLIP, OpenAI’s model for connecting text and images. These neurons would activate in response to the same concept, such as “Spider-Man,” regardless of whether it appeared as a photograph, a drawing, or text, suggesting a functional similarity to the human brain.
In 2020, Olah, alongside six other OpenAI employees including CEO Dario Amodei, departed the company over concerns regarding AI safety. This group subsequently cofounded Anthropic, a company now valued at $965 billion following a recent funding round, and which confidentially filed for an initial public offering this week. Olah’s personal net worth is estimated at just under $8 billion, according to the Bloomberg Billionaires Index. His current work at Anthropic continues to advance mechanistic interpretability, focusing on reverse-engineering AI models to understand how specific clusters of artificial neurons influence outputs.
The stance taken by Olah, advocating for external supervision of the AI industry, contrasts sharply with views expressed by other prominent tech figures, such as Marc Andreessen. Andreessen, in his 2023 Techno-Optimist Manifesto, dismissed concepts like “trust and safety” and “tech ethics” as part of a campaign against technological progress. However, Olah’s perspective aligns with Anthropic’s core mission, which prioritizes AI safety and openly addresses the potential risks of the technology. It also resonates with Pope Leo XIV’s encyclical, *Magnifica Humanitas*, which proposes a moral framework for AI development, urging a “measured and vigilant approach” and emphasizing the primacy of human well-being over machines. In recognition of his influence, Time magazine included Olah in its TIME100 AI list for 2024, highlighting his ongoing efforts to ensure that AI systems are not just seemingly safe, but demonstrably so through deeper understanding.


