OpenAI is paying researchers to stop superintelligent AI from going rogue
OpenAI is offering $10 million in grants to support technical research on how we can control artificial intelligence systems that are way smarter than the average human.
The company hopes that the results from its grants on aligning future superhuman AI systems, or Superalignment Fast Grants, will help shed light on how strong models generalize from weak supervision.
The AI lab also hopes to understand how AI systems can be used to evaluate the outputs of newer AI systems, and how to build an AI lie detector.
Hard to properly evaluate millions of lines of code
OpenAI claims that superhuman AI systems will be too complex for humans to fully understand. If a model generates a million lines of complicated code, we won’t be able to reliably evaluate whether the code is safe or dangerous to run.
In a post on X, OpenAI wrote: "Figuring out how to ensure future superhuman AI systems are aligned and safe is one of the most important unsolved technical problems in the world. But we think it is a solvable problem. There is lots of low-hanging fruit, and new researchers can make enormous contributions!"
So while current systems rely on human supervision, in the future our brain power may no longer be enough. Hence, OpenAI wants to get a head start to find ways in which humans can still effectively be in charge.
Who can claim the OpenAI grants?
The grants are mainly intended for academic labs, non-profits, and individual researchers. OpenAI is also sponsoring a one-year $150K OpenAI Superalignment Fellowship for graduate students.
The company’s research identified seven practices to keep AI systems safe and accountable and it wants to help fund further research that can work on some open questions that emerged from this work.
As part of the Superalignment grants it aims to award Agentic AI Research Grants of between $10,000 to $100,000 in this area for researchers to investigate the impacts of superintelligent AI systems and practices to make them safe.
What is an agentic AI system?
OpenAI refers to this superintelligent AI as agentic AI systems, ones capable of a wide range of actions and reliable enough that, in certain circumstances, a user could trust them to autonomously act on complex goals on their behalf.
You could for instance ask your agentic personal assistant for help baking a cake and it will print out a recipe for you and ensure all the necessary ingredients are ordered and delivered to your home in time.
OpenAI researchers said society will only be able to harness the full benefits of agentic AI systems if it can make them safe by mitigating their failures, vulnerabilities, and abuses.
Here the company is particularly interested in how one can evaluate whether an agentic AI system is appropriate for a given use case, when actions should require explicit human approval, and how we can design such intelligent systems in a way that allows us to view their internal reasoning.
What it means for the future
The current AI tools available to the public are certainly impressive but they're still not yet at the level of superintelligence that OpenAI is referring to — AI that is vastly smarter than humans.
However, thanks to its CEO Sam Altman we know they’re already working on their new GPT-5 model which he says could possess hints at superintelligence.
OpenAI anticipates that superintelligence could be developed within the next ten years. The answer to how worried we should be may depend on the results of OpenAI’s latest research grants.