The Time For AI Safety Is Now

2020 was the year I started taking the risk of super-intelligent AI seriously. I blame this on OpenAI's language model, GPT-3 which was simultaneously impressive and scary.

Some did people oversell its abilities – it isn't even close to as smart as a human being, and often fails basic tests. For example, when asked 'How many eyes does a horse have?' it replied 'Four. One in the front and three in the rear.' But on the other hand it did much better than anything we've had in the past. It even managed to write a plausible-sounding motivational article that many people thought was written by a real human.

While GPT-3 won't replace human writers anytime soon, it's quite impressive and blows away existing systems like Alexa and Siri. It feels like this year we discovered something fundamental about how to represent textual knowledge, and generate novel content based on that knowledge. Project the improvements we have seen in AI over the last decade out 20 years into the future; what will 2040's models be able to do?

I think 2040's models will be a lot better than GPT-3, and at a broader range of tasks. We've made faster progress in AI than most researchers expected (GPT-3, AlphaGo, AlphaStar, image GANs). I'd expect this progress to continue, and given that investment in AI increases with each breakthrough, progress might even accelerate. My estimate would be that a general-purpose AI (AGI) rivaling human intelligence has a 50% probability of being created in the next 30 years, and 80% in the next 50 years. Predictions of AI experts on this question seem to roughly match these numbers.

I'm also more pessimistic about the impact of highly-intelligent AGI on human well-being than most technologists; I suspect it has a high probability of being an existential threat to humanity, and is a greater risk than climate change, asteroid impacts, nuclear holocaust, bio-terrorism, or most other things. It's honestly the only existential risk I think about these days.

There are multiple ways things can go wrong. General-purpose AI could automate away the need for most jobs (e.g., driving), reduce the average human's influence over their government (e.g., huge mostly-automated drone armies allow authoritarian regimes to proliferate), and put our future in the hands of beings more intelligent than us.

What's more worrying is that we might not invest enough into preparing for this problem until it's too late – humans are extraordinarily bad at addressing long term threats like this. It's just too easy to keep raising the bar for what is considered intelligent ("AI's can only complete with us on Chess, Go, Starcraft, not on real creative work!"), invent ridiculous utopian narratives ("AI would never hurt us, and automation will free up time for humans to do more creative things!"), or otherwise ignore the problem until it is right in front of us ("Living standards have been improving for the last ten thousand years, therefore the future must always be better than the past!").

It looks like there is a small research community (total budget probably < $100M) studying this problem. My goal in 2021 is to learn from them and understand more deeply what the prospects for AGI are, and what are the ways we might align their interests with ours.

The question I'd like to be able to answer convincingly by the end of 2021 is this: How far away is human-level general-purpose artificial intelligence, and how will we align its interests with ours once it is with us?