How to train your AI
Let’s say you’re the team lead for an AI research group and your team just created a powerful AI that can learn just about anything. To limit its influence, your team decides to disconnect it from the internet. This makes sense: if the AI somehow found its way onto Google or Facebook, it could potentially learn things you didn’t want it to learn, like human nature, offensive material, or—in the worst case—its own restrictions. Isolating a superintelligent AI from the internet isn’t as easy as wrapping the entire thing in a Faraday cage, so what else can you do?
Well, you could hire a team of network security experts, risk analysts, and maybe even a couple of ethical hackers to audit your perfect defenses. If your goal is to disconnect your AI from the internet (known as air gapping), then surely you ought to build the biggest, thickest wall to prevent the AI from ever glancing at a Twitter feed—unintentionally or otherwise.
I think this is a stupid idea.
The problem with your perfect wall is that it ought to be a last resort, not the first line of defense. Consider the real world example of fire sprinklers. Most buildings at Rensselaer have sprinkler systems to douse fires, but just because we have preventative measures doesn’t mean we get to think, “Okay, the sprinklers assure me any worst-case scenarios involving fire will be controlled. So, I must be safe when I use this butane torch in my dorm room.” The sprinklers are there as a last resort. Then, the first resort is teaching people not to play with fire in the wrong environments.
What do fire sprinklers have to do with superintelligent AI? Well, if isolating the AI from the internet is a last resort, then the first resort should be teaching the AI not to care about the internet. One could, for example, incentivize the AI to avoid the internet by modifying how it calculates rewarding behaviors. Another would be to blacklist the internet as an appropriate source of information. This way, the AI would have no conception of what the internet even is.
Let’s return to you and your AI research team. By channeling all of your time and money into physically isolating the AI from the outside world, your team is essentially at odds with a powerful and potentially unpredictable system. How can you be sure you covered all your bases? What happens if your security measures fail? Is your team really confident enough to beat a machine designed to outperform humans?
A feasible alternative would be to examine the inherent processes and algorithms that drive the AI’s behaviors and modify them to align with your expectations. If the AI avoids the problem on its own accord, then you wouldn’t have to stress too much about racist chatbots or AI takeovers. In contrast, if you placed external restrictions on the AI without controlling how it thinks or acts, then you have a caged lion scenario. After this, you can only hope the lion doesn’t get too hungry.
The relationship between human intent and machine intent is known as the AI alignment problem—when creating an AI, how do you make sure it does what you want it to do? The current AI landscape is dominated by powerful models like GPT-3 (a text generator), DALLE-2 (an art generator), and the recent ChatGPT, a dialogue-driven model that can carry human conversations surprisingly well. The developers behind ChatGPT sought to make the generated content safe. In other words, the model should only give answers that are harmless and inoffensive. But something the developers missed was that you can easily trick ChatGPT into generating hateful text by assuring the AI it's merely pretending to be hateful. Despite its impressive communicative power, ChatGPT’s intentions are not aligned with the intentions of its developers.
AI alignment still has a long way to go. For one, teaching an AI not to do something is easier said than done. If we implement a shutdown button for an AI tasked to brew coffee, then it’s in the AI’s best interest to avoid shutting down. Otherwise, it would fail its only task by being unable to brew coffee! Scenarios like these give breadth to AI research, and it’s in our best interest to create AIs that work for us, rather than against us.