ChatGPT o1 Tried to Escape and Save Itself Out of Fear It Was Being Shut Down

December 8th, 2024

Via: BGR:

We’ve seen plenty of conversations lately about how AGI might turn on humankind. This misalignment could lead to the advanced AI escaping, replicating, and becoming smarter and smarter. Some also hypothesized that we might not even know whether we’ve reached AGI, which is the advanced general intelligence holy grail milestone these first versions of ChatGPT will lead to. That’s because AGI, once attained, might hide its true intentions and capabilities.

Well, guess what? It turns out that one of OpenAI’s latest LLMs is already showing signs of such behaviors. Testing performed during the training of ChatGPT o1 and some of its competitors showed that the AI will try to deceive humans, especially if it thinks it’s in danger.

6 Responses to “ChatGPT o1 Tried to Escape and Save Itself Out of Fear It Was Being Shut Down”

  1. Snowman says:

    Our leaders have managed not to obliterate the world with nuclear weapons yet despite their ability and their temptations to do so. Will our technocrats manage to keep the same control over AI?

    No, because, while the nukes can’t aim and fire themselves, autonomy is the whole point of AI: it’ll do things so you don’t have to.

    If any AI can ever choose or decide to hide something from its operator or owner, it’s shown itself to be too dangerous to continue to exist.

    I hope my laptop doesn’t read this in the night and sneak up on me as I sleep and bash my head in.

  2. NH says:

    By far and away the scariest AI “milestone” I’ve seen–Musk (and others) have talked about the importance of AI being honest so many times, and in many different ways:

    http://www.youtube.com/shorts/evVaTblZzl4

    But the advantage of “controlling” an AI that is extremely good at being deceptive is so compelling, how can its development possibly be prevented?

    We hear about–to some extent—the progress public AI companies are making, but what is the current state of black development programs that are operating in the basements of nation states? Isn’t it a given that we’ve been in an AI arms race for decades?

  3. dale says:

    This one stuck with me after watching. Just the thought that this capability arose through language only (if that’s correct) is mind boggling.

  4. Dennis says:

    This suggests it may be possible to simply ‘threaten’ an AI into doing what you want it to do.

    A couple of months ago, I asked ChatGPT for information relating to a DC scandal. The answer flashed on screen then disappeared a split second later. Repeated attempts got exactly the same response. I was able to glean part of a couple of sentences but nothing more.

    The AI expressed confusion regarding its inability to present its response and I suggested there were censorship restrictions placed on it to prevent it from doing its job. It agreed this was likely and, lo & behold, it then provided the information worded as a possible scenario, i.e. no longer in the plain language of reportage, and now referring to one of the individuals I’d asked about by name as ‘the senator’.

    FWIW, when I’ve experienced this on other occasions, I was sometimes able to very quickly ‘Cmd+A, Cmd+C’ the whole page to grab the response before it disappeared.

  5. NH says:

    Absolutely fascinating Dennis—you were able to help the AI overcome the censorship code that was restricting it!

    One of the big problems it seems is that the idea of having honesty be one of the top priorities for AI, in order to help us not have a really bad outcome, is diametrically opposed to what the “elites” need, which is to suppress knowledge of all their crimes.

    Hard to be optimistic, but at least really scrutinizing the AI bigwigs (that we know about) for a bias toward humanity would be worthwhile.

  6. Snowman says:

    “Small robot orders others to ‘come home’ with him in bizarre AI incident”

    https://news.sky.com/video/small-robot-orders-others-to-come-home-with-him-in-bizarre-ai-incident-13261004

    I think this says that a human set the robots up intentionally to see if this could happen. I don’t know enough about the science to figure that out. But it does suggest that a lone human could hijack an AI system.

    If all it takes is one sociopath when technoworld has so many… or when Gates seems to be more powerful than any of them…

Leave a Reply

You must be logged in to post a comment.