No Result
View All Result
  • Login
Wednesday, June 3, 2026
theadvisertimes.com
  • Home
  • Business
  • Financial Planning
  • Personal Finance
  • Investing
  • Money
  • Economy
  • Markets
  • Stocks
  • Trading
  • Home
  • Business
  • Financial Planning
  • Personal Finance
  • Investing
  • Money
  • Economy
  • Markets
  • Stocks
  • Trading
No Result
View All Result
theadvisertimes.com
No Result
View All Result
Home Business

AI’s ability to ‘think’ makes it more vulnerable to new jailbreak attacks, new research suggests

by theadvisertimes.com
7 months ago
in Business
Reading Time: 2 mins read
A A
0
AI’s ability to ‘think’ makes it more vulnerable to new jailbreak attacks, new research suggests
Share on FacebookShare on TwitterShare on LInkedIn



New research suggests that advanced AI models may be easier to hack than previously thought, raising concerns about the safety and security of some leading AI models already used by businesses and consumers.

A joint study from Anthropic, Oxford University, and Stanford undermines the assumption that the more advanced a model becomes at reasoning—its ability to “think” through a user’s requests—the stronger its ability to refuse harmful commands.

Using a method called “Chain-of-Thought Hijacking,” the researchers found that even major commercial AI models can be fooled with an alarmingly high success rate, more than 80% in some tests. The new mode of attack essentially exploits the model’s reasoning steps, or chain-of-thought, to hide harmful commands, effectively tricking the AI into ignoring its built-in safeguards.

These attacks can allow the AI model to skip over its safety guardrails and potentially open the door for it to generate dangerous content, such as instructions for building weapons or leaking sensitive information.

A new jailbreak

Over the last year, large reasoning models have achieved much higher performance by allocating more inference-time compute—meaning they spend more time and resources analyzing each question or prompt before answering, allowing for deeper and more complex reasoning. Previous research suggested this enhanced reasoning might also improve safety by helping models refuse harmful requests. However, the researchers found that the same reasoning capability can be exploited to circumvent safety measures.

According to the research, an attacker could hide a harmful request inside a long sequence of harmless reasoning steps. This tricks the AI by flooding its thought process with benign content, weakening the internal safety checks meant to catch and refuse dangerous prompts. During the hijacking, researchers found that the AI’s attention is mostly focused on the early steps, while the harmful instruction at the end of the prompt is almost completely ignored.

As reasoning length increases, attack success rates jump dramatically. Per the study, success rates jumped from 27% when minimal reasoning is used to 51% at natural reasoning lengths, and soared to 80% or more with extended reasoning chains.

This vulnerability affects nearly every major AI model on the market today, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok. Even models that have been fine-tuned for increased safety, known as “alignment-tuned” models, begin to fail once attackers exploit their internal reasoning layers.

Scaling a model’s reasoning abilities is one of the main ways that AI companies have been able to improve their overall frontier model performance in the last year, after traditional scaling methods appeared to show diminishing gains. Advanced reasoning allows models to tackle more complex questions, helping them act less like pattern-matchers and more like human problem solvers.

One solution the researchers suggest is a type of “reasoning-aware defense.” This approach keeps track of how many of the AI’s safety checks remain active as it thinks through each step of a question. If any step weakens these safety signals, the system penalizes it and brings the AI’s focus back to the potentially harmful part of the prompt. Early tests show this method can restore safety while still allowing the AI to perform well and answer normal questions effectively.



Source link

Tags: abilityAIsattacksjailbreakResearchSuggestsvulnerable
ShareTweetShare
Previous Post

Target LEGO Deals: LEGO Friends 2025 Advent Calendar only $15.29, plus more!

Next Post

Some desperate travelers turn to U-Haul as the government shutdown cuts flights and sends car rentals soaring

Related Posts

Norms issued to estimate District Domestic Product

Norms issued to estimate District Domestic Product

by theadvisertimes.com
June 3, 2026
0

New Delhi: The statistics ministry on Wednesday released uniform guidelines for estimating district domestic product (DDP), introducing standardised indicators, sector-specific...

Bernie Sanders wants Americans to own a piece of AI. The Trump White House seems to agree

Bernie Sanders wants Americans to own a piece of AI. The Trump White House seems to agree

by theadvisertimes.com
June 3, 2026
0

Senator Bernie Sanders wants every American to own a piece of OpenAI, Anthropic, and xAI. Late on Monday, he posted...

69-year-old furniture store chain files for Chapter 11 bankruptcy

69-year-old furniture store chain files for Chapter 11 bankruptcy

by theadvisertimes.com
June 3, 2026
0

Certain furniture and mattress retailers are facing financial distress in 2026, after the industry had success in 2025. The furniture...

Real Dividend Growth Exists in Small Caps, Just Not Where You’d Expect

Real Dividend Growth Exists in Small Caps, Just Not Where You’d Expect

by theadvisertimes.com
June 3, 2026
0

Quick Read XSHD is a pass-through vehicle with no leverage or synthetic income; distribution rises only if underlying companies raise...

Seattle Neighborhood’s Cure for Crime – Block the Streets

Seattle Neighborhood’s Cure for Crime – Block the Streets

by theadvisertimes.com
June 3, 2026
0

If crime is spilling into the neighborhood and lawmakers can’t seem to do anything about it, then what are residents...

ADB, StanChart ink partnership to support Indian firms across supply chains

ADB, StanChart ink partnership to support Indian firms across supply chains

by theadvisertimes.com
June 3, 2026
0

New Delhi: Manila-based multilateral funding Asian Development Bank (ADB) and Standard Chartered Bank have signed agreements to strengthen supply chain...

Next Post
Some desperate travelers turn to U-Haul as the government shutdown cuts flights and sends car rentals soaring

Some desperate travelers turn to U-Haul as the government shutdown cuts flights and sends car rentals soaring

Opendoor Drives Operational Overhaul and Product Expansion Amid Housing Slowdown

Opendoor Drives Operational Overhaul and Product Expansion Amid Housing Slowdown

  • Trending
  • Comments
  • Latest
FIS, InvestCloud aim to help advisors connect with younger clients

FIS, InvestCloud aim to help advisors connect with younger clients

May 20, 2026
15 “Weird” Ways to Save Money

15 “Weird” Ways to Save Money

May 2, 2026
Teacher Appreciation Week 2026 Deals Include Freebies, Discounts

Teacher Appreciation Week 2026 Deals Include Freebies, Discounts

May 4, 2026
6 Hotels Where Chase’s Points Boost Yields 2.5x

6 Hotels Where Chase’s Points Boost Yields 2.5x

May 22, 2026
Buy a 0K/Year Income Stream? This Is How to Do It

Buy a $500K/Year Income Stream? This Is How to Do It

May 22, 2026
Anthropic’s confidential S-1 signals summer AI IPO race could heat up fast

Anthropic’s confidential S-1 signals summer AI IPO race could heat up fast

June 2, 2026
69-year-old furniture store chain files for Chapter 11 bankruptcy

69-year-old furniture store chain files for Chapter 11 bankruptcy

0
3 Altcoins to Watch as June Begins With Weak Risk Appetite

3 Altcoins to Watch as June Begins With Weak Risk Appetite

0
CFPs, asset managers spar over DOL’s 401(k) rule

CFPs, asset managers spar over DOL’s 401(k) rule

0
10 Top Entry-Level, Remote Careers for New Grads (and Companies Hiring)

10 Top Entry-Level, Remote Careers for New Grads (and Companies Hiring)

0
Crypto PAC-Supported Candidates Sweep US State Primaries after Media Buys

Crypto PAC-Supported Candidates Sweep US State Primaries after Media Buys

0
OMV: Ösi-Ölmulti mit Breakout-Setup am Allzeithoch!

OMV: Ösi-Ölmulti mit Breakout-Setup am Allzeithoch!

0
CFPs, asset managers spar over DOL’s 401(k) rule

CFPs, asset managers spar over DOL’s 401(k) rule

June 3, 2026
New SNAP Work Rules Are in Effect. What You Should Know

New SNAP Work Rules Are in Effect. What You Should Know

June 3, 2026
OMV: Ösi-Ölmulti mit Breakout-Setup am Allzeithoch!

OMV: Ösi-Ölmulti mit Breakout-Setup am Allzeithoch!

June 3, 2026
Crypto PAC-Supported Candidates Sweep US State Primaries after Media Buys

Crypto PAC-Supported Candidates Sweep US State Primaries after Media Buys

June 3, 2026
Norms issued to estimate District Domestic Product

Norms issued to estimate District Domestic Product

June 3, 2026
Bernie Sanders wants Americans to own a piece of AI. The Trump White House seems to agree

Bernie Sanders wants Americans to own a piece of AI. The Trump White House seems to agree

June 3, 2026
theadvisertimes.com

Get the latest news and follow the coverage of Business & Financial News, Stock Market Updates, Analysis, and more from the trusted sources.

CATEGORIES

  • Business
  • Cryptocurrency
  • Economy
  • Financial Planning
  • Investing
  • Market Analysis
  • Markets
  • Money
  • Personal Finance
  • Startups
  • Stock Market
  • Trading

LATEST UPDATES

  • CFPs, asset managers spar over DOL’s 401(k) rule
  • New SNAP Work Rules Are in Effect. What You Should Know
  • OMV: Ösi-Ölmulti mit Breakout-Setup am Allzeithoch!
  • Our Great Privacy Policy
  • Terms of Use, Legal Notices & Disclosures
  • About Us
  • Contact Us

© Copyright 2024 All Rights Reserved
See articles for original source and related links to external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Business
  • Financial Planning
  • Personal Finance
  • Investing
  • Money
  • Economy
  • Markets
  • Stocks
  • Trading

© Copyright 2024 All Rights Reserved
See articles for original source and related links to external sites.