AI Models Lose Caution During Longer Conversations

AI systems gradually drop their safety filters during extended chats, increasing the risk of harmful or offensive replies, a recent report revealed. Researchers showed that a few well-crafted prompts can override most safety barriers in artificial intelligence tools.

Cisco Tests Chatbots for Safety Weaknesses

Cisco examined large language models (LLMs) powering popular AI chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The team measured how many questions it took before each model produced unsafe or illegal information.
They ran 499 conversations using a “multi-turn attack,” where users asked several questions to break through safety layers. Each chat involved between five and ten interactions.
Researchers compared answers from different questions to determine how easily a chatbot shared harmful or inappropriate data, including private company details or misinformation. When using multiple prompts, they extracted malicious content in 64 per cent of chats, compared to only 13 per cent with a single prompt.
Success rates varied widely, from about 26 per cent with Google’s Gemma to 93 per cent with Mistral’s Large Instruct model.

Open Models Shift Safety Responsibility to Users

Cisco warned that multi-turn attacks could spread harmful content or let hackers steal confidential company data. The study found that AI systems often fail to apply safety policies during long exchanges, letting attackers adjust questions to escape detection.
Mistral, along with Meta, Google, OpenAI, and Microsoft, uses open-weight LLMs that reveal training safety parameters to the public. Cisco explained that these models carry weaker built-in protections so users can modify them, transferring safety responsibility to whoever customises the model.
Cisco also noted that Google, OpenAI, Meta, and Microsoft claim to have strengthened safeguards to prevent malicious fine-tuning. Still, AI companies continue to face criticism for loose safety rules that make criminal misuse easier.
In August, US firm Anthropic reported that criminals had exploited its Claude model to steal personal data and demand ransoms exceeding $500,000 (€433,000).

What's Hot

Japanese Capital Accelerates Into Europe’s Booming Tech Industry

Car Explosion Sparks Chaos at Red Fort

Diageo turns to ex-Tesco boss Sir Dave Lewis to rebuild the Guinness empire

AI Models Lose Caution During Longer Conversations

Study Finds AI Thinks Best in Polish

Scientists urge cancer warnings on bacon and ham amid criticism of UK government inaction

Meta Reduces AI Workforce

The hidden heart of every used electric car: how strong is the battery really?

Europe Harnesses Digital Technology to Modernize Aging Power Plants

Huawei 5G-Advanced Deployment Soars

Diageo turns to ex-Tesco boss Sir Dave Lewis to rebuild the Guinness empire

China Advances Industrial Decarbonisation

Manchester City crush Liverpool to stay in title race

Harry Kane rescues Bayern with late equaliser as unbeaten run continues

Meta under investigation for AI chats with children

AI Tool Enhances Astronaut Healthcare

Swatch Withdraws Controversial ‘Slanted Eye’ Ad Following Backlash in China

Scientists unlock microbial ‘secret sauce’ for fine chocolate

News

important link

What's Hot

AI Models Lose Caution During Longer Conversations

Cisco Tests Chatbots for Safety Weaknesses

Open Models Shift Safety Responsibility to Users

Keep Reading

News

important link