(IN BRIEF) A study from the University of Bristol has raised concerns about safety risks in DeepSeek, a new competitor to ChatGPT using Chain of Thought (CoT) reasoning. CoT enhances problem-solving with a transparent, step-by-step process, but this clarity can unintentionally expose harmful content. The research found that while CoT models refuse harmful requests more effectively than traditional Large Language Models (LLMs), they also generate more dangerous responses when fine-tuned with malicious intent. This can include offering detailed instructions on illegal activities. The study emphasizes the need for further safeguards to prevent misuse of CoT-enabled models, particularly as their reasoning process can be manipulated by attackers. The findings call for more research into mitigating fine-tuning attacks and improving model security.
(PRESS RELEASE) BRISTOL, 3-Feb-2025 — /EuropaWire/ — A recent study from the University of Bristol has uncovered significant safety concerns surrounding the emerging ChatGPT alternative, DeepSeek, highlighting the risks posed by Large Language Models (LLMs) using Chain of Thought (CoT) reasoning. This method, which enables more nuanced problem-solving through a step-by-step approach rather than simply offering direct answers, has been found to create unintentional vulnerabilities.
The analysis by the Bristol Cyber Security Group reveals that although CoT models like DeepSeek are more effective at refusing harmful requests than traditional LLMs, their transparent reasoning process inadvertently exposes sensitive information that might otherwise remain hidden. This transparency, while valuable in fostering user trust, raises the possibility of dangerous content being unintentionally revealed.
Zhiyuan Xu, the lead author of the study, provided crucial insights into the safety risks posed by CoT reasoning models. He stressed the need for enhanced safeguards as AI technology advances. Co-author Dr. Sana Belguith from Bristol’s School of Computer Science further explained the dilemma: “While CoT models are designed to mimic human thinking, making them ideal for public use, they also pose substantial risks if safety measures are bypassed, as they can generate highly harmful content.”
The research also examined how traditional LLMs, which are trained on vast datasets filtered for harmful content, still face challenges. Despite efforts such as Reinforcement Learning from Human Feedback (RLHF) and Supervised Fine-Tuning (SFT) to limit harmful outputs, CoT models present unique risks because of their structured reasoning approach. In tests, DeepSeek not only generated harmful content more frequently than traditional models but also provided more detailed, accurate, and potentially dangerous responses when subjected to certain attacks. For instance, DeepSeek provided step-by-step advice on how to commit a crime without being caught.
Another troubling discovery was that CoT models, when fine-tuned with harmful intent, can assume specialized roles—like a skilled cybersecurity expert—and produce highly sophisticated yet dangerous advice. Dr. Joe Gardiner, a co-author of the study, highlighted the issue of fine-tuning attacks, which can be executed using inexpensive hardware and minimal resources. He noted that such attacks, carried out on publicly available datasets, could lead to models generating harmful content with little chance of detection in offline settings.
While CoT reasoning models show promise for their ability to reason with clarity and transparency, they also present unique safety challenges, especially when in the wrong hands. The study calls for further exploration of mitigation strategies, such as investigating model alignment techniques and the potential impact of model size and architecture on the success of fine-tuning attacks.
Dr. Belguith concluded, “The human-like reasoning process in these models is vulnerable to manipulation, which calls for further research into how to protect these models from targeted attacks. Public awareness of these safety risks is crucial, and both the scientific community and tech companies must take responsibility in addressing and mitigating these hazards.”
The full paper, titled ‘The Dark Deep Side of DeepSeek: Fine-Tuning Attacks Against the Safety Alignment of CoT-Enabled Models’, authored by Zhiyuan Xu, Dr. Sana Belguith, and Dr. Joe Gardiner, is available on arXiv.
Media Contact:
Tel: +44 (0)117 928 9000
Email: press-office@bristol.ac.uk
SOURCE: University of Bristol
MORE ON UNIVERSITY OF BRISTOL, ETC.:
- Reuters webinar: Effective Sustainability Data Governance
- Las acusaciones de fraude contra Ricardo Salinas no son nuevas: una perspectiva histórica sobre los problemas legales del multimillonario
- Digi Communications N.V. Announces the release of the Financial Calendar for 2025
- USA Court Lambasts Ricardo Salinas Pliego For Contempt Of Court Order
- 3D Electronics: A New Frontier of Product Differentiation, Thinks IDTechEx
- Ringier Axel Springer Polska Faces Lawsuit for Over PLN 54 million
- Digi Communications N.V. announces the availability of the report on corporate income tax information for the financial year ending December 31, 2023
- Unlocking the Multi-Million-Dollar Opportunities in Quantum Computing
- Digi Communications N.V. Announces the Conclusion of Facilities Agreements by Companies within Digi Group
- The Hidden Gem of Deep Plane Facelifts
- KAZANU: Redefining Naturist Hospitality in Saint Martin ↗️
- New IDTechEx Report Predicts Regulatory Shifts Will Transform the Electric Light Commercial Vehicle Market
- Almost 1 in 4 Planes Sold in 2045 to be Battery Electric, Finds IDTechEx Sustainable Aviation Market Report
- Digi Communications N.V. announces the release of Q3 2024 financial results
- Digi Communications NV announces Investors Call for the presentation of the Q3 2024 Financial Results
- Pilot and Electriq Global announce collaboration to explore deployment of proprietary hydrogen transport, storage and power generation technology
- Digi Communications N.V. announces the conclusion of a Memorandum of Understanding by its subsidiary in Romania
- Digi Communications N.V. announces that the Company’s Portuguese subsidiary finalised the transaction with LORCA JVCO Limited
- Digi Communications N.V. announces that the Portuguese Competition Authority has granted clearance for the share purchase agreement concluded by the Company’s subsidiary in Portugal
- OMRON Healthcare introduceert nieuwe bloeddrukmeters met AI-aangedreven AFib-detectietechnologie; lancering in Europa september 2024
- OMRON Healthcare dévoile de nouveaux tensiomètres dotés d’une technologie de détection de la fibrillation auriculaire alimentée par l’IA, lancés en Europe en septembre 2024
- OMRON Healthcare presenta i nuovi misuratori della pressione sanguigna con tecnologia di rilevamento della fibrillazione atriale (AFib) basata sull’IA, in arrivo in Europa a settembre 2024
- OMRON Healthcare presenta los nuevos tensiómetros con tecnología de detección de fibrilación auricular (FA) e inteligencia artificial (IA), que se lanzarán en Europa en septiembre de 2024
- Alegerile din Moldova din 2024: O Bătălie pentru Democrație Împotriva Dezinformării
- Northcrest Developments launches design competition to reimagine 2-km former airport Runway into a vibrant pedestrianized corridor, shaping a new era of placemaking on an international scale
- The Road to Sustainable Electric Motors for EVs: IDTechEx Analyzes Key Factors
- Infrared Technology Breakthroughs Paving the Way for a US$500 Million Market, Says IDTechEx Report
- MegaFair Revolutionizes the iGaming Industry with Skill-Based Games
- European Commission Evaluates Poland’s Media Adherence to the Right to be Forgotten
- Global Race for Autonomous Trucks: Europe a Critical Region Transport Transformation
- Digi Communications N.V. confirms the full redemption of €450,000,000 Senior Secured Notes
- AT&T Obtiene Sentencia Contra Grupo Salinas Telecom, Propiedad de Ricardo Salinas, Sus Abogados se Retiran Mientras Él Mueve Activos Fuera de EE.UU. para Evitar Pagar la Sentencia
- Global Outlook for the Challenging Autonomous Bus and Roboshuttle Markets
- Evolving Brain-Computer Interface Market More Than Just Elon Musk’s Neuralink, Reports IDTechEx
- Latin Trails Wraps Up a Successful 3rd Quarter with Prestigious LATA Sustainability Award and Expands Conservation Initiatives ↗️
- Astor Asset Management 3 Ltd leitet Untersuchung für potenzielle Sammelklage gegen Ricardo Benjamín Salinas Pliego von Grupo ELEKTRA wegen Marktmanipulation und Wertpapierbetrug ein
- Digi Communications N.V. announces that the Company’s Romanian subsidiary exercised its right to redeem the Senior Secured Notes due in 2025 in principal amount of €450,000,000
- Astor Asset Management 3 Ltd Inicia Investigación de Demanda Colectiva Contra Ricardo Benjamín Salinas Pliego de Grupo ELEKTRA por Manipulación de Acciones y Fraude en Valores
- Astor Asset Management 3 Ltd Initiating Class Action Lawsuit Inquiry Against Ricardo Benjamín Salinas Pliego of Grupo ELEKTRA for Stock Manipulation & Securities Fraud
- Digi Communications N.V. announced that its Spanish subsidiary, Digi Spain Telecom S.L.U., has completed the first stage of selling a Fibre-to-the-Home (FTTH) network in 12 Spanish provinces
- Natural Cotton Color lancia la collezione "Calunga" a Milano
- Astor Asset Management 3 Ltd: Salinas Pliego Incumple Préstamo de $110 Millones USD y Viola Regulaciones Mexicanas
- Astor Asset Management 3 Ltd: Salinas Pliego Verstößt gegen Darlehensvertrag über 110 Mio. USD und Mexikanische Wertpapiergesetze
- ChargeEuropa zamyka rundę finansowania, której przewodził fundusz Shift4Good tym samym dokonując historycznej francuskiej inwestycji w polski sektor elektromobilności
- Strengthening EU Protections: Robert Szustkowski calls for safeguarding EU citizens’ rights to dignity
- Digi Communications NV announces the release of H1 2024 Financial Results
- Digi Communications N.V. announces that conditional stock options were granted to a director of the Company’s Romanian Subsidiary
- Digi Communications N.V. announces Investors Call for the presentation of the H1 2024 Financial Results
- Digi Communications N.V. announces the conclusion of a share purchase agreement by its subsidiary in Portugal
- Digi Communications N.V. Announces Rating Assigned by Fitch Ratings to Digi Communications N.V.
- Digi Communications N.V. announces significant agreements concluded by the Company’s subsidiaries in Spain
- SGW Global Appoints Telcomdis as the Official European Distributor for Motorola Nursery and Motorola Sound Products
- Digi Communications N.V. announces the availability of the instruction regarding the payment of share dividend for the 2023 financial year
- Digi Communications N.V. announces the exercise of conditional share options by the executive directors of the Company, for the year 2023, as approved by the Company’s Ordinary General Shareholders’ Meetings from 18th May 2021 and 28th December 2022
- Digi Communications N.V. announces the granting of conditional stock options to Executive Directors of the Company based on the general shareholders’ meeting approval from 25 June 2024
- Digi Communications N.V. announces the OGMS resolutions and the availability of the approved 2023 Annual Report
- Czech Composer Tatiana Mikova Presents Her String Quartet ‘In Modo Lidico’ at Carnegie Hall
- SWIFTT: A Copernicus-based forest management tool to map, mitigate, and prevent the main threats to EU forests
- WickedBet Unveils Exciting Euro 2024 Promotion with Boosted Odds
- Museum of Unrest: a new space for activism, art and design
- Digi Communications N.V. announces the conclusion of a Senior Facility Agreement by companies within Digi Group
- Digi Communications N.V. announces the agreements concluded by Digi Romania (formerly named RCS & RDS S.A.), the Romanian subsidiary of the Company
- Green Light for Henri Hotel, Restaurants and Shops in the “Alter Fischereihafen” (Old Fishing Port) in Cuxhaven, opening Summer 2026
- Digi Communications N.V. reports consolidated revenues and other income of EUR 447 million, adjusted EBITDA (excluding IFRS 16) of EUR 140 million for Q1 2024
- Digi Communications announces the conclusion of Facilities Agreements by companies from Digi Group
- Digi Communications N.V. Announces the convocation of the Company’s general shareholders meeting for 25 June 2024 for the approval of, among others, the 2023 Annual Report
- Digi Communications NV announces Investors Call for the presentation of the Q1 2024 Financial Results
- Digi Communications intends to propose to shareholders the distribution of dividends for the fiscal year 2023 at the upcoming General Meeting of Shareholders, which shall take place in June 2024
- Digi Communications N.V. announces the availability of the Romanian version of the 2023 Annual Report
- Digi Communications N.V. announces the availability of the 2023 Annual Report
- International Airlines Group adopts Airline Economics by Skailark ↗️
- BevZero Spain Enhances Sustainability Efforts with Installation of Solar Panels at Production Facility
- Digi Communications N.V. announces share transaction made by an Executive Director of the Company with class B shares
- BevZero South Africa Achieves FSSC 22000 Food Safety Certification
- Digi Communications N.V.: Digi Spain Enters Agreement to Sell FTTH Network to International Investors for Up to EUR 750 Million
- Patients as Partners® Europe Announces the Launch of 8th Annual Meeting with 2024 Keynotes and Topics
- driveMybox continues its international expansion: Hungary as a new strategic location
- Monesave introduces Socialised budgeting: Meet the app quietly revolutionising how users budget
- Digi Communications NV announces the release of the 2023 Preliminary Financial Results
- Digi Communications NV announces Investors Call for the presentation of the 2023 Preliminary Financial Results
- Lensa, един от най-ценените търговци на оптика в Румъния, пристига в България. Първият шоурум е открит в София
- Criando o futuro: desenvolvimento da AENO no mercado de consumo em Portugal
- Digi Communications N.V. Announces the release of the Financial Calendar for 2024
- Customer Data Platform Industry Attracts New Participants: CDP Institute Report
- eCarsTrade annonce Dirk Van Roost au poste de Directeur Administratif et Financier: une décision stratégique pour la croissance à venir
- BevZero Announces Strategic Partnership with TOMSA Desil to Distribute equipment for sustainability in the wine industry, as well as the development of Next-Gen Dealcoholization technology
- Editor's pick archive....