Papers
arxiv:2504.10277

RealHarm: A Collection of Real-World Language Model Application Failures

Published on Apr 14
· Submitted by pierlj on Apr 16
Authors:
,
,

Abstract

Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.

Community

Paper author Paper submitter

Takehome message:

  • RealHarm is a collection of problematic interactions between AI Agents and chatbots. It is built from real conversations collected online (from the AI Incident Database among other sources)
  • We built an evidence-based taxonomy from the observed conversations.
  • The most common hazard category is misinformation, while the main consequence for model deployers is reputational damage.
  • Existing safeguard systems are not able to catch these incidents, often struggling to understand the conversational context.

thanks for sharing @alexcombessie and team!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.10277 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.10277 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.