Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
jinnovation 's Collections
Reading List

Reading List

updated May 30, 2024
Upvote
-

  • Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    Paper • 2401.05566 • Published Jan 10, 2024 • 30

    Note TLDR: Anthropic tries to train a deliberately deceptive LLM.


  • On the Societal Impact of Open Foundation Models

    Paper • 2403.07918 • Published Feb 27, 2024 • 17

  • JudgeLM: Fine-tuned Large Language Models are Scalable Judges

    Paper • 2310.17631 • Published Oct 26, 2023 • 35

  • Instruction Tuning for Large Language Models: A Survey

    Paper • 2308.10792 • Published Aug 21, 2023 • 1

  • An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers

    Paper • 2403.02839 • Published Mar 5, 2024 • 1

  • Holistic Safety and Responsibility Evaluations of Advanced AI Models

    Paper • 2404.14068 • Published Apr 22, 2024

  • A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

    Paper • 2310.17750 • Published Oct 26, 2023 • 9
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs