Reading List - a jinnovation Collection

jinnovation 's Collections

Reading List

updated May 30, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 30

Note TLDR: Anthropic tries to train a deliberately deceptive LLM.
On the Societal Impact of Open Foundation Models

Paper • 2403.07918 • Published Feb 27, 2024 • 17
JudgeLM: Fine-tuned Large Language Models are Scalable Judges

Paper • 2310.17631 • Published Oct 26, 2023 • 35
Instruction Tuning for Large Language Models: A Survey

Paper • 2308.10792 • Published Aug 21, 2023 • 1
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers

Paper • 2403.02839 • Published Mar 5, 2024 • 1
Holistic Safety and Responsibility Evaluations of Advanced AI Models

Paper • 2404.14068 • Published Apr 22, 2024
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

Paper • 2310.17750 • Published Oct 26, 2023 • 9