Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
sbarman25 's Collections
Training & Architectures
LLM Related
Med AI Papers
Datasets
Models
Safety / Alignment / Policies / SMI
Evals & Monitoring
Spaces
Agentic
Vulnerabilities
CV / Text-to-Image / Image-to-Image / Diffusion
Others
Hardware-aware Models
Text-to-nD++
Tool Usage (w/VLMs)
Vision Language Models
Audio Stuff

Agentic

updated Feb 5, 2024
Upvote
2

  • GAIA: a benchmark for General AI Assistants

    Paper • 2311.12983 • Published Nov 21, 2023 • 207

  • gaia-benchmark/GAIA

    Updated Feb 13 • 12.3k • 318

  • osunlp/Mind2Web

    Viewer • Updated Jul 19, 2023 • 253 • 712 • 103

  • AppAgent: Multimodal Agents as Smartphone Users

    Paper • 2312.13771 • Published Dec 21, 2023 • 55

  • GPT-4V(ision) is a Generalist Web Agent, if Grounded

    Paper • 2401.01614 • Published Jan 3, 2024 • 23

  • WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

    Paper • 2401.13919 • Published Jan 25, 2024 • 32

  • LARP: Language-Agent Role Play for Open-World Games

    Paper • 2312.17653 • Published Dec 24, 2023 • 34

  • osunlp/TravelPlanner

    Viewer • Updated Jul 14, 2024 • 1.23k • 3.65k • 57

  • TravelPlanner: A Benchmark for Real-World Planning with Language Agents

    Paper • 2402.01622 • Published Feb 2, 2024 • 37

  • A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

    Paper • 2402.00559 • Published Feb 1, 2024 • 3
Upvote
2
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs