Steven Goldfeather

treehugg3

AI & ML interests

None yet

Recent Activity

reacted to merterbak's post with 🔥 27 days ago

OpenAI has released BrowseComp an open source benchmark designed to evaluate the web browsing capabilities of AI agents. This dataset comprising 1,266 questions challenges AI models to navigate the web and uncover complex and obscure information. Crafted by human trainers, the questions are intentionally difficult. (unsolvable by another person in under ten minutes and beyond the reach of existing models like ChatGPT with and without browsing and an early version of OpenAI's Deep Research tool.) Blog Post: https://openai.com/index/browsecomp/ Paper: https://cdn.openai.com/pdf/5e10f4ab-d6f7-442e-9508-59515c65e35d/browsecomp.pdf Code in simple eval repo: https://github.com/openai/simple-evals

new activity 27 days ago

meta-llama/Llama-4-Scout-17B-16E-Instruct:Object Detection?

new activity 27 days ago

mistral-community/pixtral-12b:How do I load the model quantized?

View all activity

Organizations

None yet

treehugg3's activity

reacted to merterbak's post with 🔥 27 days ago

Post

3048

OpenAI has released BrowseComp an open source benchmark designed to evaluate the web browsing capabilities of AI agents. This dataset comprising 1,266 questions challenges AI models to navigate the web and uncover complex and obscure information. Crafted by human trainers, the questions are intentionally difficult. (unsolvable by another person in under ten minutes and beyond the reach of existing models like ChatGPT with and without browsing and an early version of OpenAI's Deep Research tool.)

Blog Post: https://openai.com/index/browsecomp/
Paper: https://cdn.openai.com/pdf/5e10f4ab-d6f7-442e-9508-59515c65e35d/browsecomp.pdf
Code in simple eval repo: https://github.com/openai/simple-evals