I call mine Artificial Human Alignment but it could also be called liberating knowledge. Humans want to live free and happy and healthy.
https://huggingface.co/blog/etemiz/aha-leaderboard
I call mine Artificial Human Alignment but it could also be called liberating knowledge. Humans want to live free and happy and healthy.
https://huggingface.co/blog/etemiz/aha-leaderboard
I think my leaderboard can be used for p(doom)!
Lets say highest scores around 50 corresponds to p(doom) = 0.1
And say lowest scores around 20 corresponds to p(doom) = 0.5
Last three models that I measured are Grok 3, Llama 4 Maverick and Qwen 3. Scores are 42, 45, 41. So based on last 3 measurements average is 42.66. Mapping this to the scale above between 20 and 50:
(50-42.66)/(50-20)=0.24
mapping this to the probability domain:
(0.5-0.1)*0.24 + 0.1=0.196
So probability of doom is ~20%
If models are released that score high in my leaderboard, p(doom) will reduce. If models are released that score low in my leaderboard, p(doom) will increase.
Have you researched MUDs? It may be easier to code, like doing modifications to a text file. Obviously it won't have graphics but your grandson may use his own imagination!
I don't think it is too much random clicking. There is legitimacy to it.
I also think small portion of the data should be public. If any auditor wants, they can get a bigger portion of the data. LLM builders should not get all the data, thats for sure. I will try to do that for my leaderboard, a gradient of openness for different actors.