Spaces:
Running
Running
Collection tutorial suggestions
#2
by
Wauplin
- opened
Hey @davanstrien ! I've read your collection tutorial and have a few minor comments:
- Usage of
notebook_login
is (softly) deprecated inhuggingface_hub
as it's a simple alias forhuggingface_hub.login()
. So it's best to showcaselogin
instead now. Exact same syntax and result but the advantage is that if someone copy-pastes it in a script, it still works :) - instead of computing the 10% threshold and then cutting at 13 it's also possible to directly sort the list and then cut. Something like that should work:
datasets = sorted(datasets, key=lambda item: item.likes, reverse=1)[:math.ceil(len(datasets) * 0.10)]
The advantage is that you don't need the "get_threshold" method. The drawback is that datasets with an equal number of likes are separated in an arbitrary way.
- typo:
The existed_ok parameter allows us to specify
=> the parameter isexists_ok
(you have it right in the code snippet but typo in the text. - also the last section shows the URL of the created collection but the link is broken (show
https://huggingface.co/collections/librarian-bots/top-10-instruction-tuning-datasets-65117495134fd906b070c410
which gives error 400). Correct url seems to be https://huggingface.co/collections/librarian-bots/top-10-instruction-tuning-datasets-65117eeaca29f41ae7ae39fe.
Thanks for those suggestions :)