Clue-instruct dataset and different models fine-tuned on it.
Andrea Zugarini
azugarini
AI & ML interests
Natural Language Processing, Language Models, Language Model Compression
Recent Activity
upvoted
a
paper
6 days ago
The FineWeb Datasets: Decanting the Web for the Finest Text Data at
Scale
updated
a dataset
16 days ago
expertai/PharmaER.IT
updated
a dataset
about 2 months ago
azugarini/crossword-clues-QA
Organizations
Collections
2
Collection of research on tokenizers' adaptation to specific domains and/or languages. Special focus on sequence compression directions
-
Fast Vocabulary Transfer for Language Model Compression
Paper • 2402.09977 • Published • 2 -
Multi-Word Tokenization for Sequence Compression
Paper • 2402.09949 • Published -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 5 -
Language Model Tokenizers Introduce Unfairness Between Languages
Paper • 2305.15425 • Published • 1