NatLibFi (National Library of Finland)

posted an update about 8 hours ago

Post

315

We ( @osma , @MonaLehtinen & me, i.e. the Annif team at the National Library of Finland) recently took part in the LLMs4Subjects challenge at the SemEval-2025 workshop. The task was to use large language models (LLMs) to generate good quality subject indexing for bibliographic records, i.e. titles and abstracts.

We are glad to report that our system performed well; it was ranked

🥇 1st in the category where the full vocabulary was used
🥈 2nd in the smaller vocabulary category
🏅 4th in the qualitative evaluations.

14 participating teams developed their own solutions for generating subject headings and the output of each system was assessed using both quantitative and qualitative evaluations. Research papers about most of the systems are going to be published around the time of the workshop in late July, and many pre-prints are already available.

We applied Annif together with several LLMs that we used to preprocess the data sets: translated the GND vocabulary terms to English, translated bibliographic records into English and German as required, and generated additional synthetic training data. After the preprocessing, we used the traditional machine learning algorithms in Annif as well as the experimental XTransformer algorithm that is based on language models. We also combined the subject suggestions generated using English and German language records in a novel way.

More information can be found in our system description preprint: Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs (2504.19675)

See also the task description preprint: SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog (2504.07199)

The Annif models trained for this task are available here: NatLibFi/Annif-LLMs4Subjects-data

juhoinkinen

posted an update about 2 months ago

Post

637

Annif is a subject indexing toolkit developed by the National Library of Finland: https://github.com/NatLibFi/Annif

Last November we organized a survey for Annif users, and now the results have been published: https://www.doria.fi/bitstream/handle/10024/190930/Annif%20Users%20Survey.pdf

The report includes an overview of:
• The vocabularies and datasets that are used with Annif
• The workflows that Annif is integrated with
• The problems Annif users are facing

The average ratings for various aspects and features of Annif given by users are shown. In short, in a scale from 1 to 5, the ratings are:
• Overall: 4.4
• Features and functions: 4.1
• Documentation: 4.5
• Smoothness of initial setup: 4.2
• Usability: 4.4
• Achieved quality of subject suggestions: 3.6

The survey also gathered user views on the improvements and new features, which are briefly discussed in the report.

6 replies

·

juhoinkinen

updated 2 models about 2 months ago

NatLibFi/FintoAI-data-KAUNO

Text Classification • Updated Mar 18 • 2

NatLibFi/FintoAI-data-YSO

Text Classification • Updated Mar 18 • 2

juhoinkinen

updated a Space about 2 months ago

5

Annif - Text Classification and Subject Indexing

📕

A tool for libraries, archives and museums

juhoinkinen

updated a model 2 months ago

NatLibFi/Annif-LLMs4Subjects-data

Text Classification • Updated Feb 27

juhoinkinen

published a model 2 months ago

NatLibFi/Annif-LLMs4Subjects-data

Text Classification • Updated Feb 27

juhoinkinen

updated a collection 3 months ago

Annif models

Collection

Annif models for text classification and subject indexing. FintoAI prefixed models are in use at Finto AI: https://ai.finto.fi • 6 items • Updated Feb 13 • 3

juhoinkinen

updated a model 7 months ago

NatLibFi/Annif-tutorial

Text Classification • Updated Oct 3, 2024 • 1

juhoinkinen

posted an update 7 months ago

Post

407

Annif 1.2 has been released!

https://github.com/NatLibFi/Annif/releases/tag/v1.2.0

This release introduces language detection capabilities in the REST API and CLI, improves 🤗 Hugging Face Hub integration, and also includes the usual maintenance work and minor bug fixes.

The new REST API endpoint /v1/detect-language expects POST requests that contain a JSON object with the text whose language is to be analyzed and a list of candidate languages. Similarly, the CLI has a new command annif detect-language. Annif projects are typically language specific, so a text of a given language needs to be processed with a project intended for that language; the language detection feature can help in this. For details see this [Wiki page](https://github.com/NatLibFi/Annif/wiki/Language-detection). The language detection is performed with the Simplemma library by [@adbar](https://github.com/adbar) et al.

The annif download command has a new --trust-repo option, which needs to be used if the repository to download from has not been used previously (that is if the repository does not appear in the local Hugging Face Hub cache). This option is introduced to raise awareness of the risks of downloading projects from the internet; the project downloads should only be done from trusted sources. For more information see the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/en/security-pickle).

This release also includes automation of downloading the NLTK datapackage used for tokenization to simplify Annif installation. Maintenance tasks include upgrading dependencies, including a new version of Simplemma that allows better control over memory usage. The bug fixes include restoring the --host option of the annif run command.

Python 3.12 is now fully supported (previously NN-ensemble and STWFSA backends were not supported on Python 3.12).

NatLibFi/Annif

juhoinkinen

updated a model 7 months ago

NatLibFi/FintoAI-data-YKL

Text Classification • Updated Oct 1, 2024 • 1

osma

updated a model 8 months ago

NatLibFi/Qwen2-0.5B-Instruct-FinGreyLit-GGUF

Updated Sep 2, 2024 • 10 • 2

osma

updated a model 9 months ago

NatLibFi/Nous-Hermes-2-Mistral-7B-DPO-FinGreyLit

Updated Aug 1, 2024

juhoinkinen

updated a dataset about 1 year ago

NatLibFi/Finna-metadata

Preview • Updated May 6, 2024 • 61 • 1

osma

updated 2 datasets about 1 year ago

NatLibFi/Finna-JOKA-images

Preview • Updated May 2, 2024 • 85 • 1

NatLibFi/Finna-HKM-images

Viewer • Updated May 2, 2024 • 5.95k • 100 • 3

juhoinkinen

updated a model about 1 year ago

NatLibFi/HogwartsSortingHat-fastText

Text Classification • Updated Apr 19, 2024

osma

updated a model about 1 year ago

NatLibFi/Nous-Hermes-2-Mistral-7B-DPO-meteor

Updated Mar 20, 2024

Tuula

updated a Space over 1 year ago

README

🚀

osma

authored a paper over 1 year ago

FinGPT: Large Generative Models for a Small Language

Paper • 2311.05640 • Published Nov 3, 2023 • 32

National Library of Finland

AI & ML interests

Recent Activity

NatLibFi's activity

NatLibFi/FintoAI-data-KAUNO

NatLibFi/FintoAI-data-YSO

Annif - Text Classification and Subject Indexing

NatLibFi/Annif-LLMs4Subjects-data

NatLibFi/Annif-LLMs4Subjects-data

Annif models

NatLibFi/Annif-tutorial

NatLibFi/FintoAI-data-YKL

NatLibFi/Qwen2-0.5B-Instruct-FinGreyLit-GGUF

NatLibFi/Nous-Hermes-2-Mistral-7B-DPO-FinGreyLit

NatLibFi/Finna-metadata

NatLibFi/Finna-JOKA-images

NatLibFi/Finna-HKM-images

NatLibFi/HogwartsSortingHat-fastText

NatLibFi/Nous-Hermes-2-Mistral-7B-DPO-meteor

README

FinGPT: Large Generative Models for a Small Language

AI & ML interests

Recent Activity

Team members 6

NatLibFi's activity

Annif - Text Classification and Subject Indexing

README