Language Technology Research Group at the University of Helsinki

university

https://blogs.helsinki.fi/language-technology/

AI & ML interests

At the University of Helsinki, we focus on: - NLP for morphologically-rich languages - Cross-lingual NLP - NLP in the humanities

Recent Activity

jrvc updated a dataset 22 days ago

Helsinki-NLP/mu-shroom

jrvc published a dataset 24 days ago

Helsinki-NLP/mu-shroom

michal-stefanik updated a dataset about 1 month ago

Helsinki-NLP/tatoeba_mt_full

View all activity

Helsinki-NLP's activity

albertvillanova

posted an update 17 days ago

Post

2497

smolagents v1.14.0 is out! 🚀
🔌 MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
🪨 Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. 💼

Full release 👉 https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI

jrvc

updated a dataset 22 days ago

Helsinki-NLP/mu-shroom

Viewer • Updated 22 days ago • 11.5k • 295 • 4

jrvc

published a dataset 24 days ago

Helsinki-NLP/mu-shroom

Viewer • Updated 22 days ago • 11.5k • 295 • 4

michal-stefanik

updated a dataset about 1 month ago

Helsinki-NLP/tatoeba_mt_full

Viewer • Updated Mar 29 • 13.2B • 1.65k

odegiber

authored 6 papers about 2 months ago

Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan

Paper • 2107.07903 • Published Jul 16, 2021

Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models

Paper • 2109.07765 • Published Sep 16, 2021

tiedeman

authored a paper about 2 months ago

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies

Paper • 2503.10267 • Published Mar 13

odegiber

authored a paper about 2 months ago

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies

Paper • 2503.10267 • Published Mar 13

albertvillanova

posted an update 2 months ago

Post

4011

🚀 New smolagents update: Safer Local Python Execution! 🦾🐍

With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. 🔒

Here's why this matters & what you need to know! 🧵👇

1️⃣ Why is local execution risky? ⚠️
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.

2️⃣ New Safety Layer in smolagents 🛡️
We now inspect every return value during execution:
✅ Allowed: Safe built-in types (e.g., numbers, strings, lists)
⛔ Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)

3️⃣ Immediate Benefits 💡
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities

4️⃣ Security Disclaimer ⚠️
🚨 Despite these improvements, local Python execution is NEVER 100% safe. 🚨
If you need true isolation, use a remote sandboxed executor like Docker or E2B.

5️⃣ The Best Practice: Use Sandboxed Execution 🔐
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.

6️⃣ Upgrade Now & Stay Safe! 🚀
Check out the latest smolagents release and start building safer AI agents today.

🔗 https://github.com/huggingface/smolagents

What security measures do you take when running AI-generated code? Let’s discuss! 👇

#AI #smolagents #Python #Security

2 replies

albertvillanova

posted an update 2 months ago

Post

3933

🚀 Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. 🦾🔒

Here's why this is a game-changer for agent-based systems: 🧵👇

1️⃣ Security First 🔐
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.

2️⃣ Deterministic & Reproducible Runs 📦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable setting—no more environment mismatches or dependency issues!

3️⃣ Resource Control & Limits 🚦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control.

4️⃣ Safer Code Execution in Production 🏭
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.

5️⃣ Easy to Integrate 🛠️
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backend—no need for complex security setups!

6️⃣ Perfect for Autonomous AI Agents 🤖
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.

⚡ Get started now: https://github.com/huggingface/smolagents

What will you build with smolagents? Let us know! 🚀💡

Atnafu

authored a paper 3 months ago

ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding

Paper • 2411.05049 • Published Nov 7, 2024 • 3

albertvillanova

posted an update 3 months ago

Post

4000

🚀 Introducing @huggingface Open Deep-Research💥

In just 24 hours, we built an open-source agent that:
✅ Autonomously browse the web
✅ Search, scroll & extract info
✅ Download & manipulate files
✅ Run calculations on data

55% on GAIA validation set! Help us improve it!💡
https://huggingface.co/blog/open-deep-research

3 replies

albertvillanova

posted an update 4 months ago

Post

2135

Discover all the improvements in the new version of Lighteval: https://huggingface.co/docs/lighteval/

aleraganato

authored a paper 5 months ago

The University of Helsinki submissions to the WMT19 news translation task

Paper • 1906.04040 • Published Jun 10, 2019

albertvillanova

posted an update 6 months ago

Post

1849

🚨 How green is your model? 🌱 Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research!
👉 open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!

🌍 The Comparator calculates CO₂ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... 🛠️
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!

ArthurZ

posted an update 6 months ago

Post

4059

Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! 🔥

AI & ML interests

Recent Activity

Team members 26

Helsinki-NLP's activity