Spaces:
Running
Running
make shorter
#2
by
davanstrien
HF Staff
- opened
README.md
CHANGED
@@ -1,67 +1,72 @@
|
|
1 |
---
|
2 |
-
title: README
|
3 |
-
emoji: 📚
|
4 |
-
colorFrom: pink
|
5 |
-
colorTo: gray
|
6 |
-
sdk: static
|
7 |
-
pinned: false
|
8 |
---
|
9 |
|
10 |
# 📚 BigLAM: Machine Learning for Libraries, Archives, and Museums
|
11 |
|
12 |
-
**BigLAM** is a community-driven
|
13 |
|
14 |
-
We aim to
|
15 |
|
16 |
-
- 🗃️
|
17 |
-
- 🤖
|
|
|
18 |
|
19 |
---
|
20 |
|
21 |
-
|
|
|
22 |
|
23 |
-
BigLAM began as a [datasets hackathon](https://github.com/bigscience-workshop/lam) within the [BigScience 🌸](https://bigscience.huggingface.co/) project
|
24 |
|
25 |
-
Our
|
26 |
-
|
27 |
-
- Helping LAM data reach new audiences.
|
28 |
-
- Supporting researchers and practitioners working at the intersection of AI and cultural heritage.
|
29 |
-
- Ensuring that machine learning datasets reflect the diversity and richness of human culture.
|
30 |
|
31 |
---
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
The [BigLAM organization on Hugging Face](https://huggingface.co/biglam) hosts:
|
36 |
|
37 |
-
|
38 |
-
- ⚙️ **Models** fine-tuned for LAM tasks, such as:
|
39 |
|
40 |
-
|
41 |
-
|
|
|
|
|
42 |
- Metadata quality assessment
|
43 |
-
|
44 |
-
-
|
|
|
45 |
|
46 |
---
|
47 |
|
48 |
-
|
|
|
49 |
|
50 |
-
We welcome contributions
|
51 |
-
You can:
|
52 |
|
53 |
-
-
|
54 |
-
- Join the
|
55 |
-
-
|
56 |
-
-
|
|
|
57 |
|
58 |
---
|
59 |
|
60 |
## 🌍 Why It Matters
|
61 |
|
62 |
-
Cultural heritage data is
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
|
64 |
-
|
65 |
-
- We help cultural institutions explore new forms of access and interpretation.
|
66 |
-
- We ensure that machine learning models learn from the full range of human knowledge—not just what's convenient to crawl.
|
67 |
-
- We develop tools and approaches that are tailored to the specific formats, challenges, and goals of libraries, archives, and museums—supporting long-term reuse and alignment with professional practices.
|
|
|
1 |
---
|
2 |
+
title: README
|
3 |
+
emoji: 📚
|
4 |
+
colorFrom: pink
|
5 |
+
colorTo: gray
|
6 |
+
sdk: static
|
7 |
+
pinned: false
|
8 |
---
|
9 |
|
10 |
# 📚 BigLAM: Machine Learning for Libraries, Archives, and Museums
|
11 |
|
12 |
+
**BigLAM** is a community-driven initiative to build an open ecosystem of machine learning models, datasets, and tools for **Libraries, Archives, and Museums (LAMs)**.
|
13 |
|
14 |
+
We aim to:
|
15 |
|
16 |
+
- 🗃️ Share machine-learning-ready datasets from LAMs via the [Hugging Face Hub](https://huggingface.co/biglam)
|
17 |
+
- 🤖 Train and release open-source models for LAM-relevant tasks
|
18 |
+
- 🛠️ Develop tools and approaches tailored to LAM use cases
|
19 |
|
20 |
---
|
21 |
|
22 |
+
<details>
|
23 |
+
<summary><strong>✨ Background</strong></summary>
|
24 |
|
25 |
+
BigLAM began as a [datasets hackathon](https://github.com/bigscience-workshop/lam) within the [BigScience 🌸](https://bigscience.huggingface.co/) project, a large-scale, open NLP collaboration.
|
26 |
|
27 |
+
Our goal: make LAM datasets more discoverable and usable to support researchers, institutions, and ML practitioners working with cultural heritage data.
|
28 |
+
</details>
|
|
|
|
|
|
|
29 |
|
30 |
---
|
31 |
|
32 |
+
<details>
|
33 |
+
<summary><strong>📂 What You'll Find</strong></summary>
|
|
|
34 |
|
35 |
+
The [BigLAM organization](https://huggingface.co/biglam) hosts:
|
|
|
36 |
|
37 |
+
- **Datasets**: image, text, and tabular data from and about libraries, archives, and museums
|
38 |
+
- **Models**: fine-tuned for tasks like:
|
39 |
+
- Art/historical image classification
|
40 |
+
- Document layout analysis and OCR
|
41 |
- Metadata quality assessment
|
42 |
+
- Named entity recognition in heritage texts
|
43 |
+
- **Spaces**: tools for interactive exploration and demonstration
|
44 |
+
</details>
|
45 |
|
46 |
---
|
47 |
|
48 |
+
<details>
|
49 |
+
<summary><strong>🧩 Get Involved</strong></summary>
|
50 |
|
51 |
+
We welcome contributions! You can:
|
|
|
52 |
|
53 |
+
- Use our [datasets and models](https://huggingface.co/biglam)
|
54 |
+
- Join the discussion on [GitHub](https://github.com/bigscience-workshop/lam/discussions)
|
55 |
+
- Contribute your own tools or data
|
56 |
+
- Share your work using BigLAM resources
|
57 |
+
</details>
|
58 |
|
59 |
---
|
60 |
|
61 |
## 🌍 Why It Matters
|
62 |
|
63 |
+
Cultural heritage data is often underrepresented in machine learning. BigLAM helps address this by:
|
64 |
+
|
65 |
+
- Supporting inclusive and responsible AI
|
66 |
+
- Helping institutions experiment with ML for access, discovery, and preservation
|
67 |
+
- Ensuring that ML systems reflect diverse human knowledge and expression
|
68 |
+
- Developing tools and methods that work well with the unique formats, values, and needs of LAMs
|
69 |
+
|
70 |
+
---
|
71 |
|
72 |
+
*Empowering AI with the richness of human culture.*
|
|
|
|
|
|