Update README.md
Browse files
README.md
CHANGED
@@ -7,10 +7,14 @@ datasets:
|
|
7 |
pipeline_tag: automatic-speech-recognition
|
8 |
---
|
9 |
|
10 |
-
[](https://arxiv.org/abs/2412.13071) [](https://github.com/language-modeling-lab/CLASP)
|
|
|
|
|
|
|
|
|
11 |
|
12 |
**CLASP** (Contrastive Language-Speech Pretraining) is a novel, lightweight, multilingual, multimodal representation designed for audio-text retrieval.
|
13 |
-
To learn more about our proposed model, please refer to this [paper](https://arxiv.org/abs/2412.13071)
|
14 |
The newly introduced dataset, SpeechBrown, which we created for training this model, can be found on [this page](https://huggingface.co/datasets/llm-lab/SpeechBrown)
|
15 |
|
16 |
CLASP creates powerful and meaningful semantic embeddings for raw speech in a 768-dimensional multilingual representation space. These embeddings can be used in various tasks such as speech retrieval or classification.
|
@@ -44,14 +48,21 @@ To use these models or train your own on custom datasets, please refer to our [G
|
|
44 |
## Citations
|
45 |
If you find our paper, code, data, or models useful, please cite the paper:
|
46 |
```
|
47 |
-
@
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
}
|
56 |
```
|
57 |
|
|
|
7 |
pipeline_tag: automatic-speech-recognition
|
8 |
---
|
9 |
|
10 |
+
[](https://arxiv.org/abs/2412.13071) [](https://github.com/language-modeling-lab/CLASP) [](https://clasp1.github.io/)
|
11 |
+
|
12 |
+
|
13 |
+
[Models](https://huggingface.co/llm-lab/CLASP) | [Springer Link](https://link.springer.com/chapter/10.1007/978-3-031-88717-8_2) | [arXiv Link](https://arxiv.org/abs/2412.13071) | [Proposed Dataset](https://huggingface.co/datasets/llm-lab/SpeechBrown) | [ACM Digital Library](https://dl.acm.org/doi/10.1007/978-3-031-88717-8_2) | [Website](https://clasp1.github.io/)
|
14 |
+
|
15 |
|
16 |
**CLASP** (Contrastive Language-Speech Pretraining) is a novel, lightweight, multilingual, multimodal representation designed for audio-text retrieval.
|
17 |
+
To learn more about our proposed model, please refer to this [paper](https://arxiv.org/abs/2412.13071), which is published at **ECIR 2025**. All code is available on this [GitHub page](https://github.com/language-modeling-lab/CLASP).
|
18 |
The newly introduced dataset, SpeechBrown, which we created for training this model, can be found on [this page](https://huggingface.co/datasets/llm-lab/SpeechBrown)
|
19 |
|
20 |
CLASP creates powerful and meaningful semantic embeddings for raw speech in a 768-dimensional multilingual representation space. These embeddings can be used in various tasks such as speech retrieval or classification.
|
|
|
48 |
## Citations
|
49 |
If you find our paper, code, data, or models useful, please cite the paper:
|
50 |
```
|
51 |
+
@inproceedings{10.1007/978-3-031-88717-8_2,
|
52 |
+
author = {Abootorabi, Mohammad Mahdi and Asgari, Ehsaneddin},
|
53 |
+
title = {CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval},
|
54 |
+
year = {2025},
|
55 |
+
isbn = {978-3-031-88716-1},
|
56 |
+
publisher = {Springer-Verlag},
|
57 |
+
address = {Berlin, Heidelberg},
|
58 |
+
url = {https://doi.org/10.1007/978-3-031-88717-8_2},
|
59 |
+
doi = {10.1007/978-3-031-88717-8_2},
|
60 |
+
abstract = {This study introduces CLASP (Contrastive Language-Speech Pretraining), a multilingual, multimodal representation tailored for audio-text information retrieval. CLASP leverages the synergy between spoken content and textual data. During training, we utilize our newly introduced speech-text dataset, which encompasses 15 diverse categories ranging from fiction to religion. CLASP’s audio component integrates audio spectrograms with a pre-trained self-supervised speech model, while its language encoding counterpart employs a sentence encoder pre-trained on over 100 languages. This unified lightweight model bridges the gap between various modalities and languages, enhancing its effectiveness in handling and retrieving multilingual and multimodal data. Our evaluations across multiple languages demonstrate that CLASP establishes new benchmarks in HITS@1, MRR, and meanR metrics, outperforming traditional ASR-based retrieval methods that rely on transcribing speech into text for subsequent text retrieval, especially in specific scenarios.},
|
61 |
+
booktitle = {Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part IV},
|
62 |
+
pages = {10–20},
|
63 |
+
numpages = {11},
|
64 |
+
keywords = {Multimodal IR, Speech Retrieval, Contrastive Learning},
|
65 |
+
location = {Lucca, Italy}
|
66 |
}
|
67 |
```
|
68 |
|