File size: 1,193 Bytes
0c05b88 fc44b3f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
---
title: README
emoji: π
colorFrom: red
colorTo: gray
sdk: static
pinned: false
---
## Chroma Datasets
Making it easy to load data into Chroma since 2023
```
pip install chroma_datasets
```
### Current Datasets
- State of the Union `from chroma_datasets import StateOfTheUnion`
- Paul Graham Essay `from chroma_datasets import PaulGrahamEssay`
- Glue `from chroma_datasets import Glue`
- SciPy `from chroma_datasets import SciPy`
`chroma_datasets` is generally backed by hugging face datasets, but it is not a requirement.
### How to use
The following will:
1. Download the 2022 State of the Union
2. Chunk it up for you
3. Embed it using Chroma's default open-source embedding function
4. Import it into Chroma
```python
import chromadb
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma
chroma_client = chromadb.Client()
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
result = collection.query(query_texts=["The United States of America"])
print(result)
```
Learn about how to create and contribute a package at [chroma-core/chroma_datasets](https://github.com/chroma-core/chroma_datasets).
|