README / README.md
jeffreyhuber's picture
Update README.md
fc44b3f
|
raw
history blame contribute delete
1.19 kB
---
title: README
emoji: πŸ“ˆ
colorFrom: red
colorTo: gray
sdk: static
pinned: false
---
## Chroma Datasets
Making it easy to load data into Chroma since 2023
```
pip install chroma_datasets
```
### Current Datasets
- State of the Union `from chroma_datasets import StateOfTheUnion`
- Paul Graham Essay `from chroma_datasets import PaulGrahamEssay`
- Glue `from chroma_datasets import Glue`
- SciPy `from chroma_datasets import SciPy`
`chroma_datasets` is generally backed by hugging face datasets, but it is not a requirement.
### How to use
The following will:
1. Download the 2022 State of the Union
2. Chunk it up for you
3. Embed it using Chroma's default open-source embedding function
4. Import it into Chroma
```python
import chromadb
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma
chroma_client = chromadb.Client()
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
result = collection.query(query_texts=["The United States of America"])
print(result)
```
Learn about how to create and contribute a package at [chroma-core/chroma_datasets](https://github.com/chroma-core/chroma_datasets).