|
--- |
|
title: README |
|
emoji: π |
|
colorFrom: red |
|
colorTo: gray |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
## Chroma Datasets |
|
|
|
Making it easy to load data into Chroma since 2023 |
|
|
|
``` |
|
pip install chroma_datasets |
|
``` |
|
|
|
### Current Datasets |
|
- State of the Union `from chroma_datasets import StateOfTheUnion` |
|
- Paul Graham Essay `from chroma_datasets import PaulGrahamEssay` |
|
- Glue `from chroma_datasets import Glue` |
|
- SciPy `from chroma_datasets import SciPy` |
|
|
|
`chroma_datasets` is generally backed by hugging face datasets, but it is not a requirement. |
|
|
|
### How to use |
|
|
|
The following will: |
|
1. Download the 2022 State of the Union |
|
2. Chunk it up for you |
|
3. Embed it using Chroma's default open-source embedding function |
|
4. Import it into Chroma |
|
|
|
```python |
|
import chromadb |
|
from chroma_datasets import StateOfTheUnion |
|
from chroma_datasets.utils import import_into_chroma |
|
|
|
chroma_client = chromadb.Client() |
|
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion) |
|
result = collection.query(query_texts=["The United States of America"]) |
|
print(result) |
|
``` |
|
|
|
Learn about how to create and contribute a package at [chroma-core/chroma_datasets](https://github.com/chroma-core/chroma_datasets). |
|
|