metadata
title: README
emoji: π
colorFrom: red
colorTo: gray
sdk: static
pinned: false
Chroma Datasets
Making it easy to load data into Chroma since 2023
pip install chroma_datasets
Current Datasets
- State of the Union
from chroma_datasets import StateOfTheUnion
- Paul Graham Essay
from chroma_datasets import PaulGrahamEssay
- Glue
from chroma_datasets import Glue
- SciPy
from chroma_datasets import SciPy
chroma_datasets
is generally backed by hugging face datasets, but it is not a requirement.
How to use
The following will:
- Download the 2022 State of the Union
- Chunk it up for you
- Embed it using Chroma's default open-source embedding function
- Import it into Chroma
import chromadb
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma
chroma_client = chromadb.Client()
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
result = collection.query(query_texts=["The United States of America"])
print(result)
Learn about how to create and contribute a package at chroma-core/chroma_datasets.