jeffreyhuber commited on
Commit
fc44b3f
·
1 Parent(s): 0c05b88

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -7,4 +7,39 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card 🔥
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ ## Chroma Datasets
11
+
12
+ Making it easy to load data into Chroma since 2023
13
+
14
+ ```
15
+ pip install chroma_datasets
16
+ ```
17
+
18
+ ### Current Datasets
19
+ - State of the Union `from chroma_datasets import StateOfTheUnion`
20
+ - Paul Graham Essay `from chroma_datasets import PaulGrahamEssay`
21
+ - Glue `from chroma_datasets import Glue`
22
+ - SciPy `from chroma_datasets import SciPy`
23
+
24
+ `chroma_datasets` is generally backed by hugging face datasets, but it is not a requirement.
25
+
26
+ ### How to use
27
+
28
+ The following will:
29
+ 1. Download the 2022 State of the Union
30
+ 2. Chunk it up for you
31
+ 3. Embed it using Chroma's default open-source embedding function
32
+ 4. Import it into Chroma
33
+
34
+ ```python
35
+ import chromadb
36
+ from chroma_datasets import StateOfTheUnion
37
+ from chroma_datasets.utils import import_into_chroma
38
+
39
+ chroma_client = chromadb.Client()
40
+ collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
41
+ result = collection.query(query_texts=["The United States of America"])
42
+ print(result)
43
+ ```
44
+
45
+ Learn about how to create and contribute a package at [chroma-core/chroma_datasets](https://github.com/chroma-core/chroma_datasets).