Sujal Bhat
commited on
Commit
·
323d65b
1
Parent(s):
3ae573f
colored deliverables
Browse files- deliverables/Task1.md +6 -3
deliverables/Task1.md
CHANGED
@@ -16,18 +16,21 @@ Hint: Create a list of potential questions that people are likely to ask!
|
|
16 |
✅ Deliverables:
|
17 |
|
18 |
1. Describe the default chunking strategy that you will use.
|
19 |
-
|
20 |
The default chunking strategy used is a combination of size-based splitting and thematic categorization.
|
21 |
This strategy uses RecursiveCharacterTextSplitter with a chunk size of 1000 characters and an overlap of 200 characters. It then categorizes these chunks based on predefined themes.
|
|
|
22 |
|
23 |
2. Articulate a chunking strategy that you would also like to test out.
|
24 |
|
|
|
25 |
A pure size-based chunking strategy without thematic categorization. This would involve splitting the text into fixed-size chunks without attempting to categorize them based on themes.
|
|
|
26 |
|
27 |
|
28 |
|
29 |
3. Describe how and why you made these decisions
|
30 |
-
|
31 |
The default strategy was chosen for its simplicity and efficiency:
|
32 |
|
33 |
* Size-based splitting (1000 characters) ensures manageable chunk sizes for processing and embedding.
|
@@ -41,7 +44,7 @@ The alternative pure size-based strategy:
|
|
41 |
* Is simpler to implement and doesn't rely on predefined themes.
|
42 |
* May split semantic units, potentially affecting the coherence of individual chunks.'
|
43 |
* Could be more comprehensive, including all parts of the document regardless of theme.
|
44 |
-
|
45 |
|
46 |
|
47 |
|
|
|
16 |
✅ Deliverables:
|
17 |
|
18 |
1. Describe the default chunking strategy that you will use.
|
19 |
+
<div style="color: green;">
|
20 |
The default chunking strategy used is a combination of size-based splitting and thematic categorization.
|
21 |
This strategy uses RecursiveCharacterTextSplitter with a chunk size of 1000 characters and an overlap of 200 characters. It then categorizes these chunks based on predefined themes.
|
22 |
+
</div>
|
23 |
|
24 |
2. Articulate a chunking strategy that you would also like to test out.
|
25 |
|
26 |
+
<div style="color: green;">
|
27 |
A pure size-based chunking strategy without thematic categorization. This would involve splitting the text into fixed-size chunks without attempting to categorize them based on themes.
|
28 |
+
</div>
|
29 |
|
30 |
|
31 |
|
32 |
3. Describe how and why you made these decisions
|
33 |
+
<div style="color: green;">
|
34 |
The default strategy was chosen for its simplicity and efficiency:
|
35 |
|
36 |
* Size-based splitting (1000 characters) ensures manageable chunk sizes for processing and embedding.
|
|
|
44 |
* Is simpler to implement and doesn't rely on predefined themes.
|
45 |
* May split semantic units, potentially affecting the coherence of individual chunks.'
|
46 |
* Could be more comprehensive, including all parts of the document regardless of theme.
|
47 |
+
</div>
|
48 |
|
49 |
|
50 |
|