Sujal Bhat
commited on
Commit
·
1d923bc
1
Parent(s):
cd1f0ae
'tasks'
Browse files- deliverables/Task1.md +4 -7
deliverables/Task1.md
CHANGED
@@ -16,20 +16,17 @@ Hint: Create a list of potential questions that people are likely to ask!
|
|
16 |
✅ Deliverables:
|
17 |
|
18 |
**1. Describe the default chunking strategy that you will use.**
|
19 |
-
|
20 |
The default chunking strategy used is a combination of size-based splitting and thematic categorization.
|
21 |
This strategy uses RecursiveCharacterTextSplitter with a chunk size of 1000 characters and an overlap of 200 characters. It then categorizes these chunks based on predefined themes.
|
22 |
-
</div>
|
23 |
|
24 |
**2. Articulate a chunking strategy that you would also like to test out.**
|
25 |
|
26 |
-
<div style="color: green;">
|
27 |
-
A pure size-based chunking strategy without thematic categorization. This would involve splitting the text into fixed-size chunks without attempting to categorize them based on themes.
|
28 |
-
</div>
|
29 |
|
|
|
30 |
|
31 |
**3. Describe how and why you made these decisions**
|
32 |
-
|
33 |
The default strategy was chosen for its simplicity and efficiency:
|
34 |
|
35 |
* Size-based splitting (1000 characters) ensures manageable chunk sizes for processing and embedding.
|
@@ -43,7 +40,7 @@ The alternative pure size-based strategy:
|
|
43 |
* Is simpler to implement and doesn't rely on predefined themes.
|
44 |
* May split semantic units, potentially affecting the coherence of individual chunks.'
|
45 |
* Could be more comprehensive, including all parts of the document regardless of theme.
|
46 |
-
|
47 |
|
48 |
|
49 |
|
|
|
16 |
✅ Deliverables:
|
17 |
|
18 |
**1. Describe the default chunking strategy that you will use.**
|
19 |
+
|
20 |
The default chunking strategy used is a combination of size-based splitting and thematic categorization.
|
21 |
This strategy uses RecursiveCharacterTextSplitter with a chunk size of 1000 characters and an overlap of 200 characters. It then categorizes these chunks based on predefined themes.
|
|
|
22 |
|
23 |
**2. Articulate a chunking strategy that you would also like to test out.**
|
24 |
|
|
|
|
|
|
|
25 |
|
26 |
+
A pure size-based chunking strategy without thematic categorization. This would involve splitting the text into fixed-size chunks without attempting to categorize them based on themes.
|
27 |
|
28 |
**3. Describe how and why you made these decisions**
|
29 |
+
|
30 |
The default strategy was chosen for its simplicity and efficiency:
|
31 |
|
32 |
* Size-based splitting (1000 characters) ensures manageable chunk sizes for processing and embedding.
|
|
|
40 |
* Is simpler to implement and doesn't rely on predefined themes.
|
41 |
* May split semantic units, potentially affecting the coherence of individual chunks.'
|
42 |
* Could be more comprehensive, including all parts of the document regardless of theme.
|
43 |
+
|
44 |
|
45 |
|
46 |
|