Sujal Bhat commited on
Commit
1d923bc
·
1 Parent(s): cd1f0ae
Files changed (1) hide show
  1. deliverables/Task1.md +4 -7
deliverables/Task1.md CHANGED
@@ -16,20 +16,17 @@ Hint: Create a list of potential questions that people are likely to ask!
16
  ✅ Deliverables:
17
 
18
  **1. Describe the default chunking strategy that you will use.**
19
- <div style="color: green;">
20
  The default chunking strategy used is a combination of size-based splitting and thematic categorization.
21
  This strategy uses RecursiveCharacterTextSplitter with a chunk size of 1000 characters and an overlap of 200 characters. It then categorizes these chunks based on predefined themes.
22
- </div>
23
 
24
  **2. Articulate a chunking strategy that you would also like to test out.**
25
 
26
- <div style="color: green;">
27
- A pure size-based chunking strategy without thematic categorization. This would involve splitting the text into fixed-size chunks without attempting to categorize them based on themes.
28
- </div>
29
 
 
30
 
31
  **3. Describe how and why you made these decisions**
32
- <div style="color: green;">
33
  The default strategy was chosen for its simplicity and efficiency:
34
 
35
  * Size-based splitting (1000 characters) ensures manageable chunk sizes for processing and embedding.
@@ -43,7 +40,7 @@ The alternative pure size-based strategy:
43
  * Is simpler to implement and doesn't rely on predefined themes.
44
  * May split semantic units, potentially affecting the coherence of individual chunks.'
45
  * Could be more comprehensive, including all parts of the document regardless of theme.
46
- </div>
47
 
48
 
49
 
 
16
  ✅ Deliverables:
17
 
18
  **1. Describe the default chunking strategy that you will use.**
19
+
20
  The default chunking strategy used is a combination of size-based splitting and thematic categorization.
21
  This strategy uses RecursiveCharacterTextSplitter with a chunk size of 1000 characters and an overlap of 200 characters. It then categorizes these chunks based on predefined themes.
 
22
 
23
  **2. Articulate a chunking strategy that you would also like to test out.**
24
 
 
 
 
25
 
26
+ A pure size-based chunking strategy without thematic categorization. This would involve splitting the text into fixed-size chunks without attempting to categorize them based on themes.
27
 
28
  **3. Describe how and why you made these decisions**
29
+
30
  The default strategy was chosen for its simplicity and efficiency:
31
 
32
  * Size-based splitting (1000 characters) ensures manageable chunk sizes for processing and embedding.
 
40
  * Is simpler to implement and doesn't rely on predefined themes.
41
  * May split semantic units, potentially affecting the coherence of individual chunks.'
42
  * Could be more comprehensive, including all parts of the document regardless of theme.
43
+
44
 
45
 
46