Spaces:

Oxbridge-Economics
/

Data-Collection-China

Running

gavinzli commited on Jan 5

Commit

4f4a669

1 Parent(s): 7dcce70

Update collection name to "articles" and enable separator regex in vectorization logic

Files changed (1) hide show

controllers/vectorizer.py CHANGED Viewed

@@ -43,7 +43,7 @@ vstore = AstraDBVectorStore(
         },
     ),
     namespace="default_keyspace",
-    collection_name="article",
     token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
     api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"])
@@ -106,7 +106,7 @@ def vectorize(article):
             chunk_size=1000,
             chunk_overlap=200,
             length_function=token_length,
-            is_separator_regex=False,
             separators=["\n\n", "\n", "\t"]  # Logical separators
         )
     chunks = text_splitter.split_documents(documents)

         },
     ),
     namespace="default_keyspace",
+    collection_name="articles",
     token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
     api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"])
             chunk_size=1000,
             chunk_overlap=200,
             length_function=token_length,
+            is_separator_regex=True,
             separators=["\n\n", "\n", "\t"]  # Logical separators
         )
     chunks = text_splitter.split_documents(documents)