Spaces:

Oxbridge-Economics
/

Data-Collection-China

Running

gavinzli commited on Dec 6, 2024

Commit

293d18b

1 Parent(s): ead6f2f

Add logging for DataFrame output in vectorize function and log content in _crawl function for better traceability

Files changed (2) hide show

controllers/vectorizer.py CHANGED Viewed

@@ -48,6 +48,7 @@ def vectorize(article):
     # df['sentimentScore'] = df['sentimentScore'].round(2)
     # df['sentimentScore'] = df['sentimentScore'].astype(float)
     df['publishDate'] = pd.to_datetime(df['publishDate'])
     loader = DataFrameLoader(df, page_content_column="content")
     documents = loader.load()
     text_splitter = RecursiveCharacterTextSplitter(

     # df['sentimentScore'] = df['sentimentScore'].round(2)
     # df['sentimentScore'] = df['sentimentScore'].astype(float)
     df['publishDate'] = pd.to_datetime(df['publishDate'])
+    print(df)
     loader = DataFrameLoader(df, page_content_column="content")
     documents = loader.load()
     text_splitter = RecursiveCharacterTextSplitter(

source/eastmoney.py CHANGED Viewed

@@ -78,6 +78,7 @@ def _crawl(url, article, retries=3):
     contenteng = ''
     for element in contentcn.split("\n"):
         contenteng += translate(element) + '\n'
     article['content'] = repr(contenteng)[1:-1].strip()
     try:
         article['subtitle'] = summarize(article['content'])

     contenteng = ''
     for element in contentcn.split("\n"):
         contenteng += translate(element) + '\n'
+        logging.info(contenteng)
     article['content'] = repr(contenteng)[1:-1].strip()
     try:
         article['subtitle'] = summarize(article['content'])