Increase timeout for URL requests in crawl functions to enhance reliability cc76656 gavinzli commited on Feb 5
Add timeout parameter to URL requests in crawl functions for improved reliability fed78ac gavinzli commited on Feb 5
Handle LangDetectException in crawl_by_url function to improve error handling 48adbee gavinzli commited on Jan 9
Refactor vectorization process by removing openai_vectorize calls and updating vectorizer initialization 5fea365 gavinzli commited on Jan 4
Add validation for content length and enhance error handling in crawl_by_url function 29d3eca gavinzli commited on Jan 4
Merge branch 'main' of https://github.com/oxbridge-econ/data-collection-china b4bd94d gavinzli commited on Jan 2
Add handling for DependencyError in PDF extraction and update requirements to include pycryptodome beed350 gavinzli commited on Jan 2
Refactor content update process to ensure reference ID is set to None and re-enable vectorization functions in article processing b68d569 gavinzli commited on Dec 18, 2024
Fix table name casing in update_content function for DynamoDB 2512706 gavinzli commited on Dec 18, 2024
Add reference ID extraction and implement retry logic for document addition 693e166 gavinzli commited on Dec 18, 2024
Increase retry attempts and adjust sleep duration for translation requests 1269de7 gavinzli commited on Dec 17, 2024
Refactor translation error handling and remove debug print statements in vectorization 0750507 gavinzli commited on Dec 13, 2024
Implement retry logic for translation requests to handle RequestError exceptions c664824 gavinzli commited on Dec 9, 2024
Replace logging with print statements for content update and reference extraction functions f237a77 gavinzli commited on Dec 6, 2024
Add logging configuration and info statements for content updates and reference extraction fbf8f15 gavinzli commited on Dec 6, 2024
Limit content length to 500 characters in sentiment computation for improved analysis accuracy dcdb6e8 gavinzli commited on Dec 6, 2024
Refactor exception handling in multiple files to specify exception types and improve logging d705151 gavinzli commited on Dec 5, 2024
Refactor error handling and improve logging in utils.py; update vectorization process in vectorizer.py; adjust variable naming in eastmoney.py c39d841 gavinzli commited on Dec 5, 2024