Spaces:

Oxbridge-Economics
/

Data-Collection-China

Running

gavinzli commited on Dec 18, 2024

Commit

1c87e0d

1 Parent(s): 23b7557

Handle missing reference ID by setting it to None in article data and refactor URL construction for clarity

Files changed (1) hide show

source/eastmoney.py CHANGED Viewed

@@ -94,6 +94,8 @@ def _crawl(url, article, retries=3):
     reference_id = extract_reference(article)
     if reference_id:
         article['referenceid'] = reference_id
     update_content(article)
 @task(name = "Data Collection - eastmoney", log_prints = True)
@@ -138,8 +140,8 @@ def crawl(delta):
                 i = i + 1
                 for article in reportinfo['data']:
                     try:
-                        domain = "https://data.eastmoney.com"
-                        url = f"{domain}/report/zw_macresearch.jshtml?encodeUrl={article['encodeUrl']}"
                         _crawl(url, article)
                     except (urllib.error.URLError, json.JSONDecodeError, KeyError) as error:
                         logger.error(error)

     reference_id = extract_reference(article)
     if reference_id:
         article['referenceid'] = reference_id
+    else:
+        article['referenceid'] = None
     update_content(article)
 @task(name = "Data Collection - eastmoney", log_prints = True)
                 i = i + 1
                 for article in reportinfo['data']:
                     try:
+                        link = "https://data.eastmoney.com/report/zw_macresearch.jshtml"
+                        url = f"{link}?encodeUrl={article['encodeUrl']}"
                         _crawl(url, article)
                     except (urllib.error.URLError, json.JSONDecodeError, KeyError) as error:
                         logger.error(error)