File size: 1,853 Bytes
39a3f77
 
805ddd2
39a3f77
 
 
 
 
 
805ddd2
39a3f77
 
 
805ddd2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8da1a53
805ddd2
8da1a53
 
 
805ddd2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8da1a53
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: Event Data Extraction
emoji: 🌐
colorFrom: pink
colorTo: blue
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false
python_version: 3.10.0
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Event Data Extraction
A testing and demo application for extracting event-data from websites. 


## Repository overview

```txt
/pages/                  
β”‚   └── Streamlit pages for the UI
β”‚
/src/
β”œβ”€β”€ configuration/       
β”‚   └── Streamlit-specific configuration files
β”‚
β”œβ”€β”€ crawler/             
β”‚   └── Scripts for crawling and collecting event data from websites
β”‚
β”œβ”€β”€ persistence/         
β”‚   └── Database connections and query logic
β”‚
β”œβ”€β”€ utils/               
β”‚   └── Helper functions and preprocessing utilities
β”‚
β”œβ”€β”€ nlp/
β”‚   β”œβ”€β”€ experimental/    
β”‚   β”‚   └── Various NLP tools and technologies under evaluation
β”‚   β”‚
β”‚   └── playground/      
β”‚       └── NLP scripts used within the Streamlit app (Pages: Playground, Pipeline, Testing)
```
## Run locally
**Python Version**: 3.10

1. Install requirements from requirements.txt file

2. Create Hugging Face Access Token in Hugging Face Platform

3. Request missing environment variables

4. **Create a `.env` file** in the root directory with the following environment variables (⚠️ **Do NOT commit this file!**)
```env
# MongoDB
MONGO_HOST=...
MONGO_USERNAME=...
MONGO_PASSWORD=...

# Google Maps API
GOOGLE_MAPS_API_KEY=...

# OpenAI API
OPENAI_API_KEY=...

# Hugging Face Inference API
INFERENCE_API_TOKEN=...

# Hugging Face Spaces (access token)
HUGGING_FACE_SPACES_TOKEN=...

# Google Cloud Platform API
GOOGLE_API_KEY=...
```
5. Start streamlit app in browser 
```bash
  streamlit run app.py
```