title: Event Data Extraction | |
emoji: π | |
colorFrom: pink | |
colorTo: blue | |
sdk: streamlit | |
sdk_version: 1.42.2 | |
app_file: app.py | |
pinned: false | |
python_version: 3.10.0 | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# Event Data Extraction | |
A testing and demo application for extracting event-data from websites. | |
## Repository overview | |
```txt | |
/pages/ | |
β βββ Streamlit pages for the UI | |
β | |
/src/ | |
βββ configuration/ | |
β βββ Streamlit-specific configuration files | |
β | |
βββ crawler/ | |
β βββ Scripts for crawling and collecting event data from websites | |
β | |
βββ persistence/ | |
β βββ Database connections and query logic | |
β | |
βββ utils/ | |
β βββ Helper functions and preprocessing utilities | |
β | |
βββ nlp/ | |
β βββ experimental/ | |
β β βββ Various NLP tools and technologies under evaluation | |
β β | |
β βββ playground/ | |
β βββ NLP scripts used within the Streamlit app (Pages: Playground, Pipeline, Testing) | |
``` | |
## Run locally | |
**Python Version**: 3.10 | |
1. Install requirements from requirements.txt file | |
2. Create Hugging Face Access Token in Hugging Face Platform | |
3. Request missing environment variables | |
4. **Create a `.env` file** in the root directory with the following environment variables (β οΈ **Do NOT commit this file!**) | |
```env | |
# MongoDB | |
MONGO_HOST=... | |
MONGO_USERNAME=... | |
MONGO_PASSWORD=... | |
# Google Maps API | |
GOOGLE_MAPS_API_KEY=... | |
# OpenAI API | |
OPENAI_API_KEY=... | |
# Hugging Face Inference API | |
INFERENCE_API_TOKEN=... | |
# Hugging Face Spaces (access token) | |
HUGGING_FACE_SPACES_TOKEN=... | |
# Google Cloud Platform API | |
GOOGLE_API_KEY=... | |
``` | |
5. Start streamlit app in browser | |
```bash | |
streamlit run app.py | |
``` |