project_root/ ├── app.py # The main Gradio endpoint that runs the training pipeline. ├── requirements.txt # Lists the Python dependencies. ├── source_files/ # Directory containing your input data files. │ ├── quranic-corpus-morphology-0.4.txt │ ├── en.sample.quran-maududi.txt │ └── en.w4w.qurandev.txt └── working_directory/ # Directory for intermediate outputs. ├── processed_data/ # Processed verse data (JSON and TXT files). ├── checkpoints/ # Checkpoints saved during training. ├── logs/ # (Optional) Additional log files. └── state/ # Pipeline state files (e.g., pipeline_state.json).