Spaces:
Sleeping
Sleeping
[fix] Add Hugging Face Space metadata
Browse files
README.md
CHANGED
@@ -1,117 +1,9 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
<a href="screenshot_2.png"><img src="assets/screenshot_2.png" width="335"></a>
|
11 |
-
|
12 |
-
|
13 |
-
## 📖 Overview
|
14 |
-
**SynthDataGen** is an AI-powered tool that creates **realistic, fake data** for any project. You don’t need to collect real information—instead, just tell SynthDataGen what kind of data you want, and it will **quickly generate** it. Thanks to its **easy-to-use web interface** built with Gradio, **anyone** can start making custom datasets right away.
|
15 |
-
|
16 |
-
### 🔑 **Key Features**
|
17 |
-
- The app can generate **various types of datasets**, such as **tables**, **time-series data**, or **text content**.
|
18 |
-
- The output can be saved in different **formats**, including **JSON**, **CSV**, **Parquet**, or **Markdown**.
|
19 |
-
- **AI models** like **GPT** and **Claude** are used to automatically create the dataset based on the task.
|
20 |
-
- A short **description of the desired dataset** is all that's needed to trigger the generation process.
|
21 |
-
- A **download link** is provided once the dataset is ready, making it easy to save and use.
|
22 |
-
- The **interface updates options automatically** and includes helpful **examples for inspiration**.
|
23 |
-
|
24 |
-
### 🎯 **How It Works**
|
25 |
-
1️⃣ Describe the dataset to generate by entering a short business problem or topic.
|
26 |
-
|
27 |
-
2️⃣ Select the dataset type, output format, AI model, and number of samples.
|
28 |
-
|
29 |
-
3️⃣ Download the generated dataset once it's ready — clean, structured, and ready to use..
|
30 |
-
|
31 |
-
### 🤔 **Why Choose SynthDataGen?**
|
32 |
-
- ⏰ **Time Saver**: Automatically creates tables, time-series, or text data—no need to gather real data yourself.
|
33 |
-
- ⚙️ **Flexible and Accessible**: Supports multiple formats (JSON, CSV, Parquet, Markdown) with a beginner-friendly interface.
|
34 |
-
- 🤖 **Powered by GPT & Claude**: Uses two top AI models to produce realistic synthetic data for prototyping or research.
|
35 |
-
|
36 |
-
### 🔧 **SynthDataGen Customization**
|
37 |
-
SynthDataGen is fully customizable through Python code. You can easily modify:
|
38 |
-
- ✏️ **System prompt** to control how the AI models generate code
|
39 |
-
- 🤖 Easily add **new frontier** or **open-source models** (e.g., LLaMA, DeepSeek, Qwen), or integrate any model from **Hugging Face libraries** and **inference endpoints**.
|
40 |
-
- 📊 **Dataset types**, by adding new categories like image metadata, dialogue transcripts ...
|
41 |
-
- 📁 **Output formats**, such as YAML, XML ...
|
42 |
-
- 🎨 **Interface styling**, including layout, colors, and themes
|
43 |
-
|
44 |
-
### 🏗️ **Architecture**
|
45 |
-
|
46 |
-
<a href="func_architecture.png"><img src="assets/func_architecture.png"></a>
|
47 |
-
<a href="tech_architecture.png"><img src="assets/tech_architecture.png"></a>
|
48 |
-
|
49 |
-
## ⚙️ Setup & Installation
|
50 |
-
|
51 |
-
**1. Clone the Repository**
|
52 |
-
```bash
|
53 |
-
git clone https://github.com/lisek75/synthdatagen_app.git
|
54 |
-
cd synthdatagen_app
|
55 |
-
```
|
56 |
-
|
57 |
-
**2. Install Dependencies**
|
58 |
-
|
59 |
-
```bash
|
60 |
-
conda env create -f synthdatagen_env.yml
|
61 |
-
conda activate synthdatagen
|
62 |
-
```
|
63 |
-
**3. Configure API Keys & Endpoints**
|
64 |
-
|
65 |
-
Create `.env` file with the following variables:
|
66 |
-
```python
|
67 |
-
OPENAI_API_KEY = your_openai_api_key
|
68 |
-
ANTHROPIC_API_KEY = your_anthropic_api_key
|
69 |
-
```
|
70 |
-
Ensure that the `.env` file remains **secure** and is not shared publicly.
|
71 |
-
|
72 |
-
|
73 |
-
## 🚀 Running the Gradio App
|
74 |
-
|
75 |
-
**Run the Application Locally**
|
76 |
-
```bash
|
77 |
-
python app.py
|
78 |
-
```
|
79 |
-
|
80 |
-
**Run the Application with Docker**
|
81 |
-
|
82 |
-
To run the app using Docker, you can either build the image yourself or use the pre-built image from Docker Hub.
|
83 |
-
|
84 |
-
- Build and run the app locally:
|
85 |
-
Build the image from the provided Dockerfile using your own Docker Hub username:
|
86 |
-
```bash
|
87 |
-
docker build -t <user-dockerhub-username>/synthdatagen:v1.0 .
|
88 |
-
docker run -d --name synthdatagen-container -p 7860:7860 --env-file .env <user-dockerhub-username>/synthdatagen:v1.0
|
89 |
-
```
|
90 |
-
This will build the Docker image and run the app in a container.
|
91 |
-
|
92 |
-
- Run the app directly from Docker Hub:
|
93 |
-
Pull the pre-built image from the Docker Hub repository (⚠️make sure to use the latest version tag from Docker Hub).
|
94 |
-
Check: https://hub.docker.com/r/lizk75/synthdatagen/tags
|
95 |
-
|
96 |
-
```bash
|
97 |
-
docker pull lizk75/synthdatagen:v1.0
|
98 |
-
docker run -d --name synthdatagen-container -p 7860:7860 --env-file .env lizk75/synthdatagen:v1.0
|
99 |
-
```
|
100 |
-
|
101 |
-
|
102 |
-
## 🧑💻 Usage Guide
|
103 |
-
- You can launch the app directly from:
|
104 |
-
- The **demo link** provided at the top of this README.
|
105 |
-
- Or by executing it **locally** using the command `python app.py` from Visual Studio or any other IDE.
|
106 |
-
- **Describe your dataset** by entering a clear business problem or topic.
|
107 |
-
- Select the **dataset type** and **output format**.
|
108 |
-
- Choose an **AI model** (GPT or Claude).
|
109 |
-
- Set the desired **number of samples**.
|
110 |
-
- Click **Create Dataset** and download the generated file.
|
111 |
-
|
112 |
-
|
113 |
-
## 📓 Google Colab
|
114 |
-
A **notebook version** is available for users who prefer running the app in a notebook environment. The notebook includes additional **open-source models ** that require a **GPU**, which is why it's recommended to run it on Google Colab or a local machine with GPU support.
|
115 |
-
|
116 |
-
https://github.com/lisek75/nlp_llms_notebook/blob/main/07_data_generator.ipynb
|
117 |
-
|
|
|
1 |
+
---
|
2 |
+
title: SynthDataGen
|
3 |
+
emoji: 🧬
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: pink
|
6 |
+
sdk: docker
|
7 |
+
app_file: Dockerfile
|
8 |
+
pinned: false
|
9 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|