Spaces:
Runtime error
Runtime error
Upload 24 files
Browse files- Readme.md +126 -0
- app.py +27 -0
- config.py +0 -0
- executor/.DS_Store +0 -0
- executor/__pycache__/workflow.cpython-312.pyc +0 -0
- executor/workflow.py +210 -0
- nodes/.DS_Store +0 -0
- nodes/llm/.DS_Store +0 -0
- nodes/llm/__pycache__/textmodel.cpython-312.pyc +0 -0
- nodes/llm/textmodel.py +103 -0
- nodes/processing/__pycache__/list.cpython-312.pyc +0 -0
- nodes/processing/list.py +36 -0
- nodes/processing/requests.py +45 -0
- nodes/scraping/.DS_Store +0 -0
- nodes/scraping/__pycache__/consolidated.cpython-312.pyc +0 -0
- nodes/scraping/__pycache__/html.cpython-312.pyc +0 -0
- nodes/scraping/html.py +249 -0
- nodes/socialmedia/__pycache__/reddit.cpython-312.pyc +0 -0
- nodes/socialmedia/__pycache__/x.cpython-312.pyc +0 -0
- nodes/socialmedia/instagram.py +9 -0
- nodes/socialmedia/reddit.py +106 -0
- nodes/socialmedia/x.py +73 -0
- requirements.txt +10 -0
- run.py +5 -0
Readme.md
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Flowify - Workflow Automation Platform
|
2 |
+
|
3 |
+
Flowify is a powerful automation platform that enables users to create custom workflows through a visual interface and deploy them as APIs. Whether you're managing personal tasks or streamlining business processes, Flowify helps you automate repetitive tasks and focus on what matters most.
|
4 |
+
|
5 |
+
## 🌟 Features
|
6 |
+
|
7 |
+
### Visual Workflow Builder
|
8 |
+
- Drag-and-drop interface for creating workflows
|
9 |
+
- Real-time workflow visualization
|
10 |
+
- Connect nodes with intuitive linking
|
11 |
+
- Zoom and pan controls for large workflows
|
12 |
+
|
13 |
+
### Node Types
|
14 |
+
1. **Scraping Nodes**
|
15 |
+
- HTML content extraction
|
16 |
+
- Image URL extraction
|
17 |
+
- Link extraction with filtering
|
18 |
+
- Table data extraction
|
19 |
+
- Header extraction
|
20 |
+
- Metadata extraction
|
21 |
+
- JavaScript/CSS file URL extraction
|
22 |
+
- Targeted div content extraction
|
23 |
+
|
24 |
+
2. **Social Media Nodes**
|
25 |
+
- X (Twitter) integration
|
26 |
+
- Reddit posting
|
27 |
+
- Social media analytics
|
28 |
+
|
29 |
+
3. **AI Nodes**
|
30 |
+
- AI response generation
|
31 |
+
- Custom prompt handling
|
32 |
+
|
33 |
+
4. **Array Manipulation Nodes**
|
34 |
+
- List creation and management
|
35 |
+
- Element extraction
|
36 |
+
- Array operations
|
37 |
+
|
38 |
+
### Additional Features
|
39 |
+
- Dark/Light mode support
|
40 |
+
- Responsive design
|
41 |
+
- API endpoint generation
|
42 |
+
- Workflow scheduling
|
43 |
+
- Real-time execution monitoring
|
44 |
+
|
45 |
+
## 🚀 Getting Started
|
46 |
+
|
47 |
+
1. **Create an Account**
|
48 |
+
- Visit the registration page
|
49 |
+
- Set up your credentials
|
50 |
+
- Choose your plan
|
51 |
+
|
52 |
+
2. **Create Your First Workflow**
|
53 |
+
- Navigate to the Dashboard
|
54 |
+
- Click "Create Workflow"
|
55 |
+
- Add and connect nodes
|
56 |
+
- Configure node settings
|
57 |
+
- Save and deploy
|
58 |
+
|
59 |
+
3. **Access Your Workflow**
|
60 |
+
- Get your unique API endpoint
|
61 |
+
- Integrate with your applications
|
62 |
+
- Monitor execution results
|
63 |
+
|
64 |
+
## 💻 Technical Details
|
65 |
+
|
66 |
+
### API Usage
|
67 |
+
Each workflow can be accessed through a unique API endpoint. The endpoint URL is provided in the workflow settings page after deployment.
|
68 |
+
|
69 |
+
### Node Configuration
|
70 |
+
Nodes can be configured with various parameters:
|
71 |
+
- Input/output settings
|
72 |
+
- Authentication credentials
|
73 |
+
- Filtering options
|
74 |
+
- Scheduling parameters
|
75 |
+
|
76 |
+
### Security
|
77 |
+
- Secure API authentication
|
78 |
+
- Encrypted credential storage
|
79 |
+
- Rate limiting protection
|
80 |
+
- User-specific workflow isolation
|
81 |
+
|
82 |
+
## 📊 Plans and Pricing
|
83 |
+
|
84 |
+
### Free Plan
|
85 |
+
- Up to 3 active workflows
|
86 |
+
- Basic scheduling (once per day)
|
87 |
+
- 1,000 API requests/month
|
88 |
+
- Core workflow builder access
|
89 |
+
|
90 |
+
### Silver Plan
|
91 |
+
- Up to 10 active workflows
|
92 |
+
- Hourly scheduling
|
93 |
+
- 10,000 API requests/month
|
94 |
+
- Advanced workflow builder tools
|
95 |
+
|
96 |
+
### Gold Plan
|
97 |
+
- Unlimited active workflows
|
98 |
+
- Real-time triggers
|
99 |
+
- Unlimited API usage
|
100 |
+
- Webhook support and custom scripts
|
101 |
+
|
102 |
+
## 🤝 Contributing
|
103 |
+
|
104 |
+
We welcome contributions to Flowify! Please read our contributing guidelines before submitting pull requests.
|
105 |
+
|
106 |
+
## 📄 License
|
107 |
+
|
108 |
+
Flowify is licensed under the [MIT License](LICENSE).
|
109 |
+
|
110 |
+
## 📞 Support
|
111 |
+
|
112 |
+
Need help? Contact us:
|
113 |
+
- Documentation: [docs.flowify.com](https://docs.flowify.com)
|
114 |
+
- Email: [email protected]
|
115 |
+
- GitHub Issues: [Report a bug](https://github.com/flowify/issues)
|
116 |
+
|
117 |
+
## 🌐 Links
|
118 |
+
|
119 |
+
- [Website](https://flowify.com)
|
120 |
+
- [Documentation](https://docs.flowify.com)
|
121 |
+
- [Blog](https://blog.flowify.com)
|
122 |
+
- [API Reference](https://api.flowify.com)
|
123 |
+
|
124 |
+
---
|
125 |
+
|
126 |
+
Made with ❤️ by the Flowify Team
|
app.py
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from fastapi import FastAPI, Request
|
2 |
+
from fastapi.middleware.cors import CORSMiddleware
|
3 |
+
from executor.workflow import execute
|
4 |
+
import pyrebase
|
5 |
+
|
6 |
+
app = FastAPI()
|
7 |
+
|
8 |
+
# Enable CORS for all origins
|
9 |
+
app.add_middleware(
|
10 |
+
CORSMiddleware,
|
11 |
+
allow_origins=["*"],
|
12 |
+
allow_credentials=True,
|
13 |
+
allow_methods=["*"],
|
14 |
+
allow_headers=["*"],
|
15 |
+
)
|
16 |
+
|
17 |
+
@app.get("/")
|
18 |
+
async def index():
|
19 |
+
return {"message": "Welcome to the Flowify Workflow Executor API"}
|
20 |
+
|
21 |
+
@app.post("/execute")
|
22 |
+
async def execute_workflow(request: Request):
|
23 |
+
workflow = await request.json()
|
24 |
+
print(workflow)
|
25 |
+
data = execute(workflow)
|
26 |
+
return data
|
27 |
+
|
config.py
ADDED
File without changes
|
executor/.DS_Store
ADDED
Binary file (6.15 kB). View file
|
|
executor/__pycache__/workflow.cpython-312.pyc
ADDED
Binary file (8.11 kB). View file
|
|
executor/workflow.py
ADDED
@@ -0,0 +1,210 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from nodes.scraping.html import *
|
2 |
+
from nodes.llm.textmodel import ai_response as generate_ai_response
|
3 |
+
from nodes.socialmedia.x import *
|
4 |
+
from nodes.socialmedia.reddit import reddit_post
|
5 |
+
from nodes.processing.list import *
|
6 |
+
|
7 |
+
#updating the executor to process multiple nodes for better execution (Mon jan 6 2025)
|
8 |
+
def execute(workflow):
|
9 |
+
|
10 |
+
print(workflow)
|
11 |
+
|
12 |
+
# Nodes that do not depend on any other node for input
|
13 |
+
unique_node = []
|
14 |
+
|
15 |
+
# Nodes that depend on other nodes for input
|
16 |
+
dependant_node = []
|
17 |
+
|
18 |
+
# Temporary storage for saving the output of the nodes
|
19 |
+
temp_data = []
|
20 |
+
|
21 |
+
# Assigning unique or dependent nodes to their respective array
|
22 |
+
for step in workflow['steps']:
|
23 |
+
# Check if 'config' contains a reference to other nodes
|
24 |
+
is_dependant = False
|
25 |
+
if 'config' in step and isinstance(step['config'], dict):
|
26 |
+
for key, value in step['config'].items():
|
27 |
+
# If the value contains a reference like 'node-1', 'node-2', etc.
|
28 |
+
if isinstance(value, str) and value.startswith('node-'):
|
29 |
+
is_dependant = True
|
30 |
+
break
|
31 |
+
|
32 |
+
if is_dependant:
|
33 |
+
dependant_node.append(step)
|
34 |
+
else:
|
35 |
+
unique_node.append(step)
|
36 |
+
|
37 |
+
for step in workflow['steps']:
|
38 |
+
print("executing step", step['node'], step['type'])
|
39 |
+
|
40 |
+
if step['type'] == "scrape_html":
|
41 |
+
for temp in temp_data:
|
42 |
+
if temp['node'] == step['config']['url']:
|
43 |
+
print(temp['data'])
|
44 |
+
data = scrape_html(temp['data'])
|
45 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
46 |
+
break
|
47 |
+
else:
|
48 |
+
data = scrape_html(step['config']['url'])
|
49 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
50 |
+
|
51 |
+
elif step['type'] == "scrape_images":
|
52 |
+
for temp in temp_data:
|
53 |
+
if temp['node'] == step['config']['data']:
|
54 |
+
filter = step['config']['filter']
|
55 |
+
data = scrape_images(data=temp['data'], filter=filter)
|
56 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
57 |
+
break
|
58 |
+
else:
|
59 |
+
data = step['config']['data']
|
60 |
+
filter = step['config']['filter']
|
61 |
+
response = scrape_images(data=data, filter=filter)
|
62 |
+
temp_data.append({'node': step['node'], 'data': response, 'type':step['type']})
|
63 |
+
|
64 |
+
elif step['type'] == "scrape_links":
|
65 |
+
for temp in temp_data:
|
66 |
+
if temp['node'] == step['config']['data']:
|
67 |
+
data = scrape_links(url = temp['data'], filter=step['config']['filter'])
|
68 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
69 |
+
break
|
70 |
+
else:
|
71 |
+
data = scrape_links(url = step['config']['data'], filter=step['config']['filter'])
|
72 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
73 |
+
|
74 |
+
elif step['type'] == "scrape_metadata":
|
75 |
+
for temp in temp_data:
|
76 |
+
if temp['node'] == step['config']['data']:
|
77 |
+
data = scrape_text(temp['data'])
|
78 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
79 |
+
break
|
80 |
+
else:
|
81 |
+
data = step['config']['data']
|
82 |
+
scrape_text(data)
|
83 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
84 |
+
|
85 |
+
elif step['type'] == "scrape_text":
|
86 |
+
for temp in temp_data:
|
87 |
+
if temp['node'] == step['config']['data']:
|
88 |
+
data = scrape_text(str(temp['data']))
|
89 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
90 |
+
break
|
91 |
+
else:
|
92 |
+
data = step['config']['data']
|
93 |
+
scrape_text(data)
|
94 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
95 |
+
|
96 |
+
elif step['type'] == "create_tweet":
|
97 |
+
for temp in temp_data:
|
98 |
+
print("node", temp['node'])
|
99 |
+
if temp['node'] == step['config']['data']:
|
100 |
+
print('foudntemp node')
|
101 |
+
login = step['config']['login']
|
102 |
+
data = create_tweet(text=temp['data'], login=login)
|
103 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
104 |
+
break
|
105 |
+
else:
|
106 |
+
print('not found node')
|
107 |
+
data = step['config']['data']
|
108 |
+
login = step['config']['login']
|
109 |
+
data = create_tweet(text=data, login=login)
|
110 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
111 |
+
|
112 |
+
elif step['type'] == "create_reddit":
|
113 |
+
for temp in temp_data:
|
114 |
+
print("node", temp['node'])
|
115 |
+
if temp['node'] == step['config']['data']:
|
116 |
+
print('foud temp node')
|
117 |
+
username = step['config']['username']
|
118 |
+
password = step['config']['password']
|
119 |
+
subreddit = step['config']['subreddit']
|
120 |
+
client_id = step['config']['id']
|
121 |
+
client_secret = step['config']['secret']
|
122 |
+
title = step['config']['secret']
|
123 |
+
data = reddit_post(client_id, client_secret, username, password, subreddit, title, body=temp['data'])
|
124 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
125 |
+
break
|
126 |
+
else:
|
127 |
+
username = step['config']['username']
|
128 |
+
password = step['config']['password']
|
129 |
+
subreddit = step['config']['subreddit']
|
130 |
+
client_id = step['config']['id']
|
131 |
+
client_secret = step['config']['secret']
|
132 |
+
title = step['config']['secret']
|
133 |
+
data = reddit_post(client_id, client_secret, username, password, subreddit, title, body=step['config']['data'])
|
134 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
135 |
+
|
136 |
+
elif step["type"] == "ai_response":
|
137 |
+
for temp in temp_data:
|
138 |
+
if temp['node'] == step['config']['data']:
|
139 |
+
data = generate_ai_response(step['config']['prompt'] + str(temp['data']))
|
140 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
141 |
+
break
|
142 |
+
else:
|
143 |
+
data = generate_ai_response(prompt=step['config']['prompt'])
|
144 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
145 |
+
|
146 |
+
|
147 |
+
elif step['type'] == "scrape_div":
|
148 |
+
for temp in temp_data:
|
149 |
+
if temp['node'] == step['config']['data']:
|
150 |
+
data = scrape_div(data=temp['data'], div=step['config']['class'])
|
151 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
152 |
+
break
|
153 |
+
else:
|
154 |
+
data = step['config']['data']
|
155 |
+
class_ = step['config']['class']
|
156 |
+
data = scrape_div(data=data, div=class_)
|
157 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
158 |
+
|
159 |
+
|
160 |
+
elif step['type'] == "extract_element":
|
161 |
+
for temp in temp_data:
|
162 |
+
if temp['node'] == step['config']['data']:
|
163 |
+
print(step['config']['data'])
|
164 |
+
data = extract_element(list_=temp['data'], index=step['config']['index'], value=step['config']['value'])
|
165 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
166 |
+
break
|
167 |
+
else:
|
168 |
+
data = extract_element(list_=step['config']['data'], index=step['config']['index'], value=step['config']['value'])
|
169 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
170 |
+
|
171 |
+
|
172 |
+
elif step['type'] == "create_list":
|
173 |
+
data = step['config']['data']
|
174 |
+
# Convert string representation of list to actual list
|
175 |
+
if isinstance(data, str):
|
176 |
+
try:
|
177 |
+
data = eval(data)
|
178 |
+
except:
|
179 |
+
pass
|
180 |
+
temp_data.append({'node': step['node'], 'data': data, 'type':step['type']})
|
181 |
+
|
182 |
+
elif step['type'] == "reddit_post":
|
183 |
+
title = step['config']['title']
|
184 |
+
content = step['config']['data']
|
185 |
+
|
186 |
+
# Check if title is from another node
|
187 |
+
for temp in temp_data:
|
188 |
+
if temp['node'] == title:
|
189 |
+
title = temp['data']
|
190 |
+
break
|
191 |
+
|
192 |
+
# Check if content is from another node
|
193 |
+
for temp in temp_data:
|
194 |
+
if temp['node'] == content:
|
195 |
+
content = temp['data']
|
196 |
+
break
|
197 |
+
|
198 |
+
data = reddit_post(
|
199 |
+
client_id=step['config']['client_id'],
|
200 |
+
client_secret=step['config']['client_secret'],
|
201 |
+
username=step['config']['username'],
|
202 |
+
password=step['config']['password'],
|
203 |
+
subreddit=step['config']['subreddit'],
|
204 |
+
title=title,
|
205 |
+
body=content
|
206 |
+
)
|
207 |
+
temp_data.append({'node': step['node'], 'data': data, 'type': step['type']})
|
208 |
+
|
209 |
+
return temp_data
|
210 |
+
|
nodes/.DS_Store
ADDED
Binary file (6.15 kB). View file
|
|
nodes/llm/.DS_Store
ADDED
Binary file (6.15 kB). View file
|
|
nodes/llm/__pycache__/textmodel.cpython-312.pyc
ADDED
Binary file (5.16 kB). View file
|
|
nodes/llm/textmodel.py
ADDED
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import Union, Dict
|
2 |
+
import requests
|
3 |
+
from tenacity import retry, stop_after_attempt, wait_exponential
|
4 |
+
import time
|
5 |
+
import json
|
6 |
+
|
7 |
+
class AIResponseError(Exception):
|
8 |
+
"""Custom exception for AI response errors"""
|
9 |
+
pass
|
10 |
+
|
11 |
+
@retry(
|
12 |
+
stop=stop_after_attempt(3),
|
13 |
+
wait=wait_exponential(multiplier=1, min=4, max=10),
|
14 |
+
reraise=True
|
15 |
+
)
|
16 |
+
def ai_response(prompt: str) -> Union[str, Dict[str, str]]:
|
17 |
+
cookies = {
|
18 |
+
'intercom-id-evnv2y8k': 'fea4d452-f9be-42e0-93e3-1e47a3836362',
|
19 |
+
'intercom-device-id-evnv2y8k': '2bb3e469-0159-4b6b-a33e-1aea4b51ccb1',
|
20 |
+
'__stripe_mid': 'e0f7c1ba-56c6-44d4-ba1d-cf4611453eb43cf922',
|
21 |
+
'state-csrf': 'z4pfq6gvoqmg92gkq6bljm',
|
22 |
+
'together_auth_cookie': '%7B%22expires%22%3A%222026-03-11T14%3A02%3A04.928Z%22%2C%22session%22%3A%22b672ad1b7784bcbb96a5b43058d3d4fbd8327f32dd020f12664307eed353c1b86f1e0d515a4c8b2d990dc5017ed1f13cd7514dee6263bcbd9e03446143245ba0c21968f273967cdb73dd6fedb0a9ff2b65a3ed2ce66b2cd4f94053c747be019d93327fa1f6b24bca9a559ba98ec48f2b51c3be242891d86bb670453120eed64e%22%7D',
|
23 |
+
'AMP_MKTG_7112ee0414': 'JTdCJTIycmVmZXJyZXIlMjIlM0ElMjJodHRwcyUzQSUyRiUyRmFjY291bnRzLmdvb2dsZS5jb20lMkYlMjIlMkMlMjJyZWZlcnJpbmdfZG9tYWluJTIyJTNBJTIyYWNjb3VudHMuZ29vZ2xlLmNvbSUyMiU3RA==',
|
24 |
+
'intercom-session-evnv2y8k': 'UWtNaFEraEJ3ZzcydXlwUC94MHhPcGg3eGZ6RXJkM2c3a2J3R1dwUGR4RWRzQnozWFNLQ0tqbW5za0gvU3RodmNZNXh4NVhRL3I5RWhwNjZKRnd5M21XRm9sZUZhTm05ZUUvaXMxZEYrNjQ9LS1OS0dDcFpuZGRCRE5XaWkxcDVZOEtBPT0=--83289f02195d8a45658bb26d7036c1bf9cfe9887',
|
25 |
+
'__stripe_sid': '7e7f0bab-efaa-4ec6-ae34-2857cccc4f644bc033',
|
26 |
+
'AMP_7112ee0414': 'JTdCJTIyZGV2aWNlSWQlMjIlM0ElMjI5NGU0MzFjOS02OTM0LTQwMGItYTk3Ni0yZjEyNzZmNjg4YzklMjIlMkMlMjJ1c2VySWQlMjIlM0ElMjI2N2I4M2E1Y2Q4MzFiZTcxYjAyYjM4MmElMjIlMkMlMjJzZXNzaW9uSWQlMjIlM0ExNzQxNzAxNzE5MDE5JTJDJTIyb3B0T3V0JTIyJTNBZmFsc2UlMkMlMjJsYXN0RXZlbnRUaW1lJTIyJTNBMTc0MTcwMTc1NTU4MSUyQyUyMmxhc3RFdmVudElkJTIyJTNBODklMkMlMjJwYWdlQ291bnRlciUyMiUzQTQlN0Q=',
|
27 |
+
}
|
28 |
+
|
29 |
+
headers = {
|
30 |
+
'accept': '*/*',
|
31 |
+
'accept-language': 'en-US,en;q=0.9,ja;q=0.8',
|
32 |
+
'authorization': 'Bearer bb80c2632e2d0ee9c8b5208fcfca771159cf0fd8f9b06404c9f2103ca936310e',
|
33 |
+
'content-type': 'application/json',
|
34 |
+
'origin': 'https://api.together.ai',
|
35 |
+
'priority': 'u=1, i',
|
36 |
+
'referer': 'https://api.together.ai/playground/chat/meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo',
|
37 |
+
'sec-ch-ua': '"Chromium";v="134", "Not:A-Brand";v="24", "Google Chrome";v="134"',
|
38 |
+
'sec-ch-ua-mobile': '?0',
|
39 |
+
'sec-ch-ua-platform': '"macOS"',
|
40 |
+
'sec-fetch-dest': 'empty',
|
41 |
+
'sec-fetch-mode': 'cors',
|
42 |
+
'sec-fetch-site': 'same-origin',
|
43 |
+
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36',
|
44 |
+
# 'cookie': 'intercom-id-evnv2y8k=fea4d452-f9be-42e0-93e3-1e47a3836362; intercom-device-id-evnv2y8k=2bb3e469-0159-4b6b-a33e-1aea4b51ccb1; __stripe_mid=e0f7c1ba-56c6-44d4-ba1d-cf4611453eb43cf922; state-csrf=z4pfq6gvoqmg92gkq6bljm; together_auth_cookie=%7B%22expires%22%3A%222026-03-11T14%3A02%3A04.928Z%22%2C%22session%22%3A%22b672ad1b7784bcbb96a5b43058d3d4fbd8327f32dd020f12664307eed353c1b86f1e0d515a4c8b2d990dc5017ed1f13cd7514dee6263bcbd9e03446143245ba0c21968f273967cdb73dd6fedb0a9ff2b65a3ed2ce66b2cd4f94053c747be019d93327fa1f6b24bca9a559ba98ec48f2b51c3be242891d86bb670453120eed64e%22%7D; AMP_MKTG_7112ee0414=JTdCJTIycmVmZXJyZXIlMjIlM0ElMjJodHRwcyUzQSUyRiUyRmFjY291bnRzLmdvb2dsZS5jb20lMkYlMjIlMkMlMjJyZWZlcnJpbmdfZG9tYWluJTIyJTNBJTIyYWNjb3VudHMuZ29vZ2xlLmNvbSUyMiU3RA==; intercom-session-evnv2y8k=UWtNaFEraEJ3ZzcydXlwUC94MHhPcGg3eGZ6RXJkM2c3a2J3R1dwUGR4RWRzQnozWFNLQ0tqbW5za0gvU3RodmNZNXh4NVhRL3I5RWhwNjZKRnd5M21XRm9sZUZhTm05ZUUvaXMxZEYrNjQ9LS1OS0dDcFpuZGRCRE5XaWkxcDVZOEtBPT0=--83289f02195d8a45658bb26d7036c1bf9cfe9887; __stripe_sid=7e7f0bab-efaa-4ec6-ae34-2857cccc4f644bc033; AMP_7112ee0414=JTdCJTIyZGV2aWNlSWQlMjIlM0ElMjI5NGU0MzFjOS02OTM0LTQwMGItYTk3Ni0yZjEyNzZmNjg4YzklMjIlMkMlMjJ1c2VySWQlMjIlM0ElMjI2N2I4M2E1Y2Q4MzFiZTcxYjAyYjM4MmElMjIlMkMlMjJzZXNzaW9uSWQlMjIlM0ExNzQxNzAxNzE5MDE5JTJDJTIyb3B0T3V0JTIyJTNBZmFsc2UlMkMlMjJsYXN0RXZlbnRUaW1lJTIyJTNBMTc0MTcwMTc1NTU4MSUyQyUyMmxhc3RFdmVudElkJTIyJTNBODklMkMlMjJwYWdlQ291bnRlciUyMiUzQTQlN0Q=',
|
45 |
+
}
|
46 |
+
model = 'meta-llama/Llama-Vision-Free'
|
47 |
+
|
48 |
+
current_messages = []
|
49 |
+
|
50 |
+
current_messages.append({
|
51 |
+
'content': [{
|
52 |
+
'type': 'text',
|
53 |
+
'text': prompt
|
54 |
+
}],
|
55 |
+
'role': 'user'
|
56 |
+
})
|
57 |
+
|
58 |
+
json_data = {
|
59 |
+
'model': model,
|
60 |
+
'max_tokens': None,
|
61 |
+
'temperature': 0.7,
|
62 |
+
'top_p': 0.7,
|
63 |
+
'top_k': 50,
|
64 |
+
'repetition_penalty': 1,
|
65 |
+
'stream_tokens': True,
|
66 |
+
'stop': ['<|eot_id|>', '<|eom_id|>'],
|
67 |
+
'messages': current_messages,
|
68 |
+
'stream': True,
|
69 |
+
}
|
70 |
+
|
71 |
+
response_text = ""
|
72 |
+
max_retries = 5
|
73 |
+
base_delay = 1 # Initial delay in seconds
|
74 |
+
|
75 |
+
for attempt in range(max_retries):
|
76 |
+
response = requests.post('https://api.together.ai/inference', cookies=cookies, headers=headers, json=json_data, stream=True)
|
77 |
+
|
78 |
+
if response.status_code == 200:
|
79 |
+
for line in response.iter_lines():
|
80 |
+
if line:
|
81 |
+
decoded_line = line.decode('utf-8')
|
82 |
+
if decoded_line.startswith('data: '):
|
83 |
+
data = decoded_line[6:] # Remove 'data: ' prefix
|
84 |
+
if data == '[DONE]':
|
85 |
+
return response_text
|
86 |
+
try:
|
87 |
+
json_response = json.loads(data)
|
88 |
+
if 'choices' in json_response and json_response['choices']:
|
89 |
+
text = json_response['choices'][0].get('text', '')
|
90 |
+
response_text += text
|
91 |
+
except json.JSONDecodeError:
|
92 |
+
continue
|
93 |
+
return response_text
|
94 |
+
elif response.status_code == 429:
|
95 |
+
if attempt < max_retries - 1:
|
96 |
+
time.sleep(0.5)
|
97 |
+
continue
|
98 |
+
raise AIResponseError("Rate limited, maximum retries reached")
|
99 |
+
else:
|
100 |
+
raise AIResponseError(f"Unexpected status code: {response.status_code}")
|
101 |
+
|
102 |
+
raise AIResponseError("Maximum retries reached")
|
103 |
+
|
nodes/processing/__pycache__/list.cpython-312.pyc
ADDED
Binary file (1.31 kB). View file
|
|
nodes/processing/list.py
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import random
|
2 |
+
|
3 |
+
def extract_element(index, list_, value):
|
4 |
+
try:
|
5 |
+
if value == "false":
|
6 |
+
if index == "random":
|
7 |
+
data = random.choice(list_)
|
8 |
+
return data
|
9 |
+
else:
|
10 |
+
data = list_[int(index)]
|
11 |
+
return data
|
12 |
+
else:
|
13 |
+
if index == "random":
|
14 |
+
data = random.choice(list_)
|
15 |
+
return data[0] if isinstance(data, list) else data
|
16 |
+
else:
|
17 |
+
data = list_[int(index)]
|
18 |
+
return data[0] if isinstance(data, list) else data
|
19 |
+
except Exception as e:
|
20 |
+
raise ValueError(f"Error extracting element: {e}")
|
21 |
+
|
22 |
+
def extract_data(name, list_):
|
23 |
+
try:
|
24 |
+
data = list_[name]
|
25 |
+
return data
|
26 |
+
except Exception as e:
|
27 |
+
raise ValueError(f"Error extracting data: {e}")
|
28 |
+
|
29 |
+
def create_list(list_):
|
30 |
+
try:
|
31 |
+
data = []
|
32 |
+
for ele in list_:
|
33 |
+
data.append(ele)
|
34 |
+
return data
|
35 |
+
except Exception as e:
|
36 |
+
raise ValueError(f"Error creating list: {e}")
|
nodes/processing/requests.py
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from curl_cffi import requests as curl_requests
|
2 |
+
import json
|
3 |
+
|
4 |
+
def custom_requests(url, method='GET', res_type='json', kwargs=None):
|
5 |
+
"""
|
6 |
+
Make a custom HTTP request
|
7 |
+
|
8 |
+
Args:
|
9 |
+
url (str): The URL to make the request to
|
10 |
+
method (str): HTTP method (GET, POST, PUT, DELETE)
|
11 |
+
res_type (str): Response type (json or text)
|
12 |
+
kwargs (dict): Additional arguments for the request (headers, body, etc.)
|
13 |
+
|
14 |
+
Returns:
|
15 |
+
dict/str: Response data based on res_type
|
16 |
+
"""
|
17 |
+
try:
|
18 |
+
# Parse kwargs if it's a string
|
19 |
+
if isinstance(kwargs, str):
|
20 |
+
kwargs = json.loads(kwargs)
|
21 |
+
elif kwargs is None:
|
22 |
+
kwargs = {}
|
23 |
+
|
24 |
+
# Make the request
|
25 |
+
response = curl_requests.request(
|
26 |
+
method=method.upper(),
|
27 |
+
url=url,
|
28 |
+
**kwargs,
|
29 |
+
impersonate='chrome101'
|
30 |
+
)
|
31 |
+
|
32 |
+
# Raise for bad status
|
33 |
+
response.raise_for_status()
|
34 |
+
|
35 |
+
# Return based on response type
|
36 |
+
if res_type.lower() == 'json':
|
37 |
+
return response.json()
|
38 |
+
else:
|
39 |
+
return response.text
|
40 |
+
|
41 |
+
except curl_requests.exceptions.RequestException as e:
|
42 |
+
return {"error": str(e)}
|
43 |
+
except json.JSONDecodeError:
|
44 |
+
return {"error": "Invalid JSON in response"}
|
45 |
+
|
nodes/scraping/.DS_Store
ADDED
Binary file (6.15 kB). View file
|
|
nodes/scraping/__pycache__/consolidated.cpython-312.pyc
ADDED
Binary file (17.8 kB). View file
|
|
nodes/scraping/__pycache__/html.cpython-312.pyc
ADDED
Binary file (17.2 kB). View file
|
|
nodes/scraping/html.py
ADDED
@@ -0,0 +1,249 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from curl_cffi import requests as req
|
2 |
+
from bs4 import BeautifulSoup
|
3 |
+
import logging
|
4 |
+
from typing import Union, List, Dict, Optional
|
5 |
+
from urllib.parse import urljoin, urlparse
|
6 |
+
|
7 |
+
# Configure logging
|
8 |
+
logging.basicConfig(level=logging.INFO)
|
9 |
+
logger = logging.getLogger(__name__)
|
10 |
+
|
11 |
+
class ScrapingError(Exception):
|
12 |
+
"""Custom exception for scraping errors"""
|
13 |
+
pass
|
14 |
+
|
15 |
+
def validate_url(url: str) -> bool:
|
16 |
+
"""Validate if the given URL is properly formatted"""
|
17 |
+
try:
|
18 |
+
result = urlparse(url)
|
19 |
+
return all([result.scheme, result.netloc])
|
20 |
+
except Exception:
|
21 |
+
return False
|
22 |
+
|
23 |
+
def clean_url(url: str) -> str:
|
24 |
+
"""Clean and normalize URL"""
|
25 |
+
if url.startswith('//'):
|
26 |
+
return f'https:{url}'
|
27 |
+
return url
|
28 |
+
|
29 |
+
def scrape_html(url: str) -> Union[str, Dict[str, str]]:
|
30 |
+
"""
|
31 |
+
Fetch HTML content from a URL with improved error handling
|
32 |
+
|
33 |
+
Args:
|
34 |
+
url (str): The URL to scrape
|
35 |
+
|
36 |
+
Returns:
|
37 |
+
str: HTML content if successful
|
38 |
+
dict: Error information if failed
|
39 |
+
"""
|
40 |
+
try:
|
41 |
+
if not validate_url(url):
|
42 |
+
return {"error": "Invalid URL format"}
|
43 |
+
|
44 |
+
response = req.get(
|
45 |
+
url,
|
46 |
+
impersonate='chrome110',
|
47 |
+
timeout=30,
|
48 |
+
max_redirects=5
|
49 |
+
)
|
50 |
+
|
51 |
+
# Check if response is HTML
|
52 |
+
content_type = response.headers.get('content-type', '').lower()
|
53 |
+
if 'text/html' not in content_type:
|
54 |
+
return {"error": f"Unexpected content type: {content_type}"}
|
55 |
+
|
56 |
+
return response.text
|
57 |
+
|
58 |
+
except Exception as e:
|
59 |
+
logger.error(f"Unexpected error while scraping {url}: {str(e)}")
|
60 |
+
return {"error": f"Unexpected error: {str(e)}"}
|
61 |
+
|
62 |
+
def scrape_images(data: str, filter: str = "") -> Union[List[str], Dict[str, str]]:
|
63 |
+
"""
|
64 |
+
Extract image URLs from HTML content with improved filtering and validation
|
65 |
+
|
66 |
+
Args:
|
67 |
+
data (str): HTML content
|
68 |
+
filter (str): Optional filter string for URLs
|
69 |
+
|
70 |
+
Returns:
|
71 |
+
list: List of image URLs if successful
|
72 |
+
dict: Error information if failed
|
73 |
+
"""
|
74 |
+
try:
|
75 |
+
if not data:
|
76 |
+
return {"error": "No HTML content provided"}
|
77 |
+
|
78 |
+
soup = BeautifulSoup(data, 'html.parser')
|
79 |
+
images = []
|
80 |
+
|
81 |
+
# Look for both img tags and background images in style attributes
|
82 |
+
for img in soup.find_all('img'):
|
83 |
+
src = img.get('src') or img.get('data-src')
|
84 |
+
if src:
|
85 |
+
src = clean_url(src)
|
86 |
+
if validate_url(src) and (not filter or filter.lower() in src.lower()):
|
87 |
+
images.append(src)
|
88 |
+
|
89 |
+
# Look for background images in style attributes
|
90 |
+
for elem in soup.find_all(style=True):
|
91 |
+
style = elem['style']
|
92 |
+
if 'background-image' in style:
|
93 |
+
url_start = style.find('url(') + 4
|
94 |
+
url_end = style.find(')', url_start)
|
95 |
+
if url_start > 4 and url_end != -1:
|
96 |
+
src = style[url_start:url_end].strip('"\'')
|
97 |
+
src = clean_url(src)
|
98 |
+
if validate_url(src) and (not filter or filter.lower() in src.lower()):
|
99 |
+
images.append(src)
|
100 |
+
|
101 |
+
return list(set(images)) # Remove duplicates
|
102 |
+
|
103 |
+
except Exception as e:
|
104 |
+
logger.error(f"Error extracting images: {str(e)}")
|
105 |
+
return {"error": f"Failed to extract images: {str(e)}"}
|
106 |
+
|
107 |
+
def scrape_links(url: str, filter: str = "") -> Union[List[str], Dict[str, str]]:
|
108 |
+
"""
|
109 |
+
Extract links from a webpage with improved validation and error handling
|
110 |
+
|
111 |
+
Args:
|
112 |
+
url (str): URL to scrape
|
113 |
+
filter (str): Optional filter for links
|
114 |
+
|
115 |
+
Returns:
|
116 |
+
list: List of links if successful
|
117 |
+
dict: Error information if failed
|
118 |
+
"""
|
119 |
+
try:
|
120 |
+
if not validate_url(url):
|
121 |
+
return {"error": "Invalid URL format"}
|
122 |
+
|
123 |
+
|
124 |
+
print(url)
|
125 |
+
response = req.get(url, impersonate='chrome110')
|
126 |
+
|
127 |
+
soup = BeautifulSoup(response.text, 'html.parser')
|
128 |
+
links = []
|
129 |
+
base_url = url
|
130 |
+
|
131 |
+
try:
|
132 |
+
|
133 |
+
for a in soup.find_all('a', href=True):
|
134 |
+
href = a['href']
|
135 |
+
# Convert relative URLs to absolute
|
136 |
+
full_url = urljoin(base_url, href)
|
137 |
+
|
138 |
+
if validate_url(full_url) and (not filter or filter.lower() in full_url.lower()):
|
139 |
+
links.append(full_url)
|
140 |
+
|
141 |
+
return list(set(links)) # Remove duplicates
|
142 |
+
|
143 |
+
except Exception as e:
|
144 |
+
logger.error(f"Error processing links: {str(e)}")
|
145 |
+
return {"error": f"Failed to process links: {str(e)}"}
|
146 |
+
|
147 |
+
except Exception as e:
|
148 |
+
logger.error(f"Error extracting links: {str(e)}")
|
149 |
+
return {"error": f"Failed to extract links: {str(e)}"}
|
150 |
+
|
151 |
+
def scrape_text(data: str) -> Union[str, Dict[str, str]]:
|
152 |
+
"""
|
153 |
+
Extract clean text content from HTML
|
154 |
+
|
155 |
+
Args:
|
156 |
+
data (str): HTML content
|
157 |
+
|
158 |
+
Returns:
|
159 |
+
str: Extracted text if successful
|
160 |
+
dict: Error information if failed
|
161 |
+
"""
|
162 |
+
try:
|
163 |
+
if not data:
|
164 |
+
return {"error": "No HTML content provided"}
|
165 |
+
|
166 |
+
soup = BeautifulSoup(data, 'html.parser')
|
167 |
+
|
168 |
+
# Remove script and style elements
|
169 |
+
for element in soup(['script', 'style', 'head']):
|
170 |
+
element.decompose()
|
171 |
+
|
172 |
+
# Get text and clean it
|
173 |
+
text = soup.get_text(separator='\n')
|
174 |
+
# Remove excessive newlines and whitespace
|
175 |
+
text = '\n'.join(line.strip() for line in text.split('\n') if line.strip())
|
176 |
+
|
177 |
+
return text
|
178 |
+
|
179 |
+
except Exception as e:
|
180 |
+
logger.error(f"Error extracting text: {str(e)}")
|
181 |
+
return {"error": f"Failed to extract text: {str(e)}"}
|
182 |
+
|
183 |
+
def scrape_div(data: str, div: str) -> Union[List[str], Dict[str, str]]:
|
184 |
+
"""
|
185 |
+
Extract content from specific div elements
|
186 |
+
|
187 |
+
Args:
|
188 |
+
data (str): HTML content
|
189 |
+
div (str): Class or ID of the div to scrape
|
190 |
+
|
191 |
+
Returns:
|
192 |
+
list: List of div contents if successful
|
193 |
+
dict: Error information if failed
|
194 |
+
"""
|
195 |
+
try:
|
196 |
+
if not data:
|
197 |
+
return {"error": "No HTML content provided"}
|
198 |
+
if not div:
|
199 |
+
return {"error": "No div selector provided"}
|
200 |
+
|
201 |
+
soup = BeautifulSoup(data, 'html.parser')
|
202 |
+
results = []
|
203 |
+
|
204 |
+
# Try class first
|
205 |
+
elements = soup.find_all(class_=div)
|
206 |
+
if not elements:
|
207 |
+
# Try ID if no class found
|
208 |
+
elements = soup.find_all(id=div)
|
209 |
+
if not elements:
|
210 |
+
return {"error": f"No elements found with class or ID: {div}"}
|
211 |
+
|
212 |
+
for element in elements:
|
213 |
+
# Get both text and HTML content
|
214 |
+
content = {
|
215 |
+
"text": element.get_text(strip=True),
|
216 |
+
"html": str(element)
|
217 |
+
}
|
218 |
+
results.append(content)
|
219 |
+
|
220 |
+
return results
|
221 |
+
|
222 |
+
except Exception as e:
|
223 |
+
logger.error(f"Error extracting div content: {str(e)}")
|
224 |
+
return {"error": f"Failed to extract div content: {str(e)}"}
|
225 |
+
|
226 |
+
# Function to scrape metadata
|
227 |
+
def scrape_metadata(data):
|
228 |
+
soup = BeautifulSoup(data, 'html.parser')
|
229 |
+
metadata = {}
|
230 |
+
for meta in soup.find_all('meta'):
|
231 |
+
name = meta.get('name') or meta.get('property')
|
232 |
+
content = meta.get('content')
|
233 |
+
if name and content:
|
234 |
+
metadata[name] = content
|
235 |
+
return metadata
|
236 |
+
|
237 |
+
# Function to scrape table data
|
238 |
+
def scrape_tables(data):
|
239 |
+
soup = BeautifulSoup(data, 'html.parser')
|
240 |
+
tables = []
|
241 |
+
for table in soup.find_all('table'):
|
242 |
+
rows = []
|
243 |
+
for row in table.find_all('tr'):
|
244 |
+
cells = [cell.get_text(strip=True) for cell in row.find_all(['th', 'td'])]
|
245 |
+
rows.append(cells)
|
246 |
+
tables.append(rows)
|
247 |
+
return tables
|
248 |
+
|
249 |
+
|
nodes/socialmedia/__pycache__/reddit.cpython-312.pyc
ADDED
Binary file (4.83 kB). View file
|
|
nodes/socialmedia/__pycache__/x.cpython-312.pyc
ADDED
Binary file (3.52 kB). View file
|
|
nodes/socialmedia/instagram.py
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import instabot
|
2 |
+
|
3 |
+
def insta_post(username, password, caption, image):
|
4 |
+
bot = instabot.Bot()
|
5 |
+
bot.login(username=username, password=password)
|
6 |
+
bot.upload_photo(image, caption=caption)
|
7 |
+
bot.logout()
|
8 |
+
|
9 |
+
|
nodes/socialmedia/reddit.py
ADDED
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import logging
|
2 |
+
from typing import Union, Dict
|
3 |
+
import praw
|
4 |
+
import random
|
5 |
+
import string
|
6 |
+
from tenacity import retry, stop_after_attempt, wait_exponential
|
7 |
+
|
8 |
+
# Configure logging
|
9 |
+
logging.basicConfig(level=logging.INFO)
|
10 |
+
logger = logging.getLogger(__name__)
|
11 |
+
|
12 |
+
class RedditError(Exception):
|
13 |
+
"""Custom exception for Reddit API errors"""
|
14 |
+
pass
|
15 |
+
|
16 |
+
def validate_reddit_credentials(client_id: str, client_secret: str, username: str, password: str) -> bool:
|
17 |
+
"""Validate Reddit credentials"""
|
18 |
+
return all([client_id, client_secret, username, password])
|
19 |
+
|
20 |
+
def generate_random_user_agent():
|
21 |
+
"""Generate a random user agent."""
|
22 |
+
prefix = "my_script_by_u/"
|
23 |
+
username = "your_reddit_username" # Replace with your Reddit username
|
24 |
+
random_suffix = ''.join(random.choices(string.ascii_letters + string.digits, k=10))
|
25 |
+
return f"{prefix}{username}_{random_suffix}"
|
26 |
+
|
27 |
+
@retry(
|
28 |
+
stop=stop_after_attempt(3),
|
29 |
+
wait=wait_exponential(multiplier=1, min=4, max=10),
|
30 |
+
reraise=True
|
31 |
+
)
|
32 |
+
def reddit_post(
|
33 |
+
client_id: str,
|
34 |
+
client_secret: str,
|
35 |
+
username: str,
|
36 |
+
password: str,
|
37 |
+
subreddit: str,
|
38 |
+
title: str,
|
39 |
+
body: str = ""
|
40 |
+
) -> Union[Dict[str, str], Dict[str, str]]:
|
41 |
+
"""
|
42 |
+
Create a Reddit post with improved error handling and retries
|
43 |
+
|
44 |
+
Args:
|
45 |
+
client_id (str): Reddit API client ID
|
46 |
+
client_secret (str): Reddit API client secret
|
47 |
+
username (str): Reddit username
|
48 |
+
password (str): Reddit password
|
49 |
+
subreddit (str): Target subreddit
|
50 |
+
title (str): Post title
|
51 |
+
body (str): Post content
|
52 |
+
|
53 |
+
Returns:
|
54 |
+
dict: Post information if successful
|
55 |
+
dict: Error information if failed
|
56 |
+
"""
|
57 |
+
try:
|
58 |
+
if not validate_reddit_credentials(client_id, client_secret, username, password):
|
59 |
+
return {"error": "Invalid or missing Reddit credentials"}
|
60 |
+
|
61 |
+
if not title:
|
62 |
+
return {"error": "Post title is required"}
|
63 |
+
|
64 |
+
if not subreddit:
|
65 |
+
return {"error": "Subreddit is required"}
|
66 |
+
|
67 |
+
# Initialize Reddit client
|
68 |
+
reddit = praw.Reddit(
|
69 |
+
client_id=client_id,
|
70 |
+
client_secret=client_secret,
|
71 |
+
username=username,
|
72 |
+
password=password,
|
73 |
+
user_agent=f"python:flowify:v1.0 (by /u/{username})"
|
74 |
+
)
|
75 |
+
|
76 |
+
# Verify credentials
|
77 |
+
try:
|
78 |
+
reddit.user.me()
|
79 |
+
except Exception:
|
80 |
+
return {"error": "Failed to authenticate with Reddit"}
|
81 |
+
|
82 |
+
# Create post
|
83 |
+
try:
|
84 |
+
subreddit_instance = reddit.subreddit(subreddit)
|
85 |
+
post = subreddit_instance.submit(
|
86 |
+
title=title,
|
87 |
+
selftext=body,
|
88 |
+
send_replies=True
|
89 |
+
)
|
90 |
+
|
91 |
+
return {
|
92 |
+
"success": True,
|
93 |
+
"post_id": post.id,
|
94 |
+
"url": f"https://reddit.com{post.permalink}"
|
95 |
+
}
|
96 |
+
|
97 |
+
except praw.exceptions.RedditAPIException as e:
|
98 |
+
error_messages = [f"{error.error_type}: {error.message}" for error in e.items]
|
99 |
+
return {"error": f"Reddit API error: {', '.join(error_messages)}"}
|
100 |
+
|
101 |
+
except praw.exceptions.PRAWException as e:
|
102 |
+
logger.error(f"PRAW error: {str(e)}")
|
103 |
+
return {"error": f"Reddit error: {str(e)}"}
|
104 |
+
except Exception as e:
|
105 |
+
logger.error(f"Unexpected error creating Reddit post: {str(e)}")
|
106 |
+
return {"error": f"Failed to create post: {str(e)}"}
|
nodes/socialmedia/x.py
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import logging
|
2 |
+
from typing import Union, Dict
|
3 |
+
import tweepy
|
4 |
+
from tenacity import retry, stop_after_attempt, wait_exponential
|
5 |
+
|
6 |
+
# Configure logging
|
7 |
+
logging.basicConfig(level=logging.INFO)
|
8 |
+
logger = logging.getLogger(__name__)
|
9 |
+
|
10 |
+
class TwitterError(Exception):
|
11 |
+
"""Custom exception for Twitter API errors"""
|
12 |
+
pass
|
13 |
+
|
14 |
+
def validate_credentials(login: Dict[str, str]) -> bool:
|
15 |
+
"""Validate Twitter API credentials"""
|
16 |
+
required_keys = ['api_key', 'secret_key', 'bearer_key', 'access_token', 'access_token_secret']
|
17 |
+
return all(key in login and login[key] for key in required_keys)
|
18 |
+
|
19 |
+
@retry(
|
20 |
+
stop=stop_after_attempt(3),
|
21 |
+
wait=wait_exponential(multiplier=1, min=4, max=10),
|
22 |
+
reraise=True
|
23 |
+
)
|
24 |
+
def create_tweet(text: str, login: Dict[str, str]) -> Union[Dict[str, str], Dict[str, str]]:
|
25 |
+
"""
|
26 |
+
Create a tweet with improved error handling and retries
|
27 |
+
|
28 |
+
Args:
|
29 |
+
text (str): Tweet content
|
30 |
+
login (dict): Twitter API credentials
|
31 |
+
|
32 |
+
Returns:
|
33 |
+
dict: Tweet information if successful
|
34 |
+
dict: Error information if failed
|
35 |
+
"""
|
36 |
+
try:
|
37 |
+
if not text:
|
38 |
+
return {"error": "No tweet content provided"}
|
39 |
+
|
40 |
+
if not validate_credentials(login):
|
41 |
+
return {"error": "Invalid or missing Twitter credentials"}
|
42 |
+
|
43 |
+
if len(text) > 280:
|
44 |
+
return {"error": "Tweet exceeds 280 characters"}
|
45 |
+
|
46 |
+
# Initialize Twitter client
|
47 |
+
client = tweepy.Client(
|
48 |
+
bearer_token=login['bearer_key'],
|
49 |
+
consumer_key=login['api_key'],
|
50 |
+
consumer_secret=login['secret_key'],
|
51 |
+
access_token=login['access_token'],
|
52 |
+
access_token_secret=login['access_token_secret']
|
53 |
+
)
|
54 |
+
|
55 |
+
# Create tweet
|
56 |
+
response = client.create_tweet(text=text)
|
57 |
+
|
58 |
+
if response and response.data:
|
59 |
+
tweet_id = response.data['id']
|
60 |
+
return {
|
61 |
+
"success": True,
|
62 |
+
"tweet_id": tweet_id,
|
63 |
+
"url": f"https://twitter.com/user/status/{tweet_id}"
|
64 |
+
}
|
65 |
+
else:
|
66 |
+
return {"error": "Failed to create tweet: No response data"}
|
67 |
+
|
68 |
+
except tweepy.TweepyException as e:
|
69 |
+
logger.error(f"Twitter API error: {str(e)}")
|
70 |
+
return {"error": f"Twitter API error: {str(e)}"}
|
71 |
+
except Exception as e:
|
72 |
+
logger.error(f"Unexpected error creating tweet: {str(e)}")
|
73 |
+
return {"error": f"Failed to create tweet: {str(e)}"}
|
requirements.txt
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
beautifulsoup4==4.13.3
|
2 |
+
curl_cffi==0.7.4
|
3 |
+
fastapi==0.115.12
|
4 |
+
instabot==0.117.0
|
5 |
+
linkedin_api==2.3.1
|
6 |
+
praw==7.8.1
|
7 |
+
Pyrebase4==4.8.0
|
8 |
+
tenacity==9.0.0
|
9 |
+
tweepy==4.14.0
|
10 |
+
uvicorn
|
run.py
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from app import app
|
2 |
+
|
3 |
+
if __name__ == '__main__':
|
4 |
+
app.run(port=5000, debug=True)
|
5 |
+
|