arpit13 commited on
Commit
011960a
Β·
1 Parent(s): b853c37

Deploy Whale_Arbitrum on HF Spaces

Browse files
.env ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Your current API key appears to be having issues
2
+ # Please replace it with your own key from https://arbiscan.io/myapikey
3
+ # Uncomment one of the API keys below or add your own
4
+ ARBISCAN_API_KEY=4YEN1UTUEZ8I8ZBWSZW5NH6ZDFYEUVKQ5U
5
+ # ARBISCAN_API_KEY=HVZC2W3IZWCGJWS8QDBZ56D1GZZNDJMZ25
6
+
7
+ # Gemini API key for price data
8
+ GEMINI_API_KEY=AIzaSyCyble5D3dlgPxDXWLlaZmu8hOM_nt-V6M
9
+
10
+ # OpenAI API key for CrewAI functionality
11
+ OPENAI_API_KEY=your-openai-api-key
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Whale Wallet AI – Market Manipulation Detection
2
+
3
+ A powerful Streamlit-based tool that tracks large holders ("whales") on the Arbitrum network to uncover potential market manipulation tactics.
4
+
5
+ ## 1. Prerequisites & Setup
6
+
7
+ ### 1.1. Python & Dependencies
8
+ - Ensure you have Python 3.8+ installed.
9
+ - Install required packages via:
10
+ ```bash
11
+ pip install -r requirements.txt
12
+ ```
13
+
14
+ ### 1.2. API Keys
15
+ You need API keys to fetch on-chain data and real-time prices:
16
+ - **ARBISCAN_API_KEY**: For fetching Arbitrum transaction data
17
+ - **GEMINI_API_KEY**: For retrieving live token prices
18
+ - **OPENAI_API_KEY**: For powering the CrewAI agents
19
+
20
+ Save these in a file named `.env` at the project root:
21
+ ```env
22
+ ARBISCAN_API_KEY=your_arbiscan_key
23
+ GEMINI_API_KEY=your_gemini_key
24
+ OPENAI_API_KEY=your_openai_key
25
+ ```
26
+ Note: Sample API keys are provided in the default .env file, but you should replace them with your own for production use.
27
+
28
+ ### 1.3. Run the App
29
+ Launch the web interface with:
30
+ ```bash
31
+ streamlit run app.py
32
+ ```
33
+
34
+ ## 2. Core Features & How to Use Them
35
+
36
+ ### 2.1 Track Large Buy/Sell Transactions
37
+
38
+ **What it does:**
39
+ Monitors on-chain transfers exceeding a configurable threshold (e.g., 1,000 tokens or $100K) for any wallet or contract you specify.
40
+
41
+ **How to use:**
42
+ 1. In the sidebar, enter one or more wallet addresses
43
+ 2. Set your minimum token or USD value filter
44
+ 3. Click **Track Transactions**
45
+ 4. The dashboard will list incoming/outgoing transfers above the threshold.
46
+
47
+ ### 2.2 Identify Trading Patterns of Whale Wallets
48
+
49
+ **What it does:**
50
+ Uses time-series clustering and sequence analysis to surface recurring behaviors (e.g., cyclical dumping, accumulation bursts).
51
+
52
+ **How to use:**
53
+ 1. Select a wallet address
54
+ 2. Choose a time period (e.g., last 7 days)
55
+ 3. Click **Analyze Patterns**
56
+ 4. View a summary of detected clusters and drill down into individual events.
57
+
58
+ ### 2.3 Analyze Impact of Whale Transactions on Token Prices
59
+
60
+ **What it does:**
61
+ Correlates large trades against minute-by-minute price ticks to quantify slippage, price spikes, or dumps.
62
+
63
+ **How to use:**
64
+ 1. Enable **Price Impact** analysis in settings
65
+ 2. Specify lookback/lookahead windows (e.g., 5 minutes)
66
+ 3. Click **Run Impact Analysis**
67
+ 4. See interactive line charts and slippage metrics.
68
+
69
+ ### 2.4 Detect Potential Market Manipulation Techniques
70
+
71
+ **What it does:**
72
+ Automatically flags suspicious behaviors such as:
73
+ - **Pump-and-Dump:** Rapid buys followed by coordinated sell-offs
74
+ - **Wash Trading:** Self-trading across multiple addresses
75
+ - **Spoofing:** Large orders placed then canceled
76
+
77
+ **How to use:**
78
+ 1. Toggle **Manipulation Detection** on
79
+ 2. Adjust sensitivity slider (Low/Medium/High)
80
+ 3. Click **Detect**
81
+ 4. Examine the **Alerts** panel for flagged events.
82
+
83
+ ### 2.5 Generate Reports & Visualizations
84
+
85
+ **What it does:**
86
+ Compiles whale activity into PDF/CSV summaries and interactive charts.
87
+
88
+ **How to use:**
89
+ 1. Select **Export** in the top menu
90
+ 2. Choose **CSV**, **PDF**, or **PNG**
91
+ 3. Specify time range and wallets to include
92
+ 4. Click **Download**
93
+ 5. Saved file will appear in your browser's download folder.
94
+
95
+ ## 3. Advanced Features: CrewAI Integration
96
+
97
+ This application leverages CrewAI to provide advanced analysis through specialized AI agents:
98
+
99
+ - **Blockchain Data Collector**: Extracts and organizes on-chain data
100
+ - **Price Impact Analyst**: Correlates trading activity with price movements
101
+ - **Trading Pattern Detector**: Identifies recurring behavioral patterns
102
+ - **Market Manipulation Investigator**: Detects potential market abuse
103
+ - **Insights Reporter**: Transforms data into actionable intelligence
104
+
105
+ ## 4. Project Structure
106
+
107
+ ```
108
+ /Whale_Arbitrum/
109
+ β”œβ”€β”€ app.py # Main Streamlit application entry point
110
+ β”œβ”€β”€ requirements.txt # Dependencies and package versions
111
+ β”œβ”€β”€ .env # API keys and environment variables
112
+ β”œβ”€β”€ modules/
113
+ β”‚ β”œβ”€β”€ api_client.py # Arbiscan and Gemini API clients
114
+ β”‚ β”œβ”€β”€ data_processor.py # Data processing and analysis
115
+ β”‚ β”œβ”€β”€ detection.py # Market manipulation detection algorithms
116
+ β”‚ β”œβ”€β”€ visualizer.py # Visualization and report generation
117
+ β”‚ └── crew_system.py # CrewAI agentic system
118
+ ```
119
+
120
+ ## 5. Use Cases
121
+
122
+ - **Regulatory Compliance & Fraud Detection**
123
+ Auditors and regulators can monitor DeFi markets for wash trades and suspicious dumps.
124
+
125
+ - **Investment Strategy Optimization**
126
+ Traders gain insight into institutional flows and can calibrate entry/exit points.
127
+
128
+ - **Market Research & Analysis**
129
+ Researchers study whale behavior to gauge token health and potential volatility.
130
+
131
+ - **DeFi Protocol Security Monitoring**
132
+ Protocol teams receive alerts on large dumps that may destabilize liquidity pools.
133
+
134
+ - **Token Project Risk Assessment**
135
+ Token issuers review top-holder actions to flag governance or distribution issues.
app.py ADDED
@@ -0,0 +1,719 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import numpy as np
4
+ import plotly.express as px
5
+ import plotly.graph_objects as go
6
+ import os
7
+ import json
8
+ import logging
9
+ import time
10
+ from datetime import datetime, timedelta
11
+ from typing import Dict, List, Optional, Union, Any
12
+ from dotenv import load_dotenv
13
+
14
+ # Configure logging - Reduce verbosity and improve performance
15
+ logging.basicConfig(
16
+ level=logging.WARNING, # Only show warnings and errors by default
17
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
18
+ )
19
+
20
+ # Create a custom filter to suppress repetitive Gemini API errors
21
+ class SuppressRepetitiveErrors(logging.Filter):
22
+ def __init__(self):
23
+ super().__init__()
24
+ self.error_counts = {}
25
+ self.max_errors = 3 # Show at most 3 instances of each error
26
+
27
+ def filter(self, record):
28
+ if record.levelno < logging.WARNING:
29
+ return True
30
+
31
+ # If it's a Gemini API error for non-existent tokens, suppress it after a few occurrences
32
+ if 'Error fetching historical prices from Gemini API' in record.getMessage():
33
+ key = 'gemini_api_error'
34
+ self.error_counts[key] = self.error_counts.get(key, 0) + 1
35
+
36
+ # Only allow the first few errors through
37
+ return self.error_counts[key] <= self.max_errors
38
+
39
+ return True
40
+
41
+ # Apply the filter
42
+ logging.getLogger().addFilter(SuppressRepetitiveErrors())
43
+
44
+ from modules.api_client import ArbiscanClient, GeminiClient
45
+ from modules.data_processor import DataProcessor
46
+ from modules.visualizer import Visualizer
47
+ from modules.detection import ManipulationDetector
48
+
49
+ # Load environment variables
50
+ load_dotenv()
51
+
52
+ # Set page configuration
53
+ st.set_page_config(
54
+ page_title="Whale Wallet AI - Market Manipulation Detection",
55
+ page_icon="🐳",
56
+ layout="wide",
57
+ initial_sidebar_state="expanded"
58
+ )
59
+
60
+ # Add custom CSS
61
+ st.markdown("""
62
+ <style>
63
+ .main-header {
64
+ font-size: 2.5rem;
65
+ color: #1E88E5;
66
+ text-align: center;
67
+ margin-bottom: 1rem;
68
+ }
69
+ .sub-header {
70
+ font-size: 1.5rem;
71
+ color: #424242;
72
+ margin-bottom: 1rem;
73
+ }
74
+ .info-text {
75
+ background-color: #E3F2FD;
76
+ padding: 1rem;
77
+ border-radius: 0.5rem;
78
+ margin-bottom: 1rem;
79
+ }
80
+ .stButton>button {
81
+ width: 100%;
82
+ }
83
+ </style>
84
+ """, unsafe_allow_html=True)
85
+
86
+ # Initialize Streamlit session state for persisting data between tab navigation
87
+ if 'transactions_data' not in st.session_state:
88
+ st.session_state.transactions_data = pd.DataFrame()
89
+
90
+ if 'patterns_data' not in st.session_state:
91
+ st.session_state.patterns_data = None
92
+
93
+ if 'price_impact_data' not in st.session_state:
94
+ st.session_state.price_impact_data = None
95
+
96
+ # Performance metrics tracking
97
+ if 'performance_metrics' not in st.session_state:
98
+ st.session_state.performance_metrics = {
99
+ 'api_calls': 0,
100
+ 'data_processing_time': 0,
101
+ 'visualization_time': 0,
102
+ 'last_refresh': None
103
+ }
104
+
105
+ # Function to track performance
106
+ def track_timing(category: str):
107
+ def timing_decorator(func):
108
+ def wrapper(*args, **kwargs):
109
+ start_time = time.time()
110
+ result = func(*args, **kwargs)
111
+ elapsed = time.time() - start_time
112
+
113
+ if category in st.session_state.performance_metrics:
114
+ st.session_state.performance_metrics[category] += elapsed
115
+ else:
116
+ st.session_state.performance_metrics[category] = elapsed
117
+
118
+ return result
119
+ return wrapper
120
+ return timing_decorator
121
+
122
+ if 'alerts_data' not in st.session_state:
123
+ st.session_state.alerts_data = None
124
+
125
+ # Initialize API clients
126
+ arbiscan_client = ArbiscanClient(os.getenv("ARBISCAN_API_KEY"))
127
+ # Set debug mode to False to reduce log output
128
+ arbiscan_client.verbose_debug = False
129
+ gemini_client = GeminiClient(os.getenv("GEMINI_API_KEY"))
130
+
131
+ # Initialize data processor and visualizer
132
+ data_processor = DataProcessor()
133
+ visualizer = Visualizer()
134
+
135
+ # Apply performance tracking to key instance methods after initialization
136
+ original_fetch_whale = arbiscan_client.fetch_whale_transactions
137
+ arbiscan_client.fetch_whale_transactions = track_timing('api_calls')(original_fetch_whale)
138
+
139
+ original_identify_patterns = data_processor.identify_patterns
140
+ data_processor.identify_patterns = track_timing('data_processing_time')(original_identify_patterns)
141
+
142
+ original_analyze_price_impact = data_processor.analyze_price_impact
143
+ data_processor.analyze_price_impact = track_timing('data_processing_time')(original_analyze_price_impact)
144
+ detection = ManipulationDetector()
145
+
146
+ # Initialize crew system (for AI-assisted analysis)
147
+ try:
148
+ from modules.crew_system import WhaleAnalysisCrewSystem
149
+ crew_system = WhaleAnalysisCrewSystem(arbiscan_client, gemini_client, data_processor)
150
+ CREW_ENABLED = True
151
+ logging.info("CrewAI system loaded successfully")
152
+ except Exception as e:
153
+ CREW_ENABLED = False
154
+ logging.error(f"Failed to load CrewAI system: {str(e)}")
155
+ st.sidebar.error("CrewAI features are disabled due to an error.")
156
+
157
+ # Sidebar for inputs
158
+ st.sidebar.header("Configuration")
159
+
160
+ # Wallet tracking section
161
+ st.sidebar.subheader("Track Wallets")
162
+ wallet_addresses = st.sidebar.text_area(
163
+ "Enter wallet addresses (one per line)",
164
+ placeholder="0x1234abcd...\n0xabcd1234..."
165
+ )
166
+
167
+ threshold_type = st.sidebar.radio(
168
+ "Threshold Type",
169
+ ["Token Amount", "USD Value"]
170
+ )
171
+
172
+ if threshold_type == "Token Amount":
173
+ threshold_value = st.sidebar.number_input("Minimum Token Amount", min_value=0.0, value=1000.0)
174
+ token_symbol = st.sidebar.text_input("Token Symbol", placeholder="ETH")
175
+ else:
176
+ threshold_value = st.sidebar.number_input("Minimum USD Value", min_value=0.0, value=100000.0)
177
+
178
+ # Time period selection
179
+ st.sidebar.subheader("Time Period")
180
+ time_period = st.sidebar.selectbox(
181
+ "Select Time Period",
182
+ ["Last 24 hours", "Last 7 days", "Last 30 days", "Custom"]
183
+ )
184
+
185
+ if time_period == "Custom":
186
+ start_date = st.sidebar.date_input("Start Date", datetime.now() - timedelta(days=7))
187
+ end_date = st.sidebar.date_input("End Date", datetime.now())
188
+ else:
189
+ # Calculate dates based on selection
190
+ end_date = datetime.now()
191
+ if time_period == "Last 24 hours":
192
+ start_date = end_date - timedelta(days=1)
193
+ elif time_period == "Last 7 days":
194
+ start_date = end_date - timedelta(days=7)
195
+ else: # Last 30 days
196
+ start_date = end_date - timedelta(days=30)
197
+
198
+ # Manipulation detection settings
199
+ st.sidebar.subheader("Manipulation Detection")
200
+ enable_manipulation_detection = st.sidebar.toggle("Enable Manipulation Detection", value=True)
201
+ if enable_manipulation_detection:
202
+ sensitivity = st.sidebar.select_slider(
203
+ "Detection Sensitivity",
204
+ options=["Low", "Medium", "High"],
205
+ value="Medium"
206
+ )
207
+
208
+ # Price impact analysis settings
209
+ st.sidebar.subheader("Price Impact Analysis")
210
+ enable_price_impact = st.sidebar.toggle("Enable Price Impact Analysis", value=True)
211
+ if enable_price_impact:
212
+ lookback_minutes = st.sidebar.slider("Lookback (minutes)", 1, 60, 5)
213
+ lookahead_minutes = st.sidebar.slider("Lookahead (minutes)", 1, 60, 5)
214
+
215
+ # Action buttons
216
+ track_button = st.sidebar.button("Track Transactions", type="primary")
217
+ pattern_button = st.sidebar.button("Analyze Patterns")
218
+ if enable_manipulation_detection:
219
+ detect_button = st.sidebar.button("Detect Manipulation")
220
+
221
+ # Main content area
222
+ tab1, tab2, tab3, tab4, tab5 = st.tabs([
223
+ "Transactions", "Patterns", "Price Impact", "Alerts", "Reports"
224
+ ])
225
+
226
+ with tab1:
227
+ st.header("Whale Transactions")
228
+ if track_button and wallet_addresses:
229
+ with st.spinner("Fetching whale transactions..."):
230
+ # Function to track whale transactions
231
+ def track_whale_transactions(wallets, start_date, end_date, threshold_value, threshold_type, token_symbol=None):
232
+ # Direct API call since CrewAI is temporarily disabled
233
+ try:
234
+ min_token_amount = None
235
+ min_usd_value = None
236
+ if threshold_type == "Token Amount":
237
+ min_token_amount = threshold_value
238
+ else:
239
+ min_usd_value = threshold_value
240
+
241
+ # Add pagination control to prevent infinite API requests
242
+ max_pages = 5 # Limit the number of pages to prevent excessive API calls
243
+ transactions = arbiscan_client.fetch_whale_transactions(
244
+ addresses=wallets,
245
+ min_token_amount=min_token_amount,
246
+ max_pages=5,
247
+ min_usd_value=min_usd_value
248
+ )
249
+
250
+ if transactions.empty:
251
+ st.warning("No transactions found for the specified addresses")
252
+
253
+ return transactions
254
+ except Exception as e:
255
+ st.error(f"Error fetching transactions: {str(e)}")
256
+ return pd.DataFrame()
257
+
258
+ wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
259
+
260
+ # Use cached data or fetch new if not available
261
+ if st.session_state.transactions_data is None or track_button:
262
+ with st.spinner("Fetching transactions..."):
263
+ transactions = track_whale_transactions(
264
+ wallets=wallet_list,
265
+ start_date=start_date,
266
+ end_date=end_date,
267
+ threshold_value=threshold_value,
268
+ threshold_type=threshold_type,
269
+ token_symbol=token_symbol
270
+ )
271
+ # Store in session state
272
+ st.session_state.transactions_data = transactions
273
+ else:
274
+ transactions = st.session_state.transactions_data
275
+
276
+ if not transactions.empty:
277
+ st.success(f"Found {len(transactions)} transactions matching your criteria")
278
+
279
+ # Display transactions
280
+ if len(transactions) > 0:
281
+ st.dataframe(transactions, use_container_width=True)
282
+
283
+ # Add download button
284
+ csv = transactions.to_csv(index=False).encode('utf-8')
285
+ st.download_button(
286
+ "Download Transactions CSV",
287
+ csv,
288
+ "whale_transactions.csv",
289
+ "text/csv",
290
+ key='download-csv'
291
+ )
292
+
293
+ # Volume by day chart
294
+ st.subheader("Transaction Volume by Day")
295
+ try:
296
+ st.plotly_chart(visualizer.plot_volume_by_day(transactions), use_container_width=True)
297
+ except Exception as e:
298
+ st.error(f"Error generating volume chart: {str(e)}")
299
+
300
+ # Transaction flow visualization
301
+ st.subheader("Transaction Flow")
302
+ try:
303
+ flow_chart = visualizer.plot_transaction_flow(transactions)
304
+ st.plotly_chart(flow_chart, use_container_width=True)
305
+ except Exception as e:
306
+ st.error(f"Error generating flow chart: {str(e)}")
307
+ else:
308
+ st.warning("No transactions found matching your criteria. Try adjusting the parameters.")
309
+ else:
310
+ st.info("Enter wallet addresses and click 'Track Transactions' to view whale activity")
311
+
312
+ with tab2:
313
+ st.header("Trading Patterns")
314
+ if track_button and wallet_addresses:
315
+ with st.spinner("Analyzing trading patterns..."):
316
+ # Function to analyze trading patterns
317
+ def analyze_trading_patterns(wallets, start_date, end_date):
318
+ # Direct analysis
319
+ try:
320
+ transactions_df = arbiscan_client.fetch_whale_transactions(addresses=wallets, max_pages=5)
321
+ if transactions_df.empty:
322
+ st.warning("No transactions found for the specified addresses")
323
+ return []
324
+
325
+ return data_processor.identify_patterns(transactions_df)
326
+ except Exception as e:
327
+ st.error(f"Error analyzing trading patterns: {str(e)}")
328
+ return []
329
+
330
+ wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
331
+
332
+ # Use cached data or fetch new if not available
333
+ if st.session_state.patterns_data is None or track_button:
334
+ with st.spinner("Analyzing trading patterns..."):
335
+ patterns = analyze_trading_patterns(
336
+ wallets=wallet_list,
337
+ start_date=start_date,
338
+ end_date=end_date
339
+ )
340
+ # Store in session state
341
+ st.session_state.patterns_data = patterns
342
+ else:
343
+ patterns = st.session_state.patterns_data
344
+
345
+ if patterns:
346
+ for i, pattern in enumerate(patterns):
347
+ pattern_card = st.container()
348
+ with pattern_card:
349
+ # Pattern header with name and risk profile
350
+ header_cols = st.columns([3, 1])
351
+ with header_cols[0]:
352
+ st.subheader(f"Pattern {i+1}: {pattern['name']}")
353
+ with header_cols[1]:
354
+ risk_color = "green"
355
+ if pattern.get('risk_profile') == "Medium":
356
+ risk_color = "orange"
357
+ elif pattern.get('risk_profile') in ["High", "Very High"]:
358
+ risk_color = "red"
359
+ st.markdown(f"<h5 style='color:{risk_color};'>Risk: {pattern.get('risk_profile', 'Unknown')}</h5>", unsafe_allow_html=True)
360
+
361
+ # Pattern description and details
362
+ st.markdown(f"**Description:** {pattern['description']}")
363
+
364
+ # Additional strategy information
365
+ if 'strategy' in pattern:
366
+ st.markdown(f"**Strategy:** {pattern['strategy']}")
367
+
368
+ # Time insight
369
+ if 'time_insight' in pattern:
370
+ st.info(pattern['time_insight'])
371
+
372
+ # Metrics
373
+ metric_cols = st.columns(3)
374
+ with metric_cols[0]:
375
+ st.markdown(f"**Occurrences:** {pattern['occurrence_count']} instances")
376
+ with metric_cols[1]:
377
+ st.markdown(f"**Confidence:** {pattern.get('confidence', 0):.2f}")
378
+ with metric_cols[2]:
379
+ st.markdown(f"**Volume:** {pattern.get('volume_metric', 'N/A')}")
380
+
381
+ # Display main chart first
382
+ if 'charts' in pattern and 'main' in pattern['charts']:
383
+ st.plotly_chart(pattern['charts']['main'], use_container_width=True)
384
+ elif 'chart_data' in pattern and pattern['chart_data'] is not None: # Fallback for old format
385
+ st.plotly_chart(pattern['chart_data'], use_container_width=True)
386
+
387
+ # Create two columns for additional charts
388
+ if 'charts' in pattern and len(pattern['charts']) > 1:
389
+ charts_col1, charts_col2 = st.columns(2)
390
+
391
+ # Hourly distribution chart
392
+ if 'hourly_distribution' in pattern['charts']:
393
+ with charts_col1:
394
+ st.plotly_chart(pattern['charts']['hourly_distribution'], use_container_width=True)
395
+
396
+ # Value distribution chart
397
+ if 'value_distribution' in pattern['charts']:
398
+ with charts_col2:
399
+ st.plotly_chart(pattern['charts']['value_distribution'], use_container_width=True)
400
+
401
+ # Advanced metrics in expander
402
+ if 'metrics' in pattern and pattern['metrics']:
403
+ with st.expander("Detailed Metrics"):
404
+ metrics_table = []
405
+ for k, v in pattern['metrics'].items():
406
+ if v is not None:
407
+ if isinstance(v, float):
408
+ metrics_table.append([k.replace('_', ' ').title(), f"{v:.4f}"])
409
+ else:
410
+ metrics_table.append([k.replace('_', ' ').title(), v])
411
+
412
+ if metrics_table:
413
+ st.table(pd.DataFrame(metrics_table, columns=["Metric", "Value"]))
414
+
415
+ # Display example transactions
416
+ if 'examples' in pattern and not pattern['examples'].empty:
417
+ with st.expander("Example Transactions"):
418
+ # Format the dataframe for better display
419
+ display_df = pattern['examples'].copy()
420
+ # Convert timestamp to readable format if needed
421
+ if 'timeStamp' in display_df.columns and not pd.api.types.is_datetime64_any_dtype(display_df['timeStamp']):
422
+ display_df['timeStamp'] = pd.to_datetime(display_df['timeStamp'], unit='s')
423
+
424
+ st.dataframe(display_df, use_container_width=True)
425
+
426
+ st.markdown("---")
427
+ else:
428
+ st.info("No significant trading patterns detected. Try expanding the date range or adding more addresses.")
429
+ else:
430
+ st.info("Track transactions to analyze trading patterns")
431
+
432
+ with tab3:
433
+ st.header("Price Impact Analysis")
434
+ if enable_price_impact and track_button and wallet_addresses:
435
+ with st.spinner("Analyzing price impact..."):
436
+ # Function to analyze price impact
437
+ def analyze_price_impact(wallets, start_date, end_date, lookback_minutes, lookahead_minutes):
438
+ # Direct analysis
439
+ transactions_df = arbiscan_client.fetch_whale_transactions(addresses=wallets, max_pages=5)
440
+ # Get token from first transaction
441
+ if not transactions_df.empty:
442
+ token_symbol = transactions_df.iloc[0].get('tokenSymbol', 'ETH')
443
+ # For each transaction, get price impact
444
+ price_impacts = {}
445
+ progress_bar = st.progress(0)
446
+ for idx, row in transactions_df.iterrows():
447
+ progress = int((idx + 1) / len(transactions_df) * 100)
448
+ progress_bar.progress(progress, text=f"Analyzing transaction {idx+1} of {len(transactions_df)}")
449
+ if 'timeStamp' in row:
450
+ try:
451
+ tx_time = datetime.fromtimestamp(int(row['timeStamp']))
452
+ impact_data = gemini_client.get_price_impact(
453
+ symbol=f"{token_symbol}USD",
454
+ transaction_time=tx_time,
455
+ lookback_minutes=lookback_minutes,
456
+ lookahead_minutes=lookahead_minutes
457
+ )
458
+ price_impacts[row['hash']] = impact_data
459
+ except Exception as e:
460
+ st.warning(f"Could not get price data for transaction: {str(e)}")
461
+
462
+ progress_bar.empty()
463
+ if price_impacts:
464
+ return data_processor.analyze_price_impact(transactions_df, price_impacts)
465
+
466
+ # Create an empty chart for the default case
467
+ empty_fig = go.Figure()
468
+ empty_fig.update_layout(
469
+ title="No Price Impact Data Available",
470
+ xaxis_title="Time",
471
+ yaxis_title="Price Impact (%)",
472
+ height=400,
473
+ template="plotly_white"
474
+ )
475
+ empty_fig.add_annotation(
476
+ text="No transactions found with price impact data",
477
+ showarrow=False,
478
+ font=dict(size=14)
479
+ )
480
+
481
+ return {
482
+ "avg_impact_pct": 0,
483
+ "max_impact_pct": 0,
484
+ "min_impact_pct": 0,
485
+ "significant_moves_count": 0,
486
+ "total_transactions": 0,
487
+ "transactions_with_impact": pd.DataFrame(),
488
+ "charts": {
489
+ "main_chart": empty_fig,
490
+ "impact_distribution": empty_fig,
491
+ "cumulative_impact": empty_fig,
492
+ "hourly_impact": empty_fig
493
+ },
494
+ "insights": [],
495
+ "impact_summary": "No price impact data available"
496
+ }
497
+
498
+ wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
499
+
500
+ # Use cached data or fetch new if not available
501
+ if st.session_state.price_impact_data is None or track_button:
502
+ with st.spinner("Analyzing price impact..."):
503
+ impact_analysis = analyze_price_impact(
504
+ wallets=wallet_list,
505
+ start_date=start_date,
506
+ end_date=end_date,
507
+ lookback_minutes=lookback_minutes,
508
+ lookahead_minutes=lookahead_minutes
509
+ )
510
+ # Store in session state
511
+ st.session_state.price_impact_data = impact_analysis
512
+ else:
513
+ impact_analysis = st.session_state.price_impact_data
514
+
515
+ if impact_analysis:
516
+ # Display impact summary
517
+ if 'impact_summary' in impact_analysis:
518
+ st.info(impact_analysis['impact_summary'])
519
+
520
+ # Summary metrics in two rows
521
+ metrics_row1 = st.columns(4)
522
+ with metrics_row1[0]:
523
+ st.metric("Avg. Price Impact (%)", f"{impact_analysis.get('avg_impact_pct', 0):.2f}%")
524
+ with metrics_row1[1]:
525
+ st.metric("Max Impact (%)", f"{impact_analysis.get('max_impact_pct', 0):.2f}%")
526
+ with metrics_row1[2]:
527
+ st.metric("Min Impact (%)", f"{impact_analysis.get('min_impact_pct', 0):.2f}%")
528
+ with metrics_row1[3]:
529
+ st.metric("Std Dev (%)", f"{impact_analysis.get('std_impact_pct', 0):.2f}%")
530
+
531
+ metrics_row2 = st.columns(4)
532
+ with metrics_row2[0]:
533
+ st.metric("Significant Moves", impact_analysis.get('significant_moves_count', 0))
534
+ with metrics_row2[1]:
535
+ st.metric("High Impact Moves", impact_analysis.get('high_impact_moves_count', 0))
536
+ with metrics_row2[2]:
537
+ st.metric("Positive/Negative", f"{impact_analysis.get('positive_impacts_count', 0)}/{impact_analysis.get('negative_impacts_count', 0)}")
538
+ with metrics_row2[3]:
539
+ st.metric("Total Transactions", impact_analysis.get('total_transactions', 0))
540
+
541
+ # Display insights if available
542
+ if 'insights' in impact_analysis and impact_analysis['insights']:
543
+ st.subheader("Key Insights")
544
+ for insight in impact_analysis['insights']:
545
+ st.markdown(f"**{insight['title']}**: {insight['description']}")
546
+
547
+ # Display the main chart
548
+ if 'charts' in impact_analysis and 'main_chart' in impact_analysis['charts']:
549
+ st.subheader("Price Impact Over Time")
550
+ st.plotly_chart(impact_analysis['charts']['main_chart'], use_container_width=True)
551
+
552
+ # Create two columns for secondary charts
553
+ col1, col2 = st.columns(2)
554
+
555
+ # Distribution chart
556
+ if 'charts' in impact_analysis and 'impact_distribution' in impact_analysis['charts']:
557
+ with col1:
558
+ st.plotly_chart(impact_analysis['charts']['impact_distribution'], use_container_width=True)
559
+
560
+ # Cumulative impact chart
561
+ if 'charts' in impact_analysis and 'cumulative_impact' in impact_analysis['charts']:
562
+ with col2:
563
+ st.plotly_chart(impact_analysis['charts']['cumulative_impact'], use_container_width=True)
564
+
565
+ # Hourly impact chart
566
+ if 'charts' in impact_analysis and 'hourly_impact' in impact_analysis['charts']:
567
+ st.plotly_chart(impact_analysis['charts']['hourly_impact'], use_container_width=True)
568
+
569
+ # Detailed transactions with impact
570
+ if not impact_analysis['transactions_with_impact'].empty:
571
+ st.subheader("Transactions with Price Impact")
572
+ # Convert numeric columns to have 2 decimal places for better display
573
+ display_df = impact_analysis['transactions_with_impact'].copy()
574
+ for col in ['impact_pct', 'pre_price', 'post_price', 'cumulative_impact']:
575
+ if col in display_df.columns:
576
+ display_df[col] = display_df[col].apply(lambda x: f"{float(x):.2f}%" if pd.notnull(x) else "N/A")
577
+
578
+ st.dataframe(display_df, use_container_width=True)
579
+ else:
580
+ st.info("No transaction-specific price impact data available")
581
+ else:
582
+ st.info("No price impact data available for the given parameters")
583
+ else:
584
+ st.info("Enable Price Impact Analysis and track transactions to see price effects")
585
+
586
+ with tab4:
587
+ st.header("Manipulation Alerts")
588
+ if enable_manipulation_detection and detect_button and wallet_addresses:
589
+ with st.spinner("Detecting potential manipulation..."):
590
+ wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
591
+
592
+ # Function to detect manipulation
593
+ def detect_manipulation(wallets, start_date, end_date, sensitivity):
594
+ try:
595
+ transactions_df = arbiscan_client.fetch_whale_transactions(addresses=wallets, max_pages=5)
596
+ if transactions_df.empty:
597
+ st.warning("No transactions found for the specified addresses")
598
+ return []
599
+
600
+ pump_dump = detection.detect_pump_and_dump(transactions_df, sensitivity)
601
+ wash_trades = detection.detect_wash_trading(transactions_df, wallets, sensitivity)
602
+ return pump_dump + wash_trades
603
+ except Exception as e:
604
+ st.error(f"Error detecting manipulation: {str(e)}")
605
+ return []
606
+
607
+ alerts = detect_manipulation(
608
+ wallets=wallet_list,
609
+ start_date=start_date,
610
+ end_date=end_date,
611
+ sensitivity=sensitivity
612
+ )
613
+
614
+ if alerts:
615
+ for i, alert in enumerate(alerts):
616
+ alert_color = "red" if alert['risk_level'] == "High" else "orange" if alert['risk_level'] == "Medium" else "blue"
617
+
618
+ with st.expander(f" {alert['type']} - Risk: {alert['risk_level']}", expanded=i==0):
619
+ st.markdown(f"<h4 style='color:{alert_color}'>{alert['title']}</h4>", unsafe_allow_html=True)
620
+ st.write(f"**Description:** {alert['description']}")
621
+ st.write(f"**Detection Time:** {alert['detection_time']}")
622
+ st.write(f"**Involved Addresses:** {', '.join(alert['addresses'])}")
623
+
624
+ # Display evidence
625
+ if 'evidence' in alert and alert['evidence'] is not None and not (isinstance(alert['evidence'], pd.DataFrame) and alert['evidence'].empty):
626
+ st.subheader("Evidence")
627
+ try:
628
+ evidence_df = alert['evidence']
629
+ if isinstance(evidence_df, str):
630
+ # Try to convert from JSON string if needed
631
+ evidence_df = pd.read_json(evidence_df)
632
+ st.dataframe(evidence_df, use_container_width=True)
633
+ except Exception as e:
634
+ st.error(f"Error displaying evidence: {str(e)}")
635
+
636
+ # Display chart if available
637
+ if 'chart' in alert and alert['chart'] is not None:
638
+ try:
639
+ st.plotly_chart(alert['chart'], use_container_width=True)
640
+ except Exception as e:
641
+ st.error(f"Error displaying chart: {str(e)}")
642
+ else:
643
+ st.success("No manipulation tactics detected for the given parameters")
644
+ else:
645
+ st.info("Enable Manipulation Detection and click 'Detect Manipulation' to scan for suspicious activity")
646
+
647
+ with tab5:
648
+ st.header("Reports & Visualizations")
649
+
650
+ # Report type selection
651
+ report_type = st.selectbox(
652
+ "Select Report Type",
653
+ ["Transaction Summary", "Pattern Analysis", "Price Impact", "Manipulation Detection", "Complete Analysis"]
654
+ )
655
+
656
+ # Export format
657
+ export_format = st.radio(
658
+ "Export Format",
659
+ ["CSV", "PDF", "PNG"],
660
+ horizontal=True
661
+ )
662
+
663
+ # Generate report button
664
+ if st.button("Generate Report"):
665
+ if wallet_addresses:
666
+ with st.spinner("Generating report..."):
667
+ wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
668
+
669
+ if CREW_ENABLED and crew_system is not None:
670
+ try:
671
+ with st.spinner("Generating AI analysis report..."):
672
+ # Check if crew_system has llm attribute defined
673
+ if not hasattr(crew_system, 'llm') or crew_system.llm is None:
674
+ raise ValueError("LLM not initialized in crew system")
675
+
676
+ report = crew_system.generate_market_manipulation_report(wallet_addresses=wallet_list)
677
+ st.markdown(f"## AI Analysis Report")
678
+ st.markdown(report['content'])
679
+
680
+ if 'charts' in report and report['charts']:
681
+ for i, chart in enumerate(report['charts']):
682
+ st.plotly_chart(chart, use_container_width=True)
683
+ except Exception as e:
684
+ st.error(f"CrewAI report generation failed: {str(e)}")
685
+ st.warning("Using direct analysis instead")
686
+
687
+ # Fallback to direct analysis
688
+ with st.spinner("Generating basic analysis..."):
689
+ insights = detection.generate_manipulation_insights(transactions=st.session_state.transactions_data)
690
+ st.markdown(f"## Potential Manipulation Insights")
691
+
692
+ for insight in insights:
693
+ st.markdown(f"**{insight['title']}**\n{insight['description']}")
694
+ else:
695
+ st.error("Failed to generate report: CrewAI is not enabled")
696
+ else:
697
+ st.error("Please enter wallet addresses to generate a report")
698
+
699
+ # Footer with instructions
700
+ st.markdown("---")
701
+ with st.expander("How to Use"):
702
+ st.markdown("""
703
+ ### Typical Workflow
704
+
705
+ 1. **Input wallet addresses** in the sidebar - these are the whale wallets you want to track
706
+ 2. **Set the minimum threshold** for transaction size (token amount or USD value)
707
+ 3. **Select time period** for analysis
708
+ 4. **Click 'Track Transactions'** to see large transfers for these wallets
709
+ 5. **Enable additional analysis** like pattern recognition or manipulation detection
710
+ 6. **Export reports** for further analysis or record-keeping
711
+
712
+ ### API Keys
713
+
714
+ This app requires two API keys to function properly:
715
+ - **ARBISCAN_API_KEY** - For accessing Arbitrum blockchain data
716
+ - **GEMINI_API_KEY** - For real-time token price data
717
+
718
+ These should be stored in a `.env` file in the project root.
719
+ """)
modules/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+
modules/__pycache__/__init__.cpython-312.pyc ADDED
Binary file (157 Bytes). View file
 
modules/__pycache__/api_client.cpython-312.pyc ADDED
Binary file (30.1 kB). View file
 
modules/__pycache__/crew_system.cpython-312.pyc ADDED
Binary file (36.2 kB). View file
 
modules/__pycache__/crew_tools.cpython-312.pyc ADDED
Binary file (18.3 kB). View file
 
modules/__pycache__/data_processor.cpython-312.pyc ADDED
Binary file (44.1 kB). View file
 
modules/__pycache__/detection.cpython-312.pyc ADDED
Binary file (22.3 kB). View file
 
modules/__pycache__/visualizer.cpython-312.pyc ADDED
Binary file (23.2 kB). View file
 
modules/api_client.py ADDED
@@ -0,0 +1,768 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+ import json
3
+ import time
4
+ import logging
5
+ from datetime import datetime
6
+ import pandas as pd
7
+ from typing import Dict, List, Optional, Union, Any
8
+
9
+ class ArbiscanClient:
10
+ """
11
+ Client to interact with the Arbiscan API for fetching on-chain data from Arbitrum
12
+ """
13
+
14
+ def __init__(self, api_key: str):
15
+ self.api_key = api_key
16
+ self.base_url = "https://api.arbiscan.io/api"
17
+ self.rate_limit_delay = 0.2 # Delay between API calls to avoid rate limiting (200ms)
18
+
19
+ # Add caching to improve performance
20
+ self._transaction_cache = {}
21
+ self._last_api_call_time = 0
22
+
23
+ # Configure debug logging - set to True for verbose output, False for minimal output
24
+ self.verbose_debug = False
25
+
26
+ def _make_request(self, params: Dict[str, str]) -> Dict[str, Any]:
27
+ """
28
+ Make a request to the Arbiscan API with rate limiting
29
+ """
30
+ params["apikey"] = self.api_key
31
+
32
+ # Implement rate limiting
33
+ current_time = time.time()
34
+ time_since_last_call = current_time - self._last_api_call_time
35
+ if time_since_last_call < self.rate_limit_delay:
36
+ time.sleep(self.rate_limit_delay - time_since_last_call)
37
+ self._last_api_call_time = time.time()
38
+
39
+ try:
40
+ # Log the request details but only in verbose mode
41
+ if self.verbose_debug:
42
+ debug_params = params.copy()
43
+ debug_params.pop("apikey", None)
44
+ logging.debug(f"API Request: {self.base_url}")
45
+ logging.debug(f"Params: {json.dumps(debug_params, indent=2)}")
46
+
47
+ response = requests.get(self.base_url, params=params)
48
+
49
+ # Print response status and URL only in verbose mode
50
+ if self.verbose_debug:
51
+ logging.debug(f"Response Status: {response.status_code}")
52
+ logging.debug(f"Full URL: {response.url.replace(self.api_key, 'API_KEY_REDACTED')}")
53
+
54
+ response.raise_for_status()
55
+
56
+ # Parse the JSON response
57
+ json_data = response.json()
58
+
59
+ # Log the response structure but only in verbose mode
60
+ if self.verbose_debug:
61
+ result_preview = str(json_data.get('result', ''))[:100] + '...' if len(str(json_data.get('result', ''))) > 100 else str(json_data.get('result', ''))
62
+ logging.debug(f"Response Status: {json_data.get('status')}")
63
+ logging.debug(f"Response Message: {json_data.get('message', 'No message')}")
64
+ logging.debug(f"Result Preview: {result_preview}")
65
+
66
+ # Check for API-level errors in the response
67
+ status = json_data.get('status')
68
+ message = json_data.get('message', 'No message')
69
+ if status == '0' and message != 'No transactions found':
70
+ logging.warning(f"API Error: {message}")
71
+
72
+ return json_data
73
+
74
+ except requests.exceptions.HTTPError as e:
75
+ logging.error(f"HTTP Error in API Request: {e.response.status_code}")
76
+ raise
77
+
78
+ except requests.exceptions.ConnectionError as e:
79
+ logging.error(f"Connection Error in API Request: {str(e)}")
80
+ raise
81
+
82
+ except requests.exceptions.Timeout as e:
83
+ logging.error(f"Timeout in API Request: {str(e)}")
84
+ raise
85
+
86
+ except requests.exceptions.RequestException as e:
87
+ logging.error(f"API Request failed: {str(e)}")
88
+ print(f"ERROR - URL: {self.base_url}")
89
+ print(f"ERROR - Method: {params.get('module')}/{params.get('action')}")
90
+ return {"status": "0", "message": f"Error: {str(e)}", "result": []}
91
+
92
+ def get_eth_balance(self, address: str) -> float:
93
+ """
94
+ Get the ETH balance of an address
95
+
96
+ Args:
97
+ address: Wallet address
98
+
99
+ Returns:
100
+ ETH balance as a float
101
+ """
102
+ params = {
103
+ "module": "account",
104
+ "action": "balance",
105
+ "address": address,
106
+ "tag": "latest"
107
+ }
108
+
109
+ result = self._make_request(params)
110
+
111
+ if result.get("status") == "1":
112
+ # Convert wei to ETH
113
+ wei_balance = int(result.get("result", "0"))
114
+ eth_balance = wei_balance / 10**18
115
+ return eth_balance
116
+ else:
117
+ return 0.0
118
+
119
+ def get_token_balance(self, address: str, token_address: str) -> float:
120
+ """
121
+ Get the token balance of an address for a specific token
122
+
123
+ Args:
124
+ address: Wallet address
125
+ token_address: Token contract address
126
+
127
+ Returns:
128
+ Token balance as a float
129
+ """
130
+ params = {
131
+ "module": "account",
132
+ "action": "tokenbalance",
133
+ "address": address,
134
+ "contractaddress": token_address,
135
+ "tag": "latest"
136
+ }
137
+
138
+ result = self._make_request(params)
139
+
140
+ if result.get("status") == "1":
141
+ # Get token decimals and convert to proper amount
142
+ decimals = self.get_token_decimals(token_address)
143
+ raw_balance = int(result.get("result", "0"))
144
+ token_balance = raw_balance / 10**decimals
145
+ return token_balance
146
+ else:
147
+ return 0.0
148
+
149
+ def get_token_decimals(self, token_address: str) -> int:
150
+ """
151
+ Get the number of decimals for a token
152
+
153
+ Args:
154
+ token_address: Token contract address
155
+
156
+ Returns:
157
+ Number of decimals (default: 18)
158
+ """
159
+ params = {
160
+ "module": "token",
161
+ "action": "getToken",
162
+ "contractaddress": token_address
163
+ }
164
+
165
+ result = self._make_request(params)
166
+
167
+ if result.get("status") == "1":
168
+ token_info = result.get("result", {})
169
+ return int(token_info.get("divisor", "18"))
170
+ else:
171
+ # Default to 18 decimals (most ERC-20 tokens)
172
+ return 18
173
+
174
+ def get_token_transfers(self,
175
+ address: str,
176
+ contract_address: Optional[str] = None,
177
+ start_block: int = 0,
178
+ end_block: int = 99999999,
179
+ page: int = 1,
180
+ offset: int = 100,
181
+ sort: str = "desc") -> List[Dict[str, Any]]:
182
+ """
183
+ Get token transfers for an address
184
+
185
+ Args:
186
+ address: Wallet address
187
+ contract_address: Optional token contract address to filter by
188
+ start_block: Starting block number
189
+ end_block: Ending block number
190
+ page: Page number
191
+ offset: Number of results per page
192
+ sort: Sort order ("asc" or "desc")
193
+
194
+ Returns:
195
+ List of token transfers
196
+ """
197
+ params = {
198
+ "module": "account",
199
+ "action": "tokentx",
200
+ "address": address,
201
+ "startblock": str(start_block),
202
+ "endblock": str(end_block),
203
+ "page": str(page),
204
+ "offset": str(offset),
205
+ "sort": sort
206
+ }
207
+
208
+ # Add contract address if specified
209
+ if contract_address:
210
+ params["contractaddress"] = contract_address
211
+
212
+ result = self._make_request(params)
213
+
214
+ if result.get("status") == "1":
215
+ return result.get("result", [])
216
+ else:
217
+ message = result.get("message", "Unknown error")
218
+ if "No transactions found" in message:
219
+ return []
220
+ else:
221
+ logging.warning(f"Error fetching token transfers: {message}")
222
+ return []
223
+
224
+ def fetch_all_token_transfers(self,
225
+ address: str,
226
+ contract_address: Optional[str] = None,
227
+ start_block: int = 0,
228
+ end_block: int = 99999999,
229
+ max_pages: int = 10) -> List[Dict[str, Any]]:
230
+ """
231
+ Fetch all token transfers for an address, paginating through results
232
+
233
+ Args:
234
+ address: Wallet address
235
+ contract_address: Optional token contract address to filter by
236
+ start_block: Starting block number
237
+ end_block: Ending block number
238
+ max_pages: Maximum number of pages to fetch
239
+
240
+ Returns:
241
+ List of all token transfers
242
+ """
243
+ all_transfers = []
244
+ offset = 100 # Results per page (API limit)
245
+
246
+ for page in range(1, max_pages + 1):
247
+ try:
248
+ transfers = self.get_token_transfers(
249
+ address=address,
250
+ contract_address=contract_address,
251
+ start_block=start_block,
252
+ end_block=end_block,
253
+ page=page,
254
+ offset=offset
255
+ )
256
+
257
+ # No more transfers, break the loop
258
+ if not transfers:
259
+ break
260
+
261
+ all_transfers.extend(transfers)
262
+
263
+ # If we got fewer results than the offset, we've reached the end
264
+ if len(transfers) < offset:
265
+ break
266
+
267
+ except Exception as e:
268
+ logging.error(f"Error fetching page {page} of token transfers: {str(e)}")
269
+ break
270
+
271
+ return all_transfers
272
+
273
+ def fetch_whale_transactions(self,
274
+ addresses: List[str],
275
+ token_address: Optional[str] = None,
276
+ min_token_amount: Optional[float] = None,
277
+ min_usd_value: Optional[float] = None,
278
+ start_block: int = 0,
279
+ end_block: int = 99999999,
280
+ max_pages: int = 10) -> pd.DataFrame:
281
+ """
282
+ Fetch whale transactions for a list of addresses
283
+
284
+ Args:
285
+ addresses: List of wallet addresses
286
+ token_address: Optional token contract address to filter by
287
+ min_token_amount: Minimum token amount to be considered a whale transaction
288
+ min_usd_value: Minimum USD value to be considered a whale transaction
289
+ start_block: Starting block number
290
+ end_block: Ending block number
291
+ max_pages: Maximum number of pages to fetch per address (default: 10)
292
+
293
+ Returns:
294
+ DataFrame of whale transactions
295
+ """
296
+ try:
297
+ # Create a cache key based on parameters
298
+ cache_key = f"{','.join(addresses)}_{token_address}_{min_token_amount}_{min_usd_value}_{start_block}_{end_block}_{max_pages}"
299
+
300
+ # Check if we have cached results
301
+ if cache_key in self._transaction_cache:
302
+ logging.info(f"Using cached transactions for {len(addresses)} addresses")
303
+ return self._transaction_cache[cache_key]
304
+
305
+ all_transfers = []
306
+
307
+ logging.info(f"Fetching whale transactions for {len(addresses)} addresses")
308
+ logging.info(f"Token address filter: {token_address if token_address else 'None'}")
309
+ logging.info(f"Min token amount: {min_token_amount}")
310
+ logging.info(f"Min USD value: {min_usd_value}")
311
+
312
+ for i, address in enumerate(addresses):
313
+ try:
314
+ logging.info(f"Processing address {i+1}/{len(addresses)}: {address}")
315
+
316
+ # Create address-specific cache key
317
+ addr_cache_key = f"{address}_{token_address}_{start_block}_{end_block}_{max_pages}"
318
+
319
+ # Check if we have cached results for this specific address
320
+ if addr_cache_key in self._transaction_cache:
321
+ transfers = self._transaction_cache[addr_cache_key]
322
+ logging.info(f"Using cached {len(transfers)} transfers for address {address}")
323
+ else:
324
+ transfers = self.fetch_all_token_transfers(
325
+ address=address,
326
+ contract_address=token_address,
327
+ start_block=start_block,
328
+ end_block=end_block,
329
+ max_pages=max_pages
330
+ )
331
+ logging.info(f"Found {len(transfers)} transfers for address {address}")
332
+ # Cache the results for this address
333
+ self._transaction_cache[addr_cache_key] = transfers
334
+
335
+ all_transfers.extend(transfers)
336
+ except Exception as e:
337
+ logging.error(f"Failed to fetch transactions for address {address}: {str(e)}")
338
+ continue
339
+
340
+ logging.info(f"Total transfers found: {len(all_transfers)}")
341
+
342
+ if not all_transfers:
343
+ logging.warning("No whale transactions found for the specified addresses")
344
+ return pd.DataFrame()
345
+
346
+ # Convert to DataFrame
347
+ logging.info("Converting transfers to DataFrame")
348
+ df = pd.DataFrame(all_transfers)
349
+
350
+ # Log the column names
351
+ logging.info(f"DataFrame created with {len(df)} rows and {len(df.columns)} columns")
352
+ logging.info(f"Columns: {', '.join(df.columns[:5])}...")
353
+
354
+ # Apply token amount filter if specified
355
+ if min_token_amount is not None:
356
+ logging.info(f"Applying min token amount filter: {min_token_amount}")
357
+ # Convert to float and then filter
358
+ df['tokenAmount'] = df['value'].astype(float) / (10 ** df['tokenDecimal'].astype(int))
359
+ df = df[df['tokenAmount'] >= min_token_amount]
360
+ logging.info(f"After token amount filtering: {len(df)}/{len(all_transfers)} rows remain")
361
+
362
+ # Apply USD value filter if specified (this would require price data)
363
+ if min_usd_value is not None and 'tokenAmount' in df.columns:
364
+ logging.info(f"USD value filtering is not implemented yet")
365
+ # This would require token price data, which we don't have yet
366
+ # df = df[df['usd_value'] >= min_usd_value]
367
+
368
+ # Convert timestamp to datetime
369
+ if 'timeStamp' in df.columns:
370
+ logging.info("Converting timestamp to datetime")
371
+ try:
372
+ df['timeStamp'] = pd.to_datetime(df['timeStamp'].astype(float), unit='s')
373
+ except Exception as e:
374
+ logging.error(f"Error converting timestamp: {str(e)}")
375
+
376
+ logging.info(f"Final DataFrame has {len(df)} rows")
377
+
378
+ # Cache the final result
379
+ self._transaction_cache[cache_key] = df
380
+
381
+ return df
382
+
383
+ except Exception as e:
384
+ logging.error(f"Error fetching whale transactions: {str(e)}")
385
+ return pd.DataFrame()
386
+
387
+ def get_internal_transactions(self,
388
+ address: str,
389
+ start_block: int = 0,
390
+ end_block: int = 99999999,
391
+ page: int = 1,
392
+ offset: int = 100,
393
+ sort: str = "desc") -> List[Dict[str, Any]]:
394
+ """
395
+ Get internal transactions for an address
396
+
397
+ Args:
398
+ address: Wallet address
399
+ start_block: Starting block number
400
+ end_block: Ending block number
401
+ page: Page number
402
+ offset: Number of results per page
403
+ sort: Sort order ("asc" or "desc")
404
+
405
+ Returns:
406
+ List of internal transactions
407
+ """
408
+ params = {
409
+ "module": "account",
410
+ "action": "txlistinternal",
411
+ "address": address,
412
+ "startblock": str(start_block),
413
+ "endblock": str(end_block),
414
+ "page": str(page),
415
+ "offset": str(offset),
416
+ "sort": sort
417
+ }
418
+
419
+ result = self._make_request(params)
420
+
421
+ if result.get("status") == "1":
422
+ return result.get("result", [])
423
+ else:
424
+ message = result.get("message", "Unknown error")
425
+ if "No transactions found" in message:
426
+ return []
427
+ else:
428
+ logging.warning(f"Error fetching internal transactions: {message}")
429
+ return []
430
+
431
+
432
+ class GeminiClient:
433
+ """
434
+ Client to interact with the Gemini API for fetching token prices
435
+ """
436
+
437
+ def __init__(self, api_key: str):
438
+ self.api_key = api_key
439
+ self.base_url = "https://api.gemini.com/v1"
440
+ # Add caching to avoid repetitive API calls
441
+ self._price_cache = {}
442
+ # Track API errors to avoid flooding logs
443
+ self._error_count = {}
444
+ self._last_api_call = 0 # For rate limiting
445
+
446
+ def get_current_price(self, symbol: str) -> Optional[float]:
447
+ """
448
+ Get the current price of a token
449
+
450
+ Args:
451
+ symbol: Token symbol (e.g., "ETHUSD")
452
+
453
+ Returns:
454
+ Current price as a float or None if not found
455
+ """
456
+ try:
457
+ url = f"{self.base_url}/pubticker/{symbol}"
458
+ response = requests.get(url)
459
+ response.raise_for_status()
460
+ data = response.json()
461
+ return float(data.get("last", 0))
462
+ except requests.exceptions.RequestException as e:
463
+ logging.error(f"Error fetching price from Gemini API: {e}")
464
+ return None
465
+
466
+ def get_historical_prices(self,
467
+ symbol: str,
468
+ start_time: datetime,
469
+ end_time: datetime) -> Optional[pd.DataFrame]:
470
+ """
471
+ Get historical prices for a token within a time range
472
+
473
+ Args:
474
+ symbol: Token symbol (e.g., "ETHUSD")
475
+ start_time: Start datetime
476
+ end_time: End datetime
477
+
478
+ Returns:
479
+ DataFrame of historical prices with timestamps
480
+ """
481
+ # Implement simple rate limiting
482
+ current_time = time.time()
483
+ if current_time - self._last_api_call < 0.05: # 50ms minimum between calls
484
+ time.sleep(0.05)
485
+ self._last_api_call = current_time
486
+
487
+ # Create a cache key based on the parameters
488
+ cache_key = f"{symbol}_{int(start_time.timestamp())}_{int(end_time.timestamp())}"
489
+
490
+ # Check if we already have this data cached
491
+ if cache_key in self._price_cache:
492
+ return self._price_cache[cache_key]
493
+
494
+ try:
495
+ # Convert datetime to milliseconds
496
+ start_ms = int(start_time.timestamp() * 1000)
497
+ end_ms = int(end_time.timestamp() * 1000)
498
+
499
+ url = f"{self.base_url}/trades/{symbol}"
500
+ params = {
501
+ "limit_trades": 500,
502
+ "timestamp": start_ms
503
+ }
504
+
505
+ # Check if we've seen too many errors for this symbol
506
+ error_key = f"error_{symbol}"
507
+ if self._error_count.get(error_key, 0) > 10:
508
+ # If we've already had too many errors for this symbol, don't try again
509
+ return None
510
+
511
+ response = requests.get(url, params=params)
512
+ response.raise_for_status()
513
+ trades = response.json()
514
+
515
+ # Reset error count on success
516
+ self._error_count[error_key] = 0
517
+
518
+ # Filter trades within the time range
519
+ filtered_trades = [
520
+ trade for trade in trades
521
+ if start_ms <= trade.get("timestampms", 0) <= end_ms
522
+ ]
523
+
524
+ if not filtered_trades:
525
+ # Cache negative result to avoid future lookups
526
+ self._price_cache[cache_key] = None
527
+ return None
528
+
529
+ # Convert to DataFrame
530
+ df = pd.DataFrame(filtered_trades)
531
+
532
+ # Convert timestamp to datetime
533
+ df['timestamp'] = pd.to_datetime(df['timestampms'], unit='ms')
534
+
535
+ # Select and rename columns
536
+ result_df = df[['timestamp', 'price', 'amount']].copy()
537
+ result_df.columns = ['Timestamp', 'Price', 'Amount']
538
+
539
+ # Convert price to float
540
+ result_df['Price'] = result_df['Price'].astype(float)
541
+
542
+ # Cache the result
543
+ self._price_cache[cache_key] = result_df
544
+ return result_df
545
+
546
+ except requests.exceptions.HTTPError as e:
547
+ # Handle HTTP errors more efficiently
548
+ self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
549
+
550
+ # Only log the first few occurrences of each error
551
+ if self._error_count[error_key] <= 3:
552
+ logging.warning(f"HTTP error fetching price for {symbol}: {e.response.status_code}")
553
+ return None
554
+
555
+ except Exception as e:
556
+ # For other errors, use a similar approach
557
+ self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
558
+
559
+ if self._error_count[error_key] <= 3:
560
+ logging.error(f"Error fetching prices for {symbol}: {str(e)}")
561
+ return None
562
+
563
+ def get_price_at_time(self,
564
+ symbol: str,
565
+ timestamp: datetime) -> Optional[float]:
566
+ """
567
+ Get the approximate price of a token at a specific time
568
+
569
+ Args:
570
+ symbol: Token symbol (e.g., "ETHUSD")
571
+ timestamp: Target datetime
572
+
573
+ Returns:
574
+ Price at the specified time as a float or None if not found
575
+ """
576
+ # Look for prices 5 minutes before and after the target time
577
+ start_time = timestamp - pd.Timedelta(minutes=5)
578
+ end_time = timestamp + pd.Timedelta(minutes=5)
579
+
580
+ prices_df = self.get_historical_prices(symbol, start_time, end_time)
581
+
582
+ if prices_df is None or prices_df.empty:
583
+ return None
584
+
585
+ # Find the closest price
586
+ prices_df['time_diff'] = abs(prices_df['Timestamp'] - timestamp)
587
+ closest_price = prices_df.loc[prices_df['time_diff'].idxmin(), 'Price']
588
+
589
+ return closest_price
590
+
591
+ def get_price_impact(self,
592
+ symbol: str,
593
+ transaction_time: datetime,
594
+ lookback_minutes: int = 5,
595
+ lookahead_minutes: int = 5) -> Dict[str, Any]:
596
+ """
597
+ Analyze the price impact before and after a transaction
598
+
599
+ Args:
600
+ symbol: Token symbol (e.g., "ETHUSD")
601
+ transaction_time: Transaction datetime
602
+ lookback_minutes: Minutes to look back before the transaction
603
+ lookahead_minutes: Minutes to look ahead after the transaction
604
+
605
+ Returns:
606
+ Dictionary with price impact metrics
607
+ """
608
+ start_time = transaction_time - pd.Timedelta(minutes=lookback_minutes)
609
+ end_time = transaction_time + pd.Timedelta(minutes=lookahead_minutes)
610
+
611
+ prices_df = self.get_historical_prices(symbol, start_time, end_time)
612
+
613
+ if prices_df is None or prices_df.empty:
614
+ return {
615
+ "pre_price": None,
616
+ "post_price": None,
617
+ "impact_pct": None,
618
+ "prices_df": None
619
+ }
620
+
621
+ # Find pre and post transaction prices
622
+ pre_prices = prices_df[prices_df['Timestamp'] < transaction_time]
623
+ post_prices = prices_df[prices_df['Timestamp'] >= transaction_time]
624
+
625
+ pre_price = pre_prices['Price'].iloc[-1] if not pre_prices.empty else None
626
+ post_price = post_prices['Price'].iloc[0] if not post_prices.empty else None
627
+
628
+ # Calculate impact percentage
629
+ impact_pct = None
630
+ if pre_price is not None and post_price is not None:
631
+ impact_pct = ((post_price - pre_price) / pre_price) * 100
632
+
633
+ return {
634
+ "pre_price": pre_price,
635
+ "post_price": post_price,
636
+ "impact_pct": impact_pct,
637
+ "prices_df": prices_df
638
+ }
639
+
640
+ def fetch_historical_prices(self, token_symbol: str, timestamp) -> Dict[str, Any]:
641
+ """Fetch historical price data for a token at a specific timestamp
642
+
643
+ Args:
644
+ token_symbol: Token symbol (e.g., "ETH")
645
+ timestamp: Timestamp (can be int, float, datetime, or pandas Timestamp)
646
+
647
+ Returns:
648
+ Dictionary with price data
649
+ """
650
+ # Convert timestamp to integer if it's not already
651
+ timestamp_value = 0
652
+ try:
653
+ # Handle different timestamp types
654
+ if isinstance(timestamp, (int, float)):
655
+ timestamp_value = int(timestamp)
656
+ elif isinstance(timestamp, pd.Timestamp):
657
+ timestamp_value = int(timestamp.timestamp())
658
+ elif isinstance(timestamp, datetime):
659
+ timestamp_value = int(timestamp.timestamp())
660
+ elif isinstance(timestamp, str):
661
+ # Try to parse string as timestamp
662
+ dt = pd.to_datetime(timestamp)
663
+ timestamp_value = int(dt.timestamp())
664
+ else:
665
+ # Default to current time if invalid type
666
+ logging.warning(f"Invalid timestamp type: {type(timestamp)}, using current time")
667
+ timestamp_value = int(time.time())
668
+ except Exception as e:
669
+ logging.warning(f"Error converting timestamp {timestamp}: {str(e)}, using current time")
670
+ timestamp_value = int(time.time())
671
+
672
+ # Check cache first
673
+ cache_key = f"{token_symbol}_{timestamp_value}"
674
+ if cache_key in self._price_cache:
675
+ return self._price_cache[cache_key]
676
+
677
+ # Implement rate limiting
678
+ current_time = time.time()
679
+ if current_time - self._last_api_call < 0.05: # 50ms minimum between calls
680
+ time.sleep(0.05)
681
+ self._last_api_call = current_time
682
+
683
+ # Check error count for this symbol
684
+ error_key = f"error_{token_symbol}"
685
+ if self._error_count.get(error_key, 0) > 10:
686
+ # Too many errors, return cached failure
687
+ return {
688
+ 'symbol': token_symbol,
689
+ 'timestamp': timestamp_value,
690
+ 'price': None,
691
+ 'status': 'error',
692
+ 'error': 'Too many previous errors'
693
+ }
694
+
695
+ try:
696
+ url = f"{self.base_url}/trades/{token_symbol}USD"
697
+ params = {
698
+ 'limit_trades': 500,
699
+ 'timestamp': timestamp_value * 1000 # Convert to milliseconds
700
+ }
701
+
702
+ response = requests.get(url, params=params)
703
+ response.raise_for_status()
704
+ data = response.json()
705
+
706
+ # Reset error count on success
707
+ self._error_count[error_key] = 0
708
+
709
+ # Calculate average price from recent trades
710
+ if data:
711
+ prices = [float(trade['price']) for trade in data]
712
+ avg_price = sum(prices) / len(prices)
713
+ result = {
714
+ 'symbol': token_symbol,
715
+ 'timestamp': timestamp_value,
716
+ 'price': avg_price,
717
+ 'status': 'success'
718
+ }
719
+ # Cache success
720
+ self._price_cache[cache_key] = result
721
+ return result
722
+ else:
723
+ result = {
724
+ 'symbol': token_symbol,
725
+ 'timestamp': timestamp_value,
726
+ 'price': None,
727
+ 'status': 'no_data'
728
+ }
729
+ # Cache no data
730
+ self._price_cache[cache_key] = result
731
+ return result
732
+
733
+ except requests.exceptions.HTTPError as e:
734
+ # Handle HTTP errors efficiently
735
+ self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
736
+
737
+ # Only log first few occurrences
738
+ if self._error_count[error_key] <= 3:
739
+ logging.warning(f"HTTP error fetching price for {token_symbol}: {e.response.status_code}")
740
+ elif self._error_count[error_key] == 10:
741
+ logging.warning(f"Suppressing further logs for {token_symbol} errors")
742
+
743
+ result = {
744
+ 'symbol': token_symbol,
745
+ 'timestamp': timestamp,
746
+ 'price': None,
747
+ 'status': 'error',
748
+ 'error': f"HTTP {e.response.status_code}"
749
+ }
750
+ self._price_cache[cache_key] = result
751
+ return result
752
+
753
+ except Exception as e:
754
+ # For other errors
755
+ self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
756
+
757
+ if self._error_count[error_key] <= 3:
758
+ logging.error(f"Error fetching prices for {token_symbol}: {str(e)}")
759
+
760
+ result = {
761
+ 'symbol': token_symbol,
762
+ 'timestamp': timestamp_value,
763
+ 'price': None,
764
+ 'status': 'error',
765
+ 'error': str(e)
766
+ }
767
+ self._price_cache[cache_key] = result
768
+ return result
modules/crew_system.py ADDED
@@ -0,0 +1,1117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import logging
3
+ from typing import Dict, List, Optional, Union, Any, Tuple
4
+ import pandas as pd
5
+ from datetime import datetime, timedelta
6
+ import io
7
+ import base64
8
+
9
+ from crewai import Agent, Task, Crew, Process
10
+ from langchain.tools import BaseTool
11
+ from langchain.chat_models import ChatOpenAI
12
+
13
+ from modules.api_client import ArbiscanClient, GeminiClient
14
+ from modules.data_processor import DataProcessor
15
+ from modules.crew_tools import (
16
+ ArbiscanGetTokenTransfersTool,
17
+ ArbiscanGetNormalTransactionsTool,
18
+ ArbiscanGetInternalTransactionsTool,
19
+ ArbiscanFetchWhaleTransactionsTool,
20
+ GeminiGetCurrentPriceTool,
21
+ GeminiGetHistoricalPricesTool,
22
+ DataProcessorIdentifyPatternsTool,
23
+ DataProcessorDetectAnomalousTransactionsTool,
24
+ set_global_clients
25
+ )
26
+
27
+
28
+ class WhaleAnalysisCrewSystem:
29
+ """
30
+ CrewAI system for analyzing whale wallet activity and detecting market manipulation
31
+ """
32
+
33
+ def __init__(self, arbiscan_client: ArbiscanClient, gemini_client: GeminiClient, data_processor: DataProcessor):
34
+ self.arbiscan_client = arbiscan_client
35
+ self.gemini_client = gemini_client
36
+ self.data_processor = data_processor
37
+
38
+ # Initialize LLM
39
+ try:
40
+ from langchain.chat_models import ChatOpenAI
41
+ self.llm = ChatOpenAI(
42
+ model="gpt-4",
43
+ temperature=0.2,
44
+ api_key=os.getenv("OPENAI_API_KEY")
45
+ )
46
+ except Exception as e:
47
+ logging.warning(f"Could not initialize LLM: {str(e)}")
48
+ self.llm = None
49
+
50
+ # Use a factory method to safely create tool instances
51
+ self.setup_tools()
52
+
53
+ def setup_tools(self):
54
+ """Setup LangChain tools for the whale analysis crew"""
55
+ try:
56
+ # Setup clients
57
+ arbiscan_client = ArbiscanClient(api_key=os.getenv("ARBISCAN_API_KEY"))
58
+ gemini_client = GeminiClient(api_key=os.getenv("GEMINI_API_KEY"))
59
+ data_processor = DataProcessor()
60
+
61
+ # Set global clients first
62
+ set_global_clients(
63
+ arbiscan_client=arbiscan_client,
64
+ gemini_client=gemini_client,
65
+ data_processor=data_processor
66
+ )
67
+
68
+ # Create tools (no need to pass clients, they'll use globals)
69
+ self.arbiscan_tools = [
70
+ self._create_tool(ArbiscanGetTokenTransfersTool),
71
+ self._create_tool(ArbiscanGetNormalTransactionsTool),
72
+ self._create_tool(ArbiscanGetInternalTransactionsTool),
73
+ self._create_tool(ArbiscanFetchWhaleTransactionsTool)
74
+ ]
75
+
76
+ self.gemini_tools = [
77
+ self._create_tool(GeminiGetCurrentPriceTool),
78
+ self._create_tool(GeminiGetHistoricalPricesTool)
79
+ ]
80
+
81
+ self.data_processor_tools = [
82
+ self._create_tool(DataProcessorIdentifyPatternsTool),
83
+ self._create_tool(DataProcessorDetectAnomalousTransactionsTool)
84
+ ]
85
+
86
+ logging.info(f"Successfully created {len(self.arbiscan_tools + self.gemini_tools + self.data_processor_tools)} tools")
87
+
88
+ except Exception as e:
89
+ logging.error(f"Error setting up tools: {str(e)}")
90
+ raise Exception(f"Error setting up tools: {str(e)}")
91
+
92
+ def _create_tool(self, tool_class, *args, **kwargs):
93
+ """Factory method to safely create a tool with proper error handling"""
94
+ try:
95
+ tool = tool_class(*args, **kwargs)
96
+ return tool
97
+ except Exception as e:
98
+ logging.error(f"Failed to create tool {tool_class.__name__}: {str(e)}")
99
+ raise Exception(f"Failed to create tool {tool_class.__name__}: {str(e)}")
100
+
101
+ def create_agents(self):
102
+ """Create the agents for the crew"""
103
+
104
+ # Data Collection Agent
105
+ data_collector = Agent(
106
+ role="Blockchain Data Collector",
107
+ goal="Collect comprehensive whale transaction data from the blockchain",
108
+ backstory="""You are a blockchain analytics expert specialized in extracting and
109
+ organizing on-chain data from the Arbitrum network. You have deep knowledge of blockchain
110
+ transaction structures and can efficiently query APIs to gather relevant whale activity.""",
111
+ verbose=True,
112
+ allow_delegation=True,
113
+ tools=self.arbiscan_tools,
114
+ llm=self.llm
115
+ )
116
+
117
+ # Price Analysis Agent
118
+ price_analyst = Agent(
119
+ role="Price Impact Analyst",
120
+ goal="Analyze how whale transactions impact token prices",
121
+ backstory="""You are a quantitative market analyst with expertise in correlating
122
+ trading activity with price movements. You specialize in detecting how large trades
123
+ influence market dynamics, and can identify unusual price patterns.""",
124
+ verbose=True,
125
+ allow_delegation=True,
126
+ tools=self.gemini_tools,
127
+ llm=self.llm
128
+ )
129
+
130
+ # Pattern Detection Agent
131
+ pattern_detector = Agent(
132
+ role="Trading Pattern Detector",
133
+ goal="Identify recurring behavior patterns in whale trading activity",
134
+ backstory="""You are a data scientist specialized in time-series analysis and behavioral
135
+ pattern recognition. You excel at spotting cyclical behaviors, correlation patterns, and
136
+ anomalous trading activities across multiple addresses.""",
137
+ verbose=True,
138
+ allow_delegation=True,
139
+ tools=self.data_processor_tools,
140
+ llm=self.llm
141
+ )
142
+
143
+ # Manipulation Detector Agent
144
+ manipulation_detector = Agent(
145
+ role="Market Manipulation Investigator",
146
+ goal="Detect potential market manipulation in whale activity",
147
+ backstory="""You are a financial forensics expert who has studied market manipulation
148
+ techniques for years. You can identify pump-and-dump schemes, wash trading, spoofing,
149
+ and other deceptive practices used by whale traders to manipulate market prices.""",
150
+ verbose=True,
151
+ allow_delegation=True,
152
+ tools=self.data_processor_tools,
153
+ llm=self.llm
154
+ )
155
+
156
+ # Report Generator Agent
157
+ report_generator = Agent(
158
+ role="Insights Reporter",
159
+ goal="Create comprehensive, actionable reports on whale activity",
160
+ backstory="""You are a financial data storyteller who excels at transforming complex
161
+ blockchain data into clear, insightful narratives. You can distill technical findings
162
+ into actionable intelligence for different audiences.""",
163
+ verbose=True,
164
+ allow_delegation=True,
165
+ tools=[],
166
+ llm=self.llm
167
+ )
168
+
169
+ return {
170
+ "data_collector": data_collector,
171
+ "price_analyst": price_analyst,
172
+ "pattern_detector": pattern_detector,
173
+ "manipulation_detector": manipulation_detector,
174
+ "report_generator": report_generator
175
+ }
176
+
177
+ def track_large_transactions(self,
178
+ wallets: List[str],
179
+ start_date: datetime,
180
+ end_date: datetime,
181
+ threshold_value: float,
182
+ threshold_type: str,
183
+ token_symbol: Optional[str] = None) -> pd.DataFrame:
184
+ """
185
+ Track large buy/sell transactions for specified wallets
186
+
187
+ Args:
188
+ wallets: List of wallet addresses to track
189
+ start_date: Start date for analysis
190
+ end_date: End date for analysis
191
+ threshold_value: Minimum value for transaction tracking
192
+ threshold_type: Type of threshold ("Token Amount" or "USD Value")
193
+ token_symbol: Symbol of token to track (only required if threshold_type is "Token Amount")
194
+
195
+ Returns:
196
+ DataFrame of large transactions
197
+ """
198
+ agents = self.create_agents()
199
+
200
+ # Define tasks
201
+ data_collection_task = Task(
202
+ description=f"""
203
+ Collect all transactions for the following wallets: {', '.join(wallets)}
204
+ between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
205
+
206
+ Filter for transactions {'of ' + token_symbol if token_symbol else ''} with a
207
+ {'token amount greater than ' + str(threshold_value) if threshold_type == 'Token Amount'
208
+ else 'USD value greater than $' + str(threshold_value)}.
209
+
210
+ Return the data in a well-structured format with timestamp, transaction hash,
211
+ sender, recipient, token symbol, and amount.
212
+ """,
213
+ agent=agents["data_collector"],
214
+ expected_output="""
215
+ A comprehensive dataset of all large transactions for the specified wallets,
216
+ properly filtered according to the threshold criteria.
217
+ """
218
+ )
219
+
220
+ # Create and run the crew
221
+ crew = Crew(
222
+ agents=[agents["data_collector"]],
223
+ tasks=[data_collection_task],
224
+ verbose=2,
225
+ process=Process.sequential
226
+ )
227
+
228
+ result = crew.kickoff()
229
+
230
+ # Process the result
231
+ import json
232
+ try:
233
+ # Try to extract JSON from the result
234
+ import re
235
+ json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
236
+
237
+ if json_match:
238
+ json_str = json_match.group(1)
239
+ transactions_data = json.loads(json_str)
240
+
241
+ if isinstance(transactions_data, list):
242
+ return pd.DataFrame(transactions_data)
243
+ else:
244
+ return pd.DataFrame()
245
+ else:
246
+ # Try to parse the entire result as JSON
247
+ transactions_data = json.loads(result)
248
+
249
+ if isinstance(transactions_data, list):
250
+ return pd.DataFrame(transactions_data)
251
+ else:
252
+ return pd.DataFrame()
253
+ except:
254
+ # Fallback to querying the API directly
255
+ token_address = None # Would need a mapping of symbol to address
256
+
257
+ transactions_df = self.arbiscan_client.fetch_whale_transactions(
258
+ addresses=wallets,
259
+ token_address=token_address,
260
+ min_token_amount=threshold_value if threshold_type == "Token Amount" else None,
261
+ min_usd_value=threshold_value if threshold_type == "USD Value" else None
262
+ )
263
+
264
+ return transactions_df
265
+
266
+ def identify_trading_patterns(self,
267
+ wallets: List[str],
268
+ start_date: datetime,
269
+ end_date: datetime) -> List[Dict[str, Any]]:
270
+ """
271
+ Identify trading patterns for specified wallets
272
+
273
+ Args:
274
+ wallets: List of wallet addresses to analyze
275
+ start_date: Start date for analysis
276
+ end_date: End date for analysis
277
+
278
+ Returns:
279
+ List of identified patterns
280
+ """
281
+ agents = self.create_agents()
282
+
283
+ # Define tasks
284
+ data_collection_task = Task(
285
+ description=f"""
286
+ Collect all transactions for the following wallets: {', '.join(wallets)}
287
+ between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
288
+
289
+ Include all token transfers, regardless of size.
290
+ """,
291
+ agent=agents["data_collector"],
292
+ expected_output="""
293
+ A comprehensive dataset of all transactions for the specified wallets.
294
+ """
295
+ )
296
+
297
+ pattern_analysis_task = Task(
298
+ description="""
299
+ Analyze the transaction data to identify recurring trading patterns.
300
+ Look for:
301
+ 1. Cyclical buying/selling behaviors
302
+ 2. Time-of-day patterns
303
+ 3. Accumulation/distribution phases
304
+ 4. Coordinated movements across multiple addresses
305
+
306
+ Cluster similar behaviors and describe each pattern identified.
307
+ """,
308
+ agent=agents["pattern_detector"],
309
+ expected_output="""
310
+ A detailed analysis of trading patterns with:
311
+ - Pattern name/type
312
+ - Description of behavior
313
+ - Frequency and confidence level
314
+ - Example transactions showing the pattern
315
+ """,
316
+ context=[data_collection_task]
317
+ )
318
+
319
+ # Create and run the crew
320
+ crew = Crew(
321
+ agents=[agents["data_collector"], agents["pattern_detector"]],
322
+ tasks=[data_collection_task, pattern_analysis_task],
323
+ verbose=2,
324
+ process=Process.sequential
325
+ )
326
+
327
+ result = crew.kickoff()
328
+
329
+ # Process the result
330
+ import json
331
+ try:
332
+ # Try to extract JSON from the result
333
+ import re
334
+ json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
335
+
336
+ if json_match:
337
+ json_str = json_match.group(1)
338
+ patterns_data = json.loads(json_str)
339
+
340
+ # Convert the patterns to the expected format
341
+ return self._convert_patterns_to_visual_format(patterns_data)
342
+ else:
343
+ # Fallback to a simple pattern analysis
344
+ # First, get transaction data directly
345
+ all_transactions = []
346
+
347
+ for wallet in wallets:
348
+ transfers = self.arbiscan_client.fetch_all_token_transfers(
349
+ address=wallet
350
+ )
351
+ all_transactions.extend(transfers)
352
+
353
+ if not all_transactions:
354
+ return []
355
+
356
+ transactions_df = pd.DataFrame(all_transactions)
357
+
358
+ # Use data processor to identify patterns
359
+ patterns = self.data_processor.identify_patterns(transactions_df)
360
+
361
+ return patterns
362
+ except Exception as e:
363
+ print(f"Error processing patterns: {str(e)}")
364
+ return []
365
+
366
+ def _convert_patterns_to_visual_format(self, patterns_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
367
+ """
368
+ Convert pattern data from agents to visual format with charts
369
+
370
+ Args:
371
+ patterns_data: Pattern data from agents
372
+
373
+ Returns:
374
+ List of patterns with visualizations
375
+ """
376
+ visual_patterns = []
377
+
378
+ for pattern in patterns_data:
379
+ # Create chart
380
+ if 'examples' in pattern and pattern['examples']:
381
+ examples_data = []
382
+
383
+ # Check if examples is a JSON string
384
+ if isinstance(pattern['examples'], str):
385
+ try:
386
+ examples_data = pd.read_json(pattern['examples'])
387
+ except:
388
+ examples_data = pd.DataFrame()
389
+ else:
390
+ examples_data = pd.DataFrame(pattern['examples'])
391
+
392
+ # Create visualization
393
+ if not examples_data.empty:
394
+ import plotly.express as px
395
+
396
+ # Check for timestamp column
397
+ if 'Timestamp' in examples_data.columns:
398
+ time_col = 'Timestamp'
399
+ elif 'timeStamp' in examples_data.columns:
400
+ time_col = 'timeStamp'
401
+ else:
402
+ time_col = None
403
+
404
+ # Check for amount column
405
+ if 'Amount' in examples_data.columns:
406
+ amount_col = 'Amount'
407
+ elif 'tokenAmount' in examples_data.columns:
408
+ amount_col = 'tokenAmount'
409
+ elif 'value' in examples_data.columns:
410
+ amount_col = 'value'
411
+ else:
412
+ amount_col = None
413
+
414
+ if time_col and amount_col:
415
+ # Create time series chart
416
+ fig = px.line(
417
+ examples_data,
418
+ x=time_col,
419
+ y=amount_col,
420
+ title=f"Pattern: {pattern['name']}"
421
+ )
422
+ else:
423
+ fig = None
424
+ else:
425
+ fig = None
426
+ else:
427
+ fig = None
428
+ examples_data = pd.DataFrame()
429
+
430
+ # Create visual pattern object
431
+ visual_pattern = {
432
+ "name": pattern.get("name", "Unknown Pattern"),
433
+ "description": pattern.get("description", ""),
434
+ "confidence": pattern.get("confidence", 0.5),
435
+ "occurrence_count": pattern.get("occurrence_count", 0),
436
+ "chart_data": fig,
437
+ "examples": examples_data
438
+ }
439
+
440
+ visual_patterns.append(visual_pattern)
441
+
442
+ return visual_patterns
443
+
444
+ def analyze_price_impact(self,
445
+ wallets: List[str],
446
+ start_date: datetime,
447
+ end_date: datetime,
448
+ lookback_minutes: int = 5,
449
+ lookahead_minutes: int = 5) -> Dict[str, Any]:
450
+ """
451
+ Analyze the impact of whale transactions on token prices
452
+
453
+ Args:
454
+ wallets: List of wallet addresses to analyze
455
+ start_date: Start date for analysis
456
+ end_date: End date for analysis
457
+ lookback_minutes: Minutes to look back before transactions
458
+ lookahead_minutes: Minutes to look ahead after transactions
459
+
460
+ Returns:
461
+ Dictionary with price impact analysis
462
+ """
463
+ agents = self.create_agents()
464
+
465
+ # Define tasks
466
+ data_collection_task = Task(
467
+ description=f"""
468
+ Collect all transactions for the following wallets: {', '.join(wallets)}
469
+ between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
470
+
471
+ Focus on large transactions that might impact price.
472
+ """,
473
+ agent=agents["data_collector"],
474
+ expected_output="""
475
+ A comprehensive dataset of all significant transactions for the specified wallets.
476
+ """
477
+ )
478
+
479
+ price_impact_task = Task(
480
+ description=f"""
481
+ Analyze the price impact of the whale transactions.
482
+ For each transaction:
483
+ 1. Fetch price data for {lookback_minutes} minutes before and {lookahead_minutes} minutes after the transaction
484
+ 2. Calculate the percentage price change
485
+ 3. Identify transactions that caused significant price moves
486
+
487
+ Summarize the overall price impact statistics and highlight notable instances.
488
+ """,
489
+ agent=agents["price_analyst"],
490
+ expected_output="""
491
+ A detailed analysis of price impacts with:
492
+ - Average price impact percentage
493
+ - Maximum price impact (positive and negative)
494
+ - Count of significant price moves
495
+ - List of transactions with their corresponding price impacts
496
+ """,
497
+ context=[data_collection_task]
498
+ )
499
+
500
+ # Create and run the crew
501
+ crew = Crew(
502
+ agents=[agents["data_collector"], agents["price_analyst"]],
503
+ tasks=[data_collection_task, price_impact_task],
504
+ verbose=2,
505
+ process=Process.sequential
506
+ )
507
+
508
+ result = crew.kickoff()
509
+
510
+ # Process the result
511
+ import json
512
+ try:
513
+ # Try to extract JSON from the result
514
+ import re
515
+ json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
516
+
517
+ if json_match:
518
+ json_str = json_match.group(1)
519
+ impact_data = json.loads(json_str)
520
+
521
+ # Convert the impact data to visual format
522
+ return self._convert_impact_to_visual_format(impact_data)
523
+ else:
524
+ # Fallback to direct calculation
525
+ # First, get transaction data
526
+ all_transactions = []
527
+
528
+ for wallet in wallets:
529
+ transfers = self.arbiscan_client.fetch_all_token_transfers(
530
+ address=wallet
531
+ )
532
+ all_transactions.extend(transfers)
533
+
534
+ if not all_transactions:
535
+ return {}
536
+
537
+ transactions_df = pd.DataFrame(all_transactions)
538
+
539
+ # Calculate price impact for each transaction
540
+ price_data = {}
541
+
542
+ for idx, row in transactions_df.iterrows():
543
+ tx_hash = row.get('hash', '')
544
+
545
+ if not tx_hash:
546
+ continue
547
+
548
+ # Get symbol
549
+ symbol = row.get('tokenSymbol', '')
550
+ if not symbol:
551
+ continue
552
+
553
+ # Get timestamp
554
+ timestamp = row.get('timeStamp', 0)
555
+ if not timestamp:
556
+ continue
557
+
558
+ # Convert timestamp to datetime
559
+ if isinstance(timestamp, (int, float)):
560
+ tx_time = datetime.fromtimestamp(int(timestamp))
561
+ else:
562
+ tx_time = timestamp
563
+
564
+ # Get price impact
565
+ symbol_usd = f"{symbol}USD"
566
+ impact = self.gemini_client.get_price_impact(
567
+ symbol=symbol_usd,
568
+ transaction_time=tx_time,
569
+ lookback_minutes=lookback_minutes,
570
+ lookahead_minutes=lookahead_minutes
571
+ )
572
+
573
+ price_data[tx_hash] = impact
574
+
575
+ # Use data processor to analyze price impact
576
+ impact_analysis = self.data_processor.analyze_price_impact(
577
+ transactions_df=transactions_df,
578
+ price_data=price_data
579
+ )
580
+
581
+ return impact_analysis
582
+ except Exception as e:
583
+ print(f"Error processing price impact: {str(e)}")
584
+ return {}
585
+
586
+ def _convert_impact_to_visual_format(self, impact_data: Dict[str, Any]) -> Dict[str, Any]:
587
+ """
588
+ Convert price impact data to visual format with charts
589
+
590
+ Args:
591
+ impact_data: Price impact data
592
+
593
+ Returns:
594
+ Dictionary with price impact analysis and visualizations
595
+ """
596
+ # Convert transactions_with_impact to DataFrame if it's a string
597
+ if 'transactions_with_impact' in impact_data and isinstance(impact_data['transactions_with_impact'], str):
598
+ try:
599
+ transactions_df = pd.read_json(impact_data['transactions_with_impact'])
600
+ except:
601
+ transactions_df = pd.DataFrame()
602
+ elif 'transactions_with_impact' in impact_data and isinstance(impact_data['transactions_with_impact'], list):
603
+ transactions_df = pd.DataFrame(impact_data['transactions_with_impact'])
604
+ else:
605
+ transactions_df = pd.DataFrame()
606
+
607
+ # Create impact chart
608
+ if not transactions_df.empty and 'impact_pct' in transactions_df.columns and 'Timestamp' in transactions_df.columns:
609
+ import plotly.graph_objects as go
610
+
611
+ fig = go.Figure()
612
+
613
+ fig.add_trace(go.Scatter(
614
+ x=transactions_df['Timestamp'],
615
+ y=transactions_df['impact_pct'],
616
+ mode='markers+lines',
617
+ name='Price Impact (%)',
618
+ marker=dict(
619
+ size=10,
620
+ color=transactions_df['impact_pct'],
621
+ colorscale='RdBu',
622
+ cmin=-max(abs(transactions_df['impact_pct'])) if len(transactions_df) > 0 else -1,
623
+ cmax=max(abs(transactions_df['impact_pct'])) if len(transactions_df) > 0 else 1,
624
+ colorbar=dict(title='Impact %'),
625
+ symbol='circle'
626
+ )
627
+ ))
628
+
629
+ fig.update_layout(
630
+ title='Price Impact of Whale Transactions',
631
+ xaxis_title='Timestamp',
632
+ yaxis_title='Price Impact (%)',
633
+ hovermode='closest'
634
+ )
635
+
636
+ # Add zero line
637
+ fig.add_hline(y=0, line_dash="dash", line_color="gray")
638
+ else:
639
+ fig = None
640
+
641
+ # Create visual impact analysis
642
+ visual_impact = {
643
+ 'avg_impact_pct': impact_data.get('avg_impact_pct', 0),
644
+ 'max_impact_pct': impact_data.get('max_impact_pct', 0),
645
+ 'min_impact_pct': impact_data.get('min_impact_pct', 0),
646
+ 'significant_moves_count': impact_data.get('significant_moves_count', 0),
647
+ 'total_transactions': impact_data.get('total_transactions', 0),
648
+ 'impact_chart': fig,
649
+ 'transactions_with_impact': transactions_df
650
+ }
651
+
652
+ return visual_impact
653
+
654
+ def detect_manipulation(self,
655
+ wallets: List[str],
656
+ start_date: datetime,
657
+ end_date: datetime,
658
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
659
+ """
660
+ Detect potential market manipulation by whale wallets
661
+
662
+ Args:
663
+ wallets: List of wallet addresses to analyze
664
+ start_date: Start date for analysis
665
+ end_date: End date for analysis
666
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
667
+
668
+ Returns:
669
+ List of manipulation alerts
670
+ """
671
+ agents = self.create_agents()
672
+
673
+ # Define tasks
674
+ data_collection_task = Task(
675
+ description=f"""
676
+ Collect all transactions for the following wallets: {', '.join(wallets)}
677
+ between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
678
+
679
+ Include all token transfers and also fetch price data if available.
680
+ """,
681
+ agent=agents["data_collector"],
682
+ expected_output="""
683
+ A comprehensive dataset of all transactions for the specified wallets.
684
+ """
685
+ )
686
+
687
+ price_impact_task = Task(
688
+ description="""
689
+ Analyze the price impact of the whale transactions.
690
+ For each significant transaction, fetch and analyze price data around the transaction time.
691
+ """,
692
+ agent=agents["price_analyst"],
693
+ expected_output="""
694
+ Price impact data for the transactions.
695
+ """,
696
+ context=[data_collection_task]
697
+ )
698
+
699
+ manipulation_detection_task = Task(
700
+ description=f"""
701
+ Detect potential market manipulation patterns in the transaction data with sensitivity level: {sensitivity}.
702
+ Look for:
703
+ 1. Pump-and-Dump: Rapid buys followed by coordinated sell-offs
704
+ 2. Wash Trading: Self-trading across multiple addresses
705
+ 3. Spoofing: Large orders placed then canceled (if detectable)
706
+ 4. Momentum Ignition: Creating sharp price moves to trigger other participants' momentum-based trading
707
+
708
+ For each potential manipulation, provide:
709
+ - Type of manipulation
710
+ - Involved addresses
711
+ - Risk level (High, Medium, Low)
712
+ - Description of the suspicious behavior
713
+ - Evidence (transactions showing the pattern)
714
+ """,
715
+ agent=agents["manipulation_detector"],
716
+ expected_output="""
717
+ A detailed list of potential manipulation incidents with supporting evidence.
718
+ """,
719
+ context=[data_collection_task, price_impact_task]
720
+ )
721
+
722
+ # Create and run the crew
723
+ crew = Crew(
724
+ agents=[
725
+ agents["data_collector"],
726
+ agents["price_analyst"],
727
+ agents["manipulation_detector"]
728
+ ],
729
+ tasks=[
730
+ data_collection_task,
731
+ price_impact_task,
732
+ manipulation_detection_task
733
+ ],
734
+ verbose=2,
735
+ process=Process.sequential
736
+ )
737
+
738
+ result = crew.kickoff()
739
+
740
+ # Process the result
741
+ import json
742
+ try:
743
+ # Try to extract JSON from the result
744
+ import re
745
+ json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
746
+
747
+ if json_match:
748
+ json_str = json_match.group(1)
749
+ alerts_data = json.loads(json_str)
750
+
751
+ # Convert the alerts to visual format
752
+ return self._convert_alerts_to_visual_format(alerts_data)
753
+ else:
754
+ # Fallback to direct detection
755
+ # First, get transaction data
756
+ all_transactions = []
757
+
758
+ for wallet in wallets:
759
+ transfers = self.arbiscan_client.fetch_all_token_transfers(
760
+ address=wallet
761
+ )
762
+ all_transactions.extend(transfers)
763
+
764
+ if not all_transactions:
765
+ return []
766
+
767
+ transactions_df = pd.DataFrame(all_transactions)
768
+
769
+ # Calculate price impact for each transaction
770
+ price_data = {}
771
+
772
+ for idx, row in transactions_df.iterrows():
773
+ tx_hash = row.get('hash', '')
774
+
775
+ if not tx_hash:
776
+ continue
777
+
778
+ # Get symbol
779
+ symbol = row.get('tokenSymbol', '')
780
+ if not symbol:
781
+ continue
782
+
783
+ # Get timestamp
784
+ timestamp = row.get('timeStamp', 0)
785
+ if not timestamp:
786
+ continue
787
+
788
+ # Convert timestamp to datetime
789
+ if isinstance(timestamp, (int, float)):
790
+ tx_time = datetime.fromtimestamp(int(timestamp))
791
+ else:
792
+ tx_time = timestamp
793
+
794
+ # Get price impact
795
+ symbol_usd = f"{symbol}USD"
796
+ impact = self.gemini_client.get_price_impact(
797
+ symbol=symbol_usd,
798
+ transaction_time=tx_time,
799
+ lookback_minutes=5,
800
+ lookahead_minutes=5
801
+ )
802
+
803
+ price_data[tx_hash] = impact
804
+
805
+ # Detect wash trading
806
+ wash_trading_alerts = self.data_processor.detect_wash_trading(
807
+ transactions_df=transactions_df,
808
+ addresses=wallets,
809
+ sensitivity=sensitivity
810
+ )
811
+
812
+ # Detect pump and dump
813
+ pump_and_dump_alerts = self.data_processor.detect_pump_and_dump(
814
+ transactions_df=transactions_df,
815
+ price_data=price_data,
816
+ sensitivity=sensitivity
817
+ )
818
+
819
+ # Combine alerts
820
+ all_alerts = wash_trading_alerts + pump_and_dump_alerts
821
+
822
+ return all_alerts
823
+ except Exception as e:
824
+ print(f"Error detecting manipulation: {str(e)}")
825
+ return []
826
+
827
+ def _convert_alerts_to_visual_format(self, alerts_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
828
+ """
829
+ Convert manipulation alerts data to visual format with charts
830
+
831
+ Args:
832
+ alerts_data: Alerts data from agents
833
+
834
+ Returns:
835
+ List of alerts with visualizations
836
+ """
837
+ visual_alerts = []
838
+
839
+ for alert in alerts_data:
840
+ # Create chart based on alert type
841
+ if 'evidence' in alert and alert['evidence']:
842
+ evidence_data = []
843
+
844
+ # Check if evidence is a JSON string
845
+ if isinstance(alert['evidence'], str):
846
+ try:
847
+ evidence_data = pd.read_json(alert['evidence'])
848
+ except:
849
+ evidence_data = pd.DataFrame()
850
+ else:
851
+ evidence_data = pd.DataFrame(alert['evidence'])
852
+
853
+ # Create visualization based on alert type
854
+ if not evidence_data.empty:
855
+ import plotly.graph_objects as go
856
+ import plotly.express as px
857
+
858
+ # Check for timestamp column
859
+ if 'Timestamp' in evidence_data.columns:
860
+ time_col = 'Timestamp'
861
+ elif 'timeStamp' in evidence_data.columns:
862
+ time_col = 'timeStamp'
863
+ elif 'timestamp' in evidence_data.columns:
864
+ time_col = 'timestamp'
865
+ else:
866
+ time_col = None
867
+
868
+ # Different visualizations based on alert type
869
+ if alert.get('type') == 'Wash Trading' and time_col:
870
+ # Create scatter plot of wash trading
871
+ fig = px.scatter(
872
+ evidence_data,
873
+ x=time_col,
874
+ y=evidence_data.get('Amount', evidence_data.get('tokenAmount', evidence_data.get('value', 0))),
875
+ color=evidence_data.get('From', evidence_data.get('from', 'Unknown')),
876
+ title=f"Wash Trading Evidence: {alert.get('title', '')}"
877
+ )
878
+ elif alert.get('type') == 'Pump and Dump' and time_col and 'pre_price' in evidence_data.columns:
879
+ # Create price line for pump and dump
880
+ fig = go.Figure()
881
+
882
+ # Plot price line
883
+ fig.add_trace(go.Scatter(
884
+ x=evidence_data[time_col],
885
+ y=evidence_data['pre_price'],
886
+ mode='lines+markers',
887
+ name='Price Before Transaction',
888
+ line=dict(color='blue')
889
+ ))
890
+
891
+ fig.add_trace(go.Scatter(
892
+ x=evidence_data[time_col],
893
+ y=evidence_data['post_price'],
894
+ mode='lines+markers',
895
+ name='Price After Transaction',
896
+ line=dict(color='red')
897
+ ))
898
+
899
+ fig.update_layout(
900
+ title=f"Pump and Dump Evidence: {alert.get('title', '')}",
901
+ xaxis_title='Time',
902
+ yaxis_title='Price',
903
+ hovermode='closest'
904
+ )
905
+ elif alert.get('type') == 'Momentum Ignition' and time_col and 'impact_pct' in evidence_data.columns:
906
+ # Create impact scatter for momentum ignition
907
+ fig = px.scatter(
908
+ evidence_data,
909
+ x=time_col,
910
+ y='impact_pct',
911
+ size=abs(evidence_data['impact_pct']),
912
+ color='impact_pct',
913
+ color_continuous_scale='RdBu',
914
+ title=f"Momentum Ignition Evidence: {alert.get('title', '')}"
915
+ )
916
+ else:
917
+ # Generic timeline view
918
+ if time_col:
919
+ fig = px.timeline(
920
+ evidence_data,
921
+ x_start=time_col,
922
+ x_end=time_col,
923
+ y=evidence_data.get('From', evidence_data.get('from', 'Unknown')),
924
+ color=alert.get('risk_level', 'Medium'),
925
+ title=f"Alert Evidence: {alert.get('title', '')}"
926
+ )
927
+ else:
928
+ fig = None
929
+ else:
930
+ fig = None
931
+ else:
932
+ fig = None
933
+ evidence_data = pd.DataFrame()
934
+
935
+ # Create visual alert object
936
+ visual_alert = {
937
+ "type": alert.get("type", "Unknown"),
938
+ "addresses": alert.get("addresses", []),
939
+ "risk_level": alert.get("risk_level", "Medium"),
940
+ "description": alert.get("description", ""),
941
+ "detection_time": alert.get("detection_time", datetime.now().strftime("%Y-%m-%d %H:%M:%S")),
942
+ "title": alert.get("title", "Alert"),
943
+ "evidence": evidence_data,
944
+ "chart": fig
945
+ }
946
+
947
+ visual_alerts.append(visual_alert)
948
+
949
+ return visual_alerts
950
+
951
+ def generate_report(self,
952
+ wallets: List[str],
953
+ start_date: datetime,
954
+ end_date: datetime,
955
+ report_type: str = "Transaction Summary",
956
+ export_format: str = "PDF") -> Dict[str, Any]:
957
+ """
958
+ Generate a report of whale activity
959
+
960
+ Args:
961
+ wallets: List of wallet addresses to include in the report
962
+ start_date: Start date for report period
963
+ end_date: End date for report period
964
+ report_type: Type of report to generate
965
+ export_format: Format for the report (CSV, PDF, PNG)
966
+
967
+ Returns:
968
+ Dictionary with report data
969
+ """
970
+ from modules.visualizer import Visualizer
971
+ visualizer = Visualizer()
972
+
973
+ agents = self.create_agents()
974
+
975
+ # Define tasks
976
+ data_collection_task = Task(
977
+ description=f"""
978
+ Collect all transactions for the following wallets: {', '.join(wallets)}
979
+ between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
980
+ """,
981
+ agent=agents["data_collector"],
982
+ expected_output="""
983
+ A comprehensive dataset of all transactions for the specified wallets.
984
+ """
985
+ )
986
+
987
+ report_task = Task(
988
+ description=f"""
989
+ Generate a {report_type} report in {export_format} format.
990
+ The report should include:
991
+ 1. Executive summary of wallet activity
992
+ 2. Transaction analysis
993
+ 3. Pattern identification (if applicable)
994
+ 4. Price impact analysis (if applicable)
995
+ 5. Manipulation detection (if applicable)
996
+
997
+ Organize the information clearly and provide actionable insights.
998
+ """,
999
+ agent=agents["report_generator"],
1000
+ expected_output=f"""
1001
+ A complete {export_format} report with all relevant analyses.
1002
+ """,
1003
+ context=[data_collection_task]
1004
+ )
1005
+
1006
+ # Create and run the crew
1007
+ crew = Crew(
1008
+ agents=[agents["data_collector"], agents["report_generator"]],
1009
+ tasks=[data_collection_task, report_task],
1010
+ verbose=2,
1011
+ process=Process.sequential
1012
+ )
1013
+
1014
+ result = crew.kickoff()
1015
+
1016
+ # Process the result - for reports, we'll use our visualizer directly
1017
+ # First, get transaction data
1018
+ all_transactions = []
1019
+
1020
+ for wallet in wallets:
1021
+ transfers = self.arbiscan_client.fetch_all_token_transfers(
1022
+ address=wallet
1023
+ )
1024
+ all_transactions.extend(transfers)
1025
+
1026
+ if not all_transactions:
1027
+ return {
1028
+ "filename": f"no_data_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.{export_format.lower()}",
1029
+ "content": ""
1030
+ }
1031
+
1032
+ transactions_df = pd.DataFrame(all_transactions)
1033
+
1034
+ # Generate the report based on format
1035
+ filename = f"whale_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
1036
+
1037
+ if export_format == "CSV":
1038
+ content = visualizer.generate_csv_report(
1039
+ transactions_df=transactions_df,
1040
+ report_type=report_type
1041
+ )
1042
+ filename += ".csv"
1043
+
1044
+ return {
1045
+ "filename": filename,
1046
+ "content": content
1047
+ }
1048
+
1049
+ elif export_format == "PDF":
1050
+ # For PDF we need to get more data
1051
+ # Run pattern detection
1052
+ patterns = self.identify_trading_patterns(
1053
+ wallets=wallets,
1054
+ start_date=start_date,
1055
+ end_date=end_date
1056
+ )
1057
+
1058
+ # Run price impact analysis
1059
+ price_impact = self.analyze_price_impact(
1060
+ wallets=wallets,
1061
+ start_date=start_date,
1062
+ end_date=end_date
1063
+ )
1064
+
1065
+ # Run manipulation detection
1066
+ alerts = self.detect_manipulation(
1067
+ wallets=wallets,
1068
+ start_date=start_date,
1069
+ end_date=end_date
1070
+ )
1071
+
1072
+ content = visualizer.generate_pdf_report(
1073
+ transactions_df=transactions_df,
1074
+ patterns=patterns,
1075
+ price_impact=price_impact,
1076
+ alerts=alerts,
1077
+ title=f"Whale Analysis Report: {report_type}",
1078
+ start_date=start_date,
1079
+ end_date=end_date
1080
+ )
1081
+ filename += ".pdf"
1082
+
1083
+ return {
1084
+ "filename": filename,
1085
+ "content": content
1086
+ }
1087
+
1088
+ elif export_format == "PNG":
1089
+ # For PNG we'll create a chart based on report type
1090
+ if report_type == "Transaction Summary":
1091
+ fig = visualizer.create_transaction_timeline(transactions_df)
1092
+ elif report_type == "Pattern Analysis":
1093
+ fig = visualizer.create_volume_chart(transactions_df)
1094
+ elif report_type == "Price Impact":
1095
+ # Run price impact analysis first
1096
+ price_impact = self.analyze_price_impact(
1097
+ wallets=wallets,
1098
+ start_date=start_date,
1099
+ end_date=end_date
1100
+ )
1101
+ fig = price_impact.get('impact_chart', visualizer.create_transaction_timeline(transactions_df))
1102
+ else: # "Manipulation Detection" or "Complete Analysis"
1103
+ fig = visualizer.create_network_graph(transactions_df)
1104
+
1105
+ content = visualizer.generate_png_chart(fig)
1106
+ filename += ".png"
1107
+
1108
+ return {
1109
+ "filename": filename,
1110
+ "content": content
1111
+ }
1112
+
1113
+ else:
1114
+ return {
1115
+ "filename": f"unsupported_format_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt",
1116
+ "content": "Unsupported export format requested."
1117
+ }
modules/crew_tools.py ADDED
@@ -0,0 +1,362 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Properly implemented tools for the WhaleAnalysisCrewSystem
3
+ """
4
+
5
+ import json
6
+ import pandas as pd
7
+ from datetime import datetime
8
+ from typing import Any, Dict, List, Optional, Type
9
+ from pydantic import BaseModel, Field
10
+ import logging
11
+
12
+ from modules.api_client import ArbiscanClient, GeminiClient
13
+ from modules.data_processor import DataProcessor
14
+ from langchain.tools import BaseTool
15
+
16
+
17
+ class GetTokenTransfersInput(BaseModel):
18
+ """Input for the get_token_transfers tool."""
19
+ address: str = Field(..., description="Wallet address to query")
20
+ contract_address: Optional[str] = Field(None, description="Optional token contract address to filter by")
21
+
22
+
23
+ # Global clients that will be used by all tools
24
+ _GLOBAL_ARBISCAN_CLIENT = None
25
+ _GLOBAL_GEMINI_CLIENT = None
26
+ _GLOBAL_DATA_PROCESSOR = None
27
+
28
+ def set_global_clients(arbiscan_client=None, gemini_client=None, data_processor=None):
29
+ """Set global client instances that will be used by all tools"""
30
+ global _GLOBAL_ARBISCAN_CLIENT, _GLOBAL_GEMINI_CLIENT, _GLOBAL_DATA_PROCESSOR
31
+ if arbiscan_client:
32
+ _GLOBAL_ARBISCAN_CLIENT = arbiscan_client
33
+ if gemini_client:
34
+ _GLOBAL_GEMINI_CLIENT = gemini_client
35
+ if data_processor:
36
+ _GLOBAL_DATA_PROCESSOR = data_processor
37
+
38
+ class ArbiscanGetTokenTransfersTool(BaseTool):
39
+ """Tool for fetching token transfers from Arbiscan."""
40
+ name = "arbiscan_get_token_transfers"
41
+ description = "Get ERC-20 token transfers for a specific address"
42
+ args_schema: Type[BaseModel] = GetTokenTransfersInput
43
+
44
+ def __init__(self, arbiscan_client=None):
45
+ super().__init__()
46
+ # Store reference to client if provided, otherwise we'll use global instance
47
+ if arbiscan_client:
48
+ set_global_clients(arbiscan_client=arbiscan_client)
49
+
50
+ def _run(self, address: str, contract_address: Optional[str] = None) -> str:
51
+ global _GLOBAL_ARBISCAN_CLIENT
52
+
53
+ if not _GLOBAL_ARBISCAN_CLIENT:
54
+ return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
55
+
56
+ try:
57
+ transfers = _GLOBAL_ARBISCAN_CLIENT.get_token_transfers(
58
+ address=address,
59
+ contract_address=contract_address
60
+ )
61
+ return json.dumps(transfers)
62
+ except Exception as e:
63
+ logging.error(f"Error in ArbiscanGetTokenTransfersTool: {str(e)}")
64
+ return json.dumps({"error": str(e)})
65
+
66
+
67
+ class GetNormalTransactionsInput(BaseModel):
68
+ """Input for the get_normal_transactions tool."""
69
+ address: str = Field(..., description="Wallet address to query")
70
+
71
+
72
+ class ArbiscanGetNormalTransactionsTool(BaseTool):
73
+ """Tool for fetching normal transactions from Arbiscan."""
74
+ name = "arbiscan_get_normal_transactions"
75
+ description = "Get normal transactions (ETH/ARB transfers) for a specific address"
76
+ args_schema: Type[BaseModel] = GetNormalTransactionsInput
77
+
78
+ def __init__(self, arbiscan_client=None):
79
+ super().__init__()
80
+ # Store reference to client if provided, otherwise we'll use global instance
81
+ if arbiscan_client:
82
+ set_global_clients(arbiscan_client=arbiscan_client)
83
+
84
+ def _run(self, address: str, startblock: int = 0, endblock: int = 99999999, page: int = 1, offset: int = 10) -> str:
85
+ global _GLOBAL_ARBISCAN_CLIENT
86
+
87
+ if not _GLOBAL_ARBISCAN_CLIENT:
88
+ return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
89
+
90
+ try:
91
+ txs = _GLOBAL_ARBISCAN_CLIENT.get_normal_transactions(
92
+ address=address,
93
+ start_block=startblock,
94
+ end_block=endblock,
95
+ page=page,
96
+ offset=offset
97
+ )
98
+ return json.dumps(txs)
99
+ except Exception as e:
100
+ logging.error(f"Error in ArbiscanGetNormalTransactionsTool: {str(e)}")
101
+ return json.dumps({"error": str(e)})
102
+
103
+
104
+ class GetInternalTransactionsInput(BaseModel):
105
+ """Input for the get_internal_transactions tool."""
106
+ address: str = Field(..., description="Wallet address to query")
107
+
108
+
109
+ class ArbiscanGetInternalTransactionsTool(BaseTool):
110
+ """Tool for fetching internal transactions from Arbiscan."""
111
+ name = "arbiscan_get_internal_transactions"
112
+ description = "Get internal transactions for a specific address"
113
+ args_schema: Type[BaseModel] = GetInternalTransactionsInput
114
+
115
+ def __init__(self, arbiscan_client=None):
116
+ super().__init__()
117
+ # Store reference to client if provided, otherwise we'll use global instance
118
+ if arbiscan_client:
119
+ set_global_clients(arbiscan_client=arbiscan_client)
120
+
121
+ def _run(self, address: str, startblock: int = 0, endblock: int = 99999999, page: int = 1, offset: int = 10) -> str:
122
+ global _GLOBAL_ARBISCAN_CLIENT
123
+
124
+ if not _GLOBAL_ARBISCAN_CLIENT:
125
+ return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
126
+
127
+ try:
128
+ txs = _GLOBAL_ARBISCAN_CLIENT.get_internal_transactions(
129
+ address=address,
130
+ start_block=startblock,
131
+ end_block=endblock,
132
+ page=page,
133
+ offset=offset
134
+ )
135
+ return json.dumps(txs)
136
+ except Exception as e:
137
+ logging.error(f"Error in ArbiscanGetInternalTransactionsTool: {str(e)}")
138
+ return json.dumps({"error": str(e)})
139
+
140
+
141
+ class FetchWhaleTransactionsInput(BaseModel):
142
+ """Input for the fetch_whale_transactions tool."""
143
+ addresses: List[str] = Field(..., description="List of wallet addresses to query")
144
+ token_address: Optional[str] = Field(None, description="Optional token contract address to filter by")
145
+ min_token_amount: Optional[float] = Field(None, description="Minimum token amount")
146
+ min_usd_value: Optional[float] = Field(None, description="Minimum USD value")
147
+
148
+
149
+ class ArbiscanFetchWhaleTransactionsTool(BaseTool):
150
+ """Tool for fetching whale transactions from Arbiscan."""
151
+ name = "arbiscan_fetch_whale_transactions"
152
+ description = "Fetch whale transactions for a list of addresses"
153
+ args_schema: Type[BaseModel] = FetchWhaleTransactionsInput
154
+
155
+ def __init__(self, arbiscan_client=None):
156
+ super().__init__()
157
+ # Store reference to client if provided, otherwise we'll use global instance
158
+ if arbiscan_client:
159
+ set_global_clients(arbiscan_client=arbiscan_client)
160
+
161
+ def _run(self, addresses: List[str], token_address: Optional[str] = None,
162
+ min_token_amount: Optional[float] = None, min_usd_value: Optional[float] = None) -> str:
163
+ global _GLOBAL_ARBISCAN_CLIENT
164
+
165
+ if not _GLOBAL_ARBISCAN_CLIENT:
166
+ return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
167
+
168
+ try:
169
+ transactions_df = _GLOBAL_ARBISCAN_CLIENT.fetch_whale_transactions(
170
+ addresses=addresses,
171
+ token_address=token_address,
172
+ min_token_amount=min_token_amount,
173
+ min_usd_value=min_usd_value,
174
+ max_pages=5 # Limit to 5 pages to prevent excessive API calls
175
+ )
176
+ return transactions_df.to_json(orient="records")
177
+ except Exception as e:
178
+ logging.error(f"Error in ArbiscanFetchWhaleTransactionsTool: {str(e)}")
179
+ return json.dumps({"error": str(e)})
180
+
181
+
182
+ class GetCurrentPriceInput(BaseModel):
183
+ """Input for the get_current_price tool."""
184
+ symbol: str = Field(..., description="Token symbol (e.g., 'ETHUSD')")
185
+
186
+
187
+ class GeminiGetCurrentPriceTool(BaseTool):
188
+ """Tool for getting current token price from Gemini."""
189
+ name = "gemini_get_current_price"
190
+ description = "Get the current price of a token"
191
+ args_schema: Type[BaseModel] = GetCurrentPriceInput
192
+
193
+ def __init__(self, gemini_client=None):
194
+ super().__init__()
195
+ # Store reference to client if provided, otherwise we'll use global instance
196
+ if gemini_client:
197
+ set_global_clients(gemini_client=gemini_client)
198
+
199
+ def _run(self, symbol: str) -> str:
200
+ global _GLOBAL_GEMINI_CLIENT
201
+
202
+ if not _GLOBAL_GEMINI_CLIENT:
203
+ return json.dumps({"error": "Gemini client not initialized. Please set global client first."})
204
+
205
+ try:
206
+ price = _GLOBAL_GEMINI_CLIENT.get_current_price(symbol)
207
+ return json.dumps({"symbol": symbol, "price": price})
208
+ except Exception as e:
209
+ logging.error(f"Error in GeminiGetCurrentPriceTool: {str(e)}")
210
+ return json.dumps({"error": str(e)})
211
+
212
+
213
+ class GetHistoricalPricesInput(BaseModel):
214
+ """Input for the get_historical_prices tool."""
215
+ symbol: str = Field(..., description="Token symbol (e.g., 'ETHUSD')")
216
+ start_time: str = Field(..., description="Start datetime in ISO format")
217
+ end_time: str = Field(..., description="End datetime in ISO format")
218
+
219
+
220
+ class GeminiGetHistoricalPricesTool(BaseTool):
221
+ """Tool for getting historical token prices from Gemini."""
222
+ name = "gemini_get_historical_prices"
223
+ description = "Get historical prices for a token within a time range"
224
+ args_schema: Type[BaseModel] = GetHistoricalPricesInput
225
+
226
+ def __init__(self, gemini_client=None):
227
+ super().__init__()
228
+ # Store reference to client if provided, otherwise we'll use global instance
229
+ if gemini_client:
230
+ set_global_clients(gemini_client=gemini_client)
231
+
232
+ def _run(
233
+ self,
234
+ symbol: str,
235
+ start_time: Optional[str] = None,
236
+ end_time: Optional[str] = None,
237
+ interval: str = "15m"
238
+ ) -> str:
239
+ global _GLOBAL_GEMINI_CLIENT
240
+
241
+ if not _GLOBAL_GEMINI_CLIENT:
242
+ return json.dumps({"error": "Gemini client not initialized. Please set global client first."})
243
+
244
+ try:
245
+ # Convert string times to datetime if provided
246
+ start_dt = None
247
+ end_dt = None
248
+
249
+ if start_time:
250
+ start_dt = datetime.fromisoformat(start_time)
251
+ if end_time:
252
+ end_dt = datetime.fromisoformat(end_time)
253
+
254
+ prices = _GLOBAL_GEMINI_CLIENT.get_historical_prices(
255
+ symbol=symbol,
256
+ start_time=start_dt,
257
+ end_time=end_dt,
258
+ interval=interval
259
+ )
260
+
261
+ return json.dumps(prices)
262
+ except Exception as e:
263
+ logging.error(f"Error in GeminiGetHistoricalPricesTool: {str(e)}")
264
+ return json.dumps({"error": str(e)})
265
+
266
+
267
+ class IdentifyPatternsInput(BaseModel):
268
+ """Input for the identify_patterns tool."""
269
+ transactions_json: str = Field(..., description="JSON string of transactions")
270
+ n_clusters: int = Field(3, description="Number of clusters for K-Means")
271
+
272
+
273
+ class DataProcessorIdentifyPatternsTool(BaseTool):
274
+ """Tool for identifying trading patterns using the DataProcessor."""
275
+ name = "data_processor_identify_patterns"
276
+ description = "Identify trading patterns in a set of transactions"
277
+ args_schema: Type[BaseModel] = IdentifyPatternsInput
278
+
279
+ def __init__(self, data_processor=None):
280
+ super().__init__()
281
+ # Store reference to processor if provided, otherwise we'll use global instance
282
+ if data_processor:
283
+ set_global_clients(data_processor=data_processor)
284
+
285
+ def _run(self, transactions_json: List[Dict[str, Any]], n_clusters: int = 3) -> str:
286
+ global _GLOBAL_DATA_PROCESSOR
287
+
288
+ if not _GLOBAL_DATA_PROCESSOR:
289
+ return json.dumps({"error": "Data processor not initialized. Please set global processor first."})
290
+
291
+ try:
292
+ # Convert JSON to DataFrame
293
+ transactions_df = pd.DataFrame(transactions_json)
294
+
295
+ # Ensure required columns exist
296
+ required_columns = ['timeStamp', 'hash', 'from', 'to', 'value', 'tokenSymbol']
297
+ for col in required_columns:
298
+ if col not in transactions_df.columns:
299
+ return json.dumps({
300
+ "error": f"Missing required column: {col}",
301
+ "available_columns": list(transactions_df.columns)
302
+ })
303
+
304
+ # Run pattern identification
305
+ patterns = _GLOBAL_DATA_PROCESSOR.identify_patterns(
306
+ transactions_df=transactions_df,
307
+ n_clusters=n_clusters
308
+ )
309
+
310
+ return json.dumps(patterns)
311
+ except Exception as e:
312
+ logging.error(f"Error in DataProcessorIdentifyPatternsTool: {str(e)}")
313
+ return json.dumps({"error": str(e)})
314
+
315
+
316
+ class DetectAnomalousTransactionsInput(BaseModel):
317
+ """Input for the detect_anomalous_transactions tool."""
318
+ transactions_json: str = Field(..., description="JSON string of transactions")
319
+ sensitivity: str = Field("Medium", description="Detection sensitivity ('Low', 'Medium', 'High')")
320
+
321
+
322
+ class DataProcessorDetectAnomalousTransactionsTool(BaseTool):
323
+ """Tool for detecting anomalous transactions using the DataProcessor."""
324
+ name = "data_processor_detect_anomalies"
325
+ description = "Detect anomalous transactions in a dataset"
326
+ args_schema: Type[BaseModel] = DetectAnomalousTransactionsInput
327
+
328
+ def __init__(self, data_processor=None):
329
+ super().__init__()
330
+ # Store reference to processor if provided, otherwise we'll use global instance
331
+ if data_processor:
332
+ set_global_clients(data_processor=data_processor)
333
+
334
+ def _run(self, transactions_json: List[Dict[str, Any]], sensitivity: str = "Medium") -> str:
335
+ global _GLOBAL_DATA_PROCESSOR
336
+
337
+ if not _GLOBAL_DATA_PROCESSOR:
338
+ return json.dumps({"error": "Data processor not initialized. Please set global processor first."})
339
+
340
+ try:
341
+ # Convert JSON to DataFrame
342
+ transactions_df = pd.DataFrame(transactions_json)
343
+
344
+ # Ensure required columns exist
345
+ required_columns = ['timeStamp', 'hash', 'from', 'to', 'value', 'tokenSymbol']
346
+ for col in required_columns:
347
+ if col not in transactions_df.columns:
348
+ return json.dumps({
349
+ "error": f"Missing required column: {col}",
350
+ "available_columns": list(transactions_df.columns)
351
+ })
352
+
353
+ # Run anomaly detection
354
+ anomalies = _GLOBAL_DATA_PROCESSOR.detect_anomalous_transactions(
355
+ transactions_df=transactions_df,
356
+ sensitivity=sensitivity
357
+ )
358
+
359
+ return json.dumps(anomalies)
360
+ except Exception as e:
361
+ logging.error(f"Error in DataProcessorDetectAnomalousTransactionsTool: {str(e)}")
362
+ return json.dumps({"error": str(e)})
modules/data_processor.py ADDED
@@ -0,0 +1,1425 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+ from datetime import datetime, timedelta
4
+ from typing import Dict, List, Optional, Union, Any, Tuple
5
+ from sklearn.cluster import KMeans, DBSCAN
6
+ from sklearn.preprocessing import StandardScaler
7
+ import plotly.graph_objects as go
8
+ import plotly.express as px
9
+ import logging
10
+ import time
11
+
12
+ class DataProcessor:
13
+ """
14
+ Process and analyze transaction data from blockchain APIs
15
+ """
16
+
17
+ def __init__(self):
18
+ pass
19
+
20
+ def aggregate_transactions(self,
21
+ transactions_df: pd.DataFrame,
22
+ time_window: str = 'D') -> pd.DataFrame:
23
+ """
24
+ Aggregate transactions by time window
25
+
26
+ Args:
27
+ transactions_df: DataFrame of transactions
28
+ time_window: Time window for aggregation (e.g., 'D' for day, 'H' for hour)
29
+
30
+ Returns:
31
+ Aggregated DataFrame with transaction counts and volumes
32
+ """
33
+ if transactions_df.empty:
34
+ return pd.DataFrame()
35
+
36
+ # Ensure timestamp column is datetime
37
+ if 'Timestamp' in transactions_df.columns:
38
+ timestamp_col = 'Timestamp'
39
+ elif 'timeStamp' in transactions_df.columns:
40
+ timestamp_col = 'timeStamp'
41
+ else:
42
+ raise ValueError("Timestamp column not found in transactions DataFrame")
43
+
44
+ # Ensure amount column exists
45
+ if 'Amount' in transactions_df.columns:
46
+ amount_col = 'Amount'
47
+ elif 'tokenAmount' in transactions_df.columns:
48
+ amount_col = 'tokenAmount'
49
+ elif 'value' in transactions_df.columns:
50
+ # Try to adjust for decimals if 'tokenDecimal' exists
51
+ if 'tokenDecimal' in transactions_df.columns:
52
+ transactions_df['adjustedValue'] = transactions_df['value'].astype(float) / (10 ** transactions_df['tokenDecimal'].astype(int))
53
+ amount_col = 'adjustedValue'
54
+ else:
55
+ amount_col = 'value'
56
+ else:
57
+ raise ValueError("Amount column not found in transactions DataFrame")
58
+
59
+ # Resample by time window
60
+ transactions_df = transactions_df.copy()
61
+ try:
62
+ transactions_df.set_index(pd.DatetimeIndex(transactions_df[timestamp_col]), inplace=True)
63
+ except Exception as e:
64
+ print(f"Error setting DatetimeIndex: {str(e)}")
65
+ # Create a safe index as a fallback
66
+ transactions_df['safe_timestamp'] = pd.date_range(
67
+ start='2025-01-01',
68
+ periods=len(transactions_df),
69
+ freq='H'
70
+ )
71
+ transactions_df.set_index('safe_timestamp', inplace=True)
72
+
73
+ # Identify buy vs sell transactions based on 'from' and 'to' addresses
74
+ if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
75
+ from_col, to_col = 'From', 'To'
76
+ elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
77
+ from_col, to_col = 'from', 'to'
78
+ else:
79
+ # If we can't determine direction, just aggregate total volume
80
+ agg_df = transactions_df.resample(time_window).agg({
81
+ amount_col: 'sum',
82
+ timestamp_col: 'count'
83
+ })
84
+ agg_df.columns = ['Volume', 'Count']
85
+ return agg_df.reset_index()
86
+
87
+ # Calculate net flow for each wallet address (positive = inflow, negative = outflow)
88
+ wallet_addresses = set(transactions_df[from_col].unique()) | set(transactions_df[to_col].unique())
89
+
90
+ results = []
91
+ for wallet in wallet_addresses:
92
+ wallet_df = transactions_df.copy()
93
+
94
+ # Mark transactions as inflow or outflow
95
+ wallet_df['Direction'] = 'Unknown'
96
+ wallet_df.loc[wallet_df[to_col] == wallet, 'Direction'] = 'In'
97
+ wallet_df.loc[wallet_df[from_col] == wallet, 'Direction'] = 'Out'
98
+
99
+ # Calculate net flow
100
+ wallet_df['NetFlow'] = wallet_df[amount_col]
101
+ wallet_df.loc[wallet_df['Direction'] == 'Out', 'NetFlow'] = -wallet_df.loc[wallet_df['Direction'] == 'Out', amount_col]
102
+
103
+ # Aggregate by time window
104
+ wallet_agg = wallet_df.resample(time_window).agg({
105
+ 'NetFlow': 'sum',
106
+ timestamp_col: 'count'
107
+ })
108
+ wallet_agg.columns = ['NetFlow', 'Count']
109
+ wallet_agg['Wallet'] = wallet
110
+
111
+ results.append(wallet_agg.reset_index())
112
+
113
+ if not results:
114
+ return pd.DataFrame()
115
+
116
+ combined_df = pd.concat(results, ignore_index=True)
117
+ return combined_df
118
+
119
+ # Cache for pattern identification to avoid repeating expensive calculations
120
+ _pattern_cache = {}
121
+
122
+ def identify_patterns(self,
123
+ transactions_df: pd.DataFrame,
124
+ n_clusters: int = 3) -> List[Dict[str, Any]]:
125
+ """
126
+ Identify trading patterns using clustering algorithms
127
+
128
+ Args:
129
+ transactions_df: DataFrame of transactions
130
+ n_clusters: Number of clusters to identify
131
+
132
+ Returns:
133
+ List of pattern dictionaries containing name, description, and confidence
134
+ """
135
+ # Check for empty data early to avoid processing
136
+ if transactions_df.empty:
137
+ return []
138
+
139
+ # Create a cache key based on DataFrame hash and number of clusters
140
+ try:
141
+ cache_key = f"{hash(tuple(transactions_df.columns))}_{len(transactions_df)}_{n_clusters}"
142
+
143
+ # Check cache first
144
+ if cache_key in self._pattern_cache:
145
+ return self._pattern_cache[cache_key]
146
+ except Exception:
147
+ # If hashing fails, proceed without caching
148
+ cache_key = None
149
+
150
+ try:
151
+ # Create a reference instead of a deep copy to improve memory usage
152
+ df = transactions_df
153
+
154
+ # Ensure timestamp column exists - optimize column presence checks
155
+ timestamp_cols = ['Timestamp', 'timeStamp']
156
+ timestamp_col = next((col for col in timestamp_cols if col in df.columns), None)
157
+
158
+ if timestamp_col:
159
+ # Convert timestamp only if needed
160
+ if not pd.api.types.is_datetime64_any_dtype(df[timestamp_col]):
161
+ try:
162
+ # Use vectorized operations instead of astype where possible
163
+ if df[timestamp_col].dtype == 'object':
164
+ df[timestamp_col] = pd.to_datetime(df[timestamp_col], errors='coerce')
165
+ else:
166
+ df[timestamp_col] = pd.to_datetime(df[timestamp_col], unit='s', errors='coerce')
167
+ except Exception as e:
168
+ # Create a date range index as fallback
169
+ df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
170
+ timestamp_col = 'dummy_timestamp'
171
+ else:
172
+ # If no timestamp column, create a dummy index
173
+ df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
174
+ timestamp_col = 'dummy_timestamp'
175
+
176
+ # Efficiently calculate floor hour using vectorized operations
177
+ df['hour'] = df[timestamp_col].dt.floor('H')
178
+
179
+ # Check for address columns efficiently
180
+ if 'From' in df.columns and 'To' in df.columns:
181
+ from_col, to_col = 'From', 'To'
182
+ elif 'from' in df.columns and 'to' in df.columns:
183
+ from_col, to_col = 'from', 'to'
184
+ else:
185
+ # Create dummy addresses only if necessary
186
+ df['from'] = [f'0x{i:040x}' for i in range(len(df))]
187
+ df['to'] = [f'0x{(i+1):040x}' for i in range(len(df))]
188
+ from_col, to_col = 'from', 'to'
189
+
190
+ # Efficiently determine amount column
191
+ amount_cols = ['Amount', 'tokenAmount', 'value', 'adjustedValue']
192
+ amount_col = next((col for col in amount_cols if col in df.columns), None)
193
+
194
+ if not amount_col:
195
+ # Handle special case for token values with decimals
196
+ if 'value' in df.columns and 'tokenDecimal' in df.columns:
197
+ # Vectorized calculation for improved performance
198
+ try:
199
+ # Ensure values are numeric
200
+ df['value_numeric'] = pd.to_numeric(df['value'], errors='coerce')
201
+ df['tokenDecimal_numeric'] = pd.to_numeric(df['tokenDecimal'], errors='coerce').fillna(18)
202
+ df['adjustedValue'] = df['value_numeric'] / (10 ** df['tokenDecimal_numeric'])
203
+ amount_col = 'adjustedValue'
204
+ except Exception as e:
205
+ logging.warning(f"Error converting values: {e}")
206
+ df['dummy_amount'] = 1.0
207
+ amount_col = 'dummy_amount'
208
+ else:
209
+ # Fallback to dummy values
210
+ df['dummy_amount'] = 1.0
211
+ amount_col = 'dummy_amount'
212
+
213
+ # Ensure the amount column is numeric
214
+ try:
215
+ if amount_col in df.columns:
216
+ df[f"{amount_col}_numeric"] = pd.to_numeric(df[amount_col], errors='coerce').fillna(0)
217
+ amount_col = f"{amount_col}_numeric"
218
+ except Exception:
219
+ # If conversion fails, create a dummy numeric column
220
+ df['safe_amount'] = 1.0
221
+ amount_col = 'safe_amount'
222
+
223
+ # Calculate metrics using optimized groupby operations
224
+ # Use a more efficient approach with built-in pandas aggregation
225
+ agg_df = df.groupby('hour').agg(
226
+ Count=pd.NamedAgg(column=from_col, aggfunc='count'),
227
+ ).reset_index()
228
+
229
+ # For NetFlow calculation, we need an additional pass
230
+ # This uses a more efficient calculation method
231
+ def calc_netflow(group):
232
+ # Use optimized filtering and calculations for better performance
233
+ first_to = group[to_col].iloc[0] if len(group) > 0 else None
234
+ first_from = group[from_col].iloc[0] if len(group) > 0 else None
235
+
236
+ if first_to is not None and first_from is not None:
237
+ # Ensure values are converted to numeric before summing
238
+ try:
239
+ # Convert to numeric with pd.to_numeric, coerce errors to NaN
240
+ total_in = pd.to_numeric(group.loc[group[to_col] == first_to, amount_col], errors='coerce').sum()
241
+ total_out = pd.to_numeric(group.loc[group[from_col] == first_from, amount_col], errors='coerce').sum()
242
+ # Replace NaN with 0 to avoid propagation
243
+ if pd.isna(total_in): total_in = 0.0
244
+ if pd.isna(total_out): total_out = 0.0
245
+ return float(total_in) - float(total_out)
246
+ except Exception as e:
247
+ import logging
248
+ logging.debug(f"Error converting values to numeric: {e}")
249
+ return 0.0
250
+ return 0.0
251
+
252
+ # Calculate NetFlow using apply instead of loop
253
+ netflows = df.groupby('hour').apply(calc_netflow)
254
+ agg_df['NetFlow'] = netflows.values
255
+
256
+ # Early return if not enough data for clustering
257
+ if agg_df.empty or len(agg_df) < n_clusters:
258
+ return []
259
+
260
+ # Ensure we don't have too many clusters for the dataset
261
+ actual_n_clusters = min(n_clusters, max(2, len(agg_df) // 2))
262
+
263
+ # Prepare features for clustering - with careful type handling
264
+ try:
265
+ if 'NetFlow' in agg_df.columns:
266
+ # Ensure NetFlow is numeric
267
+ agg_df['NetFlow'] = pd.to_numeric(agg_df['NetFlow'], errors='coerce').fillna(0)
268
+ features = agg_df[['NetFlow', 'Count']].copy()
269
+ primary_metric = 'NetFlow'
270
+ else:
271
+ # Calculate Volume if needed
272
+ if 'Volume' not in agg_df.columns and amount_col in df.columns:
273
+ # Calculate volume with numeric conversion
274
+ volume_by_hour = pd.to_numeric(df[amount_col], errors='coerce').fillna(0).groupby(df['hour']).sum()
275
+ agg_df['Volume'] = agg_df['hour'].map(volume_by_hour)
276
+
277
+ # Ensure Volume exists and is numeric
278
+ if 'Volume' not in agg_df.columns:
279
+ agg_df['Volume'] = 1.0 # Default value if calculation failed
280
+ else:
281
+ agg_df['Volume'] = pd.to_numeric(agg_df['Volume'], errors='coerce').fillna(1.0)
282
+
283
+ # Ensure Count is numeric
284
+ agg_df['Count'] = pd.to_numeric(agg_df['Count'], errors='coerce').fillna(1.0)
285
+
286
+ features = agg_df[['Volume', 'Count']].copy()
287
+ primary_metric = 'Volume'
288
+
289
+ # Final check to ensure features are numeric
290
+ for col in features.columns:
291
+ features[col] = pd.to_numeric(features[col], errors='coerce').fillna(0)
292
+ except Exception as e:
293
+ logging.warning(f"Error preparing clustering features: {e}")
294
+ # Create safe dummy features if everything else fails
295
+ agg_df['SafeFeature'] = 1.0
296
+ agg_df['Count'] = 1.0
297
+ features = agg_df[['SafeFeature', 'Count']].copy()
298
+ primary_metric = 'SafeFeature'
299
+
300
+ # Scale features - import only when needed for efficiency
301
+ from sklearn.preprocessing import StandardScaler
302
+ scaler = StandardScaler()
303
+ scaled_features = scaler.fit_transform(features)
304
+
305
+ # Use K-Means with reduced complexity
306
+ from sklearn.cluster import KMeans
307
+ kmeans = KMeans(n_clusters=actual_n_clusters, random_state=42, n_init=10, max_iter=100)
308
+ agg_df['Cluster'] = kmeans.fit_predict(scaled_features)
309
+
310
+ # Calculate time-based metrics from the hour column directly
311
+ if 'hour' in agg_df.columns:
312
+ try:
313
+ # Convert to datetime for hour and day extraction if needed
314
+ hour_series = pd.to_datetime(agg_df['hour'])
315
+ agg_df['Hour'] = hour_series.dt.hour
316
+ agg_df['Day'] = hour_series.dt.dayofweek
317
+ except Exception:
318
+ # Fallback for non-convertible data
319
+ agg_df['Hour'] = 0
320
+ agg_df['Day'] = 0
321
+ else:
322
+ # Default values if no hour column
323
+ agg_df['Hour'] = 0
324
+ agg_df['Day'] = 0
325
+
326
+ # Identify patterns efficiently
327
+ patterns = []
328
+ for i in range(actual_n_clusters):
329
+ # Use boolean indexing for better performance
330
+ cluster_mask = agg_df['Cluster'] == i
331
+ cluster_df = agg_df[cluster_mask]
332
+
333
+ if len(cluster_df) == 0:
334
+ continue
335
+
336
+ if primary_metric == 'NetFlow':
337
+ # Use numpy methods for faster calculation
338
+ avg_flow = cluster_df['NetFlow'].mean()
339
+ flow_std = cluster_df['NetFlow'].std()
340
+ behavior = "Accumulation" if avg_flow > 0 else "Distribution"
341
+ volume_metric = f"Net Flow: {avg_flow:.2f} Β± {flow_std:.2f}"
342
+ else:
343
+ # Use Volume metrics - optimize to avoid redundant calculations
344
+ avg_volume = cluster_df['Volume'].mean() if 'Volume' in cluster_df else 0
345
+ volume_std = cluster_df['Volume'].std() if 'Volume' in cluster_df else 0
346
+ behavior = "High Volume" if 'Volume' in agg_df and avg_volume > agg_df['Volume'].mean() else "Low Volume"
347
+ volume_metric = f"Volume: {avg_volume:.2f} Β± {volume_std:.2f}"
348
+
349
+ # Pattern characteristics
350
+ pattern_metrics = {
351
+ "avg_flow": avg_flow,
352
+ "flow_std": flow_std,
353
+ "avg_count": cluster_df['Count'].mean(),
354
+ "max_flow": cluster_df['NetFlow'].max(),
355
+ "min_flow": cluster_df['NetFlow'].min(),
356
+ "common_hour": cluster_df['Hour'].mode()[0] if not cluster_df['Hour'].empty else None,
357
+ "common_day": cluster_df['Day'].mode()[0] if not cluster_df['Day'].empty else None
358
+ }
359
+
360
+ # Enhanced confidence calculation
361
+ if primary_metric == 'NetFlow':
362
+ # Calculate within-cluster variance as a percentage of total variance
363
+ cluster_variance = cluster_df['NetFlow'].var()
364
+ total_variance = agg_df['NetFlow'].var() or 1 # Avoid division by zero
365
+ confidence = max(0.4, min(0.95, 1 - (cluster_variance / total_variance)))
366
+ else:
367
+ # Calculate within-cluster variance as a percentage of total variance
368
+ cluster_variance = cluster_df['Volume'].var()
369
+ total_variance = agg_df['Volume'].var() or 1 # Avoid division by zero
370
+ confidence = max(0.4, min(0.95, 1 - (cluster_variance / total_variance)))
371
+
372
+ # Create enhanced pattern charts - Main Chart
373
+ if primary_metric == 'NetFlow':
374
+ main_fig = px.scatter(cluster_df, x=cluster_df.index, y='NetFlow',
375
+ size='Count', color='Cluster',
376
+ title=f"Pattern {i+1}: {behavior}",
377
+ labels={'NetFlow': 'Net Token Flow', 'index': 'Time'},
378
+ color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
379
+
380
+ # Add a trend line
381
+ main_fig.add_trace(go.Scatter(
382
+ x=cluster_df.index,
383
+ y=cluster_df['NetFlow'].rolling(window=3, min_periods=1).mean(),
384
+ mode='lines',
385
+ name='Trend',
386
+ line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
387
+ ))
388
+
389
+ # Add a zero reference line
390
+ main_fig.add_shape(
391
+ type="line",
392
+ x0=cluster_df.index.min(),
393
+ y0=0,
394
+ x1=cluster_df.index.max(),
395
+ y1=0,
396
+ line=dict(color="red", width=1, dash="dot"),
397
+ )
398
+ else:
399
+ main_fig = px.scatter(cluster_df, x=cluster_df.index, y='Volume',
400
+ size='Count', color='Cluster',
401
+ title=f"Pattern {i+1}: {behavior}",
402
+ labels={'Volume': 'Transaction Volume', 'index': 'Time'},
403
+ color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
404
+
405
+ # Add a trend line
406
+ main_fig.add_trace(go.Scatter(
407
+ x=cluster_df.index,
408
+ y=cluster_df['Volume'].rolling(window=3, min_periods=1).mean(),
409
+ mode='lines',
410
+ name='Trend',
411
+ line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
412
+ ))
413
+
414
+ main_fig.update_layout(
415
+ template="plotly_white",
416
+ legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
417
+ margin=dict(l=20, r=20, t=50, b=20),
418
+ height=400
419
+ )
420
+
421
+ # Create hourly distribution chart
422
+ hour_counts = cluster_df.groupby('Hour')['Count'].sum().reindex(range(24), fill_value=0)
423
+ hour_fig = px.bar(x=hour_counts.index, y=hour_counts.values,
424
+ title="Hourly Distribution",
425
+ labels={'x': 'Hour of Day', 'y': 'Transaction Count'},
426
+ color_discrete_sequence=['#1f77b4'])
427
+ hour_fig.update_layout(template="plotly_white", height=300)
428
+
429
+ # Create volume/flow distribution chart
430
+ if primary_metric == 'NetFlow':
431
+ hist_data = cluster_df['NetFlow']
432
+ hist_title = "Net Flow Distribution"
433
+ hist_label = "Net Flow"
434
+ else:
435
+ hist_data = cluster_df['Volume']
436
+ hist_title = "Volume Distribution"
437
+ hist_label = "Volume"
438
+
439
+ dist_fig = px.histogram(hist_data,
440
+ title=hist_title,
441
+ labels={'value': hist_label, 'count': 'Frequency'},
442
+ color_discrete_sequence=['#2ca02c'])
443
+ dist_fig.update_layout(template="plotly_white", height=300)
444
+
445
+ # Find related transactions
446
+ if not transactions_df.empty:
447
+ # Get timestamps from this cluster
448
+ cluster_times = pd.to_datetime(cluster_df.index)
449
+ # Create time windows for matching
450
+ time_windows = [(t - pd.Timedelta(hours=1), t + pd.Timedelta(hours=1)) for t in cluster_times]
451
+
452
+ # Find transactions within these time windows
453
+ pattern_txs = transactions_df[transactions_df[timestamp_col].apply(
454
+ lambda x: any((start <= x <= end) for start, end in time_windows)
455
+ )].copy()
456
+
457
+ # If we have too many, sample them
458
+ if len(pattern_txs) > 10:
459
+ pattern_txs = pattern_txs.sample(10)
460
+
461
+ # If we have too few, just sample from all transactions
462
+ if len(pattern_txs) < 5 and len(transactions_df) >= 5:
463
+ pattern_txs = transactions_df.sample(min(5, len(transactions_df)))
464
+ else:
465
+ pattern_txs = pd.DataFrame()
466
+
467
+ # Comprehensive pattern dictionary
468
+ pattern = {
469
+ "name": behavior,
470
+ "description": f"This pattern shows {behavior.lower()} activity.",
471
+ "strategy": "Unknown",
472
+ "risk_profile": "Unknown",
473
+ "time_insight": "Unknown",
474
+ "cluster_id": i,
475
+ "metrics": pattern_metrics,
476
+ "occurrence_count": len(cluster_df),
477
+ "volume_metric": volume_metric,
478
+ "confidence": confidence,
479
+ "impact": 0.0,
480
+ "charts": {
481
+ "main": main_fig,
482
+ "hourly_distribution": hour_fig,
483
+ "value_distribution": dist_fig
484
+ },
485
+ "examples": pattern_txs
486
+ }
487
+
488
+ patterns.append(pattern)
489
+
490
+ # Cache results for future reuse
491
+ if cache_key:
492
+ self._pattern_cache[cache_key] = patterns
493
+
494
+ return patterns
495
+
496
+ except Exception as e:
497
+ import logging
498
+ logging.warning(f"Error during pattern identification: {str(e)}")
499
+ return []
500
+
501
+ # Create enhanced pattern detection method with visualization capabilities
502
+ if primary_metric == 'NetFlow':
503
+ main_fig = px.scatter(cluster_df, x=cluster_df.index, y='NetFlow',
504
+ size='Count', color='Cluster',
505
+ title=f"Pattern {i+1}: {behavior}",
506
+ labels={'NetFlow': 'Net Token Flow', 'index': 'Time'},
507
+ color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
508
+
509
+ # Add a trend line
510
+ main_fig.add_trace(go.Scatter(
511
+ x=cluster_df.index,
512
+ y=cluster_df['NetFlow'].rolling(window=3, min_periods=1).mean(),
513
+ mode='lines',
514
+ name='Trend',
515
+ line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
516
+ ))
517
+
518
+ # Add a zero reference line
519
+ main_fig.add_shape(
520
+ type="line",
521
+ x0=cluster_df.index.min(),
522
+ y0=0,
523
+ x1=cluster_df.index.max(),
524
+ y1=0,
525
+ line=dict(color="red", width=1, dash="dot"),
526
+ )
527
+ else:
528
+ main_fig = px.scatter(cluster_df, x=cluster_df.index, y='Volume',
529
+ size='Count', color='Cluster',
530
+ title=f"Pattern {i+1}: {behavior}",
531
+ labels={'Volume': 'Transaction Volume', 'index': 'Time'},
532
+ color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
533
+
534
+ # Add a trend line
535
+ main_fig.add_trace(go.Scatter(
536
+ x=cluster_df.index,
537
+ y=cluster_df['Volume'].rolling(window=3, min_periods=1).mean(),
538
+ mode='lines',
539
+ name='Trend',
540
+ line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
541
+ ))
542
+
543
+ main_fig.update_layout(
544
+ template="plotly_white",
545
+ legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
546
+ margin=dict(l=20, r=20, t=50, b=20),
547
+ height=400
548
+ )
549
+
550
+ # Create hourly distribution chart
551
+ hour_counts = cluster_df.groupby('Hour')['Count'].sum().reindex(range(24), fill_value=0)
552
+ hour_fig = px.bar(x=hour_counts.index, y=hour_counts.values,
553
+ title="Hourly Distribution",
554
+ labels={'x': 'Hour of Day', 'y': 'Transaction Count'},
555
+ color_discrete_sequence=['#1f77b4'])
556
+ hour_fig.update_layout(template="plotly_white", height=300)
557
+
558
+ # Create volume/flow distribution chart
559
+ if primary_metric == 'NetFlow':
560
+ hist_data = cluster_df['NetFlow']
561
+ hist_title = "Net Flow Distribution"
562
+ hist_label = "Net Flow"
563
+ else:
564
+ hist_data = cluster_df['Volume']
565
+ hist_title = "Volume Distribution"
566
+ hist_label = "Volume"
567
+
568
+ dist_fig = px.histogram(hist_data,
569
+ title=hist_title,
570
+ labels={'value': hist_label, 'count': 'Frequency'},
571
+ color_discrete_sequence=['#2ca02c'])
572
+ dist_fig.update_layout(template="plotly_white", height=300)
573
+
574
+ # Find related transactions
575
+ if not transactions_df.empty:
576
+ # Get timestamps from this cluster
577
+ cluster_times = pd.to_datetime(cluster_df.index)
578
+ # Create time windows for matching
579
+ time_windows = [(t - pd.Timedelta(hours=1), t + pd.Timedelta(hours=1)) for t in cluster_times]
580
+
581
+ # Find transactions within these time windows
582
+ pattern_txs = transactions_df[transactions_df[timestamp_col].apply(
583
+ lambda x: any((start <= x <= end) for start, end in time_windows)
584
+ )].copy()
585
+
586
+ # If we have too many, sample them
587
+ if len(pattern_txs) > 10:
588
+ pattern_txs = pattern_txs.sample(10)
589
+
590
+ # If we have too few, just sample from all transactions
591
+ if len(pattern_txs) < 5 and len(transactions_df) >= 5:
592
+ pattern_txs = transactions_df.sample(min(5, len(transactions_df)))
593
+ else:
594
+ pattern_txs = pd.DataFrame()
595
+
596
+ # Comprehensive pattern dictionary
597
+ pattern = {
598
+ "name": behavior,
599
+ "description": description,
600
+ "strategy": strategy,
601
+ "risk_profile": risk_profile,
602
+ "time_insight": time_insight,
603
+ "cluster_id": i,
604
+ "metrics": pattern_metrics,
605
+ "occurrence_count": len(cluster_df),
606
+ "volume_metric": volume_metric,
607
+ "confidence": confidence,
608
+ "charts": {
609
+ "main": main_fig,
610
+ "hourly_distribution": hour_fig,
611
+ "value_distribution": dist_fig
612
+ },
613
+ "examples": pattern_txs
614
+ }
615
+
616
+ patterns.append(pattern)
617
+
618
+ return patterns
619
+
620
+ def detect_anomalous_transactions(self,
621
+ transactions_df: pd.DataFrame,
622
+ sensitivity: str = "Medium") -> pd.DataFrame:
623
+ """
624
+ Detect anomalous transactions using statistical methods
625
+
626
+ Args:
627
+ transactions_df: DataFrame of transactions
628
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
629
+
630
+ Returns:
631
+ DataFrame of anomalous transactions
632
+ """
633
+ if transactions_df.empty:
634
+ return pd.DataFrame()
635
+
636
+ # Ensure amount column exists
637
+ if 'Amount' in transactions_df.columns:
638
+ amount_col = 'Amount'
639
+ elif 'tokenAmount' in transactions_df.columns:
640
+ amount_col = 'tokenAmount'
641
+ elif 'value' in transactions_df.columns:
642
+ # Try to adjust for decimals if 'tokenDecimal' exists
643
+ if 'tokenDecimal' in transactions_df.columns:
644
+ transactions_df['adjustedValue'] = transactions_df['value'].astype(float) / (10 ** transactions_df['tokenDecimal'].astype(int))
645
+ amount_col = 'adjustedValue'
646
+ else:
647
+ amount_col = 'value'
648
+ else:
649
+ raise ValueError("Amount column not found in transactions DataFrame")
650
+
651
+ # Define sensitivity thresholds
652
+ if sensitivity == "Low":
653
+ z_threshold = 3.0 # Outliers beyond 3 standard deviations
654
+ elif sensitivity == "Medium":
655
+ z_threshold = 2.5 # Outliers beyond 2.5 standard deviations
656
+ else: # High
657
+ z_threshold = 2.0 # Outliers beyond 2 standard deviations
658
+
659
+ # Calculate z-score for amount
660
+ mean_amount = transactions_df[amount_col].mean()
661
+ std_amount = transactions_df[amount_col].std()
662
+
663
+ if std_amount == 0:
664
+ return pd.DataFrame()
665
+
666
+ transactions_df['z_score'] = abs((transactions_df[amount_col] - mean_amount) / std_amount)
667
+
668
+ # Flag anomalous transactions
669
+ anomalies = transactions_df[transactions_df['z_score'] > z_threshold].copy()
670
+
671
+ # Add risk level based on z-score
672
+ anomalies['risk_level'] = 'Medium'
673
+ anomalies.loc[anomalies['z_score'] > z_threshold * 1.5, 'risk_level'] = 'High'
674
+ anomalies.loc[anomalies['z_score'] <= z_threshold * 1.2, 'risk_level'] = 'Low'
675
+
676
+ return anomalies
677
+
678
+ def analyze_price_impact(self,
679
+ transactions_df: pd.DataFrame,
680
+ price_data: Dict[str, Dict[str, Any]]) -> Dict[str, Any]:
681
+ """
682
+ Analyze the price impact of transactions with enhanced visualizations
683
+
684
+ Args:
685
+ transactions_df: DataFrame of transactions
686
+ price_data: Dictionary of price impact data for each transaction
687
+
688
+ Returns:
689
+ Dictionary with comprehensive price impact analysis and visualizations
690
+ """
691
+ if transactions_df.empty or not price_data:
692
+ # Create an empty chart for the default case
693
+ empty_fig = go.Figure()
694
+ empty_fig.update_layout(
695
+ title="No Price Impact Data Available",
696
+ xaxis_title="Time",
697
+ yaxis_title="Price Impact (%)",
698
+ height=400,
699
+ template="plotly_white"
700
+ )
701
+ empty_fig.add_annotation(
702
+ text="No transactions found with price impact data",
703
+ showarrow=False,
704
+ font=dict(size=14)
705
+ )
706
+
707
+ return {
708
+ 'avg_impact_pct': 0,
709
+ 'max_impact_pct': 0,
710
+ 'min_impact_pct': 0,
711
+ 'significant_moves_count': 0,
712
+ 'total_transactions': 0,
713
+ 'charts': {
714
+ 'main_chart': empty_fig,
715
+ 'impact_distribution': empty_fig,
716
+ 'cumulative_impact': empty_fig,
717
+ 'hourly_impact': empty_fig
718
+ },
719
+ 'transactions_with_impact': pd.DataFrame(),
720
+ 'insights': [],
721
+ 'impact_summary': "No price impact data available"
722
+ }
723
+
724
+ # Ensure timestamp column is datetime
725
+ if 'Timestamp' in transactions_df.columns:
726
+ timestamp_col = 'Timestamp'
727
+ elif 'timeStamp' in transactions_df.columns:
728
+ timestamp_col = 'timeStamp'
729
+ # Convert timestamp to datetime if it's not already
730
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
731
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
732
+ else:
733
+ raise ValueError("Timestamp column not found in transactions DataFrame")
734
+
735
+ # Combine price impact data with transactions
736
+ impact_data = []
737
+
738
+ for idx, row in transactions_df.iterrows():
739
+ tx_hash = row.get('Transaction Hash', row.get('hash', None))
740
+ if not tx_hash or tx_hash not in price_data:
741
+ continue
742
+
743
+ tx_impact = price_data[tx_hash]
744
+
745
+ if tx_impact['impact_pct'] is None:
746
+ continue
747
+
748
+ # Get token symbol if available
749
+ token_symbol = row.get('tokenSymbol', 'Unknown')
750
+ token_amount = row.get('value', 0)
751
+ if 'tokenDecimal' in row:
752
+ try:
753
+ token_amount = float(token_amount) / (10 ** int(row.get('tokenDecimal', 0)))
754
+ except (ValueError, TypeError):
755
+ token_amount = 0
756
+
757
+ impact_data.append({
758
+ 'transaction_hash': tx_hash,
759
+ 'timestamp': row[timestamp_col],
760
+ 'pre_price': tx_impact['pre_price'],
761
+ 'post_price': tx_impact['post_price'],
762
+ 'impact_pct': tx_impact['impact_pct'],
763
+ 'token_symbol': token_symbol,
764
+ 'token_amount': token_amount,
765
+ 'from': row.get('from', ''),
766
+ 'to': row.get('to', ''),
767
+ 'hour': row[timestamp_col].hour if isinstance(row[timestamp_col], pd.Timestamp) else 0
768
+ })
769
+
770
+ if not impact_data:
771
+ # Create an empty chart for the default case
772
+ empty_fig = go.Figure()
773
+ empty_fig.update_layout(
774
+ title="No Price Impact Data Available",
775
+ xaxis_title="Time",
776
+ yaxis_title="Price Impact (%)",
777
+ height=400,
778
+ template="plotly_white"
779
+ )
780
+ empty_fig.add_annotation(
781
+ text="No transactions found with price impact data",
782
+ showarrow=False,
783
+ font=dict(size=14)
784
+ )
785
+
786
+ return {
787
+ 'avg_impact_pct': 0,
788
+ 'max_impact_pct': 0,
789
+ 'min_impact_pct': 0,
790
+ 'significant_moves_count': 0,
791
+ 'total_transactions': len(transactions_df) if not transactions_df.empty else 0,
792
+ 'charts': {
793
+ 'main_chart': empty_fig,
794
+ 'impact_distribution': empty_fig,
795
+ 'cumulative_impact': empty_fig,
796
+ 'hourly_impact': empty_fig
797
+ },
798
+ 'transactions_with_impact': pd.DataFrame(),
799
+ 'insights': [],
800
+ 'impact_summary': "No price impact data available"
801
+ }
802
+
803
+ impact_df = pd.DataFrame(impact_data)
804
+
805
+ # Calculate aggregate metrics
806
+ avg_impact = impact_df['impact_pct'].mean()
807
+ max_impact = impact_df['impact_pct'].max()
808
+ min_impact = impact_df['impact_pct'].min()
809
+ median_impact = impact_df['impact_pct'].median()
810
+ std_impact = impact_df['impact_pct'].std()
811
+
812
+ # Count significant moves (>1% impact)
813
+ significant_threshold = 1.0
814
+ high_impact_threshold = 3.0
815
+ significant_moves = len(impact_df[abs(impact_df['impact_pct']) > significant_threshold])
816
+ high_impact_moves = len(impact_df[abs(impact_df['impact_pct']) > high_impact_threshold])
817
+ positive_impacts = len(impact_df[impact_df['impact_pct'] > 0])
818
+ negative_impacts = len(impact_df[impact_df['impact_pct'] < 0])
819
+
820
+ # Calculate cumulative impact
821
+ impact_df = impact_df.sort_values('timestamp')
822
+ impact_df['cumulative_impact'] = impact_df['impact_pct'].cumsum()
823
+
824
+ # Generate insights
825
+ insights = []
826
+
827
+ # Market direction bias
828
+ if avg_impact > 0.5:
829
+ insights.append({
830
+ "title": "Positive Price Pressure",
831
+ "description": f"Transactions show an overall positive price impact of {avg_impact:.2f}%, suggesting accumulation or market strength."
832
+ })
833
+ elif avg_impact < -0.5:
834
+ insights.append({
835
+ "title": "Negative Price Pressure",
836
+ "description": f"Transactions show an overall negative price impact of {avg_impact:.2f}%, suggesting distribution or market weakness."
837
+ })
838
+
839
+ # Volatility analysis
840
+ if std_impact > 2.0:
841
+ insights.append({
842
+ "title": "High Market Volatility",
843
+ "description": f"Price impact shows high volatility (std: {std_impact:.2f}%), indicating potential market manipulation or whipsaw conditions."
844
+ })
845
+
846
+ # Significant impacts
847
+ if high_impact_moves > 0:
848
+ insights.append({
849
+ "title": "High Impact Transactions",
850
+ "description": f"Detected {high_impact_moves} high-impact transactions (>{high_impact_threshold}% price change), indicating potential market-moving activity."
851
+ })
852
+
853
+ # Temporal patterns
854
+ hourly_impact = impact_df.groupby('hour')['impact_pct'].mean()
855
+ if len(hourly_impact) > 0:
856
+ max_hour = hourly_impact.abs().idxmax()
857
+ max_hour_impact = hourly_impact[max_hour]
858
+ insights.append({
859
+ "title": "Time-Based Pattern",
860
+ "description": f"Highest price impact occurs around {max_hour}:00 with an average of {max_hour_impact:.2f}%."
861
+ })
862
+
863
+ # Create impact summary text
864
+ impact_summary = f"Analysis of {len(impact_df)} price-impacting transactions shows an average impact of {avg_impact:.2f}% "
865
+ impact_summary += f"(range: {min_impact:.2f}% to {max_impact:.2f}%). "
866
+ impact_summary += f"Found {significant_moves} significant price moves and {high_impact_moves} high-impact transactions. "
867
+ if positive_impacts > negative_impacts:
868
+ impact_summary += f"There is a bias towards positive price impact ({positive_impacts} positive vs {negative_impacts} negative)."
869
+ elif negative_impacts > positive_impacts:
870
+ impact_summary += f"There is a bias towards negative price impact ({negative_impacts} negative vs {positive_impacts} positive)."
871
+ else:
872
+ impact_summary += "The price impact is balanced between positive and negative moves."
873
+
874
+ # Create enhanced main visualization
875
+ main_fig = go.Figure()
876
+
877
+ # Add scatter plot for impact
878
+ main_fig.add_trace(go.Scatter(
879
+ x=impact_df['timestamp'],
880
+ y=impact_df['impact_pct'],
881
+ mode='markers+lines',
882
+ marker=dict(
883
+ size=impact_df['impact_pct'].abs() * 1.5 + 5,
884
+ color=impact_df['impact_pct'],
885
+ colorscale='RdBu_r',
886
+ line=dict(width=1),
887
+ symbol=['circle' if val >= 0 else 'diamond' for val in impact_df['impact_pct']]
888
+ ),
889
+ text=[
890
+ f"TX: {tx[:8]}...{tx[-6:]}<br>" +
891
+ f"Impact: {impact:.2f}%<br>" +
892
+ f"Token: {token} ({amount:.4f})<br>" +
893
+ f"From: {src[:6]}...{src[-4:]}<br>" +
894
+ f"To: {dst[:6]}...{dst[-4:]}"
895
+ for tx, impact, token, amount, src, dst in zip(
896
+ impact_df['transaction_hash'],
897
+ impact_df['impact_pct'],
898
+ impact_df['token_symbol'],
899
+ impact_df['token_amount'],
900
+ impact_df['from'],
901
+ impact_df['to']
902
+ )
903
+ ],
904
+ hovertemplate='%{text}<br>Time: %{x}<extra></extra>',
905
+ name='Price Impact'
906
+ ))
907
+
908
+ # Add a moving average trendline
909
+ window_size = max(3, len(impact_df) // 10) # Dynamic window size
910
+ if len(impact_df) >= window_size:
911
+ impact_df['ma'] = impact_df['impact_pct'].rolling(window=window_size, min_periods=1).mean()
912
+ main_fig.add_trace(go.Scatter(
913
+ x=impact_df['timestamp'],
914
+ y=impact_df['ma'],
915
+ mode='lines',
916
+ line=dict(width=2, color='rgba(255,165,0,0.7)'),
917
+ name=f'Moving Avg ({window_size} period)'
918
+ ))
919
+
920
+ # Add a zero line for reference
921
+ main_fig.add_shape(
922
+ type='line',
923
+ x0=impact_df['timestamp'].min(),
924
+ y0=0,
925
+ x1=impact_df['timestamp'].max(),
926
+ y1=0,
927
+ line=dict(color='gray', width=1, dash='dash')
928
+ )
929
+
930
+ # Add colored regions for significant impact
931
+
932
+ # Add green band for normal price movement
933
+ main_fig.add_shape(
934
+ type='rect',
935
+ x0=impact_df['timestamp'].min(),
936
+ y0=-significant_threshold,
937
+ x1=impact_df['timestamp'].max(),
938
+ y1=significant_threshold,
939
+ fillcolor='rgba(0,255,0,0.1)',
940
+ line=dict(width=0),
941
+ layer='below'
942
+ )
943
+
944
+ # Add warning bands for higher impact movements
945
+ main_fig.add_shape(
946
+ type='rect',
947
+ x0=impact_df['timestamp'].min(),
948
+ y0=significant_threshold,
949
+ x1=impact_df['timestamp'].max(),
950
+ y1=high_impact_threshold,
951
+ fillcolor='rgba(255,255,0,0.1)',
952
+ line=dict(width=0),
953
+ layer='below'
954
+ )
955
+
956
+ main_fig.add_shape(
957
+ type='rect',
958
+ x0=impact_df['timestamp'].min(),
959
+ y0=-high_impact_threshold,
960
+ x1=impact_df['timestamp'].max(),
961
+ y1=-significant_threshold,
962
+ fillcolor='rgba(255,255,0,0.1)',
963
+ line=dict(width=0),
964
+ layer='below'
965
+ )
966
+
967
+ # Add high impact regions
968
+ main_fig.add_shape(
969
+ type='rect',
970
+ x0=impact_df['timestamp'].min(),
971
+ y0=high_impact_threshold,
972
+ x1=impact_df['timestamp'].max(),
973
+ y1=max(high_impact_threshold * 2, max_impact * 1.1),
974
+ fillcolor='rgba(255,0,0,0.1)',
975
+ line=dict(width=0),
976
+ layer='below'
977
+ )
978
+
979
+ main_fig.add_shape(
980
+ type='rect',
981
+ x0=impact_df['timestamp'].min(),
982
+ y0=min(high_impact_threshold * -2, min_impact * 1.1),
983
+ x1=impact_df['timestamp'].max(),
984
+ y1=-high_impact_threshold,
985
+ fillcolor='rgba(255,0,0,0.1)',
986
+ line=dict(width=0),
987
+ layer='below'
988
+ )
989
+
990
+ main_fig.update_layout(
991
+ title='Price Impact of Whale Transactions',
992
+ xaxis_title='Timestamp',
993
+ yaxis_title='Price Impact (%)',
994
+ hovermode='closest',
995
+ template="plotly_white",
996
+ legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
997
+ margin=dict(l=20, r=20, t=50, b=20)
998
+ )
999
+
1000
+ # Create impact distribution histogram
1001
+ dist_fig = px.histogram(
1002
+ impact_df['impact_pct'],
1003
+ nbins=20,
1004
+ labels={'value': 'Price Impact (%)', 'count': 'Frequency'},
1005
+ title='Distribution of Price Impact',
1006
+ color_discrete_sequence=['#3366CC']
1007
+ )
1008
+
1009
+ # Add a vertical line at the mean
1010
+ dist_fig.add_vline(x=avg_impact, line_dash="dash", line_color="red")
1011
+ dist_fig.add_annotation(x=avg_impact, y=0.85, yref="paper", text=f"Mean: {avg_impact:.2f}%",
1012
+ showarrow=True, arrowhead=2, arrowcolor="red", ax=40)
1013
+
1014
+ # Add a vertical line at zero
1015
+ dist_fig.add_vline(x=0, line_dash="solid", line_color="black")
1016
+
1017
+ dist_fig.update_layout(
1018
+ template="plotly_white",
1019
+ bargap=0.1,
1020
+ height=350
1021
+ )
1022
+
1023
+ # Create cumulative impact chart
1024
+ cumul_fig = go.Figure()
1025
+ cumul_fig.add_trace(go.Scatter(
1026
+ x=impact_df['timestamp'],
1027
+ y=impact_df['cumulative_impact'],
1028
+ mode='lines',
1029
+ fill='tozeroy',
1030
+ line=dict(width=2, color='#2ca02c'),
1031
+ name='Cumulative Impact'
1032
+ ))
1033
+
1034
+ cumul_fig.update_layout(
1035
+ title='Cumulative Price Impact Over Time',
1036
+ xaxis_title='Timestamp',
1037
+ yaxis_title='Cumulative Price Impact (%)',
1038
+ template="plotly_white",
1039
+ height=350
1040
+ )
1041
+
1042
+ # Create hourly impact analysis
1043
+ hourly_impact = impact_df.groupby('hour')['impact_pct'].agg(['mean', 'count', 'std']).reset_index()
1044
+ hourly_impact = hourly_impact.sort_values('hour')
1045
+
1046
+ hour_fig = go.Figure()
1047
+ hour_fig.add_trace(go.Bar(
1048
+ x=hourly_impact['hour'],
1049
+ y=hourly_impact['mean'],
1050
+ error_y=dict(type='data', array=hourly_impact['std'], visible=True),
1051
+ marker_color=hourly_impact['mean'].apply(lambda x: 'green' if x > 0 else 'red'),
1052
+ name='Average Impact'
1053
+ ))
1054
+
1055
+ hour_fig.update_layout(
1056
+ title='Price Impact by Hour of Day',
1057
+ xaxis_title='Hour of Day',
1058
+ yaxis_title='Average Price Impact (%)',
1059
+ template="plotly_white",
1060
+ height=350,
1061
+ xaxis=dict(tickmode='linear', tick0=0, dtick=2)
1062
+ )
1063
+
1064
+ # Join with original transactions
1065
+ transactions_df = transactions_df.copy()
1066
+ transactions_df['Timestamp_key'] = transactions_df[timestamp_col]
1067
+ impact_df['Timestamp_key'] = impact_df['timestamp']
1068
+
1069
+ merged_df = pd.merge(
1070
+ transactions_df,
1071
+ impact_df[['Timestamp_key', 'impact_pct', 'pre_price', 'post_price', 'cumulative_impact']],
1072
+ on='Timestamp_key',
1073
+ how='left'
1074
+ )
1075
+
1076
+ # Final result with enhanced output
1077
+ return {
1078
+ 'avg_impact_pct': avg_impact,
1079
+ 'max_impact_pct': max_impact,
1080
+ 'min_impact_pct': min_impact,
1081
+ 'median_impact_pct': median_impact,
1082
+ 'std_impact_pct': std_impact,
1083
+ 'significant_moves_count': significant_moves,
1084
+ 'high_impact_moves_count': high_impact_moves,
1085
+ 'positive_impacts_count': positive_impacts,
1086
+ 'negative_impacts_count': negative_impacts,
1087
+ 'total_transactions': len(transactions_df),
1088
+ 'charts': {
1089
+ 'main_chart': main_fig,
1090
+ 'impact_distribution': dist_fig,
1091
+ 'cumulative_impact': cumul_fig,
1092
+ 'hourly_impact': hour_fig
1093
+ },
1094
+ 'transactions_with_impact': merged_df,
1095
+ 'insights': insights,
1096
+ 'impact_summary': impact_summary
1097
+ }
1098
+
1099
+ def detect_wash_trading(self,
1100
+ transactions_df: pd.DataFrame,
1101
+ addresses: List[str],
1102
+ time_window_minutes: int = 60,
1103
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
1104
+ """
1105
+ Detect potential wash trading between addresses
1106
+
1107
+ Args:
1108
+ transactions_df: DataFrame of transactions
1109
+ addresses: List of addresses to analyze
1110
+ time_window_minutes: Time window for detecting wash trades
1111
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
1112
+
1113
+ Returns:
1114
+ List of potential wash trading incidents
1115
+ """
1116
+ if transactions_df.empty or not addresses:
1117
+ return []
1118
+
1119
+ # Ensure from/to columns exist
1120
+ if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
1121
+ from_col, to_col = 'From', 'To'
1122
+ elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
1123
+ from_col, to_col = 'from', 'to'
1124
+ else:
1125
+ raise ValueError("From/To columns not found in transactions DataFrame")
1126
+
1127
+ # Ensure timestamp column exists
1128
+ if 'Timestamp' in transactions_df.columns:
1129
+ timestamp_col = 'Timestamp'
1130
+ elif 'timeStamp' in transactions_df.columns:
1131
+ timestamp_col = 'timeStamp'
1132
+ else:
1133
+ raise ValueError("Timestamp column not found in transactions DataFrame")
1134
+
1135
+ # Ensure timestamp is datetime
1136
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
1137
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
1138
+
1139
+ # Define sensitivity thresholds
1140
+ if sensitivity == "Low":
1141
+ min_cycles = 3 # Minimum number of back-and-forth transactions
1142
+ max_time_diff = 120 # Maximum minutes between transactions
1143
+ elif sensitivity == "Medium":
1144
+ min_cycles = 2
1145
+ max_time_diff = 60
1146
+ else: # High
1147
+ min_cycles = 1
1148
+ max_time_diff = 30
1149
+
1150
+ # Filter transactions involving the addresses
1151
+ address_txs = transactions_df[
1152
+ (transactions_df[from_col].isin(addresses)) |
1153
+ (transactions_df[to_col].isin(addresses))
1154
+ ].copy()
1155
+
1156
+ if address_txs.empty:
1157
+ return []
1158
+
1159
+ # Sort by timestamp
1160
+ address_txs = address_txs.sort_values(by=timestamp_col)
1161
+
1162
+ # Detect cycles of transactions between same addresses
1163
+ wash_trades = []
1164
+
1165
+ for addr1 in addresses:
1166
+ for addr2 in addresses:
1167
+ if addr1 == addr2:
1168
+ continue
1169
+
1170
+ # Find transactions from addr1 to addr2
1171
+ a1_to_a2 = address_txs[
1172
+ (address_txs[from_col] == addr1) &
1173
+ (address_txs[to_col] == addr2)
1174
+ ]
1175
+
1176
+ # Find transactions from addr2 to addr1
1177
+ a2_to_a1 = address_txs[
1178
+ (address_txs[from_col] == addr2) &
1179
+ (address_txs[to_col] == addr1)
1180
+ ]
1181
+
1182
+ if a1_to_a2.empty or a2_to_a1.empty:
1183
+ continue
1184
+
1185
+ # Check for back-and-forth patterns
1186
+ cycles = 0
1187
+ evidence = []
1188
+
1189
+ for _, tx1 in a1_to_a2.iterrows():
1190
+ tx1_time = tx1[timestamp_col]
1191
+
1192
+ # Find return transactions within the time window
1193
+ return_txs = a2_to_a1[
1194
+ (a2_to_a1[timestamp_col] > tx1_time) &
1195
+ (a2_to_a1[timestamp_col] <= tx1_time + pd.Timedelta(minutes=max_time_diff))
1196
+ ]
1197
+
1198
+ if not return_txs.empty:
1199
+ cycles += 1
1200
+ evidence.append(tx1)
1201
+ evidence.append(return_txs.iloc[0])
1202
+
1203
+ if cycles >= min_cycles:
1204
+ # Create visualization
1205
+ if evidence:
1206
+ evidence_df = pd.DataFrame(evidence)
1207
+ fig = px.scatter(
1208
+ evidence_df,
1209
+ x=timestamp_col,
1210
+ y=evidence_df.get('Amount', evidence_df.get('tokenAmount', evidence_df.get('value', 0))),
1211
+ color=from_col,
1212
+ title=f"Potential Wash Trading Between {addr1[:8]}... and {addr2[:8]}..."
1213
+ )
1214
+ else:
1215
+ fig = None
1216
+
1217
+ wash_trades.append({
1218
+ "type": "Wash Trading",
1219
+ "addresses": [addr1, addr2],
1220
+ "risk_level": "High" if cycles >= min_cycles * 2 else "Medium",
1221
+ "description": f"Detected {cycles} cycles of back-and-forth transactions between addresses",
1222
+ "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
1223
+ "title": f"Wash Trading Pattern ({cycles} cycles)",
1224
+ "evidence": pd.DataFrame(evidence) if evidence else None,
1225
+ "chart": fig
1226
+ })
1227
+
1228
+ return wash_trades
1229
+
1230
+ def detect_pump_and_dump(self,
1231
+ transactions_df: pd.DataFrame,
1232
+ price_data: Dict[str, Dict[str, Any]],
1233
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
1234
+ """
1235
+ Detect potential pump and dump schemes
1236
+
1237
+ Args:
1238
+ transactions_df: DataFrame of transactions
1239
+ price_data: Dictionary of price impact data for each transaction
1240
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
1241
+
1242
+ Returns:
1243
+ List of potential pump and dump incidents
1244
+ """
1245
+ if transactions_df.empty or not price_data:
1246
+ return []
1247
+
1248
+ # Ensure timestamp column exists
1249
+ if 'Timestamp' in transactions_df.columns:
1250
+ timestamp_col = 'Timestamp'
1251
+ elif 'timeStamp' in transactions_df.columns:
1252
+ timestamp_col = 'timeStamp'
1253
+ else:
1254
+ raise ValueError("Timestamp column not found in transactions DataFrame")
1255
+
1256
+ # Ensure from/to columns exist
1257
+ if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
1258
+ from_col, to_col = 'From', 'To'
1259
+ elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
1260
+ from_col, to_col = 'from', 'to'
1261
+ else:
1262
+ raise ValueError("From/To columns not found in transactions DataFrame")
1263
+
1264
+ # Ensure timestamp is datetime
1265
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
1266
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
1267
+
1268
+ # Define sensitivity thresholds
1269
+ if sensitivity == "Low":
1270
+ accumulation_threshold = 5 # Number of buys to consider accumulation
1271
+ pump_threshold = 10.0 # % price increase to trigger pump
1272
+ dump_threshold = -8.0 # % price decrease to trigger dump
1273
+ elif sensitivity == "Medium":
1274
+ accumulation_threshold = 3
1275
+ pump_threshold = 7.0
1276
+ dump_threshold = -5.0
1277
+ else: # High
1278
+ accumulation_threshold = 2
1279
+ pump_threshold = 5.0
1280
+ dump_threshold = -3.0
1281
+
1282
+ # Combine price impact data with transactions
1283
+ txs_with_impact = []
1284
+
1285
+ for idx, row in transactions_df.iterrows():
1286
+ tx_hash = row.get('Transaction Hash', row.get('hash', None))
1287
+ if not tx_hash or tx_hash not in price_data:
1288
+ continue
1289
+
1290
+ tx_impact = price_data[tx_hash]
1291
+
1292
+ if tx_impact['impact_pct'] is None:
1293
+ continue
1294
+
1295
+ txs_with_impact.append({
1296
+ 'transaction_hash': tx_hash,
1297
+ 'timestamp': row[timestamp_col],
1298
+ 'from': row[from_col],
1299
+ 'to': row[to_col],
1300
+ 'pre_price': tx_impact['pre_price'],
1301
+ 'post_price': tx_impact['post_price'],
1302
+ 'impact_pct': tx_impact['impact_pct']
1303
+ })
1304
+
1305
+ if not txs_with_impact:
1306
+ return []
1307
+
1308
+ impact_df = pd.DataFrame(txs_with_impact)
1309
+ impact_df = impact_df.sort_values(by='timestamp')
1310
+
1311
+ # Look for accumulation phases followed by price pumps and then dumps
1312
+ pump_and_dumps = []
1313
+
1314
+ # Group by address to analyze per wallet
1315
+ address_groups = {}
1316
+
1317
+ for from_addr in impact_df['from'].unique():
1318
+ address_groups[from_addr] = impact_df[impact_df['from'] == from_addr]
1319
+
1320
+ for to_addr in impact_df['to'].unique():
1321
+ if to_addr in address_groups:
1322
+ address_groups[to_addr] = pd.concat([
1323
+ address_groups[to_addr],
1324
+ impact_df[impact_df['to'] == to_addr]
1325
+ ])
1326
+ else:
1327
+ address_groups[to_addr] = impact_df[impact_df['to'] == to_addr]
1328
+
1329
+ for address, addr_df in address_groups.items():
1330
+ # Skip if not enough transactions
1331
+ if len(addr_df) < accumulation_threshold + 2:
1332
+ continue
1333
+
1334
+ # Look for continuous price increase followed by sharp drop
1335
+ window_size = min(len(addr_df), 10)
1336
+ for i in range(len(addr_df) - window_size + 1):
1337
+ window = addr_df.iloc[i:i+window_size]
1338
+
1339
+ # Get cumulative price change in window
1340
+ if len(window) >= 2:
1341
+ first_price = window.iloc[0]['pre_price']
1342
+ last_price = window.iloc[-1]['post_price']
1343
+
1344
+ if first_price is None or last_price is None:
1345
+ continue
1346
+
1347
+ cumulative_change = ((last_price - first_price) / first_price) * 100
1348
+
1349
+ # Check for pump phase
1350
+ max_price = window['post_price'].max()
1351
+ max_idx = window['post_price'].idxmax()
1352
+
1353
+ if max_idx < len(window) - 1:
1354
+ max_to_end = ((window.iloc[-1]['post_price'] - max_price) / max_price) * 100
1355
+
1356
+ # If we have a pump followed by a dump
1357
+ if (cumulative_change > pump_threshold or
1358
+ any(window['impact_pct'] > pump_threshold)) and max_to_end < dump_threshold:
1359
+
1360
+ # Create chart
1361
+ fig = go.Figure()
1362
+
1363
+ # Plot price line
1364
+ times = [t.timestamp() for t in window['timestamp']]
1365
+ prices = []
1366
+ for _, row in window.iterrows():
1367
+ prices.append(row['pre_price'])
1368
+ prices.append(row['post_price'])
1369
+
1370
+ times_expanded = []
1371
+ for t in times:
1372
+ times_expanded.append(t - 60) # 1 min before
1373
+ times_expanded.append(t + 60) # 1 min after
1374
+
1375
+ fig.add_trace(go.Scatter(
1376
+ x=times_expanded,
1377
+ y=prices,
1378
+ mode='lines+markers',
1379
+ name='Price',
1380
+ line=dict(color='blue')
1381
+ ))
1382
+
1383
+ # Highlight pump and dump phases
1384
+ max_time_idx = window.index.get_loc(max_idx)
1385
+ pump_x = times_expanded[:max_time_idx*2+2]
1386
+ pump_y = prices[:max_time_idx*2+2]
1387
+
1388
+ dump_x = times_expanded[max_time_idx*2:]
1389
+ dump_y = prices[max_time_idx*2:]
1390
+
1391
+ fig.add_trace(go.Scatter(
1392
+ x=pump_x,
1393
+ y=pump_y,
1394
+ mode='lines',
1395
+ line=dict(color='green', width=3),
1396
+ name='Pump Phase'
1397
+ ))
1398
+
1399
+ fig.add_trace(go.Scatter(
1400
+ x=dump_x,
1401
+ y=dump_y,
1402
+ mode='lines',
1403
+ line=dict(color='red', width=3),
1404
+ name='Dump Phase'
1405
+ ))
1406
+
1407
+ fig.update_layout(
1408
+ title='Potential Pump and Dump Pattern',
1409
+ xaxis_title='Time',
1410
+ yaxis_title='Price',
1411
+ hovermode='closest'
1412
+ )
1413
+
1414
+ pump_and_dumps.append({
1415
+ "type": "Pump and Dump",
1416
+ "addresses": [address],
1417
+ "risk_level": "High" if max_to_end < dump_threshold * 1.5 else "Medium",
1418
+ "description": f"Price pumped {cumulative_change:.2f}% before dropping {max_to_end:.2f}%",
1419
+ "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
1420
+ "title": f"Pump ({cumulative_change:.1f}%) and Dump ({max_to_end:.1f}%)",
1421
+ "evidence": window,
1422
+ "chart": fig
1423
+ })
1424
+
1425
+ return pump_and_dumps
modules/detection.py ADDED
@@ -0,0 +1,684 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+ from datetime import datetime, timedelta
4
+ from typing import Dict, List, Optional, Union, Any, Tuple
5
+ import plotly.graph_objects as go
6
+ import plotly.express as px
7
+
8
+
9
+ class ManipulationDetector:
10
+ """
11
+ Detect potential market manipulation patterns in whale transactions
12
+ """
13
+
14
+ def __init__(self):
15
+ # Define known manipulation patterns
16
+ self.patterns = {
17
+ "pump_and_dump": {
18
+ "description": "Rapid buys followed by coordinated sell-offs, causing price to first rise then crash",
19
+ "risk_factor": 0.8
20
+ },
21
+ "wash_trading": {
22
+ "description": "Self-trading across multiple addresses to create false impression of market activity",
23
+ "risk_factor": 0.9
24
+ },
25
+ "spoofing": {
26
+ "description": "Large orders placed then canceled before execution to manipulate price",
27
+ "risk_factor": 0.7
28
+ },
29
+ "layering": {
30
+ "description": "Multiple orders at different price levels to create false impression of market depth",
31
+ "risk_factor": 0.6
32
+ },
33
+ "momentum_ignition": {
34
+ "description": "Creating sharp price moves to trigger other participants' momentum-based trading",
35
+ "risk_factor": 0.5
36
+ }
37
+ }
38
+
39
+ def detect_wash_trading(self,
40
+ transactions_df: pd.DataFrame,
41
+ addresses: List[str],
42
+ sensitivity: str = "Medium",
43
+ lookback_hours: int = 24) -> List[Dict[str, Any]]:
44
+ """
45
+ Detect potential wash trading between addresses
46
+
47
+ Args:
48
+ transactions_df: DataFrame of transactions
49
+ addresses: List of addresses to analyze
50
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
51
+ lookback_hours: Hours to look back for wash trading patterns
52
+
53
+ Returns:
54
+ List of potential wash trading alerts
55
+ """
56
+ if transactions_df.empty or not addresses:
57
+ return []
58
+
59
+ # Ensure from/to columns exist
60
+ if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
61
+ from_col, to_col = 'From', 'To'
62
+ elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
63
+ from_col, to_col = 'from', 'to'
64
+ else:
65
+ raise ValueError("From/To columns not found in transactions DataFrame")
66
+
67
+ # Ensure timestamp column exists
68
+ if 'Timestamp' in transactions_df.columns:
69
+ timestamp_col = 'Timestamp'
70
+ elif 'timeStamp' in transactions_df.columns:
71
+ timestamp_col = 'timeStamp'
72
+ else:
73
+ raise ValueError("Timestamp column not found in transactions DataFrame")
74
+
75
+ # Ensure timestamp is datetime
76
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
77
+ if isinstance(transactions_df[timestamp_col].iloc[0], (int, float)):
78
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
79
+ else:
80
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
81
+
82
+ # Define sensitivity thresholds
83
+ if sensitivity == "Low":
84
+ min_cycles = 3 # Minimum number of back-and-forth transactions
85
+ max_time_diff = 120 # Maximum minutes between transactions
86
+ elif sensitivity == "Medium":
87
+ min_cycles = 2
88
+ max_time_diff = 60
89
+ else: # High
90
+ min_cycles = 1
91
+ max_time_diff = 30
92
+
93
+ # Filter transactions by lookback period
94
+ lookback_time = datetime.now() - timedelta(hours=lookback_hours)
95
+ recent_txs = transactions_df[transactions_df[timestamp_col] >= lookback_time]
96
+
97
+ if recent_txs.empty:
98
+ return []
99
+
100
+ # Filter transactions involving the addresses
101
+ address_txs = recent_txs[
102
+ (recent_txs[from_col].isin(addresses)) |
103
+ (recent_txs[to_col].isin(addresses))
104
+ ].copy()
105
+
106
+ if address_txs.empty:
107
+ return []
108
+
109
+ # Sort by timestamp
110
+ address_txs = address_txs.sort_values(by=timestamp_col)
111
+
112
+ # Detect cycles of transactions between same addresses
113
+ wash_trades = []
114
+
115
+ for addr1 in addresses:
116
+ for addr2 in addresses:
117
+ if addr1 == addr2:
118
+ continue
119
+
120
+ # Find transactions from addr1 to addr2
121
+ a1_to_a2 = address_txs[
122
+ (address_txs[from_col] == addr1) &
123
+ (address_txs[to_col] == addr2)
124
+ ]
125
+
126
+ # Find transactions from addr2 to addr1
127
+ a2_to_a1 = address_txs[
128
+ (address_txs[from_col] == addr2) &
129
+ (address_txs[to_col] == addr1)
130
+ ]
131
+
132
+ if a1_to_a2.empty or a2_to_a1.empty:
133
+ continue
134
+
135
+ # Check for back-and-forth patterns
136
+ cycles = 0
137
+ evidence = []
138
+
139
+ for _, tx1 in a1_to_a2.iterrows():
140
+ tx1_time = tx1[timestamp_col]
141
+
142
+ # Find return transactions within the time window
143
+ return_txs = a2_to_a1[
144
+ (a2_to_a1[timestamp_col] > tx1_time) &
145
+ (a2_to_a1[timestamp_col] <= tx1_time + pd.Timedelta(minutes=max_time_diff))
146
+ ]
147
+
148
+ if not return_txs.empty:
149
+ cycles += 1
150
+ evidence.append(tx1)
151
+ evidence.append(return_txs.iloc[0])
152
+
153
+ if cycles >= min_cycles:
154
+ # Create visualization
155
+ if evidence:
156
+ evidence_df = pd.DataFrame(evidence)
157
+
158
+ # Get amount column
159
+ if 'Amount' in evidence_df.columns:
160
+ amount_col = 'Amount'
161
+ elif 'tokenAmount' in evidence_df.columns:
162
+ amount_col = 'tokenAmount'
163
+ elif 'value' in evidence_df.columns:
164
+ # Try to adjust for decimals if 'tokenDecimal' exists
165
+ if 'tokenDecimal' in evidence_df.columns:
166
+ evidence_df['adjustedValue'] = evidence_df['value'].astype(float) / (10 ** evidence_df['tokenDecimal'].astype(int))
167
+ amount_col = 'adjustedValue'
168
+ else:
169
+ amount_col = 'value'
170
+ else:
171
+ amount_col = None
172
+
173
+ # Create figure if amount column exists
174
+ if amount_col:
175
+ fig = px.scatter(
176
+ evidence_df,
177
+ x=timestamp_col,
178
+ y=amount_col,
179
+ color=from_col,
180
+ title=f"Potential Wash Trading Between {addr1[:8]}... and {addr2[:8]}..."
181
+ )
182
+ else:
183
+ fig = None
184
+ else:
185
+ fig = None
186
+
187
+ wash_trades.append({
188
+ "type": "Wash Trading",
189
+ "addresses": [addr1, addr2],
190
+ "risk_level": "High" if cycles >= min_cycles * 2 else "Medium",
191
+ "description": f"Detected {cycles} cycles of back-and-forth transactions between addresses",
192
+ "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
193
+ "title": f"Wash Trading Pattern ({cycles} cycles)",
194
+ "evidence": pd.DataFrame(evidence) if evidence else None,
195
+ "chart": fig
196
+ })
197
+
198
+ return wash_trades
199
+
200
+ def detect_pump_and_dump(self,
201
+ transactions_df: pd.DataFrame,
202
+ price_data: Dict[str, Dict[str, Any]],
203
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
204
+ """
205
+ Detect potential pump and dump schemes
206
+
207
+ Args:
208
+ transactions_df: DataFrame of transactions
209
+ price_data: Dictionary of price impact data for each transaction
210
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
211
+
212
+ Returns:
213
+ List of potential pump and dump alerts
214
+ """
215
+ if transactions_df.empty or not price_data:
216
+ return []
217
+
218
+ # Ensure timestamp column exists
219
+ if 'Timestamp' in transactions_df.columns:
220
+ timestamp_col = 'Timestamp'
221
+ elif 'timeStamp' in transactions_df.columns:
222
+ timestamp_col = 'timeStamp'
223
+ else:
224
+ raise ValueError("Timestamp column not found in transactions DataFrame")
225
+
226
+ # Ensure from/to columns exist
227
+ if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
228
+ from_col, to_col = 'From', 'To'
229
+ elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
230
+ from_col, to_col = 'from', 'to'
231
+ else:
232
+ raise ValueError("From/To columns not found in transactions DataFrame")
233
+
234
+ # Ensure timestamp is datetime
235
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
236
+ if isinstance(transactions_df[timestamp_col].iloc[0], (int, float)):
237
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
238
+ else:
239
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
240
+
241
+ # Define sensitivity thresholds
242
+ if sensitivity == "Low":
243
+ accumulation_threshold = 5 # Number of buys to consider accumulation
244
+ pump_threshold = 10.0 # % price increase to trigger pump
245
+ dump_threshold = -8.0 # % price decrease to trigger dump
246
+ elif sensitivity == "Medium":
247
+ accumulation_threshold = 3
248
+ pump_threshold = 7.0
249
+ dump_threshold = -5.0
250
+ else: # High
251
+ accumulation_threshold = 2
252
+ pump_threshold = 5.0
253
+ dump_threshold = -3.0
254
+
255
+ # Combine price impact data with transactions
256
+ txs_with_impact = []
257
+
258
+ for idx, row in transactions_df.iterrows():
259
+ tx_hash = row.get('Transaction Hash', row.get('hash', None))
260
+ if not tx_hash or tx_hash not in price_data:
261
+ continue
262
+
263
+ tx_impact = price_data[tx_hash]
264
+
265
+ if tx_impact['impact_pct'] is None:
266
+ continue
267
+
268
+ txs_with_impact.append({
269
+ 'transaction_hash': tx_hash,
270
+ 'timestamp': row[timestamp_col],
271
+ 'from': row[from_col],
272
+ 'to': row[to_col],
273
+ 'pre_price': tx_impact['pre_price'],
274
+ 'post_price': tx_impact['post_price'],
275
+ 'impact_pct': tx_impact['impact_pct']
276
+ })
277
+
278
+ if not txs_with_impact:
279
+ return []
280
+
281
+ impact_df = pd.DataFrame(txs_with_impact)
282
+ impact_df = impact_df.sort_values(by='timestamp')
283
+
284
+ # Look for accumulation phases followed by price pumps and then dumps
285
+ pump_and_dumps = []
286
+
287
+ # Group by address to analyze per wallet
288
+ address_groups = {}
289
+
290
+ for from_addr in impact_df['from'].unique():
291
+ address_groups[from_addr] = impact_df[impact_df['from'] == from_addr]
292
+
293
+ for to_addr in impact_df['to'].unique():
294
+ if to_addr in address_groups:
295
+ address_groups[to_addr] = pd.concat([
296
+ address_groups[to_addr],
297
+ impact_df[impact_df['to'] == to_addr]
298
+ ])
299
+ else:
300
+ address_groups[to_addr] = impact_df[impact_df['to'] == to_addr]
301
+
302
+ for address, addr_df in address_groups.items():
303
+ # Skip if not enough transactions
304
+ if len(addr_df) < accumulation_threshold + 2:
305
+ continue
306
+
307
+ # Look for continuous price increase followed by sharp drop
308
+ window_size = min(len(addr_df), 10)
309
+ for i in range(len(addr_df) - window_size + 1):
310
+ window = addr_df.iloc[i:i+window_size]
311
+
312
+ # Get cumulative price change in window
313
+ if len(window) >= 2:
314
+ first_price = window.iloc[0]['pre_price']
315
+ last_price = window.iloc[-1]['post_price']
316
+
317
+ if first_price is None or last_price is None:
318
+ continue
319
+
320
+ cumulative_change = ((last_price - first_price) / first_price) * 100
321
+
322
+ # Check for pump phase
323
+ max_price = window['post_price'].max()
324
+ max_idx = window['post_price'].idxmax()
325
+
326
+ if max_idx < len(window) - 1:
327
+ max_to_end = ((window.iloc[-1]['post_price'] - max_price) / max_price) * 100
328
+
329
+ # If we have a pump followed by a dump
330
+ if (cumulative_change > pump_threshold or
331
+ any(window['impact_pct'] > pump_threshold)) and max_to_end < dump_threshold:
332
+
333
+ # Create chart
334
+ fig = go.Figure()
335
+
336
+ # Plot price line
337
+ times = [t.timestamp() for t in window['timestamp']]
338
+ prices = []
339
+ for _, row in window.iterrows():
340
+ prices.append(row['pre_price'])
341
+ prices.append(row['post_price'])
342
+
343
+ times_expanded = []
344
+ for t in times:
345
+ times_expanded.append(t - 60) # 1 min before
346
+ times_expanded.append(t + 60) # 1 min after
347
+
348
+ fig.add_trace(go.Scatter(
349
+ x=times_expanded,
350
+ y=prices,
351
+ mode='lines+markers',
352
+ name='Price',
353
+ line=dict(color='blue')
354
+ ))
355
+
356
+ # Highlight pump and dump phases
357
+ max_time_idx = window.index.get_loc(max_idx)
358
+ pump_x = times_expanded[:max_time_idx*2+2]
359
+ pump_y = prices[:max_time_idx*2+2]
360
+
361
+ dump_x = times_expanded[max_time_idx*2:]
362
+ dump_y = prices[max_time_idx*2:]
363
+
364
+ fig.add_trace(go.Scatter(
365
+ x=pump_x,
366
+ y=pump_y,
367
+ mode='lines',
368
+ line=dict(color='green', width=3),
369
+ name='Pump Phase'
370
+ ))
371
+
372
+ fig.add_trace(go.Scatter(
373
+ x=dump_x,
374
+ y=dump_y,
375
+ mode='lines',
376
+ line=dict(color='red', width=3),
377
+ name='Dump Phase'
378
+ ))
379
+
380
+ fig.update_layout(
381
+ title='Potential Pump and Dump Pattern',
382
+ xaxis_title='Time',
383
+ yaxis_title='Price',
384
+ hovermode='closest'
385
+ )
386
+
387
+ pump_and_dumps.append({
388
+ "type": "Pump and Dump",
389
+ "addresses": [address],
390
+ "risk_level": "High" if max_to_end < dump_threshold * 1.5 else "Medium",
391
+ "description": f"Price pumped {cumulative_change:.2f}% before dropping {max_to_end:.2f}%",
392
+ "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
393
+ "title": f"Pump ({cumulative_change:.1f}%) and Dump ({max_to_end:.1f}%)",
394
+ "evidence": window,
395
+ "chart": fig
396
+ })
397
+
398
+ return pump_and_dumps
399
+
400
+ def detect_spoofing(self,
401
+ transactions_df: pd.DataFrame,
402
+ order_book_data: Optional[pd.DataFrame] = None,
403
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
404
+ """
405
+ Detect potential spoofing (placing and quickly canceling large orders)
406
+
407
+ Args:
408
+ transactions_df: DataFrame of transactions
409
+ order_book_data: Optional DataFrame of order book data
410
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
411
+
412
+ Returns:
413
+ List of potential spoofing alerts
414
+ """
415
+ # Note: This is a placeholder since we don't have direct order book data
416
+ # In a real implementation, this would analyze order placement and cancellations
417
+
418
+ # For now, return an empty list as we can't detect spoofing without order book data
419
+ return []
420
+
421
+ def detect_layering(self,
422
+ transactions_df: pd.DataFrame,
423
+ order_book_data: Optional[pd.DataFrame] = None,
424
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
425
+ """
426
+ Detect potential layering (placing multiple orders at different price levels)
427
+
428
+ Args:
429
+ transactions_df: DataFrame of transactions
430
+ order_book_data: Optional DataFrame of order book data
431
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
432
+
433
+ Returns:
434
+ List of potential layering alerts
435
+ """
436
+ # Note: This is a placeholder since we don't have direct order book data
437
+ # In a real implementation, this would analyze order book depth and patterns
438
+
439
+ # For now, return an empty list as we can't detect layering without order book data
440
+ return []
441
+
442
+ def detect_momentum_ignition(self,
443
+ transactions_df: pd.DataFrame,
444
+ price_data: Dict[str, Dict[str, Any]],
445
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
446
+ """
447
+ Detect potential momentum ignition (creating sharp price moves)
448
+
449
+ Args:
450
+ transactions_df: DataFrame of transactions
451
+ price_data: Dictionary of price impact data for each transaction
452
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
453
+
454
+ Returns:
455
+ List of potential momentum ignition alerts
456
+ """
457
+ if transactions_df.empty or not price_data:
458
+ return []
459
+
460
+ # Ensure timestamp column exists
461
+ if 'Timestamp' in transactions_df.columns:
462
+ timestamp_col = 'Timestamp'
463
+ elif 'timeStamp' in transactions_df.columns:
464
+ timestamp_col = 'timeStamp'
465
+ else:
466
+ raise ValueError("Timestamp column not found in transactions DataFrame")
467
+
468
+ # Ensure timestamp is datetime
469
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
470
+ if isinstance(transactions_df[timestamp_col].iloc[0], (int, float)):
471
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
472
+ else:
473
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
474
+
475
+ # Define sensitivity thresholds
476
+ if sensitivity == "Low":
477
+ impact_threshold = 15.0 # % price impact to trigger alert
478
+ time_window_minutes = 5 # Time window to look for follow-up transactions
479
+ elif sensitivity == "Medium":
480
+ impact_threshold = 10.0
481
+ time_window_minutes = 10
482
+ else: # High
483
+ impact_threshold = 5.0
484
+ time_window_minutes = 15
485
+
486
+ # Combine price impact data with transactions
487
+ txs_with_impact = []
488
+
489
+ for idx, row in transactions_df.iterrows():
490
+ tx_hash = row.get('Transaction Hash', row.get('hash', None))
491
+ if not tx_hash or tx_hash not in price_data:
492
+ continue
493
+
494
+ tx_impact = price_data[tx_hash]
495
+
496
+ if tx_impact['impact_pct'] is None:
497
+ continue
498
+
499
+ txs_with_impact.append({
500
+ 'transaction_hash': tx_hash,
501
+ 'timestamp': row[timestamp_col],
502
+ 'from': row.get('From', row.get('from', 'Unknown')),
503
+ 'to': row.get('To', row.get('to', 'Unknown')),
504
+ 'pre_price': tx_impact['pre_price'],
505
+ 'post_price': tx_impact['post_price'],
506
+ 'impact_pct': tx_impact['impact_pct']
507
+ })
508
+
509
+ if not txs_with_impact:
510
+ return []
511
+
512
+ impact_df = pd.DataFrame(txs_with_impact)
513
+ impact_df = impact_df.sort_values(by='timestamp')
514
+
515
+ # Look for large price impacts followed by increased trading activity
516
+ momentum_alerts = []
517
+
518
+ # Find high-impact transactions
519
+ high_impact_txs = impact_df[abs(impact_df['impact_pct']) > impact_threshold]
520
+
521
+ for idx, high_impact_tx in high_impact_txs.iterrows():
522
+ tx_time = high_impact_tx['timestamp']
523
+
524
+ # Look for increased trading activity after the high-impact transaction
525
+ follow_up_window = impact_df[
526
+ (impact_df['timestamp'] > tx_time) &
527
+ (impact_df['timestamp'] <= tx_time + pd.Timedelta(minutes=time_window_minutes))
528
+ ]
529
+
530
+ # Compare activity to baseline (same time window before the transaction)
531
+ baseline_window = impact_df[
532
+ (impact_df['timestamp'] < tx_time) &
533
+ (impact_df['timestamp'] >= tx_time - pd.Timedelta(minutes=time_window_minutes))
534
+ ]
535
+
536
+ if len(follow_up_window) > len(baseline_window) * 1.5 and len(follow_up_window) >= 3:
537
+ # Create chart
538
+ fig = go.Figure()
539
+
540
+ # Plot price timeline
541
+ all_relevant_txs = pd.concat([
542
+ pd.DataFrame([high_impact_tx]),
543
+ follow_up_window,
544
+ baseline_window
545
+ ]).sort_values(by='timestamp')
546
+
547
+ # Create time series for price
548
+ timestamps = all_relevant_txs['timestamp']
549
+ prices = []
550
+ for _, row in all_relevant_txs.iterrows():
551
+ prices.append(row['pre_price'])
552
+ prices.append(row['post_price'])
553
+
554
+ times_expanded = []
555
+ for t in timestamps:
556
+ times_expanded.append(t - pd.Timedelta(seconds=30))
557
+ times_expanded.append(t + pd.Timedelta(seconds=30))
558
+
559
+ # Plot price line
560
+ fig.add_trace(go.Scatter(
561
+ x=times_expanded[:len(prices)], # In case of any length mismatch
562
+ y=prices[:len(times_expanded)],
563
+ mode='lines',
564
+ name='Price'
565
+ ))
566
+
567
+ # Highlight the high-impact transaction
568
+ fig.add_trace(go.Scatter(
569
+ x=[high_impact_tx['timestamp']],
570
+ y=[high_impact_tx['post_price']],
571
+ mode='markers',
572
+ marker=dict(
573
+ size=15,
574
+ color='red',
575
+ symbol='circle'
576
+ ),
577
+ name='Momentum Ignition'
578
+ ))
579
+
580
+ # Highlight the follow-up transactions
581
+ if not follow_up_window.empty:
582
+ fig.add_trace(go.Scatter(
583
+ x=follow_up_window['timestamp'],
584
+ y=follow_up_window['post_price'],
585
+ mode='markers',
586
+ marker=dict(
587
+ size=10,
588
+ color='orange',
589
+ symbol='circle'
590
+ ),
591
+ name='Follow-up Activity'
592
+ ))
593
+
594
+ fig.update_layout(
595
+ title='Potential Momentum Ignition Pattern',
596
+ xaxis_title='Time',
597
+ yaxis_title='Price',
598
+ hovermode='closest'
599
+ )
600
+
601
+ momentum_alerts.append({
602
+ "type": "Momentum Ignition",
603
+ "addresses": [high_impact_tx['from']],
604
+ "risk_level": "High" if abs(high_impact_tx['impact_pct']) > impact_threshold * 1.5 else "Medium",
605
+ "description": f"Large {high_impact_tx['impact_pct']:.2f}% price move followed by {len(follow_up_window)} transactions in {time_window_minutes} minutes (vs {len(baseline_window)} in baseline)",
606
+ "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
607
+ "title": f"Momentum Ignition ({high_impact_tx['impact_pct']:.1f}% price move)",
608
+ "evidence": pd.concat([pd.DataFrame([high_impact_tx]), follow_up_window]),
609
+ "chart": fig
610
+ })
611
+
612
+ return momentum_alerts
613
+
614
+ def run_all_detections(self,
615
+ transactions_df: pd.DataFrame,
616
+ addresses: List[str],
617
+ price_data: Dict[str, Dict[str, Any]] = None,
618
+ order_book_data: Optional[pd.DataFrame] = None,
619
+ sensitivity: str = "Medium") -> List[Dict[str, Any]]:
620
+ """
621
+ Run all manipulation detection algorithms
622
+
623
+ Args:
624
+ transactions_df: DataFrame of transactions
625
+ addresses: List of addresses to analyze
626
+ price_data: Optional dictionary of price impact data for each transaction
627
+ order_book_data: Optional DataFrame of order book data
628
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
629
+
630
+ Returns:
631
+ List of potential manipulation alerts
632
+ """
633
+ if transactions_df.empty:
634
+ return []
635
+
636
+ all_alerts = []
637
+
638
+ # Detect wash trading
639
+ wash_trading_alerts = self.detect_wash_trading(
640
+ transactions_df=transactions_df,
641
+ addresses=addresses,
642
+ sensitivity=sensitivity
643
+ )
644
+ all_alerts.extend(wash_trading_alerts)
645
+
646
+ # Detect pump and dump (if price data available)
647
+ if price_data:
648
+ pump_and_dump_alerts = self.detect_pump_and_dump(
649
+ transactions_df=transactions_df,
650
+ price_data=price_data,
651
+ sensitivity=sensitivity
652
+ )
653
+ all_alerts.extend(pump_and_dump_alerts)
654
+
655
+ # Detect momentum ignition (if price data available)
656
+ momentum_alerts = self.detect_momentum_ignition(
657
+ transactions_df=transactions_df,
658
+ price_data=price_data,
659
+ sensitivity=sensitivity
660
+ )
661
+ all_alerts.extend(momentum_alerts)
662
+
663
+ # Detect spoofing (if order book data available)
664
+ if order_book_data is not None:
665
+ spoofing_alerts = self.detect_spoofing(
666
+ transactions_df=transactions_df,
667
+ order_book_data=order_book_data,
668
+ sensitivity=sensitivity
669
+ )
670
+ all_alerts.extend(spoofing_alerts)
671
+
672
+ # Detect layering (if order book data available)
673
+ layering_alerts = self.detect_layering(
674
+ transactions_df=transactions_df,
675
+ order_book_data=order_book_data,
676
+ sensitivity=sensitivity
677
+ )
678
+ all_alerts.extend(layering_alerts)
679
+
680
+ # Sort alerts by risk level
681
+ risk_order = {"High": 0, "Medium": 1, "Low": 2}
682
+ all_alerts.sort(key=lambda x: risk_order.get(x.get("risk_level", "Low"), 3))
683
+
684
+ return all_alerts
modules/tools.py ADDED
@@ -0,0 +1,373 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import pandas as pd
3
+ from datetime import datetime
4
+ from typing import Dict, List, Optional, Union, Any, Tuple
5
+
6
+ from langchain.tools import tool
7
+ from modules.api_client import ArbiscanClient, GeminiClient
8
+ from modules.data_processor import DataProcessor
9
+
10
+ # Tools for Arbiscan API
11
+ class ArbiscanTools:
12
+ def __init__(self, arbiscan_client: ArbiscanClient):
13
+ self.client = arbiscan_client
14
+
15
+ @tool("get_token_transfers")
16
+ def get_token_transfers(self, address: str, contract_address: Optional[str] = None) -> str:
17
+ """
18
+ Get ERC-20 token transfers for a specific address
19
+
20
+ Args:
21
+ address: Wallet address
22
+ contract_address: Optional token contract address to filter by
23
+
24
+ Returns:
25
+ List of token transfers as JSON string
26
+ """
27
+ transfers = self.client.get_token_transfers(
28
+ address=address,
29
+ contract_address=contract_address
30
+ )
31
+ return json.dumps(transfers)
32
+
33
+ @tool("get_token_balance")
34
+ def get_token_balance(self, address: str, contract_address: str) -> str:
35
+ """
36
+ Get the current balance of a specific token for an address
37
+
38
+ Args:
39
+ address: Wallet address
40
+ contract_address: Token contract address
41
+
42
+ Returns:
43
+ Token balance
44
+ """
45
+ balance = self.client.get_token_balance(
46
+ address=address,
47
+ contract_address=contract_address
48
+ )
49
+ return balance
50
+
51
+ @tool("get_normal_transactions")
52
+ def get_normal_transactions(self, address: str) -> str:
53
+ """
54
+ Get normal transactions (ETH/ARB transfers) for a specific address
55
+
56
+ Args:
57
+ address: Wallet address
58
+
59
+ Returns:
60
+ List of normal transactions as JSON string
61
+ """
62
+ transactions = self.client.get_normal_transactions(address=address)
63
+ return json.dumps(transactions)
64
+
65
+ @tool("get_internal_transactions")
66
+ def get_internal_transactions(self, address: str) -> str:
67
+ """
68
+ Get internal transactions for a specific address
69
+
70
+ Args:
71
+ address: Wallet address
72
+
73
+ Returns:
74
+ List of internal transactions as JSON string
75
+ """
76
+ transactions = self.client.get_internal_transactions(address=address)
77
+ return json.dumps(transactions)
78
+
79
+ @tool("fetch_whale_transactions")
80
+ def fetch_whale_transactions(self,
81
+ addresses: List[str],
82
+ token_address: Optional[str] = None,
83
+ min_token_amount: Optional[float] = None,
84
+ min_usd_value: Optional[float] = None) -> str:
85
+ """
86
+ Fetch whale transactions for a list of addresses
87
+
88
+ Args:
89
+ addresses: List of wallet addresses
90
+ token_address: Optional token contract address to filter by
91
+ min_token_amount: Minimum token amount
92
+ min_usd_value: Minimum USD value
93
+
94
+ Returns:
95
+ DataFrame of whale transactions as JSON string
96
+ """
97
+ transactions_df = self.client.fetch_whale_transactions(
98
+ addresses=addresses,
99
+ token_address=token_address,
100
+ min_token_amount=min_token_amount,
101
+ min_usd_value=min_usd_value
102
+ )
103
+ return transactions_df.to_json(orient="records")
104
+
105
+
106
+ # Tools for Gemini API
107
+ class GeminiTools:
108
+ def __init__(self, gemini_client: GeminiClient):
109
+ self.client = gemini_client
110
+
111
+ @tool("get_current_price")
112
+ def get_current_price(self, symbol: str) -> str:
113
+ """
114
+ Get the current price of a token
115
+
116
+ Args:
117
+ symbol: Token symbol (e.g., "ETHUSD")
118
+
119
+ Returns:
120
+ Current price
121
+ """
122
+ price = self.client.get_current_price(symbol=symbol)
123
+ return str(price) if price is not None else "Price not found"
124
+
125
+ @tool("get_historical_prices")
126
+ def get_historical_prices(self,
127
+ symbol: str,
128
+ start_time: str,
129
+ end_time: str) -> str:
130
+ """
131
+ Get historical prices for a token within a time range
132
+
133
+ Args:
134
+ symbol: Token symbol (e.g., "ETHUSD")
135
+ start_time: Start datetime in ISO format
136
+ end_time: End datetime in ISO format
137
+
138
+ Returns:
139
+ DataFrame of historical prices as JSON string
140
+ """
141
+ # Parse datetime strings
142
+ start_time_dt = datetime.fromisoformat(start_time.replace('Z', '+00:00'))
143
+ end_time_dt = datetime.fromisoformat(end_time.replace('Z', '+00:00'))
144
+
145
+ prices_df = self.client.get_historical_prices(
146
+ symbol=symbol,
147
+ start_time=start_time_dt,
148
+ end_time=end_time_dt
149
+ )
150
+
151
+ if prices_df is not None:
152
+ return prices_df.to_json(orient="records")
153
+ else:
154
+ return "[]"
155
+
156
+ @tool("get_price_impact")
157
+ def get_price_impact(self,
158
+ symbol: str,
159
+ transaction_time: str,
160
+ lookback_minutes: int = 5,
161
+ lookahead_minutes: int = 5) -> str:
162
+ """
163
+ Analyze the price impact before and after a transaction
164
+
165
+ Args:
166
+ symbol: Token symbol (e.g., "ETHUSD")
167
+ transaction_time: Transaction datetime in ISO format
168
+ lookback_minutes: Minutes to look back before the transaction
169
+ lookahead_minutes: Minutes to look ahead after the transaction
170
+
171
+ Returns:
172
+ Price impact data as JSON string
173
+ """
174
+ # Parse datetime string
175
+ transaction_time_dt = datetime.fromisoformat(transaction_time.replace('Z', '+00:00'))
176
+
177
+ impact_data = self.client.get_price_impact(
178
+ symbol=symbol,
179
+ transaction_time=transaction_time_dt,
180
+ lookback_minutes=lookback_minutes,
181
+ lookahead_minutes=lookahead_minutes
182
+ )
183
+
184
+ # Convert to JSON string
185
+ result = {
186
+ "pre_price": impact_data["pre_price"],
187
+ "post_price": impact_data["post_price"],
188
+ "impact_pct": impact_data["impact_pct"]
189
+ }
190
+ return json.dumps(result)
191
+
192
+
193
+ # Tools for Data Processor
194
+ class DataProcessorTools:
195
+ def __init__(self, data_processor: DataProcessor):
196
+ self.processor = data_processor
197
+
198
+ @tool("aggregate_transactions")
199
+ def aggregate_transactions(self,
200
+ transactions_json: str,
201
+ time_window: str = 'D') -> str:
202
+ """
203
+ Aggregate transactions by time window
204
+
205
+ Args:
206
+ transactions_json: JSON string of transactions
207
+ time_window: Time window for aggregation (e.g., 'D' for day, 'H' for hour)
208
+
209
+ Returns:
210
+ Aggregated DataFrame as JSON string
211
+ """
212
+ # Convert JSON to DataFrame
213
+ transactions_df = pd.read_json(transactions_json)
214
+
215
+ # Process data
216
+ agg_df = self.processor.aggregate_transactions(
217
+ transactions_df=transactions_df,
218
+ time_window=time_window
219
+ )
220
+
221
+ # Convert result to JSON
222
+ return agg_df.to_json(orient="records")
223
+
224
+ @tool("identify_patterns")
225
+ def identify_patterns(self,
226
+ transactions_json: str,
227
+ n_clusters: int = 3) -> str:
228
+ """
229
+ Identify trading patterns using clustering
230
+
231
+ Args:
232
+ transactions_json: JSON string of transactions
233
+ n_clusters: Number of clusters for K-Means
234
+
235
+ Returns:
236
+ List of pattern dictionaries as JSON string
237
+ """
238
+ # Convert JSON to DataFrame
239
+ transactions_df = pd.read_json(transactions_json)
240
+
241
+ # Process data
242
+ patterns = self.processor.identify_patterns(
243
+ transactions_df=transactions_df,
244
+ n_clusters=n_clusters
245
+ )
246
+
247
+ # Convert result to JSON
248
+ result = []
249
+ for pattern in patterns:
250
+ # Convert non-serializable objects to serializable format
251
+ pattern_json = {
252
+ "name": pattern["name"],
253
+ "description": pattern["description"],
254
+ "cluster_id": pattern["cluster_id"],
255
+ "occurrence_count": pattern["occurrence_count"],
256
+ "confidence": pattern["confidence"],
257
+ # Skip chart_data as it's not JSON serializable
258
+ "examples": pattern["examples"].to_json(orient="records") if isinstance(pattern["examples"], pd.DataFrame) else []
259
+ }
260
+ result.append(pattern_json)
261
+
262
+ return json.dumps(result)
263
+
264
+ @tool("detect_anomalous_transactions")
265
+ def detect_anomalous_transactions(self,
266
+ transactions_json: str,
267
+ sensitivity: str = "Medium") -> str:
268
+ """
269
+ Detect anomalous transactions using statistical methods
270
+
271
+ Args:
272
+ transactions_json: JSON string of transactions
273
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
274
+
275
+ Returns:
276
+ DataFrame of anomalous transactions as JSON string
277
+ """
278
+ # Convert JSON to DataFrame
279
+ transactions_df = pd.read_json(transactions_json)
280
+
281
+ # Process data
282
+ anomalies_df = self.processor.detect_anomalous_transactions(
283
+ transactions_df=transactions_df,
284
+ sensitivity=sensitivity
285
+ )
286
+
287
+ # Convert result to JSON
288
+ return anomalies_df.to_json(orient="records")
289
+
290
+ @tool("analyze_price_impact")
291
+ def analyze_price_impact(self,
292
+ transactions_json: str,
293
+ price_data_json: str) -> str:
294
+ """
295
+ Analyze the price impact of transactions
296
+
297
+ Args:
298
+ transactions_json: JSON string of transactions
299
+ price_data_json: JSON string of price impact data
300
+
301
+ Returns:
302
+ Price impact analysis as JSON string
303
+ """
304
+ # Convert JSON to DataFrame
305
+ transactions_df = pd.read_json(transactions_json)
306
+
307
+ # Convert price_data_json to dictionary
308
+ price_data = json.loads(price_data_json)
309
+
310
+ # Process data
311
+ impact_analysis = self.processor.analyze_price_impact(
312
+ transactions_df=transactions_df,
313
+ price_data=price_data
314
+ )
315
+
316
+ # Convert result to JSON (excluding non-serializable objects)
317
+ result = {
318
+ "avg_impact_pct": impact_analysis.get("avg_impact_pct"),
319
+ "max_impact_pct": impact_analysis.get("max_impact_pct"),
320
+ "min_impact_pct": impact_analysis.get("min_impact_pct"),
321
+ "significant_moves_count": impact_analysis.get("significant_moves_count"),
322
+ "total_transactions": impact_analysis.get("total_transactions"),
323
+ # Skip impact_chart as it's not JSON serializable
324
+ "transactions_with_impact": impact_analysis.get("transactions_with_impact").to_json(orient="records") if "transactions_with_impact" in impact_analysis else []
325
+ }
326
+
327
+ return json.dumps(result)
328
+
329
+ @tool("detect_wash_trading")
330
+ def detect_wash_trading(self,
331
+ transactions_json: str,
332
+ addresses_json: str,
333
+ sensitivity: str = "Medium") -> str:
334
+ """
335
+ Detect potential wash trading between addresses
336
+
337
+ Args:
338
+ transactions_json: JSON string of transactions
339
+ addresses_json: JSON string of addresses to analyze
340
+ sensitivity: Detection sensitivity ("Low", "Medium", "High")
341
+
342
+ Returns:
343
+ List of potential wash trading incidents as JSON string
344
+ """
345
+ # Convert JSON to DataFrame
346
+ transactions_df = pd.read_json(transactions_json)
347
+
348
+ # Convert addresses_json to list
349
+ addresses = json.loads(addresses_json)
350
+
351
+ # Process data
352
+ wash_trades = self.processor.detect_wash_trading(
353
+ transactions_df=transactions_df,
354
+ addresses=addresses,
355
+ sensitivity=sensitivity
356
+ )
357
+
358
+ # Convert result to JSON (excluding non-serializable objects)
359
+ result = []
360
+ for trade in wash_trades:
361
+ trade_json = {
362
+ "type": trade["type"],
363
+ "addresses": trade["addresses"],
364
+ "risk_level": trade["risk_level"],
365
+ "description": trade["description"],
366
+ "detection_time": trade["detection_time"],
367
+ "title": trade["title"],
368
+ "evidence": trade["evidence"].to_json(orient="records") if isinstance(trade["evidence"], pd.DataFrame) else []
369
+ # Skip chart as it's not JSON serializable
370
+ }
371
+ result.append(trade_json)
372
+
373
+ return json.dumps(result)
modules/visualizer.py ADDED
@@ -0,0 +1,638 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+ import plotly.graph_objects as go
4
+ import plotly.express as px
5
+ from datetime import datetime, timedelta
6
+ from typing import Dict, List, Optional, Union, Any, Tuple
7
+ import io
8
+ import base64
9
+ import matplotlib.pyplot as plt
10
+ from matplotlib.backends.backend_pdf import PdfPages
11
+ from reportlab.lib.pagesizes import letter
12
+ from reportlab.pdfgen import canvas
13
+ from reportlab.lib import colors
14
+ from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer
15
+ from reportlab.lib.styles import getSampleStyleSheet
16
+
17
+
18
+ class Visualizer:
19
+ """
20
+ Generate visualizations and reports for whale transaction data
21
+ """
22
+
23
+ def __init__(self):
24
+ self.color_map = {
25
+ "buy": "green",
26
+ "sell": "red",
27
+ "transfer": "blue",
28
+ "other": "gray"
29
+ }
30
+
31
+ def create_transaction_timeline(self, transactions_df: pd.DataFrame) -> go.Figure:
32
+ """
33
+ Create a timeline visualization of transactions
34
+
35
+ Args:
36
+ transactions_df: DataFrame of transactions
37
+
38
+ Returns:
39
+ Plotly figure object
40
+ """
41
+ if transactions_df.empty:
42
+ fig = go.Figure()
43
+ fig.update_layout(
44
+ title="No Transaction Data Available",
45
+ xaxis_title="Date",
46
+ yaxis_title="Action",
47
+ height=400,
48
+ template="plotly_white"
49
+ )
50
+ fig.add_annotation(
51
+ text="No transaction data available for timeline",
52
+ showarrow=False,
53
+ font=dict(size=14)
54
+ )
55
+ return fig
56
+
57
+ try:
58
+ # Ensure timestamp column exists
59
+ if 'Timestamp' in transactions_df.columns:
60
+ timestamp_col = 'Timestamp'
61
+ elif 'timeStamp' in transactions_df.columns:
62
+ timestamp_col = 'timeStamp'
63
+ # Convert timestamp to datetime if it's not already
64
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
65
+ try:
66
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col].astype(float), unit='s')
67
+ except Exception as e:
68
+ print(f"Error converting timestamp: {str(e)}")
69
+ transactions_df[timestamp_col] = pd.date_range(start='2025-01-01', periods=len(transactions_df), freq='H')
70
+ else:
71
+ # Create a dummy timestamp if none exists
72
+ transactions_df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(transactions_df), freq='H')
73
+ timestamp_col = 'dummy_timestamp'
74
+
75
+ # Create figure
76
+ fig = go.Figure()
77
+
78
+ # Add transactions to timeline
79
+ for idx, row in transactions_df.iterrows():
80
+ # Determine transaction type
81
+ if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
82
+ from_col, to_col = 'From', 'To'
83
+ else:
84
+ from_col, to_col = 'from', 'to'
85
+
86
+ tx_type = "other"
87
+ hover_text = ""
88
+
89
+ if pd.isna(row[from_col]) or row[from_col] == '0x0000000000000000000000000000000000000000':
90
+ tx_type = "buy"
91
+ hover_text = f"Buy: {row[to_col]}"
92
+ elif pd.isna(row[to_col]) or row[to_col] == '0x0000000000000000000000000000000000000000':
93
+ tx_type = "sell"
94
+ hover_text = f"Sell: {row[from_col]}"
95
+ else:
96
+ tx_type = "transfer"
97
+ hover_text = f"Transfer: {row[from_col]} β†’ {row[to_col]}"
98
+
99
+ # Add amount to hover text if available
100
+ if 'Amount' in row:
101
+ hover_text += f"<br>Amount: {row['Amount']}"
102
+ elif 'value' in row:
103
+ hover_text += f"<br>Value: {row['value']}"
104
+
105
+ # Add token info if available
106
+ if 'tokenSymbol' in row:
107
+ hover_text += f"<br>Token: {row['tokenSymbol']}"
108
+
109
+ # Add transaction to timeline
110
+ fig.add_trace(go.Scatter(
111
+ x=[row[timestamp_col]],
112
+ y=[tx_type],
113
+ mode='markers',
114
+ marker=dict(
115
+ size=12,
116
+ color=self.color_map.get(tx_type, "gray"),
117
+ line=dict(width=1, color='black')
118
+ ),
119
+ name=tx_type,
120
+ text=hover_text,
121
+ hoverinfo='text'
122
+ ))
123
+
124
+ # Update layout
125
+ fig.update_layout(
126
+ title='Whale Transaction Timeline',
127
+ xaxis_title='Time',
128
+ yaxis_title='Transaction Type',
129
+ height=400,
130
+ template='plotly_white',
131
+ showlegend=True,
132
+ hovermode='closest'
133
+ )
134
+
135
+ return fig
136
+
137
+ except Exception as e:
138
+ # If any error occurs, return a figure with error information
139
+ print(f"Error creating transaction timeline: {str(e)}")
140
+ fig = go.Figure()
141
+ fig.update_layout(
142
+ title="Error in Transaction Timeline",
143
+ xaxis_title="",
144
+ yaxis_title="",
145
+ height=400,
146
+ template="plotly_white"
147
+ )
148
+ fig.add_annotation(
149
+ text=f"Error generating timeline: {str(e)}",
150
+ showarrow=False,
151
+ font=dict(size=14, color="red")
152
+ )
153
+ return fig
154
+
155
+ def create_volume_chart(self, transactions_df: pd.DataFrame, time_window: str = 'D') -> go.Figure:
156
+ """
157
+ Create a volume chart aggregated by time window
158
+
159
+ Args:
160
+ transactions_df: DataFrame of transactions
161
+ time_window: Time window for aggregation (e.g., 'D' for day, 'H' for hour)
162
+
163
+ Returns:
164
+ Plotly figure object
165
+ """
166
+ # Create an empty figure with appropriate message if no data
167
+ if transactions_df.empty:
168
+ fig = go.Figure()
169
+ fig.update_layout(
170
+ title="No Transaction Data Available",
171
+ xaxis_title="Date",
172
+ yaxis_title="Volume",
173
+ height=400,
174
+ template="plotly_white"
175
+ )
176
+ fig.add_annotation(
177
+ text="No transactions found for volume analysis",
178
+ showarrow=False,
179
+ font=dict(size=14)
180
+ )
181
+ return fig
182
+
183
+ try:
184
+ # Create a deep copy to avoid modifying the original
185
+ df = transactions_df.copy()
186
+
187
+ # Ensure timestamp column exists and convert to datetime
188
+ if 'Timestamp' in df.columns:
189
+ timestamp_col = 'Timestamp'
190
+ elif 'timeStamp' in df.columns:
191
+ timestamp_col = 'timeStamp'
192
+ else:
193
+ # Create a dummy timestamp if none exists
194
+ df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
195
+ timestamp_col = 'dummy_timestamp'
196
+
197
+ # Convert timestamp to datetime safely
198
+ if not pd.api.types.is_datetime64_any_dtype(df[timestamp_col]):
199
+ try:
200
+ df[timestamp_col] = pd.to_datetime(df[timestamp_col].astype(float), unit='s')
201
+ except Exception as e:
202
+ print(f"Error converting timestamp: {str(e)}")
203
+ df[timestamp_col] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
204
+
205
+ # Ensure amount column exists
206
+ if 'Amount' in df.columns:
207
+ amount_col = 'Amount'
208
+ elif 'tokenAmount' in df.columns:
209
+ amount_col = 'tokenAmount'
210
+ elif 'value' in df.columns:
211
+ # Try to adjust for decimals if 'tokenDecimal' exists
212
+ if 'tokenDecimal' in df.columns:
213
+ df['adjustedValue'] = df['value'].astype(float) / (10 ** df['tokenDecimal'].astype(int))
214
+ amount_col = 'adjustedValue'
215
+ else:
216
+ amount_col = 'value'
217
+ else:
218
+ # Create a dummy amount column if none exists
219
+ df['dummy_amount'] = 1.0
220
+ amount_col = 'dummy_amount'
221
+
222
+ # Alternative approach: manually aggregate by date to avoid index issues
223
+ df['date'] = df[timestamp_col].dt.date
224
+
225
+ # Group by date
226
+ volume_data = df.groupby('date').agg({
227
+ amount_col: 'sum',
228
+ timestamp_col: 'count'
229
+ }).reset_index()
230
+
231
+ volume_data.columns = ['Date', 'Volume', 'Count']
232
+
233
+ # Create figure
234
+ fig = go.Figure()
235
+
236
+ # Add volume bars
237
+ fig.add_trace(go.Bar(
238
+ x=volume_data['Date'],
239
+ y=volume_data['Volume'],
240
+ name='Volume',
241
+ marker_color='blue',
242
+ opacity=0.7
243
+ ))
244
+
245
+ # Add transaction count line
246
+ fig.add_trace(go.Scatter(
247
+ x=volume_data['Date'],
248
+ y=volume_data['Count'],
249
+ name='Transaction Count',
250
+ mode='lines+markers',
251
+ marker=dict(color='red'),
252
+ yaxis='y2'
253
+ ))
254
+
255
+ # Update layout
256
+ fig.update_layout(
257
+ title="Transaction Volume Over Time",
258
+ xaxis_title="Date",
259
+ yaxis_title="Volume",
260
+ yaxis2=dict(
261
+ title="Transaction Count",
262
+ overlaying="y",
263
+ side="right"
264
+ ),
265
+ height=500,
266
+ template="plotly_white",
267
+ hovermode="x unified",
268
+ legend=dict(
269
+ orientation="h",
270
+ yanchor="bottom",
271
+ y=1.02,
272
+ xanchor="right",
273
+ x=1
274
+ )
275
+ )
276
+
277
+ return fig
278
+
279
+ except Exception as e:
280
+ # If any error occurs, return a figure with error information
281
+ print(f"Error in create_volume_chart: {str(e)}")
282
+ fig = go.Figure()
283
+ fig.update_layout(
284
+ title="Error in Volume Chart",
285
+ xaxis_title="",
286
+ yaxis_title="",
287
+ height=400,
288
+ template="plotly_white"
289
+ )
290
+ fig.add_annotation(
291
+ text=f"Error generating volume chart: {str(e)}",
292
+ showarrow=False,
293
+ font=dict(size=14, color="red")
294
+ )
295
+ return fig
296
+
297
+ def plot_volume_by_day(self, transactions_df: pd.DataFrame) -> go.Figure:
298
+ """
299
+ Create a volume chart aggregated by day with improved visualization
300
+
301
+ Args:
302
+ transactions_df: DataFrame of transactions
303
+
304
+ Returns:
305
+ Plotly figure object
306
+ """
307
+ # This is a wrapper around create_volume_chart that specifically uses day as the time window
308
+ return self.create_volume_chart(transactions_df, time_window='D')
309
+
310
+ def plot_transaction_flow(self, transactions_df: pd.DataFrame) -> go.Figure:
311
+ """
312
+ Create a network flow visualization of transactions between wallets
313
+
314
+ Args:
315
+ transactions_df: DataFrame of transactions
316
+
317
+ Returns:
318
+ Plotly figure object
319
+ """
320
+ if transactions_df.empty:
321
+ # Return empty figure if no data
322
+ fig = go.Figure()
323
+ fig.update_layout(
324
+ title="No Transaction Flow Data Available",
325
+ xaxis_title="",
326
+ yaxis_title="",
327
+ height=400,
328
+ template="plotly_white"
329
+ )
330
+ fig.add_annotation(
331
+ text="No transactions found for flow analysis",
332
+ showarrow=False,
333
+ font=dict(size=14)
334
+ )
335
+ return fig
336
+
337
+ try:
338
+ # Ensure from/to columns exist
339
+ if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
340
+ from_col, to_col = 'From', 'To'
341
+ elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
342
+ from_col, to_col = 'from', 'to'
343
+ else:
344
+ # Create an error visualization
345
+ fig = go.Figure()
346
+ fig.update_layout(
347
+ title="Transaction Flow Error",
348
+ xaxis_title="",
349
+ yaxis_title="",
350
+ height=400,
351
+ template="plotly_white"
352
+ )
353
+ fig.add_annotation(
354
+ text="From/To columns not found in transactions data",
355
+ showarrow=False,
356
+ font=dict(size=14, color="red")
357
+ )
358
+ return fig
359
+
360
+ # Ensure amount column exists
361
+ if 'Amount' in transactions_df.columns:
362
+ amount_col = 'Amount'
363
+ elif 'tokenAmount' in transactions_df.columns:
364
+ amount_col = 'tokenAmount'
365
+ elif 'value' in transactions_df.columns:
366
+ # Try to adjust for decimals if 'tokenDecimal' exists
367
+ if 'tokenDecimal' in transactions_df.columns:
368
+ transactions_df['adjustedValue'] = transactions_df['value'].astype(float) / (10 ** transactions_df['tokenDecimal'].astype(int))
369
+ amount_col = 'adjustedValue'
370
+ else:
371
+ amount_col = 'value'
372
+ else:
373
+ # Create an error visualization
374
+ fig = go.Figure()
375
+ fig.update_layout(
376
+ title="Transaction Flow Error",
377
+ xaxis_title="",
378
+ yaxis_title="",
379
+ height=400,
380
+ template="plotly_white"
381
+ )
382
+ fig.add_annotation(
383
+ text="Amount column not found in transactions data",
384
+ showarrow=False,
385
+ font=dict(size=14, color="red")
386
+ )
387
+ return fig
388
+
389
+ # Aggregate flows between wallets
390
+ flow_df = transactions_df.groupby([from_col, to_col]).agg({
391
+ amount_col: ['sum', 'count']
392
+ }).reset_index()
393
+
394
+ flow_df.columns = [from_col, to_col, 'Value', 'Count']
395
+
396
+ # Limit to top 20 flows to keep visualization readable
397
+ top_flows = flow_df.sort_values('Value', ascending=False).head(20)
398
+
399
+ # Create Sankey diagram
400
+ # First, create a mapping of unique addresses to indices
401
+ all_addresses = pd.unique(top_flows[[from_col, to_col]].values.ravel('K'))
402
+ address_to_idx = {addr: i for i, addr in enumerate(all_addresses)}
403
+
404
+ # Create source, target, and value arrays for the Sankey diagram
405
+ sources = [address_to_idx[addr] for addr in top_flows[from_col]]
406
+ targets = [address_to_idx[addr] for addr in top_flows[to_col]]
407
+ values = top_flows['Value'].tolist()
408
+
409
+ # Create hover text
410
+ hover_text = [f"From: {src}<br>To: {tgt}<br>Value: {val:.2f}<br>Count: {cnt}"
411
+ for src, tgt, val, cnt in zip(top_flows[from_col], top_flows[to_col],
412
+ top_flows['Value'], top_flows['Count'])]
413
+
414
+ # Shorten addresses for node labels
415
+ node_labels = [f"{addr[:6]}...{addr[-4:]}" if len(addr) > 12 else addr
416
+ for addr in all_addresses]
417
+
418
+ # Create Sankey diagram figure
419
+ fig = go.Figure(data=[go.Sankey(
420
+ node=dict(
421
+ pad=15,
422
+ thickness=20,
423
+ line=dict(color="black", width=0.5),
424
+ label=node_labels,
425
+ color="blue"
426
+ ),
427
+ link=dict(
428
+ source=sources,
429
+ target=targets,
430
+ value=values,
431
+ label=hover_text,
432
+ hovertemplate='%{label}<extra></extra>'
433
+ )
434
+ )])
435
+
436
+ fig.update_layout(
437
+ title="Whale Transaction Flow",
438
+ font_size=12,
439
+ height=600,
440
+ template="plotly_white"
441
+ )
442
+
443
+ return fig
444
+
445
+ except Exception as e:
446
+ # If any error occurs, return a figure with error information
447
+ print(f"Error in plot_transaction_flow: {str(e)}")
448
+ fig = go.Figure()
449
+ fig.update_layout(
450
+ title="Error in Transaction Flow",
451
+ xaxis_title="",
452
+ yaxis_title="",
453
+ height=400,
454
+ template="plotly_white"
455
+ )
456
+ fig.add_annotation(
457
+ text=f"Error generating transaction flow: {str(e)}",
458
+ showarrow=False,
459
+ font=dict(size=14, color="red")
460
+ )
461
+ return fig
462
+
463
+ def generate_pdf_report(self,
464
+ transactions_df: pd.DataFrame,
465
+ patterns: List[Dict[str, Any]] = None,
466
+ price_impact: Dict[str, Any] = None,
467
+ alerts: List[Dict[str, Any]] = None,
468
+ title: str = "Whale Analysis Report",
469
+ start_date: datetime = None,
470
+ end_date: datetime = None) -> bytes:
471
+ """
472
+ Generate a PDF report of whale activity
473
+
474
+ Args:
475
+ transactions_df: DataFrame of transactions
476
+ patterns: List of pattern dictionaries
477
+ price_impact: Dictionary of price impact analysis
478
+ alerts: List of alert dictionaries
479
+ title: Report title
480
+ start_date: Start date for report period
481
+ end_date: End date for report period
482
+
483
+ Returns:
484
+ PDF report as bytes
485
+ """
486
+ buffer = io.BytesIO()
487
+ doc = SimpleDocTemplate(buffer, pagesize=letter)
488
+ elements = []
489
+
490
+ # Add title
491
+ styles = getSampleStyleSheet()
492
+ elements.append(Paragraph(title, styles['Title']))
493
+
494
+ # Add date range
495
+ if start_date and end_date:
496
+ date_range = f"Period: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}"
497
+ elements.append(Paragraph(date_range, styles['Heading2']))
498
+
499
+ elements.append(Spacer(1, 12))
500
+
501
+ # Add transaction summary
502
+ if not transactions_df.empty:
503
+ elements.append(Paragraph("Transaction Summary", styles['Heading2']))
504
+ summary_data = [
505
+ ["Total Transactions", str(len(transactions_df))],
506
+ ["Unique Addresses", str(len(pd.unique(transactions_df['from'].tolist() + transactions_df['to'].tolist())))]
507
+ ]
508
+
509
+ # Add token breakdown if available
510
+ if 'tokenSymbol' in transactions_df.columns:
511
+ token_counts = transactions_df['tokenSymbol'].value_counts()
512
+ summary_data.append(["Most Common Token", f"{token_counts.index[0]} ({token_counts.iloc[0]} txns)"])
513
+
514
+ summary_table = Table(summary_data)
515
+ summary_table.setStyle(TableStyle([
516
+ ('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
517
+ ('GRID', (0, 0), (-1, -1), 1, colors.black),
518
+ ('PADDING', (0, 0), (-1, -1), 6),
519
+ ]))
520
+ elements.append(summary_table)
521
+ elements.append(Spacer(1, 12))
522
+
523
+ # Add pattern analysis
524
+ if patterns:
525
+ elements.append(Paragraph("Trading Patterns Detected", styles['Heading2']))
526
+ for i, pattern in enumerate(patterns):
527
+ pattern_text = f"Pattern {i+1}: {pattern.get('name', 'Unnamed')}\n"
528
+ pattern_text += f"Description: {pattern.get('description', 'No description')}\n"
529
+ if 'risk_profile' in pattern:
530
+ pattern_text += f"Risk Profile: {pattern['risk_profile']}\n"
531
+ if 'confidence' in pattern:
532
+ pattern_text += f"Confidence: {pattern['confidence']:.2f}\n"
533
+
534
+ elements.append(Paragraph(pattern_text, styles['Normal']))
535
+ elements.append(Spacer(1, 6))
536
+
537
+ elements.append(Spacer(1, 12))
538
+
539
+ # Add price impact analysis
540
+ if price_impact:
541
+ elements.append(Paragraph("Price Impact Analysis", styles['Heading2']))
542
+ impact_text = ""
543
+ if 'avg_impact' in price_impact:
544
+ impact_text += f"Average Impact: {price_impact['avg_impact']:.2f}%\n"
545
+ if 'max_impact' in price_impact:
546
+ impact_text += f"Maximum Impact: {price_impact['max_impact']:.2f}%\n"
547
+ if 'insights' in price_impact:
548
+ impact_text += f"Insights: {price_impact['insights']}\n"
549
+
550
+ elements.append(Paragraph(impact_text, styles['Normal']))
551
+ elements.append(Spacer(1, 12))
552
+
553
+ # Add alerts
554
+ if alerts:
555
+ elements.append(Paragraph("Alerts", styles['Heading2']))
556
+ for alert in alerts:
557
+ alert_text = f"{alert.get('level', 'Info')}: {alert.get('message', 'No details')}"
558
+ elements.append(Paragraph(alert_text, styles['Normal']))
559
+ elements.append(Spacer(1, 6))
560
+
561
+ # Build the PDF
562
+ doc.build(elements)
563
+ buffer.seek(0)
564
+ return buffer.getvalue()
565
+
566
+ def generate_csv_report(self,
567
+ transactions_df: pd.DataFrame,
568
+ report_type: str = "Transaction Summary") -> str:
569
+ """
570
+ Generate a CSV report of transaction data
571
+
572
+ Args:
573
+ transactions_df: DataFrame of transactions
574
+ report_type: Type of report to generate
575
+
576
+ Returns:
577
+ CSV data as string
578
+ """
579
+ if transactions_df.empty:
580
+ return "No data available for report"
581
+
582
+ if report_type == "Transaction Summary":
583
+ # Return basic transaction summary
584
+ return transactions_df.to_csv(index=False)
585
+ elif report_type == "Daily Volume":
586
+ # Get timestamp column
587
+ if 'Timestamp' in transactions_df.columns:
588
+ timestamp_col = 'Timestamp'
589
+ elif 'timeStamp' in transactions_df.columns:
590
+ timestamp_col = 'timeStamp'
591
+ # Convert timestamp to datetime if needed
592
+ if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
593
+ try:
594
+ transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col].astype(float), unit='s')
595
+ except:
596
+ return "Error processing timestamp data"
597
+ else:
598
+ return "Timestamp column not found"
599
+
600
+ # Get amount column
601
+ if 'Amount' in transactions_df.columns:
602
+ amount_col = 'Amount'
603
+ elif 'tokenAmount' in transactions_df.columns:
604
+ amount_col = 'tokenAmount'
605
+ elif 'value' in transactions_df.columns:
606
+ amount_col = 'value'
607
+ else:
608
+ return "Amount column not found"
609
+
610
+ # Aggregate by day
611
+ transactions_df['date'] = transactions_df[timestamp_col].dt.date
612
+ daily_volume = transactions_df.groupby('date').agg({
613
+ amount_col: 'sum',
614
+ 'hash': 'count' # Assuming 'hash' exists for all transactions
615
+ }).reset_index()
616
+
617
+ daily_volume.columns = ['Date', 'Volume', 'Transactions']
618
+ return daily_volume.to_csv(index=False)
619
+ else:
620
+ return "Unknown report type"
621
+
622
+ def generate_png_chart(self,
623
+ fig: go.Figure,
624
+ width: int = 1200,
625
+ height: int = 800) -> bytes:
626
+ """
627
+ Convert a Plotly figure to PNG image data
628
+
629
+ Args:
630
+ fig: Plotly figure object
631
+ width: Image width in pixels
632
+ height: Image height in pixels
633
+
634
+ Returns:
635
+ PNG image as bytes
636
+ """
637
+ img_bytes = fig.to_image(format="png", width=width, height=height)
638
+ return img_bytes
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ streamlit==1.30.0
2
+ pandas==2.1.1
3
+ numpy==1.26.0
4
+ matplotlib==3.8.0
5
+ plotly==5.18.0
6
+ python-dotenv==1.0.0
7
+ requests==2.31.0
8
+ scikit-learn==1.3.1
9
+ crewai>=0.28.0
10
+ langchain>=0.1.0,<0.2.0
11
+ reportlab==4.0.5
12
+ weasyprint==60.1
test_api.py ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ import json
4
+ import urllib.request
5
+ import urllib.parse
6
+ import urllib.error
7
+ from urllib.error import URLError, HTTPError
8
+
9
+ # Simple dotenv implementation since the module may not be available
10
+ def load_dotenv():
11
+ try:
12
+ with open('.env', 'r') as file:
13
+ for line in file:
14
+ line = line.strip()
15
+ if not line or line.startswith('#') or '=' not in line:
16
+ continue
17
+ key, value = line.split('=', 1)
18
+ os.environ[key] = value
19
+ except Exception as e:
20
+ print(f"Error loading .env file: {e}")
21
+ return False
22
+ return True
23
+
24
+ # Load environment variables
25
+ load_dotenv()
26
+
27
+ # Get API key from .env
28
+ ARBISCAN_API_KEY = os.getenv("ARBISCAN_API_KEY")
29
+ if not ARBISCAN_API_KEY:
30
+ print("ERROR: ARBISCAN_API_KEY not found in .env file")
31
+ sys.exit(1)
32
+
33
+ print(f"Using Arbiscan API Key: {ARBISCAN_API_KEY[:5]}...")
34
+
35
+ # Test addresses (known active ones)
36
+ TEST_ADDRESSES = [
37
+ "0x5d8908afee1df9f7f0830105f8be828f97ce9e68", # Arbitrum Treasury
38
+ "0x2b1ad6184a6b0fac06bd225ed37c2abc04415ff4", # Large holder
39
+ "0xc47ff7f9efb3ef39c33a2c492a1372418d399ec2", # Active trader
40
+ ]
41
+
42
+ # User-provided addresses (from command line arguments)
43
+ if len(sys.argv) > 1:
44
+ USER_ADDRESSES = sys.argv[1:]
45
+ TEST_ADDRESSES.extend(USER_ADDRESSES)
46
+ print(f"Added user-provided addresses: {USER_ADDRESSES}")
47
+
48
+ def test_api_key():
49
+ """Test if the API key is valid"""
50
+ base_url = "https://api.arbiscan.io/api"
51
+ params = {
52
+ "module": "stats",
53
+ "action": "ethsupply",
54
+ "apikey": ARBISCAN_API_KEY
55
+ }
56
+
57
+ try:
58
+ print("\n===== TESTING API KEY =====")
59
+ # Construct URL with parameters
60
+ query_string = urllib.parse.urlencode(params)
61
+ url = f"{base_url}?{query_string}"
62
+ print(f"Making request to: {url}")
63
+
64
+ # Make the request
65
+ with urllib.request.urlopen(url) as response:
66
+ response_data = response.read().decode('utf-8')
67
+ data = json.loads(response_data)
68
+
69
+ print(f"Response status code: {response.status}")
70
+ print(f"Response JSON status: {data.get('status')}")
71
+ print(f"Response message: {data.get('message', 'No message')}")
72
+
73
+ if data.get("status") == "1":
74
+ print("βœ… API KEY IS VALID")
75
+ return True
76
+ else:
77
+ print("❌ API KEY IS INVALID OR HAS ISSUES")
78
+ if "API Key" in data.get("message", ""):
79
+ print(f"Error message: {data.get('message')}")
80
+ print("β†’ You need to register for an API key at https://arbiscan.io/myapikey")
81
+ return False
82
+
83
+ except HTTPError as e:
84
+ print(f"❌ HTTP Error: {e.code} - {e.reason}")
85
+ return False
86
+ except URLError as e:
87
+ print(f"❌ URL Error: {e.reason}")
88
+ return False
89
+ except Exception as e:
90
+ print(f"❌ Error testing API key: {str(e)}")
91
+ return False
92
+
93
+ def test_address(address):
94
+ """Test if an address has transactions on Arbitrum"""
95
+ base_url = "https://api.arbiscan.io/api"
96
+
97
+ # Test for token transfers
98
+ params_token = {
99
+ "module": "account",
100
+ "action": "tokentx",
101
+ "address": address,
102
+ "startblock": "0",
103
+ "endblock": "99999999",
104
+ "page": "1",
105
+ "offset": "10", # Just get 10 for testing
106
+ "sort": "desc",
107
+ "apikey": ARBISCAN_API_KEY
108
+ }
109
+
110
+ # Test for normal transactions
111
+ params_normal = {
112
+ "module": "account",
113
+ "action": "txlist",
114
+ "address": address,
115
+ "startblock": "0",
116
+ "endblock": "99999999",
117
+ "page": "1",
118
+ "offset": "10", # Just get 10 for testing
119
+ "sort": "desc",
120
+ "apikey": ARBISCAN_API_KEY
121
+ }
122
+
123
+ print(f"\n===== TESTING ADDRESS: {address} =====")
124
+
125
+ # Check token transfers
126
+ try:
127
+ print("Testing token transfers...")
128
+ # Construct URL with parameters
129
+ query_string = urllib.parse.urlencode(params_token)
130
+ url = f"{base_url}?{query_string}"
131
+
132
+ # Make the request
133
+ with urllib.request.urlopen(url) as response:
134
+ response_data = response.read().decode('utf-8')
135
+ data = json.loads(response_data)
136
+
137
+ if data.get("status") == "1":
138
+ transfers = data.get("result", [])
139
+ print(f"βœ… Found {len(transfers)} token transfers")
140
+ if transfers:
141
+ print(f"First transfer: {json.dumps(transfers[0], indent=2)[:200]}...")
142
+ else:
143
+ print(f"❌ No token transfers found: {data.get('message', 'Unknown error')}")
144
+
145
+ except HTTPError as e:
146
+ print(f"❌ HTTP Error: {e.code} - {e.reason}")
147
+ except URLError as e:
148
+ print(f"❌ URL Error: {e.reason}")
149
+ except Exception as e:
150
+ print(f"❌ Error testing token transfers: {str(e)}")
151
+
152
+ # Check normal transactions
153
+ try:
154
+ print("\nTesting normal transactions...")
155
+ # Construct URL with parameters
156
+ query_string = urllib.parse.urlencode(params_normal)
157
+ url = f"{base_url}?{query_string}"
158
+
159
+ # Make the request
160
+ with urllib.request.urlopen(url) as response:
161
+ response_data = response.read().decode('utf-8')
162
+ data = json.loads(response_data)
163
+
164
+ if data.get("status") == "1":
165
+ transactions = data.get("result", [])
166
+ print(f"βœ… Found {len(transactions)} normal transactions")
167
+ if transactions:
168
+ print(f"First transaction: {json.dumps(transactions[0], indent=2)[:200]}...")
169
+ else:
170
+ print(f"❌ No normal transactions found: {data.get('message', 'Unknown error')}")
171
+
172
+ except HTTPError as e:
173
+ print(f"❌ HTTP Error: {e.code} - {e.reason}")
174
+ except URLError as e:
175
+ print(f"❌ URL Error: {e.reason}")
176
+ except Exception as e:
177
+ print(f"❌ Error testing normal transactions: {str(e)}")
178
+
179
+ def main():
180
+ """Main function to run tests"""
181
+ print("=================================================")
182
+ print("Arbitrum API Diagnostic Tool")
183
+ print("=================================================")
184
+
185
+ # Test the API key first
186
+ api_valid = test_api_key()
187
+
188
+ if not api_valid:
189
+ print("\n⚠️ Please update your API key in the .env file")
190
+ print("Register for an API key at https://arbiscan.io/myapikey")
191
+ return
192
+
193
+ # Test each address
194
+ for address in TEST_ADDRESSES:
195
+ test_address(address)
196
+
197
+ print("\n=================================================")
198
+ print("RECOMMENDATIONS:")
199
+ print("1. If your API key is invalid, update it in the .env file")
200
+ print("2. If test addresses work but yours don't, your addresses might not have activity on Arbitrum")
201
+ print("3. Use one of the working test addresses in your app for testing")
202
+ print("=================================================")
203
+
204
+ if __name__ == "__main__":
205
+ main()