Create README.md

# ShorNet
ShorNet is a multi-layer perceptron for predicting two prime factors from large 1024-bit hex numbers. It's a dual output head model developed using torch with MPS acceleration on a Mac Studio Ultra M3. It was built as an experiment and learning experience.

## Background
ShorNet was conceived as a function approximation of the Shor prime factorization algorithm for quantum computers. It was inspired by Google's Deepmind Alphafold in that protein-folding was another problem that was thought only to be solvable in p-time by quantum computers with far more real qbits than are currently real in any system. Alphafold however has achieved incredible success at predicting the complex process of protein folding, including the prediction of the complete proteome in human beings, 20 model organisms representing over 365,000 predicted proteins, and the structure of the SARS-COV-2 virus.

## Sample usage
```
import torch
import torch.nn as nn

# Input your 1024-bit hex number to factorize
number_to_factorize = "0x7703af0000000000fa6ead00000000008d2a480000000000772ba480000000007c0a7100000000006b72bb00000000001b842200000000000f9c57100000000015e642c00000000050bf8f00000000003b7c390000000000127718200000000052345d8000000000e7d9db00000000004058fd00000000005eb1d50000000000"

# Model architecture
class ResidualBlock(nn.Module):
def __init__(self, dim):
super().__init__()
self.linear1, self.linear2 = nn.Linear(dim, dim), nn.Linear(dim, dim)
self.norm1, self.norm2 = nn.LayerNorm(dim), nn.LayerNorm(dim)
self.activation = nn.ReLU()

def forward(self, x):
identity = x
x = self.activation(self.linear1(self.norm1(x)))
return self.linear2(self.norm2(x)) + identity

class PrimeFactorMLP(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, num_layers=6):
super().__init__()
self.input_proj = nn.Linear(input_dim, hidden_dim)
self.residual_blocks = nn.ModuleList([ResidualBlock(hidden_dim) for _ in range(num_layers)])
self.norm, self.activation = nn.LayerNorm(hidden_dim), nn.ReLU()
self.p_head, self.q_head = nn.Linear(hidden_dim, output_dim), nn.Linear(hidden_dim, output_dim)

def forward(self, x):
x = self.input_proj(x)
for block in self.residual_blocks:
x = block(x)
x = self.activation(self.norm(x))
return self.p_head(x), self.q_head(x)

# Helper functions
def int_to_tensor(num, bits=1024, chunk_size=64):
chunks = bits // chunk_size
tensor = torch.zeros(chunks)
for i in range(chunks):
mask = (1 << chunk_size) - 1
chunk_val = (num >> (i * chunk_size)) & mask
tensor[chunks - i - 1] = chunk_val / ((1 << chunk_size) - 1)
return tensor

def tensor_to_int(tensor, chunk_size=64):
num = 0
for i, chunk_val in enumerate(tensor):
val = int(round(chunk_val.item() * ((1 << chunk_size) - 1)))
num |= val << ((len(tensor) - i - 1) * chunk_size)
return num

# Convert input to integer
n = int(number_to_factorize, 16)

# Load model (download from HuggingFace or local path)
model = PrimeFactorMLP(input_dim=16, hidden_dim=2048, output_dim=8)
model.load_state_dict(torch.load('final_model.pth', map_location='cpu')['model_state_dict'])
model.eval()

# Prepare input
n_tensor = int_to_tensor(n).unsqueeze(0)

# Predict factors
with torch.no_grad():
p_pred, q_pred = model(n_tensor)
p = tensor_to_int(p_pred[0])
q = tensor_to_int(q_pred[0])

# Print results
print(f"Input number: {number_to_factorize[:34]}...")
print(f"Predicted P: 0x{p:0128x}")
print(f"Predicted Q: 0x{q:0128x}")
print(f"Product: 0x{p*q:0256x}")
print(f"Bit match: {((p*q) == n)}")
```
## Outcome
It will come as no surprise that the model does not actually effectively predict the prime factors of a known product, but it does have some interesting implications.

First, the model was trained and fine-tuned to a ~.16 validation loss. A sample of test runs indicated the following:
- Consistent Bit Accuracy: The model achieved remarkably consistent bit matching across samples - averaging around 80-82% of bits correct for both prime factors. This uniformity suggests the model is capturing fundamental structural patterns rather than just memorizing training examples.
- High-Order Bit Preservation: Considering the hex representations, the model often gets the most significant digits (i.e. the first few characters of each predicted factor) fairly accurate. This indicates it's prioritizing the high-order bits, which have the largest impact on the product.
- Q Factor Pattern Convergence: The predicted Q values exhibited a noticeable pattern - many predictions had similar digit sequences in the middle sections. For example, many Q predictions contained segments like 80...7f...80... in similar positions, suggesting the model is applying a learned template or pattern.
- Relative Error Consistency: The average relative errors for Ps and Qs were almost identical (0.159 ~ 0.160), indicating the model doesn't favor one factor over the other.
- Prediction Regularization: The predicted factors often appeared "smoother" and less random than the true factors. This suggests the network is applying some form of regularization or pattern-based prediction rather than capturing the true randomness of prime numbers.
- Value Range Compression: There appears to be compression in the predicted values compared to the true values, particularly for the Q factor - the model's predictions don't span as wide a numerical range as the actual values.

## Conclusion
This model seems to have learned to::
1. Approximate the general magnitude of each prime factor (getting high-order bits mostly right)
2. Apply certain templates or patterns it found useful during training
3. Balance errors between the two factors rather than optimizing for one

The fact that it achieves ~80% bit accuracy but still has ~16% relative error highlights the challenge: in prime factorization, even small errors in the individual bits can lead to large numerical differences when multiplied.
While this approach may not be practical for cryptographic applications, it is fascinating that the neural network had found ways to approximate a mathematical problem traditionally considered intractable for machine learning approaches.
The consistent bit accuracy around 80% suggests there might be a theoretical limit to how well this architecture can learn the factorization function without incorporating more specific number-theoretic knowledge or specialized architectures.

A quick calculation indicates that even as a starting point for an iterative, brute-force classical computer approach, 80% bit accuracy means roughly 20% of bits (about 100 bits for each 512-bit prime) are incorrect. This reduces the search space from 2^512 to approximately 2^100, which is a dramatic improvement but still exponentially large-meaning the effective classical search is still in exponential time for the remaining bits of any prediction. It may be possible to apply probabilistic search algorithms or a lattice reduction, but such methods were beyond the scope of the machine learning experiment intentions.

_Claude Sonnet 3.7 was used heavily for code generation and analysis in this project_

Files changed (1) hide show

README.md +12 -0

README.md ADDED Viewed

	@@ -0,0 +1,12 @@

+---
+license: bigscience-openrail-m
+datasets:
+- maxhirez/large-hex-prime-factor-dataset
+metrics:
+- accuracy
+pipeline_tag: graph-ml
+tags:
+- cryptography
+- math
+- quantum_computing
+---