BugWhisperer

Overview

BugWhisperer is an open-source approach designed to automate the detection of security vulnerabilities in system-on-chip (SoC) designs at the Register-Transfer Level (RTL). By fine-tuning large language models (LLMs) with domain-specific hardware security knowledge, the framework addresses the limitations of traditional, manual security verification methods. The work leverages a comprehensive hardware vulnerability database—built using golden benchmarks and augmented through design replication—to generate diverse Verilog code samples encapsulating 13 distinct vulnerability types. This enables the fine-tuned model to not only detect known vulnerabilities but also generalize across varied coding styles and architectures.

Key Contributions

Fine-Tuned LLMs:
The approach fine-tunes open-source Mistral-7B-Instruct-v-03 model specifically for hardware security tasks, enabling them to detect subtle vulnerabilities that general-purpose LLMs often miss.
Performance Gains:
Fine-tuning improves detection accuracy dramatically (Mistral-7B-instruct achieves 84.8% accuracy compared to a non-fine-tuned baseline of 42.5%), demonstrating that open-source models can become cost-effective, transparent alternatives to proprietary solutions.

Training Setup Highlights

Dataset

Samples:
4,000 vulnerable Verilog code samples as dataset for the fine-tuning process

Parameter-Efficient Fine-Tuning

Method:
Low-Rank Adaptation (LoRA)
Configuration:
- Rank: 128
- Alpha: 256
- Dropout: 0.1

Computational Resources

Hardware:
Training performed on two NVIDIA A100 GPUs.
Precision:
4-bit quantization (NF4) with float16 compute precision to optimize memory usage.

Optimization Details

Learning Rate:
2×10⁻⁶
Batch Size:
4 with gradient accumulation over 1 step
Training Epochs:
3
Optimizer:
Paged AdamW (32-bit) with a weight decay of 0.001
Gradient Clipping:
Maximum norm of 0.3
Scheduler:
Constant learning rate with a warmup ratio of 0.03
Additional Techniques:
Gradient checkpointing and a maximum sequence length capped at 512 tokens for efficient context retention

shamstarek
/

Mistral-7B-instruct-Bug-Whisperer

You need to agree to share your contact information to access this model