Gesture-to-Code Adapter for StarCoder2-3B
Model Description
This repository contains a Gesture-to-Code Adapter designed to work with the StarCoder2-3B language model. By injecting gesture embeddings into the StarCoder2-3B token space, the adapter enables real-time translation of recognized gestures into structured programming code. It leverages StarCoder2-3B’s powerful code generation capabilities, extending them to multimodal input.
Key Features
- Base Model: StarCoder2-3B, a 3-billion parameter LLM specialized in code.
- Adapter: A lightweight MLP-based projection layer that aligns gesture embeddings (from a CNN or other visual encoder) to StarCoder2-3B’s 3072-dim token embeddings.
- Training Objective: Mean-squared error (MSE) alignment of gesture–token pairs, plus optional contrastive alignment to refine embeddings.
- Usage: Real-time sign language to code snippet generation, focusing on accessibility for Deaf or hard-of-hearing programmers.
Dataset
- Name: A custom gesture dataset containing images for typical code-related gestures (e.g., “for loop,” “if statement,” “function definition”).
- Format: Each gesture is an image or short video snippet, which is converted to a fixed-size CNN embedding. The embedding is labeled to match the intended code structure.
- Scale: The dataset includes around XX,000 samples, covering ~XX discrete gestural instructions.
Training Process
- Gesture Encoder: A CNN-based classifier extracts 256- or 512-dimensional embeddings from sign images.
- Adapter Learning: We train a simple projection (fully connected + activation) to map these embeddings into StarCoder2-3B’s input space.
- Integration: During code generation, the adapter’s output replaces a special token’s embedding (e.g.,
<G>
). The code model then produces a relevant code snippet conditioned on the recognized gesture.
Model Performance
- Cosine Similarity between the adapter’s outputs and the matched StarCoder2-3B tokens.
- Accuracy/F1 on sign-to-code classification for recognized gestures.
- Code Quality: Preliminary tests show valid syntax ~XX% of the time, with advanced logic requiring additional prompt context or manual checks.
Intended Use
- Accessibility: Provide a new input modality for coding, especially beneficial for Deaf/hard-of-hearing individuals.
- Educational Tools: Enable sign-based code demonstrations in academic settings or coding bootcamps.
- Research: Investigate multimodal alignment between visual gestures and textual code embeddings.
Limitations
- Limited Gesture Set: Only covers a subset of sign language gestures and code constructs. Expanding coverage requires additional labeled data.
- Hardware Requirements: Real-time inference typically requires GPU acceleration for both CNN and StarCoder2-3B.
- Complex Code: While StarCoder2-3B is advanced, complicated multi-file or large project code generation might not be end-to-end feasible.
How to Use
from transformers import AutoModel
# 1. Load StarCoder2-3B
starcoder = AutoModel.from_pretrained("starcoder2-3b")
# 2. Load the adapter
# e.g., adapter = load_adapter("YourName/gesture2code_adapter")
# 3. Integration snippet
# For a recognized gesture -> CNN embedding -> adapter -> StarCoder2-3B token
# Replace special token <G> embedding with adapter output.
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for SenalVithana/gesture-to-llm-adapter
Base model
bigcode/starcoder2-3b