Gesture-to-Code Adapter for StarCoder2-3B

Model Description

This repository contains a Gesture-to-Code Adapter designed to work with the StarCoder2-3B language model. By injecting gesture embeddings into the StarCoder2-3B token space, the adapter enables real-time translation of recognized gestures into structured programming code. It leverages StarCoder2-3B’s powerful code generation capabilities, extending them to multimodal input.

Key Features

  • Base Model: StarCoder2-3B, a 3-billion parameter LLM specialized in code.
  • Adapter: A lightweight MLP-based projection layer that aligns gesture embeddings (from a CNN or other visual encoder) to StarCoder2-3B’s 3072-dim token embeddings.
  • Training Objective: Mean-squared error (MSE) alignment of gesture–token pairs, plus optional contrastive alignment to refine embeddings.
  • Usage: Real-time sign language to code snippet generation, focusing on accessibility for Deaf or hard-of-hearing programmers.

Dataset

  • Name: A custom gesture dataset containing images for typical code-related gestures (e.g., “for loop,” “if statement,” “function definition”).
  • Format: Each gesture is an image or short video snippet, which is converted to a fixed-size CNN embedding. The embedding is labeled to match the intended code structure.
  • Scale: The dataset includes around XX,000 samples, covering ~XX discrete gestural instructions.

Training Process

  1. Gesture Encoder: A CNN-based classifier extracts 256- or 512-dimensional embeddings from sign images.
  2. Adapter Learning: We train a simple projection (fully connected + activation) to map these embeddings into StarCoder2-3B’s input space.
  3. Integration: During code generation, the adapter’s output replaces a special token’s embedding (e.g., <G>). The code model then produces a relevant code snippet conditioned on the recognized gesture.

Model Performance

  • Cosine Similarity between the adapter’s outputs and the matched StarCoder2-3B tokens.
  • Accuracy/F1 on sign-to-code classification for recognized gestures.
  • Code Quality: Preliminary tests show valid syntax ~XX% of the time, with advanced logic requiring additional prompt context or manual checks.

Intended Use

  1. Accessibility: Provide a new input modality for coding, especially beneficial for Deaf/hard-of-hearing individuals.
  2. Educational Tools: Enable sign-based code demonstrations in academic settings or coding bootcamps.
  3. Research: Investigate multimodal alignment between visual gestures and textual code embeddings.

Limitations

  • Limited Gesture Set: Only covers a subset of sign language gestures and code constructs. Expanding coverage requires additional labeled data.
  • Hardware Requirements: Real-time inference typically requires GPU acceleration for both CNN and StarCoder2-3B.
  • Complex Code: While StarCoder2-3B is advanced, complicated multi-file or large project code generation might not be end-to-end feasible.

How to Use

from transformers import AutoModel

# 1. Load StarCoder2-3B
starcoder = AutoModel.from_pretrained("starcoder2-3b")

# 2. Load the adapter
# e.g., adapter = load_adapter("YourName/gesture2code_adapter")

# 3. Integration snippet
# For a recognized gesture -> CNN embedding -> adapter -> StarCoder2-3B token
# Replace special token <G> embedding with adapter output.
Downloads last month
8
Safetensors
Model size
3.03B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SenalVithana/gesture-to-llm-adapter

Finetuned
(21)
this model