Model: ashimdahal/microsoft-git-base_microsoft-git-base

This repository contains model artifacts for a run named microsoft-git-base_microsoft-git-base, likely a PEFT adapter.

Training Source

This model was trained as part of the project/codebase available at: https://github.com/ashimdahal/captioning_image/blob/main

Base Model Information (Heuristic)

Processor/Vision Encoder (Guessed): microsoft/git-base
Decoder/Language Model (Guessed): microsoft/git-base

⚠️ Important: The base_model tag in the metadata above is initially empty. The models listed here are heuristic guesses based on the training directory name (microsoft-git-base_microsoft-git-base). Please verify these against your training configuration and update the base_model: list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers.

How to Use (Example with PEFT)

from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes
from peft import PeftModel, PeftConfig
import torch

# --- Configuration ---
# 1. Specify the EXACT base model identifiers used during training
base_processor_id = "microsoft/git-base" # <-- Replace with correct HF ID
base_model_id = "microsoft/git-base" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b)

# 2. Specify the PEFT adapter repository ID (this repo)
adapter_repo_id = "ashimdahal/microsoft-git-base_microsoft-git-base"

# --- Load Base Model and Processor ---
processor = AutoProcessor.from_pretrained(base_processor_id)

# Load the base model (ensure it matches the type used for training)
# Example for BLIP-2 OPT:
base_model = Blip2ForConditionalGeneration.from_pretrained(
     base_model_id,
     torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs
)
# Or for other model types:
base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16)
base_model = AutoModelForCausalLM
......

# --- Load PEFT Adapter ---
# Load the adapter config and merge the adapter weights into the base model
model = PeftModel.from_pretrained(base_model, adapter_repo_id)
model = model.merge_and_unload() # Merge weights for inference (optional but often recommended)
model.eval() # Set model to evaluation mode

# --- Inference Example ---
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

image = ... # Load your image (e.g., using PIL)
text = "a photo of" # Optional prompt start

inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype

generated_ids = model.generate(**inputs, max_new_tokens=50)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(f"Generated Caption: {{generated_text}}")

More model-specific documentation, evaluation results, and usage examples should be added here.