Model: ashimdahal/microsoft-git-base_microsoft-git-base
This repository contains model artifacts for a run named microsoft-git-base_microsoft-git-base
, likely a PEFT adapter.
Training Source
This model was trained as part of the project/codebase available at: https://github.com/ashimdahal/captioning_image/blob/main
Base Model Information (Heuristic)
- Processor/Vision Encoder (Guessed):
microsoft/git-base
- Decoder/Language Model (Guessed):
microsoft/git-base
โ ๏ธ Important: The base_model
tag in the metadata above is initially empty. The models listed here are heuristic guesses based on the training directory name (microsoft-git-base_microsoft-git-base
). Please verify these against your training configuration and update the base_model:
list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers.
How to Use (Example with PEFT)
from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes
from peft import PeftModel, PeftConfig
import torch
# --- Configuration ---
# 1. Specify the EXACT base model identifiers used during training
base_processor_id = "microsoft/git-base" # <-- Replace with correct HF ID
base_model_id = "microsoft/git-base" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b)
# 2. Specify the PEFT adapter repository ID (this repo)
adapter_repo_id = "ashimdahal/microsoft-git-base_microsoft-git-base"
# --- Load Base Model and Processor ---
processor = AutoProcessor.from_pretrained(base_processor_id)
# Load the base model (ensure it matches the type used for training)
# Example for BLIP-2 OPT:
base_model = Blip2ForConditionalGeneration.from_pretrained(
base_model_id,
torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs
)
# Or for other model types:
base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16)
base_model = AutoModelForCausalLM
......
# --- Load PEFT Adapter ---
# Load the adapter config and merge the adapter weights into the base model
model = PeftModel.from_pretrained(base_model, adapter_repo_id)
model = model.merge_and_unload() # Merge weights for inference (optional but often recommended)
model.eval() # Set model to evaluation mode
# --- Inference Example ---
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
image = ... # Load your image (e.g., using PIL)
text = "a photo of" # Optional prompt start
inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype
generated_ids = model.generate(**inputs, max_new_tokens=50)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(f"Generated Caption: {{generated_text}}")
More model-specific documentation, evaluation results, and usage examples should be added here.
- Downloads last month
- 45
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support