ashimdahal
/

microsoft-git-base_microsoft-git-base

generated-by-script

image-captioning

Model card Files Files and versions Metrics Training metrics Community

microsoft-git-base_microsoft-git-base / README.md

ashimdahal's picture

Add/Update generated README.md

e1facd3 verified 9 days ago

|

history blame contribute delete

3.43 kB


	---
	# Auto-generated fields, verify and update as needed
	license: apache-2.0
	tags:
	- generated-by-script
	- peft # Assume PEFT adapter unless explicitly a full model repo
	- image-captioning # Add more specific task tags if applicable
	base_model: [] # <-- FIXED: Provide empty list as default to satisfy validator
	# - microsoft/git-base # Heuristic guess for processor, VERIFY MANUALLY
	# - microsoft/git-base # Heuristic guess for decoder, VERIFY MANUALLY
	---

	# Model: ashimdahal/microsoft-git-base_microsoft-git-base

	This repository contains model artifacts for a run named `microsoft-git-base_microsoft-git-base`, likely a PEFT adapter.

	## Training Source
	This model was trained as part of the project/codebase available at:
	https://github.com/ashimdahal/captioning_image/blob/main

	## Base Model Information (Heuristic)
	* Processor/Vision Encoder (Guessed): `microsoft/git-base`
	* Decoder/Language Model (Guessed): `microsoft/git-base`

	⚠️ Important: The `base_model` tag in the metadata above is initially empty. The models listed here are heuristic guesses based on the training directory name (`microsoft-git-base_microsoft-git-base`). Please verify these against your training configuration and update the `base_model:` list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers.

	## How to Use (Example with PEFT)

	```python
	from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes
	from peft import PeftModel, PeftConfig
	import torch

	# --- Configuration ---
	# 1. Specify the EXACT base model identifiers used during training
	base_processor_id = "microsoft/git-base" # <-- Replace with correct HF ID
	base_model_id = "microsoft/git-base" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b)

	# 2. Specify the PEFT adapter repository ID (this repo)
	adapter_repo_id = "ashimdahal/microsoft-git-base_microsoft-git-base"

	# --- Load Base Model and Processor ---
	processor = AutoProcessor.from_pretrained(base_processor_id)

	# Load the base model (ensure it matches the type used for training)
	# Example for BLIP-2 OPT:
	base_model = Blip2ForConditionalGeneration.from_pretrained(
	base_model_id,
	torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs
	)
	# Or for other model types:
	base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16)
	base_model = AutoModelForCausalLM
	......

	# --- Load PEFT Adapter ---
	# Load the adapter config and merge the adapter weights into the base model
	model = PeftModel.from_pretrained(base_model, adapter_repo_id)
	model = model.merge_and_unload() # Merge weights for inference (optional but often recommended)
	model.eval() # Set model to evaluation mode

	# --- Inference Example ---
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)

	image = ... # Load your image (e.g., using PIL)
	text = "a photo of" # Optional prompt start

	inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype

	generated_ids = model.generate(**inputs, max_new_tokens=50)
	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
	print(f"Generated Caption: {{generated_text}}")
	```

	More model-specific documentation, evaluation results, and usage examples should be added here.