microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition β’ Updated 7 days ago β’ 262k β’ 1.37k
Running 544 544 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects