Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper โข 2504.00557 โข Published Apr 1 โข 15
Running 552 552 Talking Face Generation with Multilingual TTS ๐ Generate a talking face video from text