Generate text descriptions from images
Generate images from text prompts
Process and tokenize text input