Now it is possible to feed graphic on the VLM as condition of generations! This is different from image2video where the impression turn into the main body with the video. IP2V works by using impression as a Component of the prompt, to extract the notion and elegance in the graphic. https://rap65543.digiblogbox.com/58773482/details-fiction-and-music