sound & animation
In the beginning of the pandemic, I started writing close to a track a day, give or take a few days. Titanus Colony was written a few months in and is meant to convey the essence of what seemed at the time an endless lockdown with no end in sight.
Just following mixdown, I decided the song needed visuals and I wanted to approach it a little different as I was experimenting with an “AI” model at this time.
Recent advances in machine learning have created opportunities for AI technologies to assist unlocking creativity in powerful ways. PyTTI is a toolkit on Google Colab which facilitates image generation, animation, and manipulation using processes that could be thought of as a human artist collaborating with AI assistants.
One of the core components of PyTTI (and most text-guided AI image generation methods) is a technique which is able to project both text and images into the same latent space, a “multi-modal” space which can be used to represent either text or images.
As with a single-modality space, we can measure how similar two chunks of text are or how similar two images are in this space, where “similar” is a measure of their semantic content. What’s really special here is that now we can measure how similar the semantic content of an image is relative to the semantic content of some chunk of text!
A way to approach this is as if there were a region in the multi-modal latent space that represents something like the concept “cat”. So if we project an image containing a picture of a cat into this space, it’ll be close to the region associated with this platonic “cat” concept. Similarly, if we grabbed a bit of text and project it into the same space, the results end up somewhere close the “dog” concept’s location as well.
This is so the key to how PyTTI uses CLIP to “guide” image generation. PyTTI takes the image, measures how near or far away it is from the latent space representation of the guiding prompts you provided, and tries to adjust the image in ways that move its latent space representation closer to the latent space representation of the prompt.
View this post on Instagram