Midjourney Magic: A Glimpse into How it is Trained and How it Works
In the world of AI image generators, some companies like OpenAI and Stability AI have been quite transparent about their inner workings and even published their source code. Midjourney, however, has remained enigmatic, leaving us to make educated guesses on how it operates.
Drawing upon similarities with competitors like Dall-E 2 and Stable Diffusion, let’s take a look at the training and capabilities of the mysterious Midjourney AI image generator.
The Training Process
While Midjourney has been tight-lipped about its background and training, it’s plausible that their AI image generator functions similarly to Dall-E 2 and Stable Diffusion, considering these companies have disclosed their training methods in detail. Like its rivals, Midjourney has revealed that it scoured the internet for images and corresponding text descriptions, amassing millions of published images for training purposes.
Diffusion Modeling: A Noisy Yet Effective Approach
At the heart of most publicly available AI image generators lies a process called diffusion modeling. This technique involves adding noise to an image, turning it into a pixelated mess. The AI model then learns to recover the data by reversing the noising process. Through continuous cycles of adding and removing noise, the model becomes adept at making small variations to produce realistic images.
A primary dataset used by Midjourney and other AI image generation applications is LAION (https://laion.ai/), a web-scraped dataset developed by a German non-profit organization.
The Secret Sauce: Connecting Images and Text
Midjourney’s AI model, like others, has learned to grasp the intricate relationships between images and their descriptive text. By understanding these connections, the AI image generator can discern what you’re asking for through your prompt, similar to how a human artist would interpret a commission request.
The Art of Visual Associations
While there is a common misconception that AI models like Midjourney retain images in a database or merely cut and paste them out almost like a collage, that is not quite the case. Midjourney’s AI image generator relies on neural networks that learn visual concepts through machine learning. These networks render visuals based on the associations made with text prompts, much like how an artist would reference material to complete a unique piece.
For example, if a human artist were commissioned to draw Keanu Reeves riding a donkey, the artist would very likely reference photographs of Keanu Reeves and other imagery associated with the actor and donkeys during their creative process.
Midjourney works in a similar fashion as it also references the many images of Keanu Reeves and donkeys its been trained on to associate written words with the desired visual output to create an entirely new image never seen before.
While the Midjourney AI image generator remains somewhat shrouded in mystery, educated comparisons to similar applications like Dall-E 2 and Stable Diffusion suggest it employs diffusion modeling, extensive web-scraped datasets, and a keen understanding of image-text relationships. This powerful blend allows Midjourney to create stunning, unique visuals that rival those of human artists, showcasing the potential of AI-generated art.