Google unveils Genie AI which can create video games from text and image prompts
- 28 Feb 2024
Why is it in the News?
Recently, Google DeepMind unveiled Genie, a novel model capable of creating interactive video games based solely on textual or image prompts.
What is Genie AI?
- Genie is a foundation world model that is trained on videos sourced from the Internet.
- The model can “generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.”
- It is the first generative interactive environment that has been trained in an unsupervised manner from unlabelled internet videos.
- When it comes to size, Genie stands at 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.
- These technical specifications let Genie act in generated environments on a frame-by-frame basis even in the absence of training, labels, or any other domain-specific requirements.
What does Genie do?
- Genie is a new kind of generative AI that enables anyone – even children – to dream up and step into generated worlds similar to human-designed simulated environments.
- It can be prompted to generate a diverse set of interactive and controllable environments although it is trained on video-only data.
- It is a breakthrough as it makes playable environments from a single image prompt.
- According to Google DeepMind, Genie can be prompted with images it has never seen.
- This includes real-world photographs, and sketches, allowing people to interact with their imagined virtual worlds.
- When it comes to training, they focus more on videos of 2D platformer games and robotics.
- Genie is trained on a general method, allowing it to function on any type of domain, and it is scalable to even larger Internet datasets.
Why is it Important?
- The standout aspect of Genie is its ability to learn and reproduce controls for in-game characters exclusively from internet videos.
- This is noteworthy because internet videos do not have labels about the action that is performed in the video, or even which part of the image should be controlled.
- It allows you to create an entirely new interactive environment from a single image.
- This opens up many possibilities, especially new ways to create and step into virtual worlds.
- With Genie, anyone will be able to create their own entirely imagined virtual worlds.