Google unveils Genie AI which can create video games from text and image prompts

  • 28 Feb 2024

Why is it in the News?

Recently, Google DeepMind unveiled Genie, a novel model capable of creating interactive video games based solely on textual or image prompts.

What is Genie AI?

  • Genie is a foundation world model that is trained on videos sourced from the Internet.
  • The model can “generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.”
  • It is the first generative interactive environment that has been trained in an unsupervised manner from unlabelled internet videos.
  • When it comes to size, Genie stands at 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.
    • These technical specifications let Genie act in generated environments on a frame-by-frame basis even in the absence of training, labels, or any other domain-specific requirements.

What does Genie do?

  • Genie is a new kind of generative AI that enables anyone – even children – to dream up and step into generated worlds similar to human-designed simulated environments.
  • It can be prompted to generate a diverse set of interactive and controllable environments although it is trained on video-only data.
    • It is a breakthrough as it makes playable environments from a single image prompt.
  • According to Google DeepMind, Genie can be prompted with images it has never seen.
    • This includes real-world photographs, and sketches, allowing people to interact with their imagined virtual worlds.
  • When it comes to training, they focus more on videos of 2D platformer games and robotics.
  • Genie is trained on a general method, allowing it to function on any type of domain, and it is scalable to even larger Internet datasets.

Why is it Important?

  • The standout aspect of Genie is its ability to learn and reproduce controls for in-game characters exclusively from internet videos.
  • This is noteworthy because internet videos do not have labels about the action that is performed in the video, or even which part of the image should be controlled.
  • It allows you to create an entirely new interactive environment from a single image.
  • This opens up many possibilities, especially new ways to create and step into virtual worlds.
  • With Genie, anyone will be able to create their own entirely imagined virtual worlds.