Nvidia’s foray into world models – AI models inspired by human mental models of the world
In a significant development, Nvidia has announced that it is making openly available a family of world models, called Cosmos World Foundation Models (Cosmos WFMs), at CES 2025 in Las Vegas. These models can predict and generate ‘physics-aware’ videos, and are available from Nvidia’s API and NGC catalogs, GitHub, and the AI dev platform Hugging Face.
What are Cosmos WFMs?
The Cosmos WFM family consists of a range of models divided into three categories: Nano for low latency and real-time applications, Super for ‘highly performant baseline’ models, and Ultra for maximum quality and fidelity outputs. The models range in size from 4 billion to 14 billion parameters, with Nano being the smallest and Ultra being the largest.
How do Cosmos WFMs work?
The output of one of Nvidia’s Cosmos World Foundation Models
Image Credits: Nvidia
The Cosmos WFM models can be fine-tuned for specific applications and are designed to generate ‘controllable, high-quality’ synthetic data to bootstrap the training of models for robotics, driverless cars, and more. They can simulate realistic environments like factory floors and generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data.
Training Data
The Cosmos WFM models were trained on 9,000 trillion tokens from 20 million hours of real-world human interactions, environment, industrial, robotics, and driving data. However, Nvidia wouldn’t say where this training data came from, and at least one report – and lawsuit – allege that the company trained on copyrighted YouTube videos without permission.
Nvidia’s Response
When reached for comment, an Nvidia spokesperson told TechCrunch that Cosmos ‘isn’t designed to copy or infringe any protected works.’ The spokesperson said that Cosmos learns just like people learn, and that the company gathered data from a variety of public and private sources. However, copyright experts say claims like Nvidia’s may not stand up to judicial scrutiny.
Implications
The release of Cosmos WFMs has significant implications for the AI industry, particularly in the areas of robotics and autonomous vehicles. The models can generate synthetic data to bootstrap the training of models for these applications, which could accelerate their development.
However, there are also concerns about the use of copyrighted material without permission. Whether companies like Nvidia will prevail in court cases related to this issue remains to be seen.
Industry Reaction
Companies including Waabi, Wayve, Fortellix, and Uber have already committed to piloting Cosmos WFMs for various use cases, from video search and curation to building AI models for self-driving vehicles. Uber CEO Dara Khosrowshahi said in a statement that generative AI will power the future of mobility, requiring both rich data and very powerful compute.
Nvidia’s Position
Important to note is that Nvidia’s world models aren’t ‘open source’ in the strictest sense. To abide by one widely accepted definition of ‘open source’ AI, an AI model has to provide enough information about its design so that a person could ‘substantially’ recreate it, and disclose any pertinent details about its training data, including the provenance and how the data can be obtained or licensed.
Nvidia hasn’t published Cosmos WFM training data details, nor has it made available all the tools needed to recreate the models from scratch. That’s probably why the tech giant is referring to the models as ‘open’ as opposed to open source.
Conclusion
The release of Cosmos WFMs marks a significant development in the field of AI, particularly in the areas of robotics and autonomous vehicles. While there are concerns about the use of copyrighted material without permission, the potential benefits of these models are substantial.
As Nvidia CEO Jensen Huang said onstage during a press event on Monday, "We really hope [Cosmos will] do for the world of robotics and industrial AI what Llama … has done for enterprise."