OpenAI intros two open-weight language models that can run on consumer GPUs — optimized to run on devices with just 16GB of memory

3 hours ago 4
OpenAI logo on a. phone
(Image credit: Getty Images)

OpenAI has developed a pair of new open-weight language models optimized for consumer GPUs. In a blog post, OpenAI announced "gpt-oss-120b" and "gpt-oss-20b", the former designed to run on a single 80GB GPU and the latter optimized to run on edge devices with just 16GB of memory.

Both models take advantage of a Transformer using the mixture-of-experts model, a model that was popularized with DeepSeek R1. Despite gpt-oss-120b and 20b's design focus towards consumer GPUs, both support up to 131,072 context lengths, the longest available for local inference. gpt-oss-120b activates 5.1 billion parameters per token, and gpt-oss-20b activates 3.6 billion parameters per token. Both models use alternating dense and locally banded sparse attention patterns and use grouped multi-query attention with a group size of 8.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

Read Entire Article