OpenAI intros two open-weight language models that can run on consumer GPUs — optimized to run on devices with just 16GB of memory

3 months ago 66

(Image credit: Getty Images)

OpenAI has developed a pair of new open-weight language models optimized for consumer GPUs. In a blog post, OpenAI announced "gpt-oss-120b" and "gpt-oss-20b", the former designed to run on a single 80GB GPU and the latter optimized to run on edge devices with just 16GB of memory.

Both models take advantage of a Transformer using the mixture-of-experts model, a model that was popularized with DeepSeek R1. Despite gpt-oss-120b and 20b's design focus towards consumer GPUs, both support up to 131,072 context lengths, the longest available for local inference. gpt-oss-120b activates 5.1 billion parameters per token, and gpt-oss-20b activates 3.6 billion parameters per token. Both models use alternating dense and locally banded sparse attention patterns and use grouped multi-query attention with a group size of 8.

Both models take advantage of a Chain-of-Thought reasoning architecture with a mixed focus on reasoning, efficiency, and real-world usability. The two gpt-oss models are also the first open-weight language models since GPT-2. Open AI models are similar to open-source software, providing easier accessibility for developers. OpenAI opted to make its two latest models open-source to boost adoption in emerging markets and other sectors that might lack the capability to adopt its proprietary models.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

Read Entire Article