Liquid AI's smallest model yet LFM2.5-230M beats models 4X its size at data extraction, can run 'anywhere'

22 hours ago 55

Liquid AI, founded by former MIT computer scientists, today released its smallest AI language model yet, LFM2.5-230M, and enterprises would do well to consider it for their uses in data extraction and local deployment on smartphones, laptops and robotics.

This is a 230-million-parameter foundation model explicitly designed for on-device agentic workflows, and as Liquid states in its release blog post, that small size makes it possible to run nearly "anywhere." According to Liquid, it also outperforms models more than 4X its size on selected benchmarks, specifically doing better at data extraction than the 800 million parameter count Alibaba Qwen3.5-0.8B (Instruct) and 1-billion parameter Google Gemma 3 1B.

Liquid AI LFM2.5-230M benchmark comparison chart

The model targets developers and engineers building lightweight data extraction pipelines and autonomous edge systems.

Operating under a dual-use commercial license, the model remains free for individuals and companies generating less than $10 million in annual revenue, while requiring a paid enterprise agreement for larger corporations.

This release distinguishes itself from other small AI models by utilizing the LFM2 architecture to achieve high inference speeds without the massive memory overhead typical of parameter-heavy transformers.

While major AI companies Anthropic, OpenAI, Google, Microsoft, Meta and others push parameter counts into the hundreds of billions or trillions to achieve frontier performance, a parallel race focuses entirely on the edge and local deployments.

Liquid AI's launch of LFM2.5-230M signals a pivotal shift toward architectural efficiency over brute-force scaling. By squeezing 19 trillion tokens of pre-training into a 230-million-parameter footprint, the company demonstrates that edge devices do not need massive computational power or persistent cloud connections to execute complex, multi-step agentic workflows.

How LFM2.5-230M works

The LFM2.5-230M model diverges from standard transformer architectures, relying instead on the LFM2 framework. This architecture functions as a hybrid system, interleaving gated short-range convolutions with grouped-query attention to process information efficiently.

For those tracking the evolution of efficient architectures, Liquid’s approach shares a similar conceptual goal: managing long contexts and sequential data effectively on edge hardware without the quadratic memory costs of pure attention mechanisms. The model supports an expansive 32K context window, allowing it to ingest substantial documents or continuous streams of robotic telemetry.

When analyzing the performance charts provided in the release, the architectural efficiency becomes visually apparent. The model maintains a memory footprint of under 400MB while achieving prefill and decode speeds that outpace comparable models like Gemma 3 1B IT and Granite 4.0-H-350M.

On a Samsung Galaxy S25 Ultra equipped with a Qualcomm Snapdragon Gen4 CPU, the model reaches a decode speed of 213 tokens per second. Even on a highly constrained Raspberry Pi 5, the model maintains a decode rate of 42 tokens per second. Furthermore, internal benchmarking shows the GPU inference stack delivers lower end-to-end latency than competing small models across all concurrency levels.

Why it matters for enterprises

To understand why a 230-million-parameter model is necessary, one must look at how enterprises currently manage data.

Organizations have traditionally relied on rigid, rule-based Extract, Transform, Load (ETL) scripts to move and process data. However, these legacy systems are notoriously brittle; a simple change in a document's layout or a schema update can break the entire pipeline.

To solve this, the industry is shifting toward "AI ETL," where machine learning infers mappings, detects schema drift, and adapts to changes automatically. In a modern lightweight data extraction pipeline, an AI model connects to unstructured sources—like PDFs, emails, or web forms—and structures the data into formats like JSON without requiring hardcoded rules.

For enterprises, using a massive flagship model like Claude Opus 4.6 (which costs $5.00 per million input tokens) to parse routine invoices, format addresses, or route telemetry data is economically unviable.

This is where models like LFM2.5-230M become critical. Designed explicitly as a lightweight extraction engine, it allows companies to automate repetitive formatting and data parsing at a fraction of the compute cost and latency, running directly on local hardware rather than relying on expensive, continuous cloud API calls.

Small Model Benchmarks: LFM vs. The 3B Class

The AI industry in mid-2026 is seeing a renaissance in "small" models, but the definition of "small" varies wildly.

Recently, the open-weight community was stunned by Weibo's VibeThinker-3B, a 3-billion-parameter model built on a Qwen2-style backbone that achieved a massive 94.3 on the AIME 2026 math benchmark, rivaling 600-billion-parameter behemoths through aggressive data curation and reinforcement learning.

Similarly, Google's Gemma 4 family — which recently crossed 200 million downloads — pushes frontier AI to the edge, including the E2B (2 billion parameters) designed specifically for mobile and IoT deployments.

By contrast, Liquid AI's LFM2.5-230M operates in a completely different weight class. At just 230 million parameters, it is roughly one-tenth the size of Google's smallest Gemma 4 model and VibeThinker-3B.

Because of its microscopic footprint, LFM2.5-230M is not designed to compete on reasoning-heavy workloads like advanced math, coding, or creative writing—a constraint Liquid AI explicitly acknowledges.

However, in its intended domains of data extraction and tool calling, the model punches well above its weight class.

Benchmarks released by Liquid AI show LFM2.5-230M scoring 43.26 on the BFCLv3 tool-use benchmark, dominating IBM's Granite 4.0-350M (39.58) and completely outpacing larger 1-billion-parameter models like Google's Gemma 3 1B IT (16.61).

Liquid AI LFM2.5-230M benchmark comparison bar chart

On CaseReportBench for data extraction, it scores 22.51, decimating the Qwen3.5-0.8B (Instruct).

LFM2.5-230M proves that while 3-billion-parameter models like VibeThinker are solving advanced calculus, a 230-million-parameter model is the superior, highly optimized choice for executing structured tool calls and keeping agentic pipelines running efficiently on constrained hardware.

Advanced research uses

Because it excels at tool calling, LFM2.5-230M functions primarily as a skill-selection layer. Liquid AI demonstrated this capability by deploying the model on a Unitree G1 humanoid robot.

Running entirely on-device via the robot's onboard NVIDIA Jetson Orin compute module, the model successfully processes complex environmental commands.

As noted in the company's technical blog, the model takes a free-form instruction like, *"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters,"* and automatically translates it into a structured multi-step plan calling on pre-trained low-level skills provided by NVIDIA's SONIC framework.

The base and post-trained models are available immediately on Hugging Face, with native day-one support across the inference ecosystem for llama.cpp (GGUF), MLX, vLLM, SGLang, and ONNX.

Dual-use, custom LFM Open License

Liquid AI ships LFM2.5-230M under the LFM Open License v1.0. Despite the word "open" in the title, this is not an Open Source Initiative (OSI) compliant license; it operates as a restricted, dual-use commercial framework.

For independent developers, researchers, and early-stage startups, the license functions identically to open-source software.

Users receive a perpetual, worldwide, royalty-free license to reproduce, modify, and distribute the model, provided they retain original copyright notices and prominently state any modifications.

However, the license includes a strict "Commercial Use Limitation". Any legal entity generating $10 million or more in annual revenue loses the right to use the model commercially under this agreement.

Large enterprises crossing this financial threshold must negotiate a separate, paid commercial agreement with Liquid AI to deploy the model in production.

This strategy protects the company from having its intellectual property absorbed by major technology conglomerates for free, while still seeding the model at the grassroots developer level.

Read Entire Article