Gemini 3.1 Flash-Lite arrives as Google’s cheapest Gemini 3 model yet

March 3, 2026•News•

•2 min read

Gemini 3.1 Flash-Lite arrives as Google’s cheapest Gemini 3 model yet

After launching Gemini 3.1 Pro last month, Google is now adding Gemini 3.1 Flash-Lite, a new model aimed at developers who need lower costs and quicker responses more than they need the company’s highest-end reasoning model. It is rolling out in preview starting today through the Gemini API in Google AI Studio and Vertex AI.

Google describes 3.1 Flash-Lite as the fastest and cheapest model in its Gemini 3 family so far. Pricing starts at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens, with the company pitching it for high-volume tasks like translation, moderation, and other jobs where speed and cost matter more than anything else.

According to Google, Gemini 3.1 Flash-Lite delivers a 2.5 times faster time to first token than Gemini 2.5 Flash and a 45% boost in output speed, while still holding up on quality. Google also says the model beats other systems in the same tier on reasoning and multimodal benchmarks, which is the bigger point here. This is meant to be the model developers can use constantly without paying for a heavier system.

Flash-Lite also includes adjustable thinking levels, so developers can decide how much reasoning they want the model to use for a task. That gives it a bit more range than the “Lite” name might suggest. Google is clearly pitching it for repetitive, high-frequency work, but also for tasks that need stronger instruction following without moving up to a more expensive model.

On the API side, Gemini 3.1 Flash-Lite supports a 1 million token context window and can take in text, code, images, audio, video, and PDFs, while returning text output. That helps place it pretty clearly in Google’s lineup. It is not the flagship model, but it is the one built for scale.

Google also says early testers including Latitude, Cartwheel, and Whering are already using the model. The broader pitch is fairly straightforward. Gemini 3.1 Flash-Lite is supposed to be the model developers reach for when they want something fast, inexpensive, and reliable enough to run all day, with Google also highlighting output speed advantages over Claude 4.5 Haiku in its launch comparison chart.

Y. Anush Reddy

Y. Anush Reddy is a contributor to this blog.

How Platforms Are Using AI to Break Language Barriers

The next wave of streaming isn’t about better shows, it’s automated voiceovers. AI is teaching stories to speak your language, one voice at a time.

Apple Just Built Translation Into the OS

Apple just made your iPhone an AI interpreter. Your iPhone can translate speech in real time on calls, video chats, and in-person.

New York Times Sues Perplexity in Copyright Dispute

A new lawsuit from The New York Times puts Perplexity AI at the center of the copyright fight over AI answer engines. The Times claims its content was used without authorization. The outcome could influence how AI search cites, pays, and competes with publishers.

Amazon in talks to invest $10 billion in OpenAI

Amazon and OpenAI are discussing a $10 billion deal that could value OpenAI above $500 billion. The talks would deepen OpenAI’s AWS ties while it expands beyond Microsoft. The push comes after GPT-5.2 and rising pressure from Google Gemini 3.

Related Articles