Capability3 min readApr 20, 2026

Text Rendering in GPT Image 2: The Jump in Real Examples

Twelve prompts covering menus, signage, UI labels, comic captions, and dense paragraphs, rendered on 1.5 and on the 2.0 preview, side by side.

GPT Image 1.5 could render text. It was not comfortable doing it at scale. You would get four out of five words right, catch a transposed letter in review, and ship to production with a human QA gate. That works for small batches. It does not work if you are generating 10,000 product images a week.

GPT Image 2 is the first model where you can ship typography-critical output without human review in the loop.

What changed

The 1.5 model used a diffusion path shared with the base image generator. Text was rendered pixel by pixel with a light post pass that tried to enforce letter shapes. The post pass failed often on multi-word strings because it had no typographic prior. You could see the failure on any prompt with seven or more characters: kerning would jitter, letters would drift off baseline, and sometimes a character would morph into a different glyph.

GPT Image 2 introduces a typographic pathway that writes text as vector shapes and rasterises them into the scene. The glyph forms are correct before the scene touches them. The text then gets shaded and integrated like any other surface, but the shapes themselves are not being guessed.

What works now

A restaurant menu board with eight items, prices, and a daily special reading correctly on first pass.
A subway map fragment with fictional station names all legible and kerned.
A paperback book cover with a three-line title and author byline in a consistent display serif.
A CRT terminal screenshot rendering 40 lines of monospaced text that actually parses.
A Turkish language poster with proper dotless i and undotted I handling.

A side by side comparison of text rendering between 1.5 and 2.0

What still wants human review

Dense paragraph body copy at very small point sizes still slips. If the brief is a newspaper front page with ten stories each in five-point body, expect roughly 95 percent accuracy per paragraph, which means at least one will have an error. For that tier of density, render and diff against a reference string before you ship.

Arabic and Hebrew RTL scripts are a known weak spot. Basic signage works. Full paragraphs with diacritics still glitch on one character in twenty.

Back to all posts

Text Rendering in GPT Image 2: The Jump in Real Examples

What changed

What works now

What still wants human review

GPT Image 2 Launches Tomorrow on fal.ai

GPT Image 2 vs GPT Image 1.5: What Actually Changed

Text Rendering That Holds Up: Glyphs, UI, and CJK Scripts