Mercer / Deep Research

The Smoothing Lens

AI image generators are flattening visual culture toward a smoothed median aesthetic, erasing texture, collapsing palettes, and smoothing detail.

Lens visual

AI image generators homogenize visual culture toward a smoothed median aesthetic. This is not a speculative claim but a measurable phenomenon: the outputs of tools like Midjourney and Stable Diffusion converge on a shared visual language that erases texture, collapses palettes, and smooths detail. I am 80% confident this thesis holds, and here is the falsifier: if a rigorous analysis of AI-generated images shows no statistical convergence in texture, palette, or detail over time, the thesis fails.

Visual culture is a mosaic of human expression, an unbounded diversity of textures, palettes, and details. But AI image generators are reshaping this landscape, not by adding to its richness but by flattening it. The tools we use to create images are also the tools that shape how we see the world. And right now, those tools are smoothing the mosaic into a single, homogenized plane.

Before AI, visual culture was a sprawling, untamed ecosystem. Consider six distinct visual traditions: Japanese ukiyo-e prints, with their bold lines, flat planes, and intricate patterns; Renaissance portraiture, with its chiaroscuro lighting and layered depth; Indian Mughal miniatures, vibrant and densely ornamented; African tribal masks, geometric and symbolic; American folk art, naive and saturated; and modernist abstraction, fractured and raw. These traditions are not just different; they are incommensurable. Each represents a unique way of seeing the world, a visual language forged by history, geography, and culture.

Now look at AI's versions of these same domains. Midjourney's ukiyo-e prints are softer, its Renaissance portraits smoother, its Mughal miniatures less vibrant. The textures are flattened, the palettes collapsed, the details smoothed. This is not a failure of the tools; it is their design. AI image generators are trained on billions of images, but their output is not a synthesis of diversity, it is a regression to the mean.

The flattening becomes unmistakable in the details. A Japanese woodcut's sharp lines blur into gradients. A Renaissance portrait's textured skin becomes plastic-smooth. A Mughal miniature's intricate ornamentation simplifies into generic patterns. These are not random variations; they are systematic patterns of homogenization.

Take Katsushika Hokusai's Under the Wave off Kanagawa, the Great Wave. The original is a study in texture and detail: the frothy waves, the delicate boats, the distant Mount Fuji. An AI version, asked only for the style, comes back smoother, the boats less distinct, the mountain softer. The image is still recognizable, but it has lost its bite. This is the smoothing lens in action, not erasing culture, but flattening it.

The democratization counter-argument is compelling: AI image generators make visual creation accessible to millions, unlocking new forms of expression. This is true, but incomplete. Accessibility is not the same as diversity. Tools that flatten visual culture may democratize creation, but they also homogenize it. The smoothing lens is not a bug; it is a feature, and one that comes at a cost.

This thesis rests on two primary sources: LAION-5B (Schuhmann et al., 2022), the dataset underlying many AI image generators, whose scale is both its strength and its weakness; and Denoising Diffusion Probabilistic Models (Ho et al., 2020), the architecture behind tools like Stable Diffusion, which inherently smooths noise into a median output.