Maket.ai will generate you 200 floor plan options in under a minute. Type in four bedrooms, three baths, 2,400 square feet, and the algorithm exhales a grid of layouts so fast your browser barely has time to render the thumbnails. A million users have signed up, drawn by the promise of architectural choice without architectural fees. Scroll through those 200 plans, though, and something strange starts to happen. You cannot quite articulate what is wrong with them, but you can feel it: the rooms all flow the same way, the hallways converge on the same geometries, and by the fortieth option you realize you have been looking at variations on a theme rather than genuinely different approaches to organizing domestic space.
A pair of researchers at the University of Pisa just proved that feeling is mathematically real.
What Diversity Collapse Looks Like
Stoppani and Bacciu's 2025 paper, a collaboration between Pisa's computer science department and the H&M Group, introduced a metric they call the Diversity Score. Their finding is blunt: as diffusion models train longer and get better at generating realistic-looking floor plans, they simultaneously get worse at generating diverse ones. Fréchet Inception Distance, the standard metric the field uses to evaluate generative models, measures only realism. It cannot detect whether a model that scores well is producing 200 genuinely different layouts or 200 pixel-level variations of the same three. FID is blind to homogeneity. A model can ace the realism test while flunking the diversity test, and nobody notices because nobody was measuring diversity in the first place.
Worse, their out-of-distribution evaluations showed that these models are prisoners of their training data. Given a boundary shape the training set did not contain, they do not improvise. They hallucinate familiar shapes into unfamiliar constraints, producing layouts that are technically valid but spatially incoherent, the architectural equivalent of an autocomplete engine finishing a sentence it has never actually read.
A Dataset That Has Never Seen a Ranch House
RPLAN, published by Wu et al., contains more than 80,000 annotated floor plans. It is the dataset that launched a subfield. Nearly every academic paper on AI-generated floor plans trains on it, benchmarks against it, or both, and the commercial tools that evolved from those papers inherited its DNA whether they acknowledge the lineage or not.
Every single plan in RPLAN is a Chinese urban apartment, without exception. Single-unit, predominantly rectangular rooms, organized around spatial norms that reflect Chinese residential architecture: wet and dry bathroom separation, compact kitchens designed for wok cooking with dedicated exhaust, genkan-style entries where shoes come off, living rooms that face south for feng shui compliance. These are legitimate architectural conventions, well-suited to the culture that produced them. They are also nothing like the spatial grammar of American single-family housing, which organizes around attached garages that open into mudrooms, open-concept kitchen-to-living flows, split bedroom plans that isolate the master suite, dedicated laundry rooms that Chinese apartments fold into balcony space, and two-story foyers that exist in no apartment on earth.
ResPlan, a newer dataset from Indiana University, attempts to fill the gap with 17,000 more diverse plans. Its authors explicitly acknowledge that RPLAN "predominantly consists of simple, single-unit layouts with mostly rectangular room shapes." But 17,000 plans competing against 80,000 entrenched ones in a field where researchers default to the largest available benchmark is not a replacement. It is a footnote that the field has yet to take seriously.
Calculating the Real Number of Options
If you accept Stoppani and Bacciu's framework, the effective diversity ratio of diffusion-based floor plan generators is sobering. Their Diversity Score measures how many topologically distinct layouts a model produces when given identical constraints. Across their experiments, models that FID said were improving were actually converging: generating outputs that, while visually distinct at the pixel level, shared the same room adjacency graphs, the same circulation patterns, the same fundamental spatial organization. Two plans that differ only in whether the bedroom is 12 feet wide or 14 feet wide are not two options. They are one option at two scales.
Applied to a commercial context, this means a tool that advertises 200 generated layouts may contain, by topological diversity measures, somewhere between 15 and 40 meaningfully distinct spatial configurations. That is still more options than a single architect would present in a first meeting, and it is dramatically fewer than the marketing suggests.
In 2024, the U.S. Census Bureau recorded 947,000 single-family housing starts. How many of those were influenced by generative design tools whose training data contained zero American single-family homes? We do not know, because the commercial tools do not disclose their training data composition. But the academic lineage is public, and RPLAN sits at the root of it like a geological stratum underlying everything built on top.
| Tool | Approach | Training Data | Price | Diversity Risk |
|---|---|---|---|---|
| Maket.ai | Generative (likely diffusion-based) | Undisclosed | $30/mo | High (if RPLAN-descended) |
| Finch3D | Graph-based rules | None (rule system) | €800–2,750/yr | Low (no training collapse) |
| Academic models | Diffusion / GAN | RPLAN (80K Chinese apts) | Free / research | Proven high |
Not Every Tool Is Guilty
Finch3D, built by a team of Swedish architects, avoids the diversity collapse problem entirely by using a graph-based approach rather than training on images. Its proprietary Finch Graph encodes spatial relationships as nodes and edges, generating layouts through rule satisfaction rather than pattern replication. Founded by Pamela Nunez Wallgren, Jesper Wallgren, and Martin Kretz, the tool focuses on multifamily and commercial projects and costs between €800 and €2,750 per year, which prices it beyond the casual homeowner but within reach for architectural firms and developers who need architecturally coherent outputs rather than pixel-plausible ones.
This distinction matters more than any benchmark score. Graph-based systems do not suffer from training convergence because they do not train in the conventional sense, instead applying constraint-satisfaction rules that produce layouts based on spatial relationships rather than statistical patterns. A graph system that knows kitchens should be adjacent to dining rooms and that bedrooms need exterior walls will produce layouts constrained by those relationships rather than by the statistical mean of whatever dataset it absorbed. It cannot generate culturally biased plans for the same reason a compass cannot point south: the mechanism does not permit it.
But graph-based tools are expensive, technically complex, and designed for professionals. Maket.ai's million-user base is not architects. It is homeowners and small builders who want a starting point, and those users are the ones most likely to trust that 200 options means 200 choices.
As Elizabeth Bowie Christoforetti of the Harvard Graduate School of Design has argued, "technology tends to amplify, accelerate, or consolidate the inherited values and value systems of our society." RSM US documented the physical parallel: 5-over-1 construction, enabled by IBC Section 510.2, has already homogenized American multifamily exteriors from Portland to Charlotte. AI floor plan tools risk doing to the interior what 5-over-1 did to the facade, collapsing spatial diversity into a statistical mean and calling it optimization.
If You Are Using an AI Floor Plan Tool
Ask what it trained on. If the company will not answer, assume the worst, because in a field where the dominant public dataset is culturally monocultural, silence about training data composition is not a trade secret defense but an admission that the question has never been asked internally, never been answered externally, and never been surfaced in the marketing materials that promise you hundreds of unique design possibilities for $30 a month.
Count meaningful differences, not cosmetic ones. When evaluating generated options, sketch the room adjacency graph for each: which rooms connect to which? If 15 of your 20 "different" plans have the same adjacency graph with different dimensions, you received one plan at 15 scales. A human architect who presented that range would be fired. You would not pay twice for the same blueprint with the font changed.
Budget for a human review. At $30 per month for Maket.ai, the tool is cheap enough to use as brainstorming software and expensive enough to make users skip the architect who would catch the problems. A licensed architect charges $2,000 to $5,000 for schematic design on a custom home. If an AI tool saves you three weeks of design iteration but leads to a plan that ignores your local building code's setback requirements, egress window minimums, or ADA-compliant hallway widths, the time savings evaporate in the permitting office. Maket.ai's own reviews note that "zoning compliance claims need professional verification before permit submission." Treat the tool as a sketch pad, not a blueprint.
If you are an architect evaluating generative tools for your practice, Finch3D's graph-based approach sidesteps the training data problem entirely, though it targets multifamily at a price point that reflects that market. For single-family residential, the honest answer in April 2026 is that no commercial AI floor plan generator has published evidence that its outputs are topologically diverse across American housing typologies. You are better served using the tools for constraint exploration, rapid iteration on adjacency concepts, and client presentations, not as substitutes for spatial design judgment honed on actual buildings.
Limitations of This Analysis
Stoppani and Bacciu studied academic diffusion models, not commercial products. Maket.ai and competitors may use proprietary architectures, supplemental training data, or post-generation filtering that mitigates diversity collapse in ways the academic literature has not yet measured. We do not have access to any commercial floor plan generator's training data composition, and the effective diversity ratio of 15 to 40 distinct layouts per 200 generated is inferred from the Diversity Score findings, not directly measured on any specific tool's output. "Diversity" in floor plans resists clean definition: two plans with identical adjacency but different proportions may be meaningfully different for a buyer whose priority is closet space rather than circulation flow. The DS metric is new, not yet independently replicated, and may not generalize to all model architectures. The 947,000 single-family housing starts figure includes all homes regardless of whether any generative design tool was involved.