A homeowner in Atlanta called her HVAC contractor last March because the air conditioning ran nonstop on 85-degree days but the house never quite felt dry. His diagnosis: the system wasn’t big enough. He recommended upgrading from a three-ton unit to a five-ton. She would have done it, too, except a friend who works in building science told her to get an actual Manual J load calculation first.
Her load calculation came back at 17,400 BTU/h, roughly 1.5 tons. Her existing three-ton system was already double what the house needed, and he had proposed tripling it.
She wasn’t unusual. She was average.
400 Square Feet Per Ton, and Other Lies
America’s dominant method for sizing residential HVAC is not ACCA Manual J, the nationally recognized ANSI standard. It is a contractor standing in your living room, doing mental math. Here is how the rule works: take the conditioned floor area, divide by some number between 400 and 600, and that is your tonnage. A 2,000-square-foot house gets a 3.5- to 5-ton system. Numbers vary by region and by contractor, but the logic is always the same. Square footage in, tonnage out. Insulation levels, window orientation, air sealing, duct leakage, occupancy patterns, internal heat gains from appliances? Ignored, all of it.
Energy Vanguard, a building science firm run by Allison Bailes, PhD, published data from 40 residential Manual J calculations performed across the southeastern United States, Texas, California, and the Midwest. Not a single result fell at or below 600 square feet per ton, with the lowest in the entire dataset coming in at 624 and most exceeding 1,000. Average: 1,431 square feet per ton.
Run those numbers backward and the scale of the problem becomes obvious. A contractor using the 500 sf/ton rule on a house that actually needs cooling at 1,431 sf/ton will install a system 2.86 times larger than the house requires. On a 2,400-square-foot home, the rule says 4.8 tons while Manual J says 1.7. That is not a rounding error. That is buying a Ford F-350 to commute to an office four miles away.
Why Oversizing Kills Heat Pumps Faster Than Furnaces
Gas furnaces are somewhat forgiving of oversizing because they fire, heat the air, and shut off. Crude but functional, and houses get warm. Utility bills run 15-25% higher than they should, and nobody notices because they have no baseline for comparison.
Heat pumps are different, and dangerously so. An oversized heat pump in cooling mode satisfies the thermostat so quickly that it shuts down before the evaporator coil has time to wring moisture from the air. This is called short-cycling. Short-cycling. A compressor fires for four or five minutes instead of the 10-15 minutes it needs to reach steady-state dehumidification, and indoor relative humidity climbs above 60%. At 65% you start growing mold, and at 70% you can smell it.
Variable-capacity heat pumps are marketed as the solution to this because they modulate down, but they cannot modulate infinitely. A three-ton variable-capacity system might turn down to one ton. If the house only needs half a ton at 55 degrees outside, the system is still oversized by 100% at minimum output, and it will cycle endlessly. Its compressor ramps up and down searching for a steady state it can never find because the load is too far below the floor of its operating range. NEEP’s cold climate heat pump analysis showed that even well-sized variable systems spend roughly 12% of annual operating hours cycling, and oversized ones spend far more.
Financial damage compounds quickly. An oversized heat pump costs $1,500-4,000 more upfront per ton of unnecessary capacity. It runs less efficiently because it never settles into the middle of its performance curve where the COP is highest. Compressors wear faster from repeated start-stop cycles, each of which draws a surge current 4-8x the running amperage. Equipment lifespan drops from the 15-20 year typical range toward 10-12. And the homeowner, mystified by high humidity and mediocre comfort, calls the contractor who installed it, who recommends a supplemental dehumidifier for another $1,800-2,500. Nobody investigates the root cause: that the system was sized by arithmetic divination rather than physics.
48% of Your Energy Bill, Sized by a Guess
HVAC systems consume 48% of the energy used in a typical American home, according to DOE data cited by the New Jersey Green Building Manual. Half the energy bill, sized by a rule of thumb that predates energy codes, building wraps, low-E glass, and spray foam insulation. It was a heuristic for a time when houses leaked like colanders and insulation was an afterthought. Modern homes built to the 2021 IECC have dramatically lower loads per square foot, making the old rules not just wrong but dangerously wrong in the specific direction that wastes the most money and energy.
Engineers oversize intentionally, too, piling a 25% safety factor on top of a rule of thumb that already oversizes by 100-200%. the safety factor pushes the installed capacity to absurd multiples of actual need. Engineers protect against a one-in-forty-year design day. Rutgers’ analysis notes that these extreme conditions occur only 1-2.5% of the time, but the oversized equipment runs at degraded efficiency during the other 97.5%.
The New Stack: Satellite, Utility Data, and Inference
Several startups and research groups are attacking the sizing problem with ML-driven platforms that replace or augment Manual J, combining data sources that are already available but have never been integrated for HVAC design.
Satellite and aerial imagery gives you roof area, orientation, tree shading, and building footprint without a site visit. Google’s Project Sunroof proved this works for solar sizing years ago, and the same imagery feeds HVAC models.
Utility consumption data, when available, reveals the building’s actual energy profile across seasons. Twelve months of electric and gas bills tell you more about a home’s thermal performance than any single-point calculation.
Real estate and assessor records provide construction year, square footage, number of stories, and sometimes insulation and window data. Construction year alone is a powerful proxy for building envelope performance because each era has characteristic building practices.
Local weather data, including design temperatures, heating and cooling degree days, and humidity profiles, gets layered on top.
Arch, a data intelligence platform that raised $6.2 million in seed funding from Gigascale Capital, Coatue, and others in February 2024, integrates data from over 12 sources and uses proprietary algorithms to help HVAC contractors size and sell heat pump systems. Arch’s pitch: contractors spend 80% of their time on leads that never convert, partly because the sales process requires a manual site visit and hand-calculated proposal. Arch generates a right-sized system recommendation from a desk. By early 2024, the platform had processed over $4 million in heat pump sales. Aurora Solar’s co-founders are angel investors, which says something about the pattern.
QuitCarbon takes a whole-home approach, generating AI-driven electrification plans that include heat pump sizing alongside panel upgrades, water heater replacements, and induction stove swaps. Their partnership with the city of Palo Alto offers residents a personalized plan based on their address, utility data, and home characteristics.
On the research side, Lawrence Berkeley National Laboratory validated three major building energy simulation engines against measured data in their FLEXLAB facility. LBNL’s engines predicted peak heating and cooling loads with a Root Mean Square Error of roughly 10%. Not perfect, but dramatically more accurate than a contractor’s rule of thumb that misses by 150-200%.
The IRA Made This Urgent
IRA changed the economics of heat pump adoption overnight. Rewiring America estimates the IRA provides an average of $10,600 per household for full electrification, including up to $2,000 for a qualified heat pump. Twenty-five states representing 55% of the U.S. population committed to installing 20 million residential heat pumps by 2030. According to AHRI shipment data, heat pump sales outpaced gas furnaces by 37% between November 2023 and November 2024, a record margin.
All those new heat pumps are being sized by someone. If the someone uses a rule of thumb, the country is pouring billions into equipment that runs too large, cycles too frequently, dehumidifies too poorly, and dies too early. RMI calculates that a properly selected heat pump can reduce climate pollution by up to 93% compared to a gas furnace. But that 93% figure assumes the system actually runs at its rated efficiency, which requires correct sizing. An oversized system operating at degraded part-load efficiency might deliver 60-70% of the theoretical emissions reduction. Still better than gas, but a third of the benefit left on the table because a contractor used the wrong arithmetic.
What a Right-Sized System Actually Looks Like
A properly sized heat pump runs longer cycles at lower intensity. In cooling mode, the evaporator coil stays cold enough, long enough, to pull moisture from the air and drain it out the condensate line instead of blowing it back into the house. Indoor humidity stays in the 45-55% range where humans feel comfortable and mold cannot establish. Its compressor reaches peak COP, which for modern cold-climate heat pumps ranges from 3.0 to 4.5 at moderate outdoor temperatures. For every unit of electricity consumed, three to four and a half units of heat are moved into or out of the house.
In heating mode, a right-sized system at design temperature runs nearly continuously, which is correct behavior because a heat pump operating at 100% capacity when it is 5°F outside is doing exactly what it should be doing. Contractors who have spent careers installing gas furnaces interpret continuous running as failure, but a furnace that runs continuously is undersized while a heat pump that runs continuously at design temperature is perfectly sized.
Cost differences between a right-sized and oversized installation are not trivial. For a 2,400-square-foot home in Climate Zone 4:
| Metric | Rule of Thumb (4.5 ton) | Manual J / AI Sized (2 ton) |
|---|---|---|
| Equipment cost | $8,500-12,000 | $4,500-7,000 |
| Annual energy (cooling) | ~$1,100 | ~$740 |
| Indoor RH (summer avg) | 58-68% | 45-55% |
| Compressor lifespan | 10-12 years | 15-18 years |
| Supplemental dehumidifier | Likely ($1,800-2,500) | Unnecessary |
Over a 15-year system lifetime, the right-sized installation saves $9,000-14,000 in equipment, energy, and avoided supplemental dehumidification. And those figures assume electricity prices stay flat, which they will not.
Existing Homes Have a Simpler Path
Allison Bailes published a method in 2025 that eliminates the need for a Manual J calculation on existing homes that meet certain criteria. Two conditions: the home must not be undergoing major renovations that change the thermal envelope, and the current system must still be performing at roughly its original capacity.
Time-and-temperature measurement. Track the system’s runtime against outdoor temperature during both a heating and a cooling season. If your current three-ton system runs 40% of the time on a 95°F day, the house needs about 1.2 tons at that condition. Straightforward math. Free data. And the result is more accurate than most Manual J calculations performed in the field because it captures the building as it actually behaves, not as the operator’s assumptions about insulation R-values and infiltration rates predict it should behave.
AI platforms could automate this entirely, since smart thermostats already log runtime data and utility-grade smart meters record consumption at 15-minute intervals. Connect the two data streams to a weather API and a simple regression model, and you have a load profile for every conditioned hour of every day for a year, with no site visit, no input errors, and no safety factor motivated by liability fear rather than physics.
The Strongest Case Against
AI sizing tools depend on data that is not uniformly available. Utility data access varies wildly by state and utility. Green Button data standards exist but adoption is spotty. Satellite imagery captures roofs, not walls. A house with a recent deep energy retrofit, new windows, or spray-foamed attic will look identical to its pre-retrofit self from a satellite. Without manual updates, the model will undersize the system because the inputs reflect the old building, not the new one. This is exactly the kind of manual input that the platforms are designed to eliminate.
Manual J itself is imperfect, as LBNL’s FLEXLAB testing showed 10% RMSE in peak load predictions from the most sophisticated simulation engines. Field-performed Manual J calculations are worse because they depend on operator estimates of infiltration rates and insulation conditions that are frequently wrong. A Manual J that overestimates infiltration by 50% will suggest a system one full ton larger than necessary on a typical house. Any calculation is only as good as its inputs, and the inputs for an existing home are often guesses dressed up as measurements.
There is also a structural incentive problem: contractors make more money installing larger systems. A five-ton heat pump has higher equipment cost, higher installation labor, and a larger markup than a two-ton. Until the payment model shifts, or until building departments actually enforce the Manual J requirement that has been in the International Residential Code since 2009, contractors will continue to oversize because oversizing pays better and nobody complains when their house is too cold. They complain when it is not cold enough. Undersizing is a callback, but oversizing is a quiet theft of efficiency that the homeowner never detects.
What I Did Not Prove
No longitudinal study has tracked AI-sized heat pump installations against rule-of-thumb installations in the same housing stock and measured the difference in energy consumption, equipment lifespan, and occupant comfort over a full equipment lifecycle. That 2.4x oversizing figure comes from comparing Energy Vanguard’s Manual J averages against the midpoint of the contractor rule-of-thumb range, not from a controlled experiment. Cost savings in the table above use representative industry figures, not data from a specific project. Arch’s $4 million in processed sales is self-reported. LBNL’s FLEXLAB results apply to commercial buildings with overhead mixing ventilation and have not been validated against residential construction specifically. RMI’s 93% emissions reduction assumes a grid mix that varies by state and will change as more renewables come online.
Oversizing is real, and its scale is supported by the data available. Whether AI platforms can solve it at scale depends on data access, contractor adoption, and code enforcement mechanisms that do not yet exist in most jurisdictions.
Sources
- Energy Vanguard / Allison Bailes, PhD, “Manual J Load Calculations vs. Rules of Thumb,” Green Building Advisor (2016)
- NJ Green Building Manual, “Properly-Sized HVAC Equipment,” Rutgers University
- LBNL, “Accuracy of HVAC Load Predictions: Validation of EnergyPlus and DOE-2 Using FLEXLAB Measurements,” OSTI Technical Report (2020)
- NEEP, “Not Too Big, Not Too Small: New Tools for Improved Air Source Heat Pump Selection” (2022)
- Environment America, “Heat Pumps Are Outselling Gas Furnaces by a Record Margin,” citing AHRI Nov 2024 data
- Pulse2, “Arch: HVAC Data Intelligence Contractor Company Raises $6.2 Million” (2024)
- ACCA, Manual J® Residential Load Calculation, 8th Edition
- Allison Bailes, PhD, “You Don’t Need a Load Calculation,” Green Building Advisor (2025)
- QuitCarbon, “QuitCarbon + Palo Alto: Helping Residents Switch to Clean Energy”