The Real Cost of AI: Energy, Water, and Carbon

1. How Much Energy AI Actually Uses

Global data centers consumed roughly 460 TWh of electricity in 2022 — about 2% of total world electricity demand. The IEA projects that figure could reach 1,000 TWh by 2026, with AI workloads accounting for a growing share of that increase. For perspective, 460 TWh is comparable to the entire annual electricity consumption of France. AI is not yet the dominant cause, but it is the fastest-growing one.

A single H100 GPU draws around 700 W under full load. A training cluster of 10,000 such GPUs running continuously for 30 days consumes approximately 5 GWh — enough to power roughly 500 average European homes for a year. That is one training run for one model at one company. The industry is running many such jobs in parallel, and the clusters are getting larger with each generation of frontier models.

At inference, costs per query are smaller but accumulate at scale. A single ChatGPT query is estimated to consume around 0.3 Wh (0.0003 kWh) for a typical GPT-4o request, though this varies with query length and model size. Early estimates put this at roughly 10 times a Google Search, though more recent analyses suggest the gap has narrowed considerably as models become more efficient. With hundreds of millions of ChatGPT users and comparable usage at competing services, inference energy adds up to gigawatt-hours per day across the industry. The per-query cost sounds trivial; the aggregate does not.

2. The Carbon Cost of Training

Training a large language model produces substantial CO2-equivalent emissions. A 2019 study by Strubell et al. estimated that a full neural architecture search process — training thousands of model variants to find an optimal transformer architecture — could emit around 284 tonnes of CO2e. A single BERT training run is far cheaper (under a tonne), but the paper illustrated how quickly costs compound at scale. More recent frontier models are dramatically larger. Estimates for GPT-3 (175B parameters) put training emissions somewhere between 500 and 550 tonnes CO2e, and GPT-4-scale runs are estimated in the range of several thousand tonnes, depending on the grid supplying the compute.

To make that tangible: a transatlantic round-trip flight between Oslo and New York produces roughly 1.5–2 tonnes CO2e per passenger. Training a frontier model is therefore equivalent to somewhere between 250 and several thousand such flights, depending on the model scale and grid carbon intensity. The comparison is imperfect — a trained model serves billions of queries while a flight serves one person — but it illustrates the order of magnitude. The emissions happen at the point of training, regardless of how widely the resulting model is used.

Carbon intensity varies enormously by location and time. A training run on hardware in Iceland, where the grid is nearly 100% renewable geothermal and hydro, has a fraction of the footprint of the same run in a coal-heavy region. The average US grid emits roughly 386 g CO2 per kWh (2023 EPA figure); the Swedish grid runs at roughly 30–40 g CO2/kWh depending on the year. Running the same 5 GWh training job shifts from approximately 1,930 tonnes CO2e on the US average to about 150–200 tonnes in Sweden — a factor of 10–13 difference from grid selection alone.

3. Inference Is the Bigger Long-Term Problem

Training grabs headlines because the numbers are large and discrete — a single event you can attach a figure to. But inference is where the cumulative energy cost compounds. OpenAI processes an estimated 10 million or more queries per day on ChatGPT alone. Google, Meta, Baidu, and dozens of enterprise deployments contribute billions more. Each query burns a small amount of energy, but the aggregate is enormous and grows monotonically with user adoption.

The useful frame here is capital versus operating expenditure. Training is a capital cost — paid once per model generation, roughly every 12–18 months for frontier models. Inference is operating cost — paid every second of every day. Over the lifetime of a deployed model, inference energy typically exceeds training energy by a significant margin. Some estimates put the ratio at 10:1 or higher for widely-used models, simply because the query volume is enormous and sustained.

As models get embedded in more products — coding assistants, search engines, customer support bots, document processing pipelines — the inference load increases. The energy trajectory for AI over the next decade will be determined primarily by how efficiently the industry serves inference at scale, not by how efficiently it trains the next generation of models. Architectural improvements like speculative decoding, grouped-query attention, and key-value cache sharing exist precisely because inference efficiency has become a first-order engineering concern.

4. Water Consumption for Data Center Cooling

Energy consumption generates heat, and heat requires cooling. Data centers use water-based cooling systems extensively, and AI's GPU-dense compute clusters run hotter than conventional server racks. Microsoft reported that its global water consumption increased by 34% between 2021 and 2022, reaching 6.4 million cubic meters — roughly the volume of 2,500 Olympic swimming pools. That increase coincided with rapid AI infrastructure expansion. Google reported a similar trajectory, with global water consumption reaching 5.6 billion gallons (approximately 21 billion liters) in 2022.

Water usage effectiveness (WUE) — liters of water consumed per kWh of IT energy — typically ranges from 0.5 to 2.0 L/kWh for modern data centers. At the lower end of that range, a 5 GWh training run consumes 2.5 million liters of water. Training GPT-3 is estimated to have required on the order of 700,000 liters of fresh water for direct on-site cooling alone — roughly what 1,200 people drink in a year. Frontier models trained at larger scale use proportionally more. A 100 MW data center running continuously can draw as much water per day as a small city.

The geographic distribution of data centers makes this particularly acute. Many large facilities are in water-stressed regions — Arizona and Nevada, for example — where water is a genuinely scarce resource. Evaporative cooling consumes water permanently; it does not return to the local watershed. This is an environmental cost that receives less attention than carbon but is equally material in water-constrained regions. Nordic data centers benefit from cool ambient temperatures that enable free-air cooling for large fractions of the year, effectively eliminating the water consumption problem alongside providing access to low-carbon electricity.

5. Practical Tips for Developers to Reduce AI Energy Use

The most impactful lever is model selection. Smaller, purpose-built models use drastically less energy than frontier general-purpose models. A fine-tuned 7B-parameter model can match or exceed GPT-4-class performance on a specific narrow task while consuming 50–100x less energy per query. A single H100 can serve a quantized 7B model; GPT-4-class serving requires clusters of expensive hardware. If your application does sentiment classification, entity extraction, or intent routing, you do not need a 70B model — benchmark a smaller one first.

Quantization is a high-yield, low-effort technique. Moving from FP32 to FP16 halves memory bandwidth and compute requirements with negligible accuracy loss on most tasks. INT8 quantization cuts that in half again. Methods like GPTQ and AWQ bring 4-bit quantization within a few percentage points of full-precision quality on many benchmarks. A 4-bit quantized 13B model fits on a single 24 GB GPU; the FP16 version requires twice the hardware. Knowledge distillation — training a small student model to replicate a large teacher — is the more involved version of this: it produces compact models with strong task-specific performance and is now well-supported in open-source tooling.

At the application level: cache repeated responses rather than re-running inference for identical or near-identical inputs. If your application serves the same system prompt followed by varied user queries, prefix caching can eliminate redundant prefill computation for the shared portion. Batch requests wherever latency allows — GPUs are significantly more energy-efficient processing batches than single requests, because utilization improves. Audit your application code for unnecessary API calls triggered by retries, polling loops, or verbose logging pipelines. These are standard engineering hygiene measures that simultaneously reduce energy consumption and API costs.

6. Choosing Providers and Regions with Renewable Energy

Cloud region selection is a concrete, actionable lever. AWS, Google Cloud, and Azure all publish carbon intensity estimates for their regions, and the differences are large. Running compute in us-east-1 (Virginia, mixed gas and coal) versus eu-north-1 (Stockholm, nearly 100% hydro and wind) can represent a 10–30x difference in grams of CO2 per kWh. For training jobs where geographic latency does not matter, routing to a low-carbon region is a direct reduction in real emissions — not just a certificate purchase.

For workloads where timing is flexible, carbon-aware scheduling compounds the benefit. Grid carbon intensity varies through the day as solar and wind generation fluctuate. Running batch jobs when renewable generation is high — midday solar, overnight wind — rather than at fixed times can reduce emissions by 30–50% in some regions with no change to the compute itself. The Electricity Maps API and cloud-native carbon dashboards (Google's Carbon-Intelligent Computing initiative, AWS Customer Carbon Footprint Tool) make this automatable.

Beyond the hyperscalers, purpose-built sustainable AI compute providers are worth evaluating for dedicated training workloads. Companies like CoreWeave and Lambda Labs colocate in regions with favorable renewable energy access. Nordic cloud infrastructure — Iceland's geothermal grid, Swedish and Norwegian hydro — offers some of the lowest carbon intensities in the world, typically 30–50 g CO2/kWh compared to 400–600 g CO2/kWh for coal-heavy grids. The unit economics for pure GPU compute are often competitive with hyperscalers.

7. AI Applications That Help Fight Climate Change

It would be intellectually dishonest to discuss AI's energy consumption without acknowledging what the technology enables on the other side of the ledger. DeepMind's GraphCast generates 10-day global weather forecasts in under a minute, compared to hours for traditional numerical weather prediction models requiring thousands of CPU cores. Better weather forecasts enable more accurate renewable energy scheduling, which reduces curtailment and improves grid stability. The compute cost of one GraphCast inference is a rounding error compared to the grid efficiency gains it enables.

Materials discovery is another high-leverage application. AlphaFold-derived insights are accelerating protein and molecular structure prediction; analogous AI tools are being applied to finding new electrolyte materials for batteries, new catalysts for green hydrogen production, and new photovoltaic materials with higher efficiency. Identifying a solar cell material with 10% better efficiency could displace many orders of magnitude more emissions than the compute required to find it. Google DeepMind applied reinforcement learning to data center cooling and reported a 40% reduction in cooling energy — AI directly reducing AI's own physical footprint.

Grid optimization, supply chain emissions tracking, building energy management, and carbon capture process optimization are further areas where AI delivers measurable climate benefits. The distinction that matters is between AI deployed as a general-purpose chat layer — where the energy-to-value ratio is diffuse — versus AI applied to specific high-leverage scientific or engineering problems where the return on compute is clearly positive. Both are happening simultaneously, and the net balance depends substantially on how the industry chooses to prioritize its compute resources.

8. A Balanced Assessment

AI's energy footprint is real, growing, and worth taking seriously. The numbers are not catastrophic on a global scale today — AI-specific electricity consumption is still a small fraction of total demand — but the growth rate is steep and the trajectory matters more than the current snapshot. A projected doubling of data center energy use by 2026 driven substantially by AI is a signal worth paying attention to, not dismissing.

At the same time, the footprint is manageable with deliberate choices. The gap between a carelessly architected AI system and a thoughtfully designed one is not marginal — it is often a factor of 10 to 100 in energy consumption for the same task. Smaller models, quantization, caching, batching, renewable energy sourcing, and geographic placement are all levers that developers and organizations can pull today. They are also levers that reduce costs, which means the economic incentive and the environmental incentive are aligned.

The physics is straightforward: every joule of compute has to come from somewhere, and every degree of temperature rise in a GPU rack requires active cooling. Being precise about what those costs are — rather than ignoring them or catastrophizing about them — is the starting point for making better decisions. The AI systems being built today will run for years. Design choices made now accumulate into a real-world energy and emissions trajectory. That is worth getting right.

Resources

IEA — Energy Demand from AI — data center energy consumption figures and projections
Strubell et al. (2019) — Energy and Policy Considerations for Deep Learning in NLP — the foundational paper on carbon costs of training
NVIDIA H100 GPU — official specs including the 700W TDP rating
Epoch AI — How Much Energy Does ChatGPT Use? — per-query energy estimates and methodology
Microsoft 2024 Environmental Sustainability Report — water consumption and data center growth figures
Google 2023 Environmental Report — water and energy consumption data
EPA eGRID — US grid carbon intensity data by region
Our World in Data — Carbon Intensity of Electricity — country-level grid carbon intensity comparisons
Google DeepMind — GraphCast — AI weather forecasting model
Google DeepMind — Data Centre Cooling with RL — 40% reduction in cooling energy
AWQ: Activation-aware Weight Quantization (MIT Han Lab) — post-training quantization techniques for LLM compression
Electricity Maps — real-time grid carbon intensity API for carbon-aware computing