Project Highlight - Distributed LLM pretraining during renewable curtailment windows: a feasibility study

This technical report presents a system that performs full-parameter LLM training across geo-distributed GPU clusters during regional curtailment windows, elastically switching between local single-site training and federated multi-site synchronization as sites become available or unavailable.

Completed for
VSD Research
Year
Focus area
Federated learning

Overview

Training large language models consumes substantial energy. At the same time, renewable generation regularly produces more electricity than the grid can absorb, forcing clean power to be curtailed. In California alone, 3.4 million MWh of wind and solar output was curtailed in 2024, a 29% increase from the prior year. These windows of surplus generation represent compute time that is both low-carbon and low-cost.

This project, a collaboration with Philipp Wiesner, Soeren Becker, Dominik Scheinert, Alexander Acker, and Odej Kao at exalsius and TU Berlin, demonstrates a system for full-parameter LLM training across geo-distributed GPU clusters during regional curtailment windows. The system dynamically adapts its execution mode based on how many sites are currently experiencing curtailment: training locally when a single site is available, performing federated synchronization when multiple sites overlap, and pausing entirely when no curtailment is detected. Curtailment signals are derived from real-world marginal carbon intensity traces provided by WattTime.

We trained a 561M-parameter transformer model on 12.8 billion tokens across three clusters representing California, Texas, and South Australia. The Flower federated learning framework handles multi-site coordination, while the exalsius control plane manages elastic provisioning and deprovisioning of GPU nodes across clusters. A custom hysteresis mechanism prevents oscillation from short-lived signal fluctuations, and work-weighted federated averaging accommodates heterogeneous compute and sites joining or leaving mid-round.

The curtailment-aware approach reaches training quality comparable to conventional single-site and continuous federated baselines while reducing operational carbon emissions to 5-12% of single-region runs. Energy consumption across all scenarios was comparable (36-38 kWh), confirming that emissions reductions come from shifting when and where energy is consumed rather than from reduced compute.

I've written an accompanying blog post discussing the implications of this work for multi-agent energy systems. Read it here.

View the full report here. The prototype is available on GitHub.

Tools used

  • Python
  • PyTorch
  • Flower
  • Kubernetes
  • Redis
  • Vessim
CO2 emissions reduction
95%
geo-distributed clusters
4
training parity vs. baseline
99.3%

Other past projects

Webpage build for Keep Cool newsletter

I built a new custom subscribe page for my favorite climate tech newsletter, Keep Cool. The site was built using Next.js + Tailwind and rests on top of the Beehiiv-managed back end, requiring no change in workflow for the client.

Read more

TEA: Flexible GPU clusters as distributed energy resources

I completed a techno-economic analysis (TEA) for a proposed company called "Nth Power," a distributed GPU cloud provider and Virtual Power Plant (VPP) operator for grid operators, utilities, and clean energy companies.

Read more

Start a conversation

Follow me