
Replatforming an omnichannel retailer onto Azure ahead of peak
A 240-store UK retailer was running a brittle e-commerce stack on ageing infrastructure. Black Friday the previous year had taken the site down for 71 minutes at peak. We rebuilt the platform in time for the next one and they did not lose a minute.
The challenge
A 240-store UK omnichannel retailer with GBP 180m annual turnover ran their customer-facing e-commerce platform on physical infrastructure in a single colocation. Releases were monthly, painful and required an out-of-hours window. Test environments did not match production. Observability stopped at the load balancer.
The previous Black Friday, the platform had taken 71 minutes of customer-visible downtime at peak, with a board-estimated GBP 1.4m revenue impact and a re-platforming mandate. Two competing proposals had landed: a USD 2.1m lift-and-shift to AWS from one vendor, and a 14-month transformation programme from another. Both missed the next peak.
We were brought in to find a third path that delivered measurable resilience before the November cutover.
Our approach
Architecture: pragmatic, not perfect
We designed a multi-region Azure landing zone with autoscaling AKS for the storefront, a managed PostgreSQL Hyperscale tier for the product catalogue, Front Door + CDN for edge caching and a clear seam between the order capture pipeline and the legacy ERP. No big-bang ERP replacement.
Cutover: one weekend, eight hour window
We migrated production with a single planned 8 hour cutover. DNS was pre-staged with low TTL. The legacy stack was kept warm for 30 days as a fall-back. We ran a full dress rehearsal two weeks ahead in a clone environment.
Engineering culture: code, not tickets
Infrastructure as code with Terraform and a GitHub Actions pipeline replaced the manual change process. Within one quarter the team moved from monthly releases to multiple deploys per week with feature flags for safer rollout.
Peak readiness: the test that mattered
We ran load tests at four times the previous Black Friday peak using k6 and synthetic user journeys, surfacing two hot-spots in the basket service that we fixed in week two. The November peak landed without a single autoscaling event reaching its ceiling.
Results delivered
- Zero customer-visible downtime during the next Black Friday peak (vs. 71 minutes the year before)
- Monthly cloud spend 34 percent lower than the competing lift-and-shift quote
- Deploy frequency moved from 1 per month to 11+ per week
- Mean time to recover from production incidents: 87 minutes down to 9 minutes
- Test environment parity reached 96 percent (was 41 percent)
- Engineering team retained: 0 voluntary leavers in the 12 months following replatform
“We finally trust our infrastructure. The site stays up, the bill stays sensible, and we ship features without a meeting about it. The team feels like ours, but better.”