FACGFACG
Retail and e-commerce
All case studies Retail and e-commerce

Replatforming an omnichannel retailer onto Azure ahead of peak

A 240-store UK retailer was running a brittle e-commerce stack on ageing infrastructure. Black Friday the previous year had taken the site down for 71 minutes at peak. We rebuilt the platform in time for the next one and they did not lose a minute.

0 min
Customer-visible downtime during the next Black Friday
34%
Reduction in monthly cloud spend versus the lift-and-shift quote
11x
Increase in deploy frequency in the first quarter post-go-live

The challenge

A 240-store UK omnichannel retailer with GBP 180m annual turnover ran their customer-facing e-commerce platform on physical infrastructure in a single colocation. Releases were monthly, painful and required an out-of-hours window. Test environments did not match production. Observability stopped at the load balancer.

The previous Black Friday, the platform had taken 71 minutes of customer-visible downtime at peak, with a board-estimated GBP 1.4m revenue impact and a re-platforming mandate. Two competing proposals had landed: a USD 2.1m lift-and-shift to AWS from one vendor, and a 14-month transformation programme from another. Both missed the next peak.

We were brought in to find a third path that delivered measurable resilience before the November cutover.

Our approach

Architecture: pragmatic, not perfect

We designed a multi-region Azure landing zone with autoscaling AKS for the storefront, a managed PostgreSQL Hyperscale tier for the product catalogue, Front Door + CDN for edge caching and a clear seam between the order capture pipeline and the legacy ERP. No big-bang ERP replacement.

Cutover: one weekend, eight hour window

We migrated production with a single planned 8 hour cutover. DNS was pre-staged with low TTL. The legacy stack was kept warm for 30 days as a fall-back. We ran a full dress rehearsal two weeks ahead in a clone environment.

Engineering culture: code, not tickets

Infrastructure as code with Terraform and a GitHub Actions pipeline replaced the manual change process. Within one quarter the team moved from monthly releases to multiple deploys per week with feature flags for safer rollout.

Peak readiness: the test that mattered

We ran load tests at four times the previous Black Friday peak using k6 and synthetic user journeys, surfacing two hot-spots in the basket service that we fixed in week two. The November peak landed without a single autoscaling event reaching its ceiling.

Results delivered

  • Zero customer-visible downtime during the next Black Friday peak (vs. 71 minutes the year before)
  • Monthly cloud spend 34 percent lower than the competing lift-and-shift quote
  • Deploy frequency moved from 1 per month to 11+ per week
  • Mean time to recover from production incidents: 87 minutes down to 9 minutes
  • Test environment parity reached 96 percent (was 41 percent)
  • Engineering team retained: 0 voluntary leavers in the 12 months following replatform

We finally trust our infrastructure. The site stays up, the bill stays sensible, and we ship features without a meeting about it. The team feels like ours, but better.

CTO, UK omnichannel retailer (reference call available after NDA)