Company Overview
Dexcom, founded in 1999, is a global leader in continuous glucose monitoring (CGM) technology. Headquartered in San Diego, California, the company develops cutting-edge glucose monitoring solutions that empower people with diabetes to manage their condition with confidence and control. Dexcom’s mission is to improve health outcomes and quality of life through innovative sensor technology and seamless digital integration. With operations spanning the U.S., Malaysia, and Ireland, Dexcom remains at the forefront of diabetes care, driving advancements that benefit both patients and healthcare providers.
Objectives
Dexcom Cloud Engineering team set the following objectives for 2024:
- Speed up and de-risk lifecycle management across Kubernetes (GKE) and critical cloud-native projects (e.g., Istio, Kafka, Keycloak, Prometheus, NGINX Ingress) to stay ahead of release cycles and avoid rushed, high-risk changes.
- Reduce reliance on expert team members, freeing up expert bandwidth for strategic innovation projects.
- Enhance application safety posture by defining guardrails for app teams that assure change safety throughout the software development lifecycle.
- Ensure compliance with stringent regulatory standards by providing documented proof of operational resilience for mission-critical applications.
"Each Cloud Native project and addon, especially stateful and datapath projects, required weeks of preparation. The risk of breakage and downtime made every change a high-stakes event." — Sean Blog, Sr. Manager, Dexcom Cloud Engineering.
Challenges
Dexcom Cloud Engineering runs a platform on GKE supporting hundreds of developers and thousands of applications. These workloads are mission-critical and must comply with stringent healthcare regulations.
Kubernetes and surrounding projects evolve quickly. Between February 2023 and August 2024, GKE released six Kubernetes versions (v1.26…v1.31). In parallel, projects like Istio, Kafka, Keycloak, and Prometheus shipped multiple releases each. This pace created the following challenges:
- Complex, interdependent upgrades – Continuous change across Kubernetes and cloud-native projects demanded extensive validation, testing, and cross-team coordination to prevent breakages and disruptions.
- Velocity and agility bottlenecks – Dependencies across numerous projects introduced unknown incompatibilities, slowing safe change. Dexcom didn’t want to choose between safety and speed.
- Risk of falling behind – Lagging on Kubernetes or project releases increases operational risk and can lead to condensed, high-stress upgrade windows.
- Ensuring availability through change – Real-time data processing makes continuous uptime non-negotiable for safety and compliance.
Rather than adding headcount, Dexcom partnered with Chkk to navigate interdependencies across Kubernetes and the cloud-native projects while accelerating safe upgrades.
Solution
Dexcom implemented the Chkk Platform to simplify lifecycle management and enhance compliance.
- Streamlined Lifecycle Management – Chkk automated key tasks such as dependency analysis, release note processing, and impact assessment across Kubernetes and Cloud Native projects, cutting down research and planning time by up to 8x.
- Upgrade Copilot & Preverified Plans – Chkk’s Upgrade Copilot automated tedious pre-work and delivered Preverified Upgrade Plans for clusters and Cloud Native projects, tested on a digital twin of Dexcom’s infrastructure, ensuring safe, well-orchestrated upgrades.
- Repeatable Upgrades with Curated Workflows – Chkk standardized workflows and enabled task delegation, reducing reliance on expert knowledge and making complex upgrades repeatable and efficient.
- Avoiding breakages with Safety, Health and Readiness Checks – Chkk covers thousands of versions across 300+ Cloud Native projects in its curated library of preflight, inflight and postflight checks, which Dexcom used extensively to ensure upgrades are disruption-free.
- Conformance to Operational Guardrails – Dexcom used Chkk’s Guardrails to update hundreds of Helm charts owned by application teams, ensuring conformance to safety primitives at the source of their software development lifecycle.
"Chkk transformed our upgrade process from a high-risk, manual effort into a streamlined, automated workflow. The level of insight and safety nets they provide is unparalleled." — John Rzeszotarski, VP of Infrastructure.
Outcomes
By implementing Chkk, Dexcom achieved significant operational and financial benefits:
- 200% increase in upgrade productivity, ensuring business, regulatory, and compliance goals were met.
- 80% reduction in upgrade preparation time, eliminating weeks of manual research and validation.
- Improved operational efficiency – Cloud Engineering team could focus on strategic initiatives rather than break-fix efforts.
- Repurposed 2 FTEs by reducing manual upgrade workloads, allowing them to focus on high-value work.
- Higher compliance assurance – Timely upgrades ensured adherence to regulatory standards and mitigated continuity risks.
Dexcom’s experience with Chkk demonstrates how automated lifecycle management of Kubernetes and Cloud Native projects can transform infrastructure operations, reducing risk, saving costs, and enabling teams to focus on innovation.
"With Chkk, we’ve cut our Cloud Native infrastructure upgrade prep time from weeks to days while ensuring the highest levels of safety and compliance." — Chakri Paladugu, Staff Engineer, Dexcom Cloud Engineering.
Takeaway Lessons
- The real upgrade challenge isn’t Kubernetes—it’s Cloud Native projects – Managing interdependencies across tens of projects like Istio, Kafka, and Keycloak is the primary source of complexity.
- Delaying upgrades poses a serious business continuity risk – Waiting for extended support only delays the inevitable; without proactive planning, companies face rushed, high-risk upgrades.
- Lifecycle safety tooling is a must – For large-scale, regulated environments, safety and compliance must be baked into upgrade workflows to prevent breakages and failures.
- A proactive approach accelerates Upgrades by 3x-5x – Chkk ensures fast, safe, and compliant upgrades.
- Safety and agility are achievable together – The right automation and risk mitigation tools enable velocity without sacrificing stability.