A Biotech Company achieved 99.99% continuous uptime against 24 hours recurring shutdown

Company Summary

A clinical-stage company approached us to resolve the challenges they were having with their cloud infrastructure each time they deployed a critical line of business applications on the system.

Cloud Service Provider

Kubernetes on GCP

Business Challenges

The company had an IT team with little knowledge on the best strategy to deal with a recurring bug when specific service level applications are deployed which made their cloud operations to randomly shut down several times within a 24hr period, and made especially worse during off-hours.

Our Solution

We used “MatosSphere” to run a deep observability test on the system to detect the root cause of the issues. This enabled us to 1-click remediate their cloud environment by reverting to the last working state of the infrastructure. Then we set up replicas for working pods so that at least one workload pod is always active at any instance of time in case of k8’s. After that, we auto remediate using snapshots to trigger new instances and finally setup DR to enable continuous uptime.

The Results

Our 1-click remediation solution fully restored their cloud infra into operation and they no longer had issues whenever applications were deployed.The company achieved 99.99% seamless performance and optimum security of their cloud infrastructure and constantly used snapshots to trigger new instances and auto-remediate issues immediately.