Pedro MartinezPedro is a Technology Leader, former CIO/CTO and co-founder, speaker, father of 3, former paratrooper and US Army veteran.  He counts with 19 years of experience in Tech.  Follow Pedro for valuable information about Cloud adoption and overall Digital Transformation.

“You’re either part of the solution or you’re part of the problem.”


Achieving Zero Failure

Posted by   

No Comments »

These are the pillars of Zero Failure

After years of walking the DevOps and Resiliency road, I can come to the conclusion that Failure is to not being able to Recognize, Measure and Partner and CICDCF.  Well, that feels kind of cheesy and tricky, doesn’t it?  Let me explain.

In the world of DevOps and Reliability Engineering (SRE), what do you consider a failure?  Is it a percentage of outages, or system or application failures?  Maybe a buggy code?  Is it getting fired from the job as a result of a severity incident?

Zero Failure

Recognize system and application failures will always happen.

We must recognize that systems failures will always happen; this is true for a “monolithic architecture,” for microservices architecture, for infrastructure deployed in containers or even for serverless workflows.

We have heard the phrase “Fail Fast and Continuously” many times.  However, do we know why?  We now do.  

Our job as SREs or DevOps Engineers is to build for Failure, with Resiliency and Automation baked into it.

The faster we find flaws or failure, the faster we can recover, in theory.  We can begin designing and building for failure. However, how do we know to identify a failure in systems, applications or directly business processes?  Well, we measure.  What do you mean?  How do we measure?

Center Image Credit: Tanu McCabe

To understand what success looks like, we need to measure.

We all have heard the term–”you can’t improve what you don’t measure” before, but as a DevOps or Reliability engineer, how do I go about doing that?

I’m glad you ask. Let’s take a look at a couple of examples:

  1. Agile stories: Let’s say we are operating in an Agile or Scrumban delivery model. We want to size the bulk of work that Epic or Story represent.  Estimate in story points, which could be equivalent hours, days, weeks, and that could give is some way to gauge if we are successful at delivering according to our agreement with the customer.
  2. Systems Availability: We can measure systems availability by uptime percentage, by outages model, by scalability models, among other methods.  Here is where Observability comes into play.

Let’s talk about Observability, or Cloud Observability for that matter, for a moment, because it is a critical part of Capital One’s success in our DevOps and SRE journey.

Tanu McCabe, Distinguished Engineer at Capital One, wrote an article earlier this year introducing the “4 Steps of the Resiliency Framework.”  I won’t go into details, but she proposes that if you build your resiliency around the framework of Detecting, Alerting, Responding & Recovering, and Refining & Testing, you can build an Adaptable, Self-healing and Predictive resiliency solution. The article is a good read.  Check it out.

You can begin by setting up monitoring intelligence with triggers and alarms on your microservices and systems to help you know what’s happening, and how the service or system is behaving or misbehaving.

They key word here is ACTIONABLE.

Monitoring for the sake of monitoring and showcasing a pretty dashboard with information that doesn’t make sense; it is not the right approach.  Furthermore, if you are monitoring something to get a pager, or call, in the middle of the night, that’s not the right approach either.

As you begin to identify components or things to measure, you also have to think about how to automate corrective or self-healing remediation; this is what is also known as reducing toil.

Build and nurture partnerships to achieve Zero Failure.

A partnership can be defined as an arrangement where parties, known as Partners, agree to cooperate to advance their mutual interests.

Who are our Partners?  Our Partners can be Stakeholders, our organization leaders, other teams that have a stake in our goals, and our customers, whether internally or externally.

Our partners can help us accelerate Success and enable Transparency.  If you fail to build the right partnerships, you may achieve short-term success, but it’s not an experience built to last, it is not sustainable operation model.

Let me give you some examples:

  1. In the past, my teams used to build multiple tools and applications to solve similar problems.  We used to incentivize innovation for the sake of innovation.  There was a lot of motivation to build an excellent shiny tool to solve a problem.  The real dilemma is that you had 5 other teams building a shiny tool to solve the same problem.  That wasn’t sustainable; there was a waste of talent, resources, and money.  We realized that through focusing on establishing partnerships, we began to reach out to, and engage with, teams across the company to solve common problems.  Things got resolved faster, with fewer resources and we began to turn our focus to the customer.  Our customers now were our most significant and most important partners.
  2. When you involve your architecture team, your security or cyber team, business team, the customer and others, you can accelerate adoption.  Your team can be more prepare for architecture reviews, code testing, or pipelines approvals.  You have more eyes watching for possible areas of weakness. Involve them from the very beginning and let them help you define priorities and roadmaps.
  3. Just as in the Observability use cases, we often “don’t know what we don’t know.”  The same principles apply to partnerships.  You can gain:
    • Perspective
    • Knowledge
    • Experiences
    • Skills
    • Meaningful relationships that otherwise you won’t get when you don’t partner
  4. When you partner, you can begin to align your goals across an enterprise level, you can begin to identify dependencies, collision, resources needs and begin to operate at the same cadence; you begin to identify common goals.
  5. A culture of diversity and inclusion is one tha elevates each other. Diversity and Inclusion are direct byproducts of an organization that makes Partnerships a top priority. Some argue that’s the other way around, but trust me [insert wink here], I don’t drink light beer.
Image Credit: John Stevens at synopsys

Continuous Integration, Continuous Delivery, and Continuous Feedback.

I wanted to share this image by John Stevens from Synopsys.  He wrote an interesting article back in March of 2018 where he highlights the differences between Agile, CICD, and DevOps.  You should check it out.

I argue that a team should leverage a combination of those 3 principles to have a successful SRE or DevOps workshop.  There is one thing that seems to be missing from these discussions just about every single time–The Customer, arguably the most critical persona.

I define Continuous Feedback as the practice of bidirectionally sharing the experiences with any given product, very frequently.

Add the customer/consumer to your pipeline.  Not just collect feedback, but analyze and discuss the feedback with your partners openly and transparently.

Who owns ensuring these principles are being put in practice?  You, me, all of us.  It starts with each engineer, and if we all do it, we’re going to start seeing a change in the culture.

The process of transformation can be an incredible journey.  You can start with looking internally and recognizing that change is needed, that transformation is need with the end goal in mind.  The principles of Zero Failure apply to all stages of the evolution cycle.

Including the customers on the journey, can help you deliver a Delightful Experience to them.

That is how we can achieve “Zero Failure.”  This is a view from a Management perspective. This topic is larger than a single blog post or article, and I didn’t even address the subject of Architecture Validation Cycle; that’s for another time.

What are your thoughts?

DISCLOSURE STATEMENT: These opinions are those of the author. Unless noted otherwise in this post, Capital One or any other organization are not affiliated with, nor is it endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are the ownership of their respective owners.