DevOps Decrypted: Ep.22 - Leaving the Danger Zone Behind in Modern Organisations with Gene Kim
In this very special episode of DevOps Decrypted, we are joined by none other than Gene Kim – the bestselling author of The Phoenix Project, The DevOps Cookbook and The Unicorn Project.
Summary
In this very special episode of DevOps Decrypted, we are joined by none other than Gene Kim – the bestselling author of The Phoenix Project, The DevOps Cookbook and The Unicorn Project.
It's a really great chat where Gene talks to us about his latest book, Wiring the Winning Organization. With our very own Laura, Rasmus, Jobin, Matt – and CIO Neal, join us in this fascinating discussion about slowing down to speed up, the limits of limitless communication, and the "social wiring" of companies that operate at the highest levels.
There was so much to discuss that we’ll have to invite Gene back to talk more!
Laura Larramore:
Welcome to DevOps Decrypted episode 22, where we talk about all things DevOps. I'm your host, Laura Larramoore, here with our Adaptivist panel – Matt Saunders, Rasmus, and Jobin – and we also have our CIO Neal with us today. We have a special guest today, Gene Kim, who is a Wall Street Journal, best-selling author and researcher; welcome, Gene.
Gene Kim:
I am so happy to be here. Good afternoon to you all. Good morning here.
Laura Larramore:
So, Gene. We have copies of the Phoenix Project and the Unicorn Project in all of our offices.
But now I hear there's a new book to be added to the library entitled Wiring the Winning Organization. Do you want to tell us a little bit about what writing this book was like and what generated the idea for it?
Gene Kim:
Oh, for sure. Yeah. Thank you for the question, Laura, yeah. So as some background.
I've been studying high-performing technology organisations since 1999. And that was a journey. I started back when I was a CTO and technical co-founder of a company called Tripwire in the Information Security Space, and I was there for 13 years. I left in 2010, and so our goal was to study these amazing high-performing organisations that were the best at Dev, the best at Ops, the best at infosec and compliance, and understand how did they made the good to trade transformation now so that we could other, you know, so we could figure out how other organisations could replicate those amazing outcomes.
And so the big surprise in that journey was how it took me into the middle of the DevOps movement, which is so urgent and important. One of the most amazing things that happened on that journey was meeting Dr. Steven Spear at the MIT Sloan School of Business in 2014.
And I cannot overstate how much his thinking has influenced my own. In fact, The DevOps Handbook was at least one year late; because of that course, it's like, Oh, my gosh! We've missed so many important things.
And so the quest we've been on for 3, 4 years is trying to figure out what is in common between agile and DevOps: the Toyota production system, lean, safety culture, and resilience engineering.
And the conclusion is really that they're all incomplete expressions of a far greater whole. And I gotta tell you it's been the most intellectually challenging thing I've ever worked on. Still, also the most intellectually rewarding, because you know they say the goal of science is to explain the most with the least, and I think it's a been dazzling to say, you know in any transformation.
There are 3 mechanisms of performance, you know. And you know, you can express everything we see, you know. In terms of these kinds of other frameworks, you know, through these very three simple things, anyway.
So that's what I've been working on. And that's been. It's been so fun to have the book finally out there. And yeah, actually getting people to react to it. I hope that it was rewarding and interesting to read.
Neal Riley:
It was also interesting to listen to. So I was. I was on a with Audible. That's how I listen to my books these days – and it was interesting to hear you talk about your right. You made the assertion that you have found not only something sort of fundamental but also the first principle that leads up to agile, DevOps. You're even talking about team topologies and many of these other principles falling under this general construction.
How did you land on this being the underpinnings of the construction? Are the "ifications” the underpinnings that you're referring to?
Gene Kim:
That's fantastic, Neal. So yeah, there, there are 3 of them. Let's give them some names. A. And I'll answer that the well, let's say what they are. First, yeah. So we're saying that there are 3 mechanisms which leaders use to make it so that people can do their work easily and well. And I think that's really the underpinning assertion that we make is that you know, since the job of leaders isn't actually do the work, it stands the reason that their job must be, you know, to help others do their work.
Well. We're saying there's kind of 3 main ways they do it. One is that they have to slow down.
There are some problems where you don't want, you know, people making all their mistakes in production environments where the consequences are so high. You can't undo it. You can't learn because learning is, you know, fundamentally experimental and experiential.
And so yeah, that one's all about slowing down. Speed up. We have many adages for that, like you know, stop sawing to sharpen the saw.
It's interesting that there's no word in English that actually says that one concept. You know, the notion of making a short-term investment for longer-term gain. So it was important enough that we made up a word.
The second is about simplification. So the first is "slowifying”, moving some work in time. So it's not in production, but in planning and practice – the second one is making the problems easier to solve.
And so we know that people need to improve at solving problems. And it's very difficult to solve problems when everything is in a big intertwined big ball of mud where small actions in one spot can ripple out and cause massive global chaos. And catastrophic outcomes.
And so, simplification is all about dividing large problems into small ones.
So you know, one of them is like waterfall to agile right incrementalism, and the other ones about modularisation, you know, like, that's the Amazon API rearchitecture project – and the other one sequentialization. A linearisation. So, that's the Toyota production system for sequential activities. Modularisation does for parallel activities, so linearisation does for interdependent sequential activities.
The third one is about amplification, the notion that you must take weak signals of failure and amplify them so they can be decisively acted upon to prevent, detect or correct better. So, yeah, to answer your question, Neal. The process was trying to show all these disparate practices, you know, across all these different frameworks like, whether it's Conway's law for these team topologies.
We know all these things are important, and then try to synthesise. What is more, a more straightforward whole. And I love the saying, anyone can take something complex, make it – I'm sorry, anyone can take something simple and make it complex. It's a little more of a trick to take something complex and really make it simple anyway. Sorry for the long answer.
Jobin Kuruvilla:
No, that's great; I learned a lot from that simple, quick answer. Quick question – all these ifications, you know. It does make a lot of sense. But at the same time…, by the way, I'm the head of DevOps here at Adaptavist – and when we go and transform organisations, there's a question that might arise: why slowification? That's the opposite of what we are trying to achieve. Right!?
I mean, simplification, amplification. Yeah, absolutely. We get it. We want to do it right. Why should we be slowing down the point of hiring you or getting your help? Or, moving to this new fancy tool list, you know, do it 1,000 times a day, right?
Why should we even slot on so? Can you go a little bit deep into that particular ification?
Gene Kim:
Absolutely. Yeah. And you're so right, I think… Yeah, we don't wanna be doing 4 releases a year that don't feel so good. We want to do more like 136,000 per day, right, like that, Amazon reported doing in 2015 – and you're talking about slowification. Or, maybe it sounds like, you know, we want to ship more features and yet you're trying to, you know, pay down this, quote technical debt right, and do things that don't sound like features to me.
And so what they have in common is a notion that at some point. Yeah, we have to slow down to speed up. And that's not free, right? And there have been so many attempts to be able to say, you know quality is free – you have to, you know, this is a short-term correction, you know, for a longer-term gain, and you know what's we thought it was interesting that there was no word in English it had this – the Germans do. It's with the is called Verveserum, the notion of, you know, you're deliberately slowing down, you know, to speed up in the long term. We hypothesize that maybe the absence of a specific word in the management literature prevents us from actually saying and expressing something very important. If you can't say it, maybe you can't think it? So.
And there are many other words that we looked at. You know, one was deceleration. But you know it's like, it's, the point isn't to decelerate. The point is, yeah. You want to accelerate ultimately. One other thing that we thought was important about this was that one of the influences, the primary influence for slowification is Dr. Kahneman and Tversky. So that's Daniel Kahneman and Amos Tversky.
So Kahneman won the Nobel Prize for his contributions, saying there are 2 modes of thinking, you know this sl mode. One system. One is all about being slow, deliberative, contemplative, and creative.
And then there's Mode 2, which is system 2, which is all about instinctual fast biases, habits, and routines. And you know, when we are doing most work, you know, we we are often, you know, we only can access most of the time system 2, right? To figure out how to do things differently requires activating system one. And we say we say that, you know, there's something very specific that needs to happen in knowledge work as well. Right does that. We have to sort of trigger the slowification. So we can activate system one, you know. So we can develop new modes of thinking, develop those. \
You know, we've got to figure out what's going on so we can automate it. We've got to figure out how to reserve time, you know, to, you know, do the automation work right again for that longer-term gain.
Rasmus Praestholm:
Yeah. And I had thought about it, too, because I've been in a similar boat as Jobin. Overall, like… slowification is a great word for a book. It gets people talking, which is probably on purpose.
But then, when you're trying to sell something to clients, it's like, Whoa, what are you doing here? And I was also trying to think about it behind the scenes related to the Agile Manifesto. And all these things like it's not so much that you're trying to slow everything down.
You're trying to be more thorough or even more mindful.
So it's a slowification. It's more like a thorough-iffication, or mindful – but mindful and other terms have gotten, you know, hyped up on other things, too. But I can see. It makes sense that. Yeah, you are slow down to go faster. You're really just making space to be more thorough and thoughtful in your process.
Gene Kim:
Yeah, I love what you just said, Rasmus. I mean, I think I love the fact that you said, yeah, it embodies a concept that you've developed over time – a sensibility.
And my fondest hope is that, you know, people will be able to, you know, in their daily work, or when they find themselves in situations where things are going terribly wrong. The learning can't happen right now, and we have limited ability to undo, that people will say this is an opportunity for us to slowify and do all these things like, you know, tabletop exercises, better planning, do rehearsals to do, you know, set ourselves up so that we can get to a place where you learn. The example that we use in the book is Chaos Monkey. Now, Chaos Monkey is one of the best examples of slowification they made all these investments so that ultimately, they were able to deliberately inject faults in the production environment so they could always get a sense of confidence that they could survive things when inopportune things happen, which is a reason why, in what? 21 April 2011. They were one of the few services that survived the first massive, you know, AZ availability zone failure at AWS. So yeah, I love what you just said, Rasmus.
Neal Riley:
I want to echo this, too, because, in one of my favourite YouTube videos, I sent people who come from a world that wasn't tech that I used to know. And now I live in this tech world. They asked me what that world's like. John Cleese did a really good take on this in the eighties.
Have you? Have you seen it where he was? He was speaking to a a management sort of seminar thing, I believe, in Polish. The version that I saw was done in a manner. But he was talking about open-mode and closed-mode thinking.
Closed mode thinking is what you're calling mode 2, and mode 2 is, once you know the idea, you shut off everything possible around you to distract you, and you just march towards it. That's a sprint. That's yeah. That's agile. That's that's that style. Event. Let's call it that way.
Open-mode thinking is completely different. There's no wrong answer in open mode, thinking there's never all you're trying to do is expand as many opportunities as you can. It turns out the Stanford Design School's method follows this pattern, right? These things have a lot of very good lineage across a number of different areas. It's true.
But I agree with Rasmus; slow is an interesting word in that context…
Gene Kim:
And by the way… To tell you where my head went – when you're in the fast-moving production environment, you're in the middle of an outage. You have no idea what's going on. You're just trying to make sense. There are certain avenues of thinking that are not readily available to you. Right? I mean, chances are you're in this fight/flight mode – you actually have to rely on many routines and training and, like previous experiences, it is not the best time, you know, for you want to. You know, have a wide expansive, you know. Ranging questions like, Okay, what's the underlying causality is happening here? Those are not modes of thinking that are available to you at that time, and there's actually, even for people who are good at it, they're often not in the loop close enough, you know, for the people in the middle of that situation to access.
One example is in the Apollo 11, you know, during the Apollo 11 landing, I was talking to the person who wrote the lunar landing software, who was… So when the program fault codes 1201, 1202 errors came up, and apparently, Neil Armstrong actually lost all displays you know of, like, you know, altitude, rate of descent, etc.
All the people at the MIT Labs were there. They were listening to it, but they were not in the loop. They could not actually contribute a lot of valuable data for the landing that's happening, you know, on the moon, you know, millions of miles away.
So open closed thinking within our brains. They may not be able to be accessed at the time. And there's all this expertise that we need to be able to retrieve because it's either too far away, temporarily or spatially, where they can't directly contribute to the problem-solving.
Anyways, that's fascinating.
Neal Riley:
No, I completely I completely agree. And I have to say, as the child of an army colonel. It was very interesting to see that your forward is about or from, you know, the head of the Navy basically going, look, if I had read this, it would have been so much easier to do my job.
How much more of you know of a promotion can you possibly get? You know, it's like, Oh, do you want to run a navy? Well, yeah, hang out with Gene. It's great.
One of the things. The reason why I bring up the John Cleese example is because one of the things that John Cleese talked about way back then was setting the right environment for the different modes of thinking.
And it's exactly as you were describing just a second ago. You closed mode, thinking you need a very specific environment to enable and be successful. You need to be able to remove distractions. You need to know that you're only focusing on one thing, and there's the deadline, etc.
In the world of well, in the book you definitely identify these three ififications, we, we focus on one of them so far – you also talk quite a lot about the shape and the melding of an organisation through this metaphor of circuitry. Now, circuitry triggered all sorts of thoughts in my head, some of them biological, some of them, I mean. I went quantum physics for events like…
Okay, what is the circuit really like? What is the circuit? Really, Gene, like? What do you? What do you mean? As you wrote, when you look up social circuitry and Wikipedia, what you get back is how a brain is wired to be social. I think you're talking about something else. You were talking about the shape of an organisation, weren't you?
Gene Kim:
Yeah, exactly. In fact, yeah. So, there are essentially three layers where work is performed. One is layer one, like the work in front of us. So that could be the code that could be the binary that's running in production; it could be the patient in front of us.
Layer 2 is the tool we work through so that it could be the platform on which our code runs. It could be the IDE, right? There are usually teams associated with supporting those technologies.
Then, layer 3 is what we're calling the social circuitry. It's the organisational wiring.
And I love that Winston Churchill quote. You know, we shape our buildings, and after that they shape us, right? As leaders, we create the social circuitry, and after that, it shapes us. We make the architecture we work within, and forever after, it shapes us.
And we are saying that leaders are ultimately responsible, you know, for creating the wiring of which everyone works within. And so when we're talking about the Amazon API re-architecture in the early 2000s, you know, they had found themselves in architecture where it used to serve them well. They could do hundreds of deployments per year, when they had 2 products, books and music, and things got harder and harder to the point where they can only do tens of deployments a year. Most deployments still need to finish. And they were caught in this social circuitry, this architecture that required even small things to require massive amounts of communication and coordination and signaling and prioritisation and sequencing and deconfliction…
And worst, you know. You have to deploy together, and so even anything going wrong causes everyone you know, the deployment to abort.
And so the whole magic of the two pizza team was, you know, Jeff Bezos saying that he wanted less communication, not more. He wanted teams to be able to work independently of each other so they could be deployed towards Amazon's largest problems on behalf of customers.
And that's what led to this hard partition between teams, so that couples could regain independence of action – so that they could independently develop, deploy and deploy value to customers, you know, without needing to communicate and coordinate.
If I could make a little detour.
I read a fantastic interview of then Amazon CTO, Werner Vogel, who is still CTO. And yeah, he talked about that era, and I read that paper 20, 40 times. I missed this one phrase until this earliest year, where he talked about this ridiculous situation that the Amazon digital team – so it's Kindle, music, video, where, you know, whenever you ordered one of those products, you still had to provide a physical shipping address.
And he said, Yeah, it was ridiculous, but they had to go to 60 different ordering teams and say, Could you please change it so we wouldn't have to provide the physical shipping address? And the response was, you know, we didn't budget for it, and so they were stuck right. And so, you know, that's what led to that famous memo about "thou shalt only communicate and coordinate with each other through these APIs, " which was a rewiring of the organisation.
And so leaders rewire either through slowifying, so moving some activities from production to planning and preparations, they can change how we work so that they're more modular, linear, or less large-batch waterfallish. Or we change a signal so that, you know, we create a system where even weak signals of failures are noticed and acted upon or the worst cases, you know, we accidentally create a culture where weak signals are suppressed or even extinguished entirely, which is obviously not so good.
Rasmus Praestholm:
That's a hilarious example that feels relevant because I work with a client in the UK. And even if I have to put in a ticket to access the repository. I have to enter a shipping address. And since I'm in the UK, I apparently live in Buckingham Palace when it comes to sending me things related to my repro request to GitHub.
So, there's so much fun stuff out there, and much like Neal pointed out, that case study with the Navy in the intro. There are a lot of interesting case studies that you put in that book, including the gold ones, that probably hint at conceptual similarities in this problem space dating back probably like to the punchcard era… because people are still people, and the tech just changes. Like one thing we've done previously in this podcast, we talked about demystifying Kubernetes.
Because Kubernetes is complex, and at the end of the session, we got to the conclusion. Kubernetes is still complex. You just kind of move the complexity around. And instead of having a guy that's just physically wiring things in a server room, you now have Kubernetes experts that are just digitally wiring virtual hardware.
You have kept it the same.
But we've incidentally gained a scaling benefit. But the social change has yet to really be there.
So that leaves me with a few questions for you… Do you think we are genuinely gaining any radical social benefits over time? Are we just solving the same problems repeatedly with new tech and new terms?
Gene Kim:
Oh, that is a great question.
So let me… one of the most shocking things. One of the things that was so fun working with Dr. Spear was the fact that we come from totally different backgrounds.
His work started studying Toyota. And it's production, you know, famous production system, he wrote 1999, decoding the DNA of the Toyota – which I read in the early 2000s. He worked in aviation design and plant safety at Alcoa.
And you know my background is primarily software. And yet we had read so many of the same papers and had a common language, which was so interesting. So I learned many things.
But here's one of the shocking things I learned only in the last 90 days.
He said that hospitals were very easy to manage in the 1950s. Compared to now, you had two functional specialities – doctors and nurses and very little technology.
Compare that to these days, where there are scores of functional specialities within the doctors. Right? You even have radiology, which used to be just X-rays, now called imaging, because it involves five different potential technologies, each with other teams, you know, supporting them.
Yeah. Now, there's pharmacy and supply chain, and so forth. And it means, and it explains why it's so challenging to get excellent healthcare because several functional specialities have gone up by, you know, somewhere between, you know, 40 and a hundred x.
And so.
It introduces this. The big aha moment is the responsibility. The job that the layer three circuitry has to do was so much easier back in the 1950s. Both, in turn, because the functional specialities were fewer. There are fewer technology silos, you know, compared to now; I think what you're describing is, you know, very similar to technology today – is that we have so many more functional specialities. I'll name a couple right – security authentication, authorization, you got the database security, you have data masking, you have containers, you have Kubernetes, you have observability. You have all these things.
These are all things I don't want to do. I would far instead, in fact, my worst Thanksgiving last year. I spent one very unrewarding week learning about logging. It wasn't something I mainly wanted to know. Java logging. It turned out to be important, but I finally had to learn it. But I did. I care about it? No. Did I get a lot of joy and reward learning about Java logging? No – but did was was, I find. Was it important? Yes.
Would I instead have relied on a group whose entire role in life was to solve logging for thousands of engineers? Yes.
Is that available to cell developers?
No! Anyway. So this is like the fact that there are so many specialities. This is what allows, you know, where being big is an inherent advantage because it allows for these specializations that can empower, arm, and uplift thousands of other developers.
And it's a shame, Rasmus, when we create these functional specialities, when there's not even a problem when it's not even relevant to the main problem being solved. Right? We just now stood up another Kubernetes specialist group that makes it even more difficult for work to get out the door right with no actual benefit.
That would be a shame if that were actually happening, right?
Rasmus Praestholm:
This reminds me a bit about how I have interpreted the price of platform engineering because there should be a group focusing on logging, and then they expose it. Here are ways to log in to some platforms, like an IDP, Backstage, etc.
Gene Kim:
Can I give you one example of my favourites? I love platform engineering love. You know how that team topology articulates the need for, but one of my most shocking examples of this was…
A small team of 5 to 8 engineers at Google is called the Java Platform Team. And their goal is to be very good at Java, because it turns out like, you know, the thousands of Google developers who are, you know, have a decades history with Java, even they struggle with the migration between Java 8 and Java 11 that they were. It actually triggered. You know these out of memory errors because of how you know, garbage collection was different in, you know. G one GC, any rate, that the point is that as teams migrated to Java 11 and the new version of JVM, the rate of container out-of-memory errors went through the roof, and it turns out to be one of the reasons is that the developer instinct is to ask for more memory, and it still caused errors which caused them to increase more memory. So you had these services in production that were using wildly more memory than they should. And so this team figured out, all right, how do we? How do we fix this for developers?
But they actually had a longer-term goal of making it so that services spend more time in garbage collection because the most scarce and most expensive commodity at Google is not compute; it’s memory.
And so they showed these experiments where you could get Google services, you know, running in a tenth of memory, right? Saving tens of millions of dollars. Right? And this is just in the pilot, anyway, just so great that small teams can have incredible, incredible benefits, you know, for thousands of engineers. And you know, impacting, you know, tens of millions of dollars, you know, potentially heading to hundreds of millions of dollars.
Neal Riley:
And you would you would call that small team of Java... Let's call them zealots. Let's call them aficionados – you'd call them a slowification team, right?
Cause really, they're not in the they're not in the day to day dregs of I'm migrating this version of this? JVM, on this day, they're going. How does one think about JVM in the first place? What is the…
Gene Kim:
Yeah, exactly. And their product takes advantage of modularisation. Right? So that's for the developers. Their world didn't change at all right. Their interface to the outside world didn't change at all, and yet they gained the benefit, all the benefits of what the small team is doing.
You know, in terms of memory usage, cost, power consumption, etc.
Neal Riley:
Yeah, we know these problems all too well. Building a JVM product on top of on top of a JVM, right? That's a very common trope I think, but…
Jobin Kuruvilla:
But when you talk about why, in the organisation as such, will these small teams be the bottleneck? Right? I mean, will they be the single point of failure? That is always a concern, isn't it? And how do you scale this to a much, much wider organisation?
Gene Kim:
Yeah, I love it. And I mean, this is a great question. And I think what is amazing about these platform teams is they because everything is either self-service, right so that they can't become the bottleneck, you know, because everything is on demand. You don't have to open up a ticket. You don't have to pester them for weeks. You can get what you need, you know. I can get the logging to go where it needs to. I can scrub all the data, you know, without waiting for someone to actually, you know, work on my ticket.
That's how these amazing platform teams and cloud have, you know, empowered and enabled so much developer productivity. But there's another thing that platform teams do, which is that they work behind these fixed interfaces so that they can experiment independently of someone else.
In other words, every time this Java platform team at Google wanted to potentially change a setting, they had to communicate and coordinate with a thousand other developers. You know, at Google. And if any one of them said, Oh, no, not right now, I'm working on something very critical. Then, they would be stuck.
And yet what happens instead is that just with any good modular system, you know, teams can. And behind this interface, they can change independently, they can experiment, they can try – they can search a large exploration space, right?
And it's like you. Now have a hundred roulette wheels that you can spin, and if any of them hit, then you can sort of shift that in and take the winnings right, and if they don't hit, you don't change anything at all, right? So you take the wins, and you ignore the losses. I think it was fascinating that economists talked about the benefits of modularity.
So, to your point, right? You know you're doing something wrong when a platform team inadvertently creates a bottleneck. I think it's either because they violated self-service or they haven't created a hard partition, you know, between them and the rest of the world.
How am I doing?!
Jobin Kuruvilla:
That's absolutely great. Yeah, I think we are doing really great; yeah, fascinating chat.
Rasmus Praestholm:
So… This still feels a lot like talking tech. There's a lot of helpfulness to people in that. Here are platforms that innovate teams, hear good things, and so on.
I am also wondering, what can we do at a tech level to encourage improvements to the social circuitry of an organisation?
So, for instance, there we are working on this DevOps product that's, you know, platform engineering, of course, because everybody's doing platform engineering.
And I've been leading up some of the design elements for it. And I have tried like, OK, yep, many tools, tools are over here. They're fine. We know that that's not like a novel. Everybody's doing tools. We know platform engineering and so on. But we want to focus on the people-in-process bits of the DevOps triad. But how do we do that? How do we support that inside a product?
And so far, I've mostly focused on really emphasising relationships between things. So you can find the groups, and you can know how they interact and how they're doing over time, and so maybe so you can raise visibility, amplification on these teams are struggling in some ways. So let's help them.
But that's still just about making more visibility and so on. Can you really? What can you think of that would help tech support the improvement of social circuitry?
Gene Kim:
Yeah, that's a really great question. In fact, It's so good that I wish we had put this into the book.
But let's see if we can sort of survey the landscape; I mean if let's go through each one of the principles, and let's see if we can tease out what they must be.
So with slowification, I think the failure mode that would emerge is everyone knows that this is like the best thing since sliced bread; everyone should be using it. But there's no time in the individual team schedule actually to adopt it. Right? There's that pressure to ship features is so great, you know, that we can't make this short-term investment, even though there's a longer-term gain. Right? So that's clearly a need for leadership to help slowify, you know, to get groups to get from here to there.
I think that's definitely one. And simplification, right? You know, one failure mode would be like shifting everybody all at once. Right? And yeah, that's probably bad because, you know, we've now forced every group to take on a bunch of novelty, and we can't learn from lessons learned by the previous group, right?
So, you know, we've sort of inadvertently pushed everything to do it all at once when a better approach would probably be something staged. Right? Let's take the people who really want to do this, who see the benefits right, and then use that as a stepping stone, you know, for a wider spread adoption.
Rasmus Praestholm:
I just had a thing that hit me. Of course, you put some neat diagrams in which you show the danger zone and the good ones and map things. I wonder if you could do live maps like that on a platform.
Gene Kim:
Oh, yeah, absolutely. In fact, you know, what I'm doing right now is walking through each phase. What are the danger zone winning zone characteristics? Yeah. So, and so I'll sort of think out. I'll continue to think out loud and sort of explicitly name those.
That's a great point Rasmus – so another, a definite failure mode would be around modularization. Right? So you know the fact that you have a platform capability the goal would be to, you know, once you're on the platform. You can hide more of the world to the users. I mean, I think one of the amazing things about Google was Google Borg. And so I like one of the surprising things about Google Borg is that it was considered such a competitive advantage that there was no outside mention of it written down until about 2013.
So it ran over a decade in production. And so it what is Google Borg? It was. The predecessor is what Kubernetes is modeled upon. It used to be called the cluster manager. It's actually Kubernetes, Google Kubernetes engine actually runs on Google Borg – most core services run on Google Borg.
And what was so different, the difference it made to Google engineers is that you could write code as if – so you didn't have to worry about what happens when your code, when your server just disappears, you didn't have to worry about like which data center you're running on right.
So all the things that we take advantage of and take for granted in Kubernetes. So it was, it allowed developers to focus on, you know the the business functionality right? Not about resilience, not about observability, not about, you know. You know, hardening against resilience failures.
And so what's marvelous about platforms that once you're running on the platform, a whole bunch of other problems are taken care of for you, and you don't need to worry about them. So yeah, I think, that's probably a and it creates independence of action right? The platform can improve independent of the teams that are running on that platform. So what's the danger zone there?
Somehow that dependencies leak out. The abstraction leaks out so that suddenly developers now do have to worry about, you know, Java logging you know, configuration settings when ideally that should be hidden from them arguably.
Linearization is you know, when you have dependent activities that should happen sequentially and things get intertangled between them right? And so the beauty of CICD is that not only is it automated, but the build engineers are now gonna have independent action from the QA Engineers and the deployment engineers.
And so we know that independence of action liberates lots of creativity and abilities to solve problems independently of each other. Right? The opposite of that is what in the battle days, 10, 15 years ago, where you have these huge Gant charts of, you know, 1,300 activities that almost happened sequentially. And if anything goes wrong you know the entire thing, you know, might blow up.
And then amplification… What's the dangers on activities there? Oh, yeah – right things go wrong in the system, and the signals don't go to where they need to go, and they're either so weak or faint that it might get transmitted, but never gets received.
If I can add one story that I just love in the book, it is the Southwest Airlines holiday failure.
When this was unfolding, and this is about one year ago, Admiral John Richardson Stephen Speer, and I were texting for almost a week because it was this incredible example of signals not getting to where they needed to go. So, you know, when the winter storm Elliott hit, there were nearly 10,000 flight cancellations.
And what was remarkable, though, was that most airlines could recover in a couple of days. What happened to Southwest Airlines was the number of cancellations kept increasing, and the reason was their crew scheduling system – so as widely was reported, what happened when Southwest Airlines the at the end of each day, if they were not in the city they were supposed to be in, they had to call like a special number in the cruise scheduling office, and they had hold times of, you know, half hour hours, sometimes scores of hours.
So if that information didn't get factored into the schedule when the next day the planes weren't where they thought the scheduling system would be right, and so they had to cancel the flight, and so ultimately what they had to do was essentially reboot the airline network, right?
These empty planes need to be flown to where they need to be to resume normal operations.
And so this was such a great example of where the control overlay, the layer three social circuitry, could not transmit information effectively enough to where it needed to go to the people at the edge – I thought, this is such a great example of, you know, control theory and action happening, you know, in a space where you could actually see the consequences where the information was actually being transited by telephone right into crews building systems.
Neal Riley:
Makes perfect sense makes perfect sense. The questions of level one, level 2, and level 3 are really interesting interplay. Because you're right. Just as the social circuitry is built on those relationships and those interconnected – coupling and loosely coupling is you define in the book – and yeah, yeah, it's really good.
The reality is, though, I think the case of Southwest is a really good example. We see examples like this all the time of customers dealing with the issue of not having the right environment to be in that winning sphere, and there are only so many levers I can pull quickly. One of the levers I can pull is swapping out Github for GitLab. Right? I can. I can do these kinds of okay, cool. Everyone's now on this thing.
But as Rasmus was asking about it, I think I agree. The issue really is in that. In that relationship between levels 2 and 3, it seems to me with the customers we talked to in the way that we worked and the way we operate it with them, there are things that can happen at level 2 that are almost… preordered conditions, those things that actually enable just the very basic.
So you're talking about, you're talking about amplification – amplification, getting something to the right person at the right time is kind of a tiny bit of automation, if you're looking at it in that level, 2 sense, right? It's a tiny bit of automation.
But it's also quite a lot of, actually, you really need to know a bit more about who you're dealing with. You need to know a bit more about the team. You need to know a bit more about what they do and what they're working on. Otherwise, everyone is just getting blasted with – hey, somebody uploaded something to the board. Oh, God! I can't even imagine the notification queue. You know what I mean?
Gene Kim:
So, one of the things I'm hoping people really remember from the book is the story of Steve and Gene moving a couch.
And the story goes, you know. Steve and Gene need to move a couch. And you would think that this is all braun work, no brain work allowed. And it turns out that there are actually a ton of problems that they need to solve. Where's the centre of gravity, you know, to get through a narrow doorway around which axis do they need to rotate to get down a narrow, winding set of stairs, you know, who should go first, and do they face forwards or backwards?
And what's remarkable is that you know they don't need a bunch of consultants. They don't need focus groups, right? Just by picking up the couch through trial and error, fast feedback, right? And communication coordination, they will probably figure out how to solve the problem.
But there's all these leaders. There are all these things that leaders can do to make it very difficult for Steve and Gene to do their work. They can turn off all the lights, which, you know, makes the work more dangerous. It will take longer. People could get hurt, furniture could get damaged.
We could also, like, introduce a lot of background noise like a siren, like loud, loud music. This makes a problem more difficult, but in a way different than just turning off the lights – we could introduce an intermediary that prevents Steve and Gene from talking directly with each other. We can make them go through Jira tickets. We can make them, you know, go through work orders with lawyers involved account managers right?! And what's remarkable is that this, too, will make it very difficult for Steve and Gene to do the work. And so moving a couch is a metaphor for joint cognition and joint problem solving. And so what you were saying, Neal, is that often, the best thing that we can do is embed someone from the platform engineering group into the Dev team, because the developers may not have the language to be able to describe what they need right, and so the best thing that we can do is to actually have a platform engineering expert observe their work right to deeply understand the problem, to help them on board or to better yet, you know. Maybe if there's something that the Platform can't do, build it for them – because these platform engineers are so expert that they can understand what needs to be done, even though the customer can't even articulate what their problems are.
And so I was mentioning that somebody I actually saw a properly formed Kubernetes deployment file, and it was so shocking to me, and it made me realize I don't even know what correct YAML looks like.
So and this is not where I want to be spending my time, and and so if I can add one more thing on that. So whenever you have people moving a couch, it means that they're inherently coupled together. You know what effects the couch affects Steve and Gene – what affects Gene affect Steve, and vice versa.
And you know, so sometimes that coupling is inherent in the problem. Dev and Ops. It turns out that was a problem. So there were so many problems to be solved, and the information that needed to be transferred was so great that Dev and Ops had to be coupled together, right? Microservices represent where you don't actually need a lot of communication coordination.
And my favourite example of this was the Apollo Capcoms. It turns out that during the Apollo Space program or the crew in space, it turns out there were only about 10 people on the ground who were allowed to communicate with them.
They were called the capcoms, and they were astronauts, and they were not just any astronauts. They were either the people who trained the astronauts in space or they were the backup crew. And so when you have an emergency in space, it turns out the bandwidth is finite, and the need to communicate and coordinate is so critical that you need an astronaut on both sides of the channel. Right? Because, as Alan Kay. What Alan Kay once said.
When a message is so important, don't send a message. You must send a messenger – right?
And so it just says when moving a couch. Sometimes, the best thing you can do is actually own both sides of the couch, right? So that you can get maximum information through that scarce, sparse channel.
Jobin Kuruvilla:
But this is a great example, right? I mean, as I said before, you know, we look at transforming organisations. And one of the things that we try to train people on is, don't look at just the solution. Look at also the problem, right? And people come to us and ask, Hey, Steve and Gene, I need help moving this couch.
Now, pause and ask, okay, why are you moving the couch? Okay, we can definitely help you move the couch. We have all the tools, and you know the people to move the couch. But why are you actually trying to move the couch? That probably might tell you an answer where you don't have actually to move the couch.
There may be a better solution out there, right?
Neal Riley:
It's also why moving the couch is hard. I love that question; our CTO Jon, who couldn't make it to with us on this episode, will be in a future one. This is a question he asks all the time.
It's like, What about this conversation is hard. Right? Okay, there's a couch – couch moving is something most people could do. What's going on here?
But, Gene, it's not going to surprise you. We're one of these groups that's unique. In other podcast episodes, we've talked about our structure and how we operate. Rasmus and I are giggling behind the scenes as you're talking about this because in this relationship in our group, we actually have those project managers – we even have legal involvement sometimes when we're working together on contracts.
The reason why we can do that is because of our consultative background and because of the growth that we've had in our product businesses. We're actually pretty good at writing a minimum viable contract.
And I and I truly mean writing on a piece of paper like a minimum viable. And actually, right now, my customer in the room is Rasmus. So in between our business units, the way we're operating, we actually have these internal contracts and statements of work. The reason why we do this is we codify the way of working into the contract, and in some cases, what you need is, oh, you need staff augmentation. You need someone dedicated to your team doing what you do, whatever.
Sometimes, they go out, or your internal or external customers might go. You know what? I don't need the people. I really don't care about Java logging. I really don't wanna care about this. I never do – cool! We've got an app that solves that.
And I was, I was laughing, as you were telling us. One of the groups working with us right now has solved that logging problem four different ways. One's a consultant, one's demanded service, a product, and an SDK, right? And it is one of these areas, depending on how you wanna to think about Java logging. But it's one of those very good – you've got a lot of really good examples, Rasmus. We were talking about the Reno example, right?
Rasmus Praestholm:
Yes, I'm a big fan of the hotel renovation example in their couch movement, especially because I could see myself there, which leads to a question.
I also wondered when I was looking at the book – great book, new perspective, new terms. I love it – who is it for? And I'll I'll put myself into the example for that because I feel like sometimes I'm a little bit of a weirdo nobody. I don't have reports or an area of authority. I post things on Slack, and nobody listens, so like.
Can I do anything with this book other than throw it at people and try to hit them or get them to read it? But then I put myself in that example.
I am standing outside the hotel.
And I'm seeing all these movers and painters run all over the place, and they're like out the window, and that these 2 dudes are trying to organise it.
And like they're getting places. They finally figured out some things. But then I'm like. I can see something. : I have an idea. I think it'll help. But like I am out here, I'm a nobody.
How can I help? How can I be in the picture?
Gene Kim:
Yeah. But one of my favourite descriptions of the Phoenix Project was exactly that use case – he would throw the book. And I think he said he had a special version with like serrated edges, right? That he would throw the book at people.
And I think that was definitely a metaphorical figurative…
But you get to, I think, the most important question that anyone writing a book should ask from the very beginning is, Who is the book for?
We specifically targeted the book at the boss of the technology leader though it has been, I've been… One of my areas of passion for the last 10 years has been studying you know, we've called a DevOps Enterprise community. And so we've had 19 conferences, you know. over 1,100 experience reports. 1,500 leaders. And the goal is to really chronicle the transformations of you know, technology leaders in large complex organizations.
And we recently renamed the Enterprise Technology Leadership Summit, because a lot of people were saying, why would I wanna go to a deployment pipeline conference? Like, Oh, that's not what DevOps was meant to connote, you know, back in 2009.
So the observation is, and I would say 2 fold.
One is that the people driving these kind of DevOps transformations where they're trying to break down silos to better create value, help win in the marketplace. They were being promoted to rate far higher than the rest of the population. This is both for the you know, quote managers and individual contributors.
But it's been my observation that even those people, their fates, were heavily reliant on who their boss was, and if that boss changed it could go from great to not so great in a heartbeat.
And it just said to me that the technology function is so often misunderstood, and so Steve has his own goals. But my goal was really, how do we show things like, you know, platform engineering team topologies, you know, breaking down silos.
These are important for every organization, and it can be made explainable, not just to that technology leader's boss. But we can make it. We can explain in a way that will resonate in their own experiences. One of the things I'm proud of that many people will find surprising is that of the 25 case studies, less than a quarter of them are technology-related.
So I think the benefit of that is, and the examples that we bring up are, you know, Steve's experience bringing his daughter into an emergency room for a broken wrist and finding that, oh, my gosh! They had to wait in hours. In the waiting room. The X-ray was initially taken, but it was going to be taken on the wrong arm because of incorrect documentation or unreadable documentation. The cast was instead of fibreglass and is very kid-friendly; it was actually a plaster cast, you know, because of a breakdown within the communication supply chain.
To make the follow-up appointment actually required them to call an outside line and to a number that no one actually knew. It's just so. All these things were indicative of you know, inadequate layer 3, you know, social circuitry.
And turns out like, you know, these problems can be found in any engineering and any organisation doing significant things.
So I would say that even the choice of the forward from Admiral John Richardson, former chief of naval operations for the entire US Navy – he's on the Board of Boeing, he has to oversee the performance of CEOs and highly consequential organisations.
The goal is to aim for the people that he interacts with the people that are. Yeah. This technology leaders boss has to interact with, to make it relevant to them. Not just something, you know, bespoke to technology. But no, this is something that is universal to the way organisations work.
So I hope this enables that use case to say, Hey, look, there's something really bad happening here because we're not slowifying. We're not modularising, we're not linearising, we're not amplifying.
Here's a book that explains why it's important, what it is and how it's relevant to us and provides some prescription that, you know, I think any technology leader, you know, that means leaders at all levels, internal and external, can use to help drive a business case for things that we need to do to get from here to there.
Rasmus Praestholm:
So you're telling me that I don't need to get a book with serrated edges and throw it to knock him out? Take his job and then tell people I can actually, otherwise.
Neal Riley:
I love this…
Gene, I have to tell you. You've reminded me of a story from my past, and I've never told this story as a breakdown in level 3 social circuitry. But I will start this story in a way that most people in the UK where I live now won't be able to recognise.
I was held up at gunpoint in Oakland, California. So, long story, long story full, I will shorten this story to say that this was the year that Find My iPhone came out for the dev people. So, people had access to the Apple developer community, right? I had enabled that on an iPad, and I had just bought a little 4G device, which used to be huge right on, given my age, but I was robbed at gunpoint. They stole the iPad, the 4G device, and the whole bag.
I'm at my house, and the Oakland police department, who at the time is pretty infamous for not being really great. They're much better now. But at the time, they were having some trouble, they asked. Luckily, they asked the question, cause it's Silicon Valley, they said, do you have anything in your bag that can be tracked? Aha! Level 2, aha! I have.
I have enabled the tool fine. It turns out it was on Highway 280, so they say. Get in the back of the car. So I'm riding around in the back of a car following find my iPhone. I'm hoping it works. This is my trust in Apple that the developer community was there. Turns out it did. We get we get to a pizza place, it's a long story, near Fruitvale. Sure enough, we find the bag in the car.
In fact, it was the fourteenth thing that had gotten stolen that day, including the car.
Level 2 on the game. Now, here's the problem with Level 3.
I identified 2 types of police officers that day. There are the police officers who were level 3 engaged. They had their social circuitry switched on, or they were ready to accept social circuitry.
They came to me, and they asked the question, How did you do this? Because this is unique. How did this happen? And I explained that I would find my iPhone if anybody in the next month said they'd turned this on. This is a thing that you all can use.
This is incredible.
One of them won't name them. Don't know their name, really said. Hang on, what do you mean you can track any iPhone?
No, no, no, no, no, no.. You turn it on for your family. Blah, blah, blah!
So you're saying my spouse can track my phone? Okay? Breakdown social circuitry.
Sorry, you're not getting it. Yeah. Yeah.
And you could see the divide.
There was a gap, and it wasn't an age gap. It wasn't a gender gap. It wasn't anything like that. It was just. There are people who were looking at that level 2, the increase in the technology as a capability that they could use, and possibly you might call in some cases surveilled – in this case, I had the approval to be surveilled. Thank you very much for helping me out.
But right? It's just one of these very interesting things. You're right. These are technology problems, for sure. People who are operating in technology organisations are enabled organisations; they have these problems more than most.
But this is a problem. Even if with petty crime, this is a problem with, you know, theft on the street, one might argue.
Gene Kim:
And I'll resonate with that story. And I. So in, you know, adoption, we're always looking for these early adopters who are not just. Have a desire and a knack for finding promising technologies to solve the next kind of a horizon of problems, but they're also trusted. They're vocal, they, you know, people view them as, you know, the early adopters.
So you know. Do you want to use this new platform? I know I'll wait to see Neal's experiences because we trust his judgment. Now, so whoever this. These Maven are, you know, whoever he or she may be right, that they serve a very important function because they are judgmental and critical. Ideally, they have the best long-term interest of the organisation at heart. Right? They're not doing it for technology’s sake, but they're doing it because they have a sense of how people should be working, right?
What problems shouldn't they be working on like Java logging – important, but not something that is an area that should be an area of expertise for, you know, thousands of engineers.
In fact – can I tell you a brief story of, like, you know, the same patterns I love around platform adoption?
I'll tell you 3 of them because one of them. You were just reminding me of it.
There was a presentation at the DevOps Enterprise Summit 2015. It was Ralph Loura – CIO of HP. This was before the split up. and you know, he said. Kind of the role of shared services.
We view them as buoys, not boundaries. So the metaphor he used was, Like in a river channel, right? You stay within the buoys; you know you'll be safe; you know they're dredged. They're highly maintained.
But if you need to stray beyond the buoys. Right, you can; you must make a business case for it. You have to follow the same principles we like: run security, compliance, etc.
But you know we encourage you to do that because that might be the next. You know that innovation might be the next thing that we have to take into, you know, within the buoys. I thought that was great.
And I think directionally was just such a beautiful Aha moment for about what leaders needed to be doing in terms of saying norms are on innovation.
The second case study was on Target 2017. And so for those in the UK, Target is the sixth largest retailer in the US. And you know, it's one of the organisations I've studied the most over the last decade, and so they famously outsourced almost all of the technology folks, you know, that's thousands of engineers, and in 2013, they started bringing them back in.
And you know the the seminal moment was was Ross Clanton, Director of Development. Actually, I think he was Director of Architecture and Operations, he was formerly security.
He said his Aha moment was when they had to make a schema change to their data warehouse and the estimates came back at at 10,000 hours and 1 million dollars – because it was spread between 3 different outsources.
That should be 15 min of work. What are we doing wrong, anyway? So they, to atone for their policy errors, they wanted every development team to be able to choose the technologies they used to help atone for decades of, you know, not allowing them to do that. So in 2017 Levi Gaynor, the Director of Engineering, he described this problem they had, which was that you know, they had allowed so many technologies to come into the organization that developers could no longer switch teams easily, because they had all they're all using different technology stacks.
And so what they do about it. He described a quarterly meeting that they had with their CIO at the time Mike McNamara – and they published 3 lists as their kind of new mechanism of governance, and it was just amazing.
It said first category – everything lived on a Github repo.
The first category was, here's a technology that we love. And here's all the groups using it.
Second was – we're not sure yet. But here's all the groups using it, you know, so if you want to know the goods and bads, you know. Just ask them.
Third is do not use. Yeah, we have active projects to deprecate this middleware database – don't make a new one which we then have to then eradicate from the enterprise.
So I love that because it was just so different than you know, the what I've seen lead architecture, review boards, do you know, in, you know, for generations?
So the third one came from Comcast, so they own Sky in the UK. They own Universal Pictures, the largest broadband provider in the US.
About 8,000 engineers.
And they said, You know, we want every engineer, every technologist to innovate all the time – but not in certain areas. They said, especially, not in CICD pipelines, because we already have 14 of them in use.
We should have one, or maybe 2, and describe this process. They went through where it wasn't really a popularity contest. It wasn't like, you know. Would you know that they settled so they wanted to choose a technology that they rate from one to 5, and the goal wasn't to pick the technology with the most number of number ones. It was like who had the most number of answers of 3 or better 3, meaning I could support it "if”.
And what was remarkable was Jonathan Moore, one of the presenters. He was the chief software architect, he said – but it was amazing that we could get to a set of technologies, even though there was no short-term gain from any of the development groups. But everyone saw that there was a long-term gain, and it was actually caused by an outage that happened because a certain team was using a CICD platform that they didn't understand very well or weren't well supported in the organisation.
And so the goal was, let's pick one or 2 that get really, really good at. And then, you know, create a transition plan where you know we can move. Thousands of teams, you know, onto these one or 2 platforms, anyway. So they seem mutually contradictory, yet I think all of them are made with the same intent, right? How do we enable teams to do what they do best right, enabling them to lift the tide that lifts all boats?
Neal Riley:
Absolutely great. The rising tide lifts all boats metaphorically. It's a really interesting one. And it's it's interesting in that context of the buoys.
Because you're right; actually, this is our stated CTO strategy. And this is how we approach our consultancy, products, and everything else we do. But those shared services are a guide or guide post and should be the centre of gravity of where everyone probably needs to go.
However, when they are not and when they're not fit for purpose, every group that we've got has the autonomy to decide a different path, or an amended path, or recompose some of these services together to bring out something that's more fit for purpose and commit it back to the core.
Actually, we like it this way; we just swapped our identity providers. We were an off-zero shop for most of our customer-facing things. I think I saw you giggling about this on Twitter about…
Like how? Oh, my God, what is the bill? Yeah, we had that at a much smaller level, right? So we shifted to these guys called Clerk, which are doing really great. But what came very quickly was their dev experience was so good, we took it, and we put those [inaudible] out and went everybody. The runway was much shorter than we thought to get out of that innovation. You know, the early adopter phase is at this. It is actually fit for purpose. What do we need to do to scale this? What do we need to do to enable others to get the same success as some of these other groups?
It's this shifting of the organisation across different groups, right?
Gene Kim:
And what a brilliant property of modularisation! You could switch authentication providers without impacting all of your customers, right? And having them. You can make all your changes on your side of the interface without affecting them – absolutely marvellous. By the way, Rasmus, it just occurred to me while Neal was talking, and so another tool that one could use the right to reshape or create signals to trigger reshaping the layers of social circuitry is Net Promoter Score.
So, one of my favourite tools used within the technology leadership community is employing net promoter scores, which are like everyone competing in the internal marketplace.
You know. Very few organisations these days. Say, you must use this and you. There is no ability to choose the competition. And I think the modern reality is, whenever you're especially for platforms. You are competing against other platforms. And so, how do you measure that? One is the net promoter score, and this is a simple question on a scale of one to 10: to what degree would you recommend this to a colleague or friend?
And you know, the secret is, I think the answer goes from negative 100 to 100. And you're only really counting the nines and tens, and great scores are like 60 and above certain shared services. I've learned one group. It was. It was for an SAP security role, administrator change.
An SLA of being able to make changes. In 2 weeks, they average about 2 and a half months.
They had a net promoter score of a negative 87, which is actually the lowest they had ever heard of. I knew they could go negative, but I didn't know they could go to negative 100; at any rate, this amazing technology leader said, we love shared services, but not here. They actually broke up the team.
They put those SAP security experts into the business units. And you know now it's like at, you know, plus 40, which shows – same people, same equipment. The only change is the wiring right, which shows how the difference maker is often in layer three social circuitry.
Rasmus Praestholm:
If you want to throw some more concepts and things I can put in my platform, I need a new co-author for the next book. Let me know.
Gene Kim:
And, by the way, what a great example of like those – and it explains why, like great platforms, great SaaS products often have a little pop-up box to say, you know, not like can I help but like On a one to 10, how likely would you be to recommend this service to your friend? Right? It's actually a powerful signal of, you know, to what extent people love using your platform.
Neal Riley:
We actually combine them, so we use NPS all over the place internally. It's not as used as it is external, but we're one of the few groups that we know in our space that actually do it on every consultative arrangement to, you know if you like the relationship you've had with us. And most importantly, it's not just the people that you got our brand, what your expectations are, but also the mode of operation you expected.
Were you sold what you were actually delivered, like every one of those subtle promoter questions? And they're not always using NPS, they're all there, sometimes using a different variant. But yeah, that feedback loop that's super important.
And also internally, we see this in our group; it's predicated on the idea that you know you're the customer, and you know that shared services a vendor. And as long as you have those expectations, you put those hats on. And you have that relationship.
And you know, at the end of the day, you will provide a rotasport to this group and how they give you feedback. You're in the driver's seat – you can ask for what you want, and someone may not take your contract. We've had some situations that happen, too, right?
This is going well, and I need this mode of operation to be cool. This group can't do that. We need to find a different way, right? This isn't going to be the way that we operate totally fine. If you've got that, I don't know if you've come across the boundary list, but guys like Simone have that podcast, which is really good.
They preach composable business. Yeah, like, this is this is one of the the big tenets of how we operate. We love those guys, the boundary list kids who are really, really fun. But yeah, this is definitely part and part of our DNA, how we operate
Gene Kim:
On that. Just briefly. One of the core thesis… theses? Thesis…
Neal Riley:
Thesi?!
Gene Kim:
One of the core. One of the things that we really built a book on was. You know, the job of leaders is to make it so that the people that they support can do their work easily and well. And you know the work that I helped contribute.
One of the things I was so proud to work on was the state of DevOps research with Doc Nicole Forsgren and Jez Humble, you know, from 2013 to 2019. And that's when I really got introduced to the team member net promoter score – the notion that you know, instead of our products, like, you know, would you recommend your team as a great place to work to your colleagues and friends, and it's such a great signal that says, you know, is it possible?
I think you can sort of start tracing the dots of like. Why, what makes it so that you would want to recommend your job to someone you care about, you know, to a colleague or friend? If it requires heroics even to get small things done, if everything requires tons of communication, coordination, authorisation, synchronisation and deconfliction and blah blah blah, right? I mean, that's not a fun place to work.
You can see this in developer experience, but I think it's true for anything in general. If you don't have enough time allocated for slowification, and all your work is horrendous, difficult, dangerous, scary – you know that these are not great working conditions, and leaders are responsible for creating those conditions.
And so I would encourage any leader to say, Alright, maybe a very important signal that I need is not just telemetry around my product. But you know, that basic telemetry of are people happy? Do people like being on this team? One of the things I'd learned from Dr Jennifer Petoff, who was recently based in Ireland as a director of SRE at Google – asked her how are directors of SRE measured. Right? It's not, you know, can't be based on the success of the products they support. Right?
And she said, one of the dominant measures are, you know, to what degreeare, the SREs working underneath them, happy? How satisfied are they that they can do their job well? And that was just such a terrific validation of this notion. You know, of the job of leaders.
Jobin Kuruvilla:
It's funny that you mentioned that Google recently announced that DORA metrics don't necessarily cover that.
The happiness, the measure of happiness, the burnout – because the core DORA metrics focus on time to market and things like that – don't really cover whether is Rasmus happy or is Jobin happy.
Gene Kim
The key finding was that the technical measures also correlate with organisational performance, which includes to what degree we are exceeding profit, share profitability, marketing, and productivity goals.
And you know, we know that you know it also influences employing net promoter score – those high performers are twice as likely to recommend their organisations as a great place to work.
And, by the way, just a little trivial fact: where did that team member net promoter score come from? It came from Dr. Andre Martin, who was later VP of People Dev at Google.
And he just came out with a great book called Right Fit Wrong Fit anyways, that actually came from a series of text messages in 2015 or 2016. It was like, Oh, my gosh, Nicole! I have a great question. It was like we've been searching for this: culture score. Yeah. This one is called the Gallup 13. But we can't ask 13 questions. So it turns out the team member's net promoter score is just incredible, sort of a collapse of that one question of these engagement questions that are often used in HR circles.
Matt Saunders:
Do you know where I'd love to ask what the net promoter score is? It's in a company called Parts Unlimited.
I'm just wondering, as we start to wrap up, I think, back to Brent in Parts Unlimited and Maxine in the Unicorn Project – very unhappy developers, whoever they are.
And now we're talking about rewiring organisations with social circuitry.
Do you think there's an industry, if you look at the, you know, the period between, like, the Phoenix Project, the Unicorn Project coming onto the latest book? Are we actually getting there?
Are we speaking to the same people with the new book? Or is the NPS score just going to be terrible forever? What do you think?
Gene Kim:
Whoa? In fact, yeah, it was funny in the Unicorn Project. they actually do a net promoter score, and within the businesses within the store managers, right with the right store manager. It was world-class. I think it was like 50 plus.
And then they asked IT. And I would think it was negative 14?! But yeah, I think the story ends where, you know, the technologists were some of the most switched-on people in the group where they were attracting the best talent. Not this is from technology. But you know, also from quote the business.
So I think, yeah, if the question is, are we making any progress? And I have to say. The answer from my perspective is undisputedly yes, having chronicled the technology leadership journey for the last 10 years across 1,100 plus experience reports.
There's just no doubt in my mind that you know that, increasingly, business leadership recognises that the technology organisation is a critical competency to win in the marketplace. And you know they might. Yeah, it's not a hundred per cent, but it's much larger than before.
And so I credit, you know, those incredible technology leaders who help pioneer, you know, these practices that were typically only associated with the tech giants – Facebook, Amazon, Netflix, Google, Microsoft and showing that, hey? It's relevant for everybody. I'm just so grateful for, you know, the ability to help chronicle those journeys because I think it affects all of us.
Neal Riley:
You're exactly right. Gartner refers to those giants you're referring to as the digital dragons, and I love this sort of ostentatiousness of that idea. But I must say to sort of ground out and drop your last questions. It's been so. It's a breath of fresh air. Having an entire podcast where we didn't say ChatGPT once feels so good.
But I broke the seal. So all of this is great, and as you're writing your book, I'm sure none of this has come out yet. Where? Where are you at with this? How do you? How are you looking at this new, this new technology? Because no one yet is saying, Oh, this is killing DevOps. They're talking about industries that are changing drastically with these new generative capabilities.
Have you been playing with this recently, like, what's what's been the what's been the plan?
Gene Kim:
I gotta tell you, I haven't had this much fun in technology in 20-plus years. I think it's so radically different. It's not deterministic. It's amazing. I'll tell you one thing I've been having a ton of fun doing. That's been a huge eye-opener for me as I publish this new video podcast, Thinking Out Loud. So it's a little extension of the Idealcast podcast I've done for several years.
And so it's just a little easier lift than the Idealcast podcast episodes. So, one of the things I did was use ChatGPT to confirm some things I wanted to do during the writing. But I only have had a chance to do it now that the book's out.
One of them was, you know, I gave ChatGPT. GPT4 the couch analogy and then asked it, you know. Can you validate and explain to me how the couch metaphor can simultaneously explain the Amazon re-architecture in the early 2000s? And it said, yes, absolutely.
You know, Amazon 2001 had 3,500 ish engineers, all coupled to one couch. So, for any of them to get anything done, they had to coordinate with 3,500 other engineers, so they split it up into microservices right into smaller couches, right to regain the independence of action.
I'm like, awesome. Alright. Now, here's the Amazon Prime video example. Please explain to me, using the couch as a metaphor, what that is like. Oh, exactly. It was like they had split the couches into too many tiny fragments, so you know that too much of the effort was in coordination and transport, copying these video files in and out of storage buckets.
I'm like, yes, exactly so. The kind of measure was to glue the couches back together into one monolith so that you know they could all be, you know, recreate coherence. I love that because I think the mark of a good model is that, you know, it can explain the most with the least, and I'm just amazed at how good ChatGPT is, not just summarisation. But you know, making these kinds of inferences and taking metaphors, and for me, just using it to help concretise and validate my own thinking.
Man. It's just so fun. For the technology leadership community, we spent the first part of this interview talking about how much harder life is when there are more functional specialities. Oh, with generative AI, a man adds one more functional speciality to the left, right?
You know, you got to all the data, and AI and AI engineers on the left, and Dev has to meet them there, and we also have to push them further to the right in production.
I mean, that is as if our job weren't difficult enough. We've just added one more thing that the layer three social services have to do, which is integrate this whole bunch of people who don't look like your typical engineer and integrate them into the value stream and introduce a whole bunch of risks that we've never had to deal with in production. Right? I think it's a phenomenal frontier.
And it just shows that what worked for us one year may not be enough to work the next year…
Neal Riley:
Adaptation is key. Isn't it right? Because building content last year is not how you need to protect yourself from content created this year, you know I agree with you. The landscape is vast, and the cliffs are buried. You don't know. You don't know where those black holes live, right? It's fascinating to see the mathematics behind them.
You're right about ChatGPT; ChatGPT as an interface is one thing; GPT4 is the underlying technology and another.
We're seeing organisations looking at a Mistral and even Facebook's LLaMA, right? How these things compare against each other depends on what task you're doing and what guidance you're giving it. And things like this, what supporting material makes its way, and how you cut your sentences apart. It's just a whole new world that, frankly, we're looking at exactly the same way. This is really exciting. It's a really exciting time to be in tech.
Rasmus Praestholm:
This is probably a whole other podcast.
Neal Riley:
You could talk all day on this couldn’t you?
Gene Kim:
And I have to say, it's in the Enterprise Technology Leadership Summit for last year, we've had like about a 20 to 30% talks on generative AI. And it's just so exciting to see how much energy it just injects into the conference because it dazzles us. It tells us what's possible, it excites us in terms of like what we can do for our organisation.
And it's also, you know, the next frontier that technology leadership invariably has to tackle.
Laura Larramore:
But as we tackle that, people will always be there. And so that's what I appreciate about your book: it wasn't just platitudes on leadership. Here are some things nice things to say about leadership and how to live. It's practical.
Here's how to help people work better. And I think it can lead to a conversation about, with this AI frontier, how we can integrate people and help people work more efficiently with it.
Gene Kim:
Laura. I have to say. You've actually mentioned one of my favourite words – platitudes! It's actually something I despise to my core! My worst fear would be writing a book of platitudes, and my favourite one is buy low, sell high – technically correct, but not really useful.
Right? It's all about the how. So I love that you brought that up like… One should be suspicious of platitudes.
Laura Larramore:
Well, we appreciate your time here today, thanks for coming on the podcast – it was a very enlightening conversation, and I would love to continue it more. So maybe we can do that.
But thanks for joining us to discuss DevOps Decrypted on this podcast. We hope you're enjoying the show. Let us know what you think on social media, @Adaptavist. We look forward to continuing this conversation there.
So for Gene Kim and our Adaptavist panel, which includes Jobin, Rasmus Matt and Neal – I'm Laura Larramore and DevOps Decrypted is part of the Adaptavist Live podcast network.
Why not leave us a review on your podcast platform of choice? Let us know how we're doing or highlight topics you would like us to discuss in our upcoming episodes.
We truly love to hear your feedback, and as a thank you, we will be giving out some free Adaptavist swag bags to say thank you for your ongoing support!