When we peel back the layers of the stack, there’s one human characteristic we’re sure to find: errors. Mistakes, mishaps, and miscalculations are fundamental to being human, and as such, error is built into every piece of infrastructure and code we create. Of course, learning from our errors is critical in our effort to create functional, reliable tech. But could our mistakes be as important to technological development as our ideas? And what happens when we try to change our attitude towards errors…or remove them entirely?
In this fascinating episode of Traceroute, we start back in 1968, when “The Mother of All Demos“ was supposed to change the face of personal computing…before the errors started. We’re then joined by Andrew Clay Shafer, a DevOps pioneer who has seen the evolution of “errors” to “incidents” through practices like Scrum, Agile, and Chaos Engineering. We also speak with Courtney Nash, a Cognitive Neuroscientist and Researcher whose Verica Open Incident Directory (VOID) has changed the way we look at incident reporting.
Amy Tobey:
You're listening to Traceroute, a podcast about the inner workings of our digital world. I'm Amy Tobey.
Fen Aldrich:
I'm Fen Aldrich.
John Taylor:
And I'm John Taylor. And I thought we'd kick things off here looking at a website together, since we're all just kind of staring into a computer screen anyway. I think you've heard of this website before. It's called the Verica Open Incident Database.
Fen Aldrich:
Oh yeah, the VOID.
Amy Tobey:
Oh yeah, we know those people.
Fen Aldrich:
Yeah, we can all stare into the VOID together.
John Taylor:
Absolutely. Yeah, and we're going to actually talk with the people who created this database. But before we do, I thought we should jump in and take a look at some of the incidents listed here. It's at the void.community. And I've got this one from February 15th. Well, I just accidentally took down Twitter. Well, so... Yeah, here's another one. You broke Reddit, the Pie Day Outage.
Fen Aldrich:
Down for 314 minutes, no less.
John Taylor:
Wow. Wow.
Fen Aldrich:
Nailed it. Nailed Pie Day.
John Taylor:
Oh, that's great. Yeah, I mean, and there are incidents of all different kinds. There's even FAA ground stoppage here. Yeah.
Amy Tobey:
There was a really good one recently from the FAA about a near miss.
John Taylor:
Oh, that was [inaudible 00:01:26].
Amy Tobey:
Where I think a plane had a tail hit.
Fen Aldrich:
It's actually the one that's listed here, January 26th, Alaska Airlines.
Amy Tobey:
Oh, good, good. That's a wonderful report, because it's a near miss, and these are super-duper interesting because usually we talk about incidents where a thing has happened and impacted customers or impacted the audience. A near miss is something like a plane has a tail strike as it's taking off. It doesn't actually harm anyone, the passengers are okay, but you still go do this deep investigation and when you do, you discover issues that prevent a more serious disaster from happening.
John Taylor:
Yeah. So that's the part that I find really fascinating. So, like this incident up above, a woman named Leah Culver, who was a Twitter employee, she turned on a spaces feature that didn't perform well at scale, and then Twitter just went down. So not long after that, the site's stability team managed to roll it back and Twitter was up and running again. And though Leah was later laid off from Twitter, she was not fired as a result of this error. Instead, it was reported as an incident and posted online for everyone and anyone to see. How did this happen? How did errors evolve into incidents, and why does this seem to be a uniquely tech industry phenomenon? Do either of you have a personal experience in a former job or something, where it wasn't that way? Where the culture was more of, how do I cover myself for this error, than how do I report and or learn from this error?
Amy Tobey:
Lots of experiences with that. In fact, that's what I've done for a job for a long time, is reversing that behavior, because that's the natural tendency of most folks, because most folks, even in tech, are working a job because they got to eat. And it creates a situation where there's pressure on folks to not admit when they messed up or when a mistake has happened that they feel responsible for, even if they're not really responsible for it.
John Taylor:
How about you, Fen, anything that comes to mind?
Fen Aldrich:
Yeah, I think, probably several examples throughout careers. Because of that tendency, because most organizations are a bit, they want to place the blame for it somewhere, right? And having a person to point to who made the mistake and be like, "We got rid of that person. Problem fixed." Makes people feel better.
John Taylor:
Tech didn't always get a pass from the consequences of errors, and certainly, tech never got an exemption from the fear of what consequences might occur if mistakes were made. And perhaps the greatest historical example of this was a demonstration given for the fall Joint Computer Conference in San Francisco in December of 1968. The demonstration had the inauspicious title of A Research Center for Augmenting Human Intellect. But for Douglas Engelbart, the head of the Augmentation Research Center at Stanford, this presentation was the culmination of his life's work. The live demonstration Englebart held that day featured the introduction of a complete computer hardware and software package called the oN-Line System, or NLS. The idea behind the NLS was to free computing from merely being about number crunching, and for it to become a tool for communication and information retrieval, accessible to everyone. Englebart needed this demonstration to go off without a hitch. But it didn't. At one point in the demonstration, Engelbart wanted to show that if you accidentally delete a file, the machine can reload it if you saved it. But as he tells the audience...
Douglas Engelbart:
I'll load that file, and it'll come back in as it was I last saved it, telling me the date that I wrote it, and unfortunately, I didn't save enough. Oops. John Taylor: And he can't bring it back. Two minutes later, the machine is supposed to organize items on a shopping list, but...
Douglas Engelbart:
Okay, so let me organize it by saying, just generally produce. What the hell happened? All right, I'm going to try to... I entered a statement. It says, "Hey, that's fine."
Amy Tobey:
The screen goes black for eight seconds, followed by an audio cutoff for another 19 seconds. The error code illegal entity flashes on the screen twice, and later, the entire screen freezes. The screen then goes black for another five seconds. There are at least 30 separate occasions where the presentation does not go according to plan. In fact, because of the errors and the delay they caused, Engelbart has to skip the entire last 10 minutes of the demonstration.
Douglas Engelbart:
I'll see what I can skip so we can get through in the time we were supposed to like 10 minutes ago. Sorry about that.
Amy Tobey:
Well, I got to ask you something.
John Taylor:
Sure.
Amy Tobey:
Have you heard of the Demo Gods?
John Taylor:
I have not.
Amy Tobey:
So there's a concept. So, everyone who does a live demo basically always runs into this. It's kind of like a common trope in live demos. I think it predates computers even, right? And so, what I'm saying is, there is some precedence for having a successful demo that has errors in it. People do it all the time. They do a live demo on stage at a conference, and it worked the hundred times they tried it before the conference, and then on stage, some small thing went wrong. And they go, "Oh crap, something went wrong." And then the whole audience goes, "Oh, it must be the Demo Gods. They didn't like you today."
John Taylor:
All right, so as much as I'd love to think that I wouldn't get fired for a bad demo because my boss understood that a tech-oriented deity wasn't on my side that day, I don't think that's why I wouldn't be filing for unemployment. No. For some reason, tech looks at and reacts to errors in ways that other industries don't. For example, in 2012, heads rolled at JP Morgan when a cut and paste error on an Excel spreadsheet cost the company $3.1 billion. In 2005, a single typing error from one employee caused 41 times the number of shares to be sold by a company on the Tokyo Stock Exchange than were meant to be sold. The president of the stock exchange himself resigned over the incident. Now, you may argue that, of course employees were fired because these are really high dollar mistakes. Well, in 2021, the Big 5 tech giants generated a combined $1.4 trillion in revenue. So, I don't think that's it. No. A paradigm shift occurred in tech that didn't happen in other industries. Something that changed how technologists in particular look at errors. Somebody gave the tech industry permission to look at mistakes in an entirely different way. Somebody with a lot of influence, and with their fingers on a lot of purse strings.
John F. Kennedy:
We choose to go to the moon in this decade and do the other things, not because they are easy, but because they're hard.
John Taylor:
On September 12th, 1962, President John F. Kennedy told the science community in America to do the impossible. He decided we were going to put a man on the moon within seven years, and this was only 35 years after Lindbergh made the first solo transatlantic flight. And JFK's reason we were going to put a man on the moon, Cold War politics aside, was because it was hard to do. That mandate, that reasoning, in implicitly gave engineers and computer scientists a green light to look at errors in a fundamentally different way. But this paradigm shift would also require massive cultural changes, changes in the way we talk about and react to errors and mistakes. And even in the tech space, those changes were going to be, as JFK put it, hard.
Amy Tobey:
Even if you work at a psychologically safe organization, and in theory, you should be able to be vulnerable about any kind of mistake or error or thing that went wrong, but people still have this kind of fundamental fear because this job is how I feed my family.
Fen Aldrich:
If you can remove that pressure of, "If I admit a mistake, I am going to potentially not be able to survive anymore," that is strong pressure to not admit that you are doing things and not actually learn from incidents, because you have to create this reality in which you are the perfect worker in order to maintain that. And so, these movements always push for more power for workers because they have to. You have to be able to drive to a point where someone can say, "Hey, I did this and here's what went wrong." And not worry about them immediately losing their job or losing their livelihood or not being able to provide for their family, which is a much stronger pressure than-
John Taylor:
Yes.
Fen Aldrich:
... did product come out okay?
John Taylor:
Yeah. And that's interesting because you go from this, there's this paradigm shift from, "Oh, that wasn't supposed to happen." To, "We know these things happen."
Fen Aldrich:
Right.
John Taylor:
And so, I guess that that's a question that I have, is where did this sort of turnaround, where and when did we go from, "Oh my God, that's an error and I feel ashamed and I'm going to lose my job." To, "We need to find all the incidents that we can." Even changing from error to incident, where and when did that really start to happen?
Amy Tobey:
Challenger. Started with Challenger.
Speaker 6:
Looks like a couple of the solid rocket boosters blew away from the side of the shuttle in an explosion.
John Taylor:
In January of '86, I was living in this crappy old house just off campus with about five other people. And though I didn't have school that day, I set my alarm early enough so that I could wake up and watch the launch of the Space Shuttle Challenger, because I haven't missed a single launch since the inaugural flight of Columbia back in 1981. And the year earlier, even my friends and I journeyed out to the high desert to watch the space shuttle land at Edwards Air Force Base. So I stood there in front of my ancient TV with the little rabbit ears on top, sipping my coffee, just absolutely geeking out as this marvel of engineering roared off the launch pad and into the clear, blue Florida sky. Sorry. I had this friend... I had this friend who worked for NASA, and she was really good friends with the pilot of the Challenger. And she called me and I had never heard her cry before. And then one minute and 13 seconds into this picture perfect, by the numbers launch, the unthinkable happened.
Speaker 7:
Flight controllers here looking very carefully at the situation. Obviously a major malfunction.
John Taylor:
The Challenger broke apart.
Speaker 7:
We have no down link.
John Taylor:
46,000 feet above the Atlantic Ocean, killing all seven crew members on board. The cause of the disaster was the failure of the primary and secondary redundant O-ring seals in a joint in the shuttle's right solid rocket booster. The rubber O-rings froze up in the low morning temperatures, reducing their ability to seal the joints. The seals were breached after liftoff, and hot pressurized gas from within the SRB leaked and eventually burned all the way through to the main propellant tank. But following an investigation that occurred in tandem with a 32-month hiatus of the space shuttle program, it was discovered that there was more going on than just engineering errors.
Amy Tobey:
Most resilience folks think it started with Challenger where they started to think differently about safety at large scale. And so, that's where the research started around resilience engineering, where we talk a lot about how incidents work and how to learn from them. And why it's so important is because that's where they started to discover things like normalization of deviance, that there wasn't really a concept that was used before that. What they would do is go and be like, "Which engineer put that O-ring in the shuttle?" And go fire them. Right? And then when Challenger happened, it involved everybody. There wasn't a smoking gun, there wasn't a single error, there was just this huge chain of causality. And so they had no choice but to face this reality that errors are more complex than this idea of a root cause.
John Taylor:
Tell me a little bit more about, what did you call it? The normalization of deviancy? Aside from being my typical Saturday night, what does that mean?
Amy Tobey:
I like yours better. It means what you sort of alluded to in the lead up, which was something happens and somebody notices it. So an O-ring fails, in the case of the Challenger. When the shuttle comes back and they inspect it, they notice, "Oh, that O-ring failed, but the secondary O-ring succeeded." So no big deal, we replace the O-ring and we move on. And this happens again in the next launch, and it happens again in the next launch, and then everybody assumes that one of the O-rings failing is no big deal.
John Taylor:
Yeah.
Amy Tobey:
So they don't actually address the issue of a failing O-ring. They just go, "Well, that just always happens. Just kick the machine and move on."
John Taylor:
Yeah.
Amy Tobey:
And so, we see this all over in life, right?
Fen Aldrich:
Oh, it just does that sometimes.
Amy Tobey:
Yeah, it just does that and you bang it [inaudible 00:16:42] percussive maintenance, and then it comes back and it's cool. When really, what you need to do is take it apart and clean some things and put it back together the right way. And so that's normalization of deviance, is when we just kind of, something that isn't right, we get used to it, it becomes normal, and so we lose sensitivity to it, and then it becomes one of the factors. And then another factor goes wrong, and obviously now we actually have an accident.
John Taylor:
So the normalization of deviance is something that occurs progressively, right, over time. So if that's the case, you can make an argument that the normalization of deviance that created the Challenger disaster finds its roots in another incident that took place almost 16 years earlier, Apollo 13.
Tony:
Here's Tony, you have Apollo.
John Taylor:
Experts note that one of the things that helped prevent the Apollo 13 mission from becoming a fatal catastrophe, aside from the incredible ingenuity involved, was the number of redundant systems onboard the spacecraft. Scientists and engineers were able to repurpose entire systems on the fly and bring home these three astronauts safely back to earth. Though there were definitely reports and inquiries and program delays for sure, the genius of redundant systems was eventually reinforced, which resulted in a normalization of deviance. When Challenger was lost with seven crew members on board, that catastrophe brought to light the need to change the culture entirely. Interestingly, that same year we lost Challenger, two Japanese researchers introduced the term scrum in the context of product development in their 1986 Harvard Business Review article, The New New Product Development Game. The authors described a new approach to commercial product development that would increase speed and flexibility based on case studies from other industries, from manufacturing firms in automotives and photocopier and printer industries. But the idea of scrum didn't really take off until another newfangled technology began to emerge, the internet.
Speaker 9:
Alison, can you explain what internet is?
John Taylor:
So the basic idea of the scrum framework was to allow for continuous feedback and flexibility. It required teams to self-organize by encouraging physical co-location or close online collaboration. So in other words, the closer you work together, the less likely the chance of miscommunication and other errors. Scrum eventually sees widespread adoption within the software industry especially. Then, in 2001, several technologists published the Manifesto for Agile Software Development. The Agile Manifesto took the ideas of scrum to the next level, and for evangelists who espoused these new sets of tools and practices, evangelists like Andrew Clay Shafer, Principal at Ergonautic, Agile was a game changer, sort of.
Andrew Clay Shafer:
When I first got into the startups, I literally thought Agile was the dumbest thing ever, right? So it was super dumb, and it was just because it was this vanilla, watered down scrum thing where just a lot of ritual and no real understanding. To me, it's just basically baby waterfall. And everyone sort of believes that if you stop writing documentation and start having standups, that you're somehow magically going to get software, which I never thought worked or was a good idea.
John Taylor:
But it wasn't the quantity of human interaction that Andrew objected to. It was the quality.
Andrew Clay Shafer:
But I think it's a false dichotomy to say you have to sacrifice culture to have good technical practices. In fact, I would say that in order to have those high performing team dynamics, you have those also, right? You're faster, you're safer, and everyone's happier.
John Taylor:
For Andrew, this new way of looking at the interactions and communications between teams needed to be dependent on the team's mutual goals.
Andrew Clay Shafer:
So the type of process that you have is going to be different depending on the scale and the criticality of the work you're going to do, right? So then, if you are building something like Flickr, which is literally putting cat pictures on the internet, then the types of process you can get away with at the scale and the complexity and the criticality is different than if you're trying to solve problems where there's financial transactions involved, and then different, still, from the problems where there's life and death involved.
John Taylor:
More and more tech companies that adopted Agile practices began looking at errors in a different light. If errors are not only inevitable but acceptable as part of the process, then maybe we should look at how we can learn and grow from errors. In fact, maybe we shouldn't even look at them as errors at all. Maybe they're just incidents.
Courtney Nash:
My name is Courtney Nash. I am, as far as we know, the only Internet Incident Librarian in the world. My real title is Research Analyst. It's boring. So I like Internet Incident Librarian a lot better.
John Taylor:
Courtney's background is in cognitive neuroscience, but as she puts it, she ran off to join the internet and ended up working with Microsoft, Amazon and Fastly. However, it was during her stint with O'Reilly Media when she got involved with some of the early coverage of DevOps, eventually chairing the Velocity Conference with John Allspaw. And this is when she started marrying her old interests in cognitive science with complex systems and software. When she was laid off as a result of the pandemic, she received a call from Verica, a company that was using a new technique called "Chaos Engineering to make systems more secure and less vulnerable to costly incidents." Verica asked if she'd like to come and do research for them, and Courtney jumped at the chance.
Courtney Nash:
I'd started looking at failure outage reports. Dan Lu had this Kubernetes.af repo of collections of incident reports, and I sort of fell down that rabbit hole, and at some point, I had more than we had for those products. I had thousands of these. People were sending them to me, people were giving me their archives. So I'd built up, and next thing I knew, I turned around and had this sort of burgeoning database of incident reports, software incident reports. And that turned into what's the VOID, The Verica Open Incident Database. And that's, it's taken on a life of its own. And so, the joke is that I have the Dewey Decimal System of internet incidents. But it kind of has reached that point where if you say, "Oh, give me an example of," I can be like, "Oh yeah, well, there was that Atlassian outage two years ago. Blah, blah, blah." So I spent a lot of time reading these things, but I've also built up a whole, I think we have over 10,000 incidents in the VOID now. So I have a lot of data.
John Taylor:
Okay. So this is exactly what we were talking about when we were staring into the VOID earlier. So when you put your incident reports out there, just out in the open for everyone to see, is the goal to de-stigmatize errors that just normally occur as part of the development process?
Fen Aldrich:
That is exactly the stigma that I think we're trying to combat, right, is that everybody does it. Right? It's a similar push towards mental health. No, people struggle with this stuff, and if we can talk about it openly and with each other, we can help learn from each other. We don't have to do all the work ourselves.
John Taylor:
Yeah.
Fen Aldrich:
And it's starting to rely on each other. So this is like, if we talk about all of our incidents publicly, other people can learn stuff from what we did. And you browse through it and something happens, you're like, "Oh, this sounds familiar. Oh, right. That was this other that had a challenge with this, and here's what they did and it worked. Oh, look, it worked for us too."
John Taylor:
Exactly.
Fen Aldrich:
Or it didn't, and we learned yet another new thing.
Amy Tobey:
And sometimes, or very often, organizations have internal policy and sometimes customer contracts that require them to write these. And so the ones that get written internally are usually very different from what would be available in the VOID report, or a public publicly released retrospective.
John Taylor:
In essence, the VOID database helps to destigmatize incidents, which could make this cultural shift happen exponentially faster. Just as important, the reports allow companies to focus on the incidents themselves as opposed to the people involved in the incidents. So remember that $3.1 billion cut and paste error I mentioned earlier? Incident reporting helps a company to say, "Perhaps we should implement a new system that doesn't use cut and paste." Rather than, "Perhaps we should just fire the person who did the cutting and pasting." Courtney has analyzed a lot of data from the VOID, and she finds that tech companies are seeing other advantages to incident reporting as well.
Courtney Nash:
I think the companies that do this are learning a couple of things. One, it builds trust from their customers, right? It's much worse to say nothing than to say something. That's PR 101, right? But, the folks that are coming out and being incredibly transparent, they're winning over sort of hearts and minds, if you will, of the engineering community at the very least. So it also helps from a hiring standpoint. People want to go, engineers want to go work at companies that believe that this is important and valuable and that invest in learning from their incidents and their outages, versus sort of blaming engineers and building a culture that's very toxic in that regard. And then the third piece of that for me is, from the industry as a whole, if we don't do this, right, if we don't take this on and do a good job of it, someone's going to force us to do it in a way that we don't like, which is the looming specter sort of essentially of regulation, right? And that's happening already in the security industry, in various ways. And so the question is now, would you be forced to report on every availability incident you have?
John Taylor:
All right. Let's take a look at our checklist here, shall we? Let's see. We've got incident reporting? Check. A need for competitive advantage? Check. A blameless culture? Check. Sociotechnical systems? Check. Enhanced communication? Check. I see. We're talking about DevOps here. So, around 2007, people in the software development and IT communities raised concerns that when one team writes and creates software, while another separate team deploys and supports it, sometimes a fatal dysfunction can occur. DevOps was created as a set of practices and tools to integrate and automate the work of software development and IT operations. And for people like Courtney, incident reporting was a natural fit for DevOps.
Courtney Nash:
The thing that was so compelling to me as a cognitive scientist, about DevOps, was I was like, "Oh, wait, y'all figured out it's actually people?" Yes! Welcome to the party. And so that's why DevOps didn't feel revolutionary to me. It felt right. And so, the way that people who practice Chaos Engineering types of approaches, who have, learning from incidents to sort of groups and blameless culture, just culture, all of that, are really tapping into that notion that what we're building are socio-technical systems, right? How our people work together with the machines and the systems and the automation and all of this stuff that we're building. So I feel like it's just advanced level DevOps. It's not just, you own it, you run it, you have the pager. It's the way humans work in those systems. It's the way we create success. It's the way that the things we do that create success the majority of the time can also contribute to failure. And so, it's this much more holistic, systemic view. So I think for folks, I think it's just like for folks who've been doing DevOps for a long time, you end up here. This is the next stop on that road.
Amy Tobey:
DevOps was invented and became a concept and took off in the industry, and it was about 10 years later before folks like John Allspaw went off and started bringing resilience engineering in as a way to explain why DevOps actually works, why it's important, right? So we went and did this and it became about continuous integration and continuous deployment, and we didn't really focus on the human issue as much, as an industry, right? Some of us who are early DevOps folks have been angry about this the whole time. But the movement ended up being about the tools, and they forgot about the people and process before tools part of it.
Fen Aldrich:
So this has been a thing we've been talking about as humans for a very long time, of talking about surprises, incidents, errors, whatever we're calling them because words have meaning but are squishy. They're more complex than pure cause and effect. We're recognizing that the things that we're building and what we're dealing with have so many moving parts and so many people involved, and so many people have so many different understandings of the thing, that we can't just assume that somebody did something bad and should be punished. And this is true of lots of movements that focus on people and not an economic benefit or a thing that can be sold to you. Like Austin Parker talked about, very well at SREcon this year, about the commodification of all these different movements, right? Because before DevOps, we had Agile that was trying to do this with the software development world and saying, "Hey, we need to focus on people over the process thing. We need to actually care about what are the results of what we're doing? Is this what we're after? What are the actual goals we're trying to accomplish, and making sure that continues to be true throughout our development life cycle." And then it was like, "Hey, this applies to more than just writing code and delivering a product. Also, maybe we should think about Ops, and is Ops actually delivering what we're asking it to?" And this was DevOps and cared about people, and the socio end of the socio-technical systems, but you can make a lot of money selling the technical end, and it's a lot harder to sell the socio end.
John Taylor:
Yeah.
Fen Aldrich:
Because it requires people to do hard work.
John Taylor:
It's funny because there's this saying in DevOps, "You can't buy DevOps, but I'm willing to sell it." In fact, I think it was you, Fen, that said that when we were preparing for this episode. And as ironic and probably cynical as it is to think that we've come full circle to this point where people are trying to commoditize error, well, Courtney has a different perspective.
Courtney Nash:
It's admitting that we are humans in systems made by humans, and we don't always make those systems work well for us. None of us go to work to take the internet down, right? That's not what we do. And so we make the best decisions we can at the time with the information and the context that we have. That's what we're doing, day in and day out. And 90 whatever percent of the time, it works.
John Taylor:
And perhaps 90 whatever percent is an excellent goal. But what about 100%? Is the bottom line here that we should be looking at removing the human element entirely, in order to create systems that are free of error?
Fen Aldrich:
There's some people that get mad when you talk about root cause and these same people get real mad when you talk about human error, and I'm one of those people. So the concept of it is, that's I think, is actually a misdirection. It's happened a lot with DevOps and with software development is thinking like, "Oh, we want to automate this thing to get the human error out of it." It's never like, "Oh, someone keeps typing B instead of C. How do we stop that? Oh, by putting a computer there." Well, maybe, but why were they typing B instead of C? Right? There's something more interesting there. And do they think B is correct? Did they keep typing the wrong letter, in case, maybe we should have a check here to double check? There's all different ways we can do that. But just removing people from it also removes the thing that makes you able to pivot and adapt and change when something surprising happens in the system.
John Taylor:
Because surprises do happen in the system. Remember Douglass Engelbart's demonstration from 1968? Remember how it was so riddled with error that he couldn't even finish what he had planned to demonstrate? Well, when the demo finally came to an end, Douglas Engelbart sat alone on a stage in front of 1,000 of his peers, and the crowd went absolutely nuts. Engelbart And his team received a standing ovation that rattled the walls of the conference room. The presentation would go on to become known in the history books as the mother of all demos. This 90 minute banger of a demonstration basically set the stage for modern personal computing, windows, hypertext, graphics, efficient navigation and command input, video conferencing, the computer mouse, word processing, dynamic file linking, revision control, and a collaborative realtime editor. The demo was riddled with errors, but because they were learning to look beyond these mistakes, legendary technologists in the audience like Alan Kay, Charles Irby, Andy Van Damme, and Bob Sproul, were able to see the genius in this tech and bring it to life. If we vilify the human element in the hardware and software we create, then we're no longer creating systems for humans. Perhaps the biggest mistake we can make is punishing mistakes. Perhaps the mother of all errors is calling it an error in the first place.
Fen Aldrich:
I think that's part of what the value of something like the mother of all demos does, is it shows like, "Hey, not only is this possible, but we kind of got a working prototype working, or at least we kind of know what it should look and feel like, even if it's error-prone when it goes off." Now, people have the inspiration like, "Oh, that's neat. What else could we do? What other cutting edge things could we actually do if we tried?"
John Taylor:
Yeah. When Douglas Engelbart took the stage on that momentous day in 1968, did he know his demonstration would be riddled with errors? Probably not. But what he might have known is that any errors that did occur wouldn't diminish his ideas, and that made all the difference. When we peel back the layers of the stack, we find our mistakes side by side with our genius. When we look at the amazing things that we've accomplished, we see a trail that weaves through a forest of errors. What tech in particular is trying to do is look at errors as another facet of just being human. And like every aspect of our humanity, we have a choice. Judge it or embrace it.
Fen Aldrich:
Traceroute is a podcast from Equinix and Stories Bureau. This episode was produced by John Taylor with help from Tim Balint and Cat Bagsic. It was edited by Joshua Ramsey and mixed by Jeremy Tuttle, with additional editing and sound design by Matthar DeLeon. Our theme song was composed by Ty Gibbons. You can check us out on Twitter at origins_dev, that's D-E-V. And type origins.dev into your browser for even more stories about the human layer of the stack. We'll leave these links and more including an episode transcript down in the show notes. If you enjoyed this show, please share it wherever you hang out online and consider leaving a five star rating on Apple and Spotify because it really helps other people find the show. Traceroute will be back in two weeks. Until then, I'm Fen Aldrich. Thanks for listening.