Computing is fragileI loaded and reloaded, and yet the message stayed the same. 404, no access, can’t compute. Why wasn’t Trello working? I needed Trello. I use a program called Pomello to help track my productivity, and it won’t work without Trello. Like Pavlov’s dog, when I hear the clock winding noise as Pomello starts I get into work mode for 25 minutes. When I hear the “ding” from Pomello after that time, I know I can take a five minute break. And then it continues.

But no clock winding, no ding, and I felt a little lost.

As it turns out, it was a major AWS outage today that caused the problem. It wasn’t just me, it was millions of users across thousands of heavily frequented sites. For some like me, it was annoying. For business owners that had sites and apps running in AWS, it was expensive and embarrassing. It can be hard to explain to a customer why they can’t access your site when it feels a little outside your control.

Problems in computing are not limited to cloud providers like AWS, Azure, and others, all of which have had outages of some kind or another. Recently it was revealed that CloudFlare had a security flaw that affected only “one in every 3,300,000 HTTP requests” from February 13th to February 18th. Not a big deal, right? Yet CloudFlare operates in front of 5.5 million sites. Not too long ago, botnets attacked DNS servers on the East coast and prevented users from accessing Problems are not limited to the internet. Hard drive failures, dropping your computer, trying to pair to a Bluetooth device that isn’t showing up in your list is all things we can relate to. Technology can be fragile.

The fragility of technology leaks into the physical world as well. Auto flush toilets have never been a problem for me, but I recently found out that those auto flush toilets are terrorizing kids. Maybe you are a non-parent like me and think it’s a little funny, but apparently children are scared to go to the restroom when the thing vortexes randomly and parents have to now take special precautions to keep their kid’s therapy bills a bit lower later in life.

This is technology being fragile from a user experience standpoint. Remember toilet handles in public restrooms? A bit icky of an image, maybe, but you pushed it down and it flushed. That’s it. No problems. With more and more things being connected to the Internet, computing is going to continue becoming ingrained into our every day experiences. Setting aside the classic problems in product design, with the additional complexity of microcontrollers, internet connections, WiFi, LTE, automation, and sensors we’re going to see a host of new design challenges to overcome (thankfully I have a different blog post, The 5 Principles of IoT UX Design, that can help you get started).

So do we become luddites? I’ll start working on the “Let Us Flush Our Own Toilets” and “Forget The Cloud And Start Using Paper Again” signs for our protest against technology.

No, of course the answer in our increasingly complex world isn’t to ditch technology altogether. It’s to continue to improve and mitigate the risks. In fact, these fragility problems are exactly the type of things DevOps and Agile methodologies set out to solve. The question isn’t “how do we avoid change and stick with what works,” it’s “how do we embrace change and plan for it.” These processes can be applied to nearly any problem: user experience, security, and uptime are just a few.

There has been and continues to be an industry-wide shift to these ideas that embrace iterative improvement, and for good reason. The image that I conjure up to remind myself of the differences between the old way of trying to discourage change to mitigate risk and the modern method of embracing change to encourage robustness is that of a Windows XP machine sitting under a dusty desk. “Don’t touch that, Frank. Bad things happen when we try to update that machine.” Actually, I don’t have to stretch my imagination all that far.

Recently at a local theme park, I dropped off our personal belongings into some lockers. It was one of those lockers where you scan your fingerprint and come back later to put in the locker number, scan your finger again, and then retrieve your belongings. If you don’t come back soon enough, they charge you for the time. All is fair in love and theme park locker rentals, I suppose. So we had some fun screaming our heads off on a roller coaster and came back to get our belongings. We were met with an error message on the screen on what was very clearly a Windows XP machine. In the year 2017. And I had just put my fingerprints into this thing.

A harried woman haughtily shuffled over to fix it. I’m certain she had done this thousands of times before, ever since this system had been created. Meanwhile, a line formed. Once she was done rebooting, a clearly time-honored tradition, our time had expired and it was asking us for a credit card. With all of our cash and cards still inside the locker.

It was eventually figured out, but this alternative to our modern, complex, and fast-paced development world is clearly not the way to go. So, we must embrace change, but in a manageable way. Namely, be iterative, get feedback, and make changes. The smaller that loop is and the more information you can gain within it, the better. Fragility will be conquered by mastering complexity. Although what we have right now is not perfect, we are continuously moving toward improvement. It may be hard to remember twenty years ago when the server ran on a wheezing machine underneath the IT director’s desk, but those were not halcyon days.

In those times, multi-day outages due to hardware failure was almost normal. And expect data loss, too. As time marched forward, we realized maybe we should back up that computer. Then we figured out maybe there should be some redundancy of the live system. Oh, and maybe we should have a Patch Tuesday so it stays up to date, but make sure you send out an email beforehand warning users of inevitable downtime! Maybe we should virtualize this environment so we can scale and recover from hardware failures faster. Next we made the move to the cloud, with geo-redundancy and autoscaling. In terms of software development methodologies, when we realized the need for iterative changes, we transitioned from Waterfall to more flexible Agile methodologies. Now we are still using that, but folding in DevOps practices to help give a wholistic view of testing, deploying, gaining feedback and making changes.

In 2017, development practices are not sitting still. Coming disruptions to how we work don’t stop, whether that’s containers, serverless architecture, or machine learning. If you want to see the future, check out Azure’s Service Fabric, which takes care of managing microservices with mind-boggling complexity and yet with incredible robustness. With a DevOps mentality we are continuing to improve the process of deployment and development. Gaining meaningful insights into ever-changing applications is becoming a must. The world is not slowing down. Customers and businesses are expecting applications that just work, that scale, and that can be easily change as needs shift.

As we head down this road of faster, more incremental changes, what we are giving up in simplicity and perceived stability we are making up for in being able to rapidly respond to security flaws, uptime problems, and usability issues. We are learning that overcoming fragility is not just a goal, but a process.

If you’d like to speed up your team’s adoption of DevOps practices, streamline your deployment process to Azure, or get some training on latest development best practices, that’s exactly what we at Nebbia are here for. Contact us and get ahead of the curve.