View All Videos

Archive for the ‘News’ Category

Please welcome dev2ops’s newest blogger…. Lee Thompson

Please welcome dev2ops’s newest blogger…. Lee Thompson

11

Damon Edwards / 

Lee Thompson is joining us as a dev2ops contributor.

Lee is a currently a consultant specializing in development and operations practices for large scale and mission critical web applications. His current clients include household names in the banking, social networking, and e-commerce fields.

Previously, Lee was the Chief Technologist of E*TRADE Financial. To learn more about Lee, you might want to check out this interview we did a few months back.

Lee has seen the world from the Dev side, the Ops side, and everything in between. Alex and I are please to welcome him to the dev2ops community and look forward to his contributions.

 

Q&A: Lee Thompson, former Chief Technologist of E*TRADE Financial

Q&A: Lee Thompson, former Chief Technologist of E*TRADE Financial

17

Damon Edwards / 

I recently caught up with Lee Thompson to discuss a variety of Dev to Ops topics including deployment, testing, and the conflict between dev and ops.

 

Lee recently left E*TRADE Financial where he was VP & Chief Technologist. Lee’s 13 years at E*TRADE saw two major boom and bust cycles and dramatic expansion in E*TRADE’s business.

 

 

Damon:
You’ve had large scale ops and dev roles… what lessons have you’ve learned the required the perspective of both roles?

Lee:
I was heavy into ops during the equity bubble of 1998 to 2000 and during that time we scaled E*TRADE from 40,000 trades a day to 350,000 trades a day. After going through that experience, all my software designs changed. Operability and deployability became large concerns for any kind of infrastructure I was designing. Your development and your architecture staff have to hand the application off to your production administrators so the architects can get some sleep. You don’t want your developers involved in running the site. You want them building the next business function for the company. The only way that is going to happen is to have the non-functional requirements –deployability, scalability, operability–  already built into your design. So that experience taught me the importance of non-functional requirements in the design process.

 

Damon:
You use the phrase “wall of confusion” a lot… can you explain the nature of the dev and ops conflict?

Lee:
When dealing with a large distributed compute infrastructure you are going to have applications that are difficult to run. The operations and systems engineering staff who is trying to keep the business functions running is going to say “oh these developers don’t get it”. And then back over on the developer side they are going to say “oh these ops guys don’t get it”. It’s just a totally different mindset. The developers are all about changing things very quickly and the ops team is all about stability and reliability. One company, but two very different mindsets. Both want the company to succeed, but they just see different sides of the same coin. Change and stability are both essential to a company’s success.

 

Damon:
How can we break down the wall of confusion and resolve the dev and ops conflicts?

Lee:
The first step is being clear about non-functional requirements and avoid what I call “peek-a-boo” requirements.

Here’s a common “peek-a-boo” scenario:
Development (glowing with pride in the business functions they’ve produced): “Here’s the application”
Operations: “Hey, this doesn’t work”
Development: “What do you mean? It works just fine”
Operations: “Well it doesn’t run with our deployment stack”
Development: “What deployment stack?”
Operations: “The stuff we use to push all of the production infrastructure”

The non-functional requirements become late cycle “peek-a-boo” requirements when they aren’t addressed early in development. Late cycle requirements violates continuous integration and agile development principles.  The production tooling and requirements have to be accounted for in the development environment but most enterprises don’t do that. Since the deployment requirements aren’t covered in dev, what ends up happening is that the operations staff receiving the application has to do innovation through necessity and they end up writing a number of tools that over time become bigger and bigger and more of a problem which Alex described last year in the Stone Axe post. Deployability is a business requirement and it needs to be accounted for in the development environment just like any other business requirement.

 

Damon:
Deployment seems to be topics of discussion that are increasing in popularity… why is that?

Lee:
Deployability and the other non-functional requirements have always been there, they were just often overlooked. You just made do. But a few things have happened.

1. Complexity and commodity computing, both in the hardware and software, has meant that your deployment is getting to the point where automation is mandatory. When I started at E*TRADE there were 3 Solaris boxes. When I left the number of servers was orders and orders of magnitude larger (the actual number is proprietary). Since the operations staff can’t afford to log into every box, they end up writing tools because the developers didn’t give them any.

2. Composite applications, where applications are composed of other applications, mean that every application has to be deployed and scaled independently.  Versioning further complicates matters. Before the Internet, in the PC and mainframe industries, you were used to delivering and maintaining multiple versions of software simultaneously. In the early days of the Internet, a lot of that went away and you only had two versions — the one deployed and the one about to be deployed. Now with the componentization of services and the mixing and matching of components you’ll find that typically you have several versions of a piece of infrastructure facing different business functions. So you might be running three or four independently deployed and managed versions of the same application component within your firewall.  

3. Cloud computing takes both complexity and the need for deployability up another notch. Now you are looking to remote a portion of your infrastructure into the Internet. Almost everyone who is starting up a company right now is not talking about building a datacenter, they are all talking about pushing to the cloud. So deployability is very much at the forefront of their thinking about how to deliver their business functions.  And the cloud story is only beginning. For example, what happens when you get a new generation of requirements like the ability to automate failover between cloud vendors?

 

Damon:
Testing is one of those things that everyone knows is good, but seems to rarely get adequately funded or properly executed. Why is that?

Lee:
Well, like many things it’s often simply a poor understanding of what goes into doing it right and an oversimplification of what the business value really is.

Just like deployment infrastructure, proper testing infrastructure is a distributed application in of itself. You have to coordinate the provisioning of a mocked up environment that mimics your production conditions and then boot up a distributed application that actually runs the tests. The level of thought and effort that has to go into properly doing that can’t be overlooked. Well, not if you are serious about delivering on quality of service.

While integration testing should be a very important piece of your infrastructure, the importance of antagonistic testing also can’t be overlooked. For example, the CEO is going to want to know what happens when you double the load on your business. The only way to really know that is to have a good facsimile of your business application mocked-up and those exact scenarios tested. That is a large scale application in of itself and takes investment.

Beyond service quality, there is business value in proper testing infrastructure that is often overlooked. When you start to build up a large quantity of high fidelity tests those tests actually represent knowledge about your business.  Knowledge is always an asset to your business. It’s pretty clear that businesses who know a lot about themselves tend to do well and those who lack that knowledge tend not to be very durable. 

 

Damon:
The culture of any large company is going to restrictive. Large financial institutions are, by their very nature, going to be more restrictive. Coming out of that culture, what are you most excited to do?

Lee:
Punditry, blogging and using social media to start! You really can’t do that from behind the firewall in the FI world. There are legitimate reasons for the restrictions, and I understand that. Because you have to contend with a lot of regulatory concerns, you just aren’t going to see a lot of financial institution technologist ranting online about what is going on behind the firewall. So I’m excited about becoming a producer of social media content rather than just a consumer.

I also find consulting exciting. It’s been fun getting out there and seeing a variety of companies and how similar their problems are to each other and to what I worked on at E*TRADE. It reminds me how advanced E*TRADE really is and what we had to contend with. I enjoy applying the lessons I’ve learned over my career to helping other companies avoid pitfalls and helping them position their IT organization for success.

Are sys admins soon to be relics?

Are sys admins soon to be relics?

12

Damon Edwards / 

One of the ideas that can be extrapolated from the positions of the “infrastructure as code” crowd, is that the future of systems administration will look dramatically different than it does today.*

The extreme view of the future is that you’ll have a set of domain experts (application architects/developers, database architects, storage architects, performance management, platform security, etc.) who produce the infrastructure code and everything else happens automatically. The image of today’s workhorse, pager wearing, fire extinguishing sys admin doesn’t seem to have a role in that world.

Of course, the reality will be somewhere in the pragmatic middle. But a glimpse of that possible future should make sys admins question which direction they are taking their job skills.

I finally got around to digging into the conference wrap up report that O’Reilly publishes after its annual web operations conference, Velocity. Most of it was the standard self-serving kudos. However, the table below really caught my eye and inspired me to write this post.

Attendee Job Titles (multiple answers accepted)

  • Software Developer 60%
  • IT Management/Sys Admin 27%
  • Computer Programmer 20%
  • CXO/Business Strategist 19%
  • Web/UI Design 17%
  • Business Manager 16%
  • Product Manager 10%
  • Consultant 9%
  • Entrepreneur 8%
  • Business Development 4%
  • Community Activist 3%
  • Marketing Professional 2%
  • Other 5%

Now of course you have to look at this data with a cautious eye. People were asked to self-describe, you could select multiple titles, some people where attending to learn about design tricks for faster page load times, and most people blow through marketing surveys without much care. However, it did catch my eye that somewhere between 60 – 80% described themselves as having a development role. Only 27% described themselves as having a sys admin role.

Now is it a big leap to point to this data as an early warning signal of the demise of the traditional sys admin role? Probably… but it fully jibes with the anecdotal evidence we saw around the conference halls. From large .com employees (Facebook, Twitter, Google, Flickr, Shopzilla, etc..) to the open source tools developers, the thought (and action) leaders were developers who happened to focus on systems administration, not systems administrators who happened to have development skills.

* Disclosure: I’m a member of the infrastructure as code crowd

The CIO / Ops Perception Gap

The CIO / Ops Perception Gap

12

Alex Honor / 

Every IT manager should read this article: Making Business Service Management a Reality . I think the original title, “BSM Evolution – The CIO / Ops Perception Gap”, more accurately reflects the essence of the issues it draws out.
* CIOs prematurely believing they have a handle around running their software services
* VPs of Ops afraid to admit that they’ve just begun a long journey that assumes continuous improvement approaches and no one time fixes
* No clear visibility from the biz level into the level of quality of service operations delivered by the CIO on down through the tech management ranks
* The need to focus on fundamentals

The article made me think of the strategies put forth in the Visible Ops book. But, even more so, it really indicates the need for true visibility into how ops is conducted at every level (no obfuscation tolerated).

Web Operations: the canary in the IT Management coal mine

Web Operations: the canary in the IT Management coal mine

34

Damon Edwards / 

Rob England (The IT Skeptic), recently wrote some very nice things about this blog.

After I got over the fact that one of my favorite bloggers is writing about this blog, I realized that his post does raise a good question: If good IT Management is good IT Management not matter what business you are in, why does this blog focus so much on the Web Operations perspective?

Part of the reason is that Web Operations is the world that Alex and I live in on a daily basis (via ControlTier… helping e-commerce and SaaS companies improve the efficiency and reliability of their operations).

The other part of the reason is that we see Web Operations as the canary in the coal mine for IT Operations. When a company’s entire business is operating software as revenue producing service, the shortcomings and the successes of your IT Operations goes right to your bottom line. The tolerance for the status quo dissipates a lot quicker and there is stronger political will to think outside of the box.

Put it this way, pretend you’re the CEO of a Fortune 100 size company that makes aircraft engines or automobiles. Where is improving the efficiency and reliability of your IT Operations going to fall on the list of things you worry about every day? 32 on a top 50 list might be generous.

Now pretend you are the CEO of an online company whose sole source of revenue comes from what you can generate through your website. Suddenly the efficiency and reliability of your IT Operations jumps to near the top of the list.

Update: While people point out to me that I’m stretching the “canary in a coal mine” metaphor a bit far… I’m loading The Police’s Zenyatta Mondatta album into my iTunes.

Web Operations: Are you developing an asset or a liability?

Web Operations: Are you developing an asset or a liability?

15

Damon Edwards / 

“Buy vs. Build”. It’s a term you hear repeatedly with it comes to businesses weighing their options for application and systems management solutions. But as anyone who spends time in the web operations trenches knows, the reality is always something closer to “build vs. build”. Buy something from a software vendor, use open source tools, develop something from scratch – in each situation there just isn’t a one size fits all option and there is always going to be custom integration involved. This reality was previously covered in Alex’s “Stone Axes” post.

So being resigned to the fact that there is a “build” aspect to any solution, the next critical choice then becomes what guidelines you impose on your organization to steer their design choices. The most pervasive design criteria seems to be technical completeness or elegance. From a technical architect’s purist point of view this makes sense; but what this often fails to take into account is the business impact of those technical decisions.

While many technical design options might seem to have identical business impact on day 1 (they cost roughly x to develop and provide feature y), what are the true cost of those decisions down the road? Have those decisions put the company in a position to continuously leverage those design choices into increasingly greater returns? Or have those decisions placed an anchor around the company’s neck that they will be weighted down by, and paying for, well into the future? To put it into loose economic terms: have you developed an asset or a liability for your company?

What would be an example of building asset? Using off the shelf open source tools and only developing thin layers of integration where they need to plug into your existing systems.

What would be an example of building a liability? Writing a custom system that mirrors the available functionality of existing off the shelf tools, thereby saddling your company with the sole responsibility for the forward progress of the design and maintenance of that tooling.

The asset vs. liability concept is one that obviously needs to be flushed out quite a bit more. In any case, it’s shocking how infrequently companies actually analyze the long-term business impact of the technical design decisions made about their tooling.

(Note: Thanks to Lee Thompson for framing this as an asset vs liability debate)

Page 4 of 10First23456Last