February 2011

What makes a Hero?

Chris put forward the idea of the ‘Hero’, an unofficial role assumed by an experienced developer critical to the project. There are positive aspects to the role, this is person turned to in the moment of crisis when something must be fixed ‘right now’, they are the person with the deepest knowledge of the system and an indispensable contributor to the project. As with many things, however, this strength can also be a weakness. The feeling of being indispensable is very powerful and freely sharing knowledge and collaborating with other less experienced/capable team mates only undermines this feeling. If the Hero is no longer the only person who can fix the problem, surely that makes them less important?

In extreme cases the presence of the Hero becomes toxic, reluctance to collaborate is unpleasant, but active obstruction and obfuscation is something else. At this point the team/project has some serious problems. On the one hand it is doomed to failure without the Hero, on the other-hand, the ability of the group to act as a team evaporates and progress on the project is brought to all but a standstill.

What to do when a Hero goes bad?

We spoke for about an hour on the subject and while there were one or two examples that partially dealt with the problem, ranging from ‘move them to another project’ through to ‘fire them’ (!), no-one in attendance was able to provide a truly positive example of recovering from such a scenario.

I am fortunate in never having worked with anyone quite as extreme as the examples presented in the session, but where I have seen glimpses of this behaviour my sense is that overwhelmingly, the behaviour is the product of environment rather than necessarily any failing on the part of the individual.

The participants in the session, seemed to be made up of managers/coaches rather than out and out developers, which may explain why much of the discussion seemed to presuppose that the fault lay solely with the Hero.

Common environmental factors that I have observed include:-

Perceived ambiguity over who has technical responsibility for a system

Poor performance feedback and/or poorly communicated career development

Lack of trust/respect amongst team mates

Seemingly overwhelming operational issues

Compensation schemes pitting team mates against one another

It is the manager not the Hero who has most influence over these points. So I think that before answering the question ‘What to do when your Hero goes bad?’, a better question, as a manager is ‘What have I done wrong to allow my Hero to go bad?’.

Focus on Teams

Unhelpfully, as with all management problems, the best way to solve the problem is not to have it in the first place. Placing greater emphasis on the performance of the team rather than on the individual can help here. Any action that benefits the whole team is recognised and celebrated so the Hero need not lose prestige by supporting those around him. In fact the Hero’s standing increases since he is now multiplying his own capability through increasing the skills of the team. As a side effect, since the team is now more capable the Hero has more time to spend on the truly difficult problems, which in time, he will pass onto the rest of the team.

Switching focus away from individuals and towards the team is a non trivial exercise, but if the agile movement has brought us anything it is methods to engender collaboration, trust and team level thinking.

It doesn’t matter if you get there, every step along the way is an improvement.

Me, praising Continuous Deployment

Ever since coming across the idea on Eric Ries’s blog I’ve always been a big fan of Continuous Deployment. For those unfamiliar with the term, it means writing your code, testing frameworks and monitoring systems in such a way that it is possible to completely automate the process of going from source control commit to deployment to a live system without posing a quality melt down. This means teams can find themselves deploying 50 times a day as a matter of course.

Yeah

It’s not without it’s critics, and a lot of people see this as one way ticket to putting out poorly tested buggy code. I think that those folk completely miss the point and that in many scenarios in fact the opposite is true. The thing I really like though, is that, whether or you ever get to the point of automatically deploying every commit to live, every step that you might take to get there is hugely positive.

So, really, what would have to happen in order to employ a Continuous Deployment regime?

18 months ago my then team started to take this idea more seriously, I thought it would be interesting to give an overview of the steps taken towards Continuous Deployment, and since we’re certainly not there yet, what we plan to do in the future.

We started from a point where we would release to live environment every few weeks. Deployments, including pre and post deploy testing could take two people half a day sometimes more. I should also say that we are dealing with machine to machine SaaS systems where the expectation is that the service is always available.

Reduce manual deployment load

Our first efforts aimed to reduce the human load on deployment through automation. Fear meant that we still needed to ssh into every node to restart but every other step was taken care of. This meant that it eventually became common place to deploy multiple times a week across multiple platforms.

Improve system test coverage

Once a deploy was live we were still spending considerable time on behaviour verification. To address this we worked to improve our system and load testing capability. Doing so meant that we had more time to manually verify deploy specific behaviour, safe in the knowledge that the general behaviour was covered by the tester.

Improve system monitoring

This approach also requires a high level of trust in system monitoring. We have our own in house monitoring system whose capabilities we expanded during this period. In particular, we improved our expression language to better state what constituted erroneous behaviour and we also worked on better long term trend analysis, taking inspiration from this paper . It’s no surprise to me that it came out of IMVU who have been practicing Continuous Deployment for a long time.

Reduce deploy size

Since the act of deployment was now much less expensive we looked to reduce the number of changes that went out in each deploy. At first this felt false, after all if the user can’t use the feature in it’s entirety, what’s the point? We soon realised that smaller chunks were easier to verify and sped us up over time. We took an approach that I’ve since heard referred to as ‘experiments’ so that new functionality could be deployed live but was hidden from regular users. It meant that we could demo new functionality in production, without disrupting the business as usual service.

Embrace lean inspired methodology

Breaking down deploys into a few day’s worth of work also improved our lead time meaning that we could more responsive in the event of a change of plan. It was during this period that we switched from time boxing to Kanban. This is interesting since Continuous Deployment is often championed by the lean startup movement.

The future

More recently, actively pursuing Continuous Deployment has taken a back seat, but the next logical steps could be to further flesh out the system test coverage and then look to completely automate deployment to the staging environment (modulo database changes).

However, it doesn’t really matter what we do next, if it takes us a step closer to theoretically being able to deploy continuously it will undoubtedly improve our existing lead time and responsiveness.

This post contains a number of Continuous Deployment resources, but a few further articles I found to be interesting follow include:-

Hell is for Heroes