It all comes down to two things
- I get to work with people who really love what they do.
- I get to work with people who are insanely open to change.
It all comes down to two things
Ask anyone about hiring developers and the advice is always the same ‘only hire the best’. The principle reasons being that
On the face the face of it this seems like great advice, who wouldn’t want to hire the best? It turns out pretty much everybody.
For instance, how long are you willing to wait to fill the position? What if you are really really stretched? What if you’re so stretched that you worry for existing staff? What if hiring a specific individual will mean huge disparities in pay between equally productive staff? What if not making the hire is difference between keeping a key client or losing them? At some point every company has to draw a line and elect to hire ‘the best we’ve seen so far’.
The difference between the great companies and the rest is how to deal with this problem. Great organisations place recruitment at the centre of what they do. If hiring is genuinely everyone’s number one priority then hiring the best becomes more achievable. For starters you might even have half a chance of getting ‘the best’ into your interview room in the first place.
It doesn’t matter if you get there, every step along the way is an improvement.
Ever since coming across the idea on Eric Ries’s blog I’ve always been a big fan of Continuous Deployment. For those unfamiliar with the term, it means writing your code, testing frameworks and monitoring systems in such a way that it is possible to completely automate the process of going from source control commit to deployment to a live system without posing a quality melt down. This means teams can find themselves deploying 50 times a day as a matter of course.
It’s not without it’s critics, and a lot of people see this as one way ticket to putting out poorly tested buggy code. I think that those folk completely miss the point and that in many scenarios in fact the opposite is true. The thing I really like though, is that, whether or you ever get to the point of automatically deploying every commit to live, every step that you might take to get there is hugely positive.
So, really, what would have to happen in order to employ a Continuous Deployment regime?
18 months ago my then team started to take this idea more seriously, I thought it would be interesting to give an overview of the steps taken towards Continuous Deployment, and since we’re certainly not there yet, what we plan to do in the future.
We started from a point where we would release to live environment every few weeks. Deployments, including pre and post deploy testing could take two people half a day sometimes more. I should also say that we are dealing with machine to machine SaaS systems where the expectation is that the service is always available.
Our first efforts aimed to reduce the human load on deployment through automation. Fear meant that we still needed to ssh into every node to restart but every other step was taken care of. This meant that it eventually became common place to deploy multiple times a week across multiple platforms.
Once a deploy was live we were still spending considerable time on behaviour verification. To address this we worked to improve our system and load testing capability. Doing so meant that we had more time to manually verify deploy specific behaviour, safe in the knowledge that the general behaviour was covered by the tester.
This approach also requires a high level of trust in system monitoring. We have our own in house monitoring system whose capabilities we expanded during this period. In particular, we improved our expression language to better state what constituted erroneous behaviour and we also worked on better long term trend analysis, taking inspiration from this paper . It’s no surprise to me that it came out of IMVU who have been practicing Continuous Deployment for a long time.
Since the act of deployment was now much less expensive we looked to reduce the number of changes that went out in each deploy. At first this felt false, after all if the user can’t use the feature in it’s entirety, what’s the point? We soon realised that smaller chunks were easier to verify and sped us up over time. We took an approach that I’ve since heard referred to as ‘experiments’ so that new functionality could be deployed live but was hidden from regular users. It meant that we could demo new functionality in production, without disrupting the business as usual service.
Breaking down deploys into a few day’s worth of work also improved our lead time meaning that we could more responsive in the event of a change of plan. It was during this period that we switched from time boxing to Kanban. This is interesting since Continuous Deployment is often championed by the lean startup movement.
More recently, actively pursuing Continuous Deployment has taken a back seat, but the next logical steps could be to further flesh out the system test coverage and then look to completely automate deployment to the staging environment (modulo database changes).
However, it doesn’t really matter what we do next, if it takes us a step closer to theoretically being able to deploy continuously it will undoubtedly improve our existing lead time and responsiveness.
This post contains a number of Continuous Deployment resources, but a few further articles I found to be interesting follow include:-
In the previous post, Technical Debt is Different, I talked about the need to treat management of technical debt as a separate class of problem to that of feature requests generated outside of the team.
As with any project above a certain size, team collaboration is key, and that means having a reliable method of prioritising technical debt that the whole team can buy into. This post will describe a method that I have been using over the past year that satisfies this need.
I was new to my current the project and wanted to get an idea from the team of the sorts of things that needed attention. I mentioned this just before lunch one day and by the time I got back from my sandwich I had an etherpad with over 100 hundred items. By the end of the afternoon, I discovered that etherpad really doesn’t deal with documents above a certain size.
It was clear that we needed to a way to reference and store these ideas, I had two main requirements.
The first step was to go through the list and group items into similar themes, this helped identify duplicate or overlapping items. At this stage some items were rewritten to ensure that they were suitably specific and well-bounded.
Now that we had a grouped list of tasks it was time to attempt to prioritise. As discussed in the previous post, prioritising refactoring tasks can be challenging and passions are likely to run high. I felt that rather than simply stack ranking all items, it was better to categorise them against a set of orthogonal metrics. This led to a much more reasoned (though no less committed) debate of the relative merits of different tasks.
Every item was classified according to:-
The simplest metric, this is a very high level estimate of what sort of size the item was likely to be. Estimating the size helped highlight any differences in perceived scope, and in some cases items were broken down further at this point. Size estimation works best when estimates for tasks are relative to one another, however to seed the process we adopted the following rough convention.
Timeliness speaks of how the team feels about the task in terms of willingness to throw themselves into it. Items were assigned a timeliness value from four options.
How much will the team benefit from the change? Is it an area of the code base that it touched often? Perhaps it will noticeably speed development of certain types of features. It could be argued that Value is the only metric that matters, however Value needs to be considered in the context of risk (addressed through timeliness) and effort (addressed through size).
All items for a given Timeliness are measured relatively and given a score of ‘High’, ‘Medium’, ‘Low’. Low value items are rarely tackled, and even then, only if they happen to be in the Opportunity category.
Once all items had been classified, it is time to visualise the work items. To do this we transferred the items to cards and stuck them to a pin board, with timeliness on the horizontal axis and value on the vertical axis (each card contained a reference to the task size). Now it was possible to view all items at once, and from this starting point much easier to make decisions over which items to take next.
Since the whole team had contributed to the process it was clear to individuals why, even though their own proposals were important, that there was greater value in working on other items first. Crucially, we also had a process to ensure that these mid-priority items were not going to be forgotten and trust that they would be attended to in due course.
When a task is completed, we place a red tick against it to demonstrate progress, this helps build trust within the team that we really are working off our technical debt. Sometimes a specific piece of work, as a side effect, will lead to the team indirectly making progress against a technical debt item. When this happens we add a half tick, indicating that this card should be prioritised over other similarly important items so that we get it finished off completely.
This system is effective in reducing the stress that comes with managing technical debt and provided a means for all the team to have a say in where the team spent their effort. However, one area where it is weak is in managing very small, relatively low value tasks that can be completed in an hour or so. Examples might include removing unused code, reducing visibility on public fields, renaming confusingly named classes – in essence, things that you might expect to happen as part of general refactoring were you already working in the area.
To manage these small easy wins, the team maintains an etherpad of ‘Tiny Tasks’ and reviews new additions to the list on a weekly basis. The rule is that if anyone considers a task to be anything other than trivial it is thrown out and considered as part of the process above. These tasks are then picked up by the developer acting as the maintainer during the week.
Generally it is easier if an individual has final say of the prioritisation of tasks, in the case of technical debt this is harder since the whole team should be involved. Therefore, a trusted method of highlighting and prioritising technical debt tasks is needed. By breaking down the prioritisation process into separate ‘Size’, ‘Timeliness’ and ‘Value’, it was possible to have more reasoned discussion over the relative merits of items. Visualising the items together at the end of the categorisation process enables the team to make better decisions over what to work on next and builds trust that items will not be simply forgotten. Very small items can still be prioritised if the team agrees that they really are trivial.
Technical debt is great metaphor to describe what happens to your code base if you don’t continually keep it clean and tidy.
Any Software team accrues technical debt either intentionally to satisfy a short term win, or unintentionally as the design and requirements of a system drift over time. It’s a subject that has been written about extensively and I particularly liked NRG’s attempts to track it as part of their weekly metrics (you’ll want slide 9).
The thing I’ve noticed about servicing technical debt is that it is very different from other work a team might undertake and that it requires an alternative approach to manage it. The principal differences that I see are:-
I am fortunate to work for a company with strong engineering leadership that acknowledges and makes provision for the servicing of technical debt. However, even if the argument for technical debt has been won, deciding how best to tackle debt can be highly contentious and in some cases destructive.
The biggest problem is that of prioritisation. In many agile teams you would hope to have a single product owner who can make prioritisation decisions for product features, in practice this can be hard for an organisation to provide but the key point is that it’s important to minimise the number of final decision makers.
In the case of technical debt it is the dev team that decides, which means thrashing out the priorities across the entire team. Each developer will have a different, often very strong view, on what is important and arriving at a conclusion can be a long and painful journey. Additionally, existing project prioritisation tools such as MoSCoW do not lend themselves to technical debt prioritisation.
A trusted means to prioritise tasks makes it possible to identify a team wide strategy. Without a clear strategy there is the temptation for individuals to ‘go it alone’, this means that over time the overall impact is reduced. Firstly, larger items that are too big for one person are ignored and secondly, if it is not possible to decide on what is important, then collaboration becomes difficult. This means that the impact of smaller items is also diminished since they will feed into the individual developer’s strategic vision rather than that of the team’s overall vision. This in itself can become toxic as it breaks down trust within the team and further hampers collaboration. Rachel Davies has a great post describing the effects of self orientation on team trust.
The fact that technical debt is being tackled at all is a good thing, but it would be nice to do this in an efficient a way as possible. My team and I already spend a significant amount energy on improving our ability to deliver valuable software in a consistent fashion and our approach to managing technical debt should be no less disciplined. The only difference is that this time around, we are our own customer.
It’s clear that some form of prioritisation method is necessary, but committees are generally not a good way to make decisions. One approach is to assign a final decision maker, perhaps a tech lead or senior member of the team, but I really want a system where the entire team buys into the process. If the process is right then it should be rare for someone to have to say ‘this is how it is’.
Over the past year I’ve been working on a system to better manage my team’s technical debt, in my next post I’ll go on to explain the approach.