Retrofitting code analysis to legacy projects – my 4 step strategy for long-term success

This entry is part of a 3 part series: Roslyn-based code analysers for the win!

In my last post, I talked about why I love code analysers. Towards the end, I mentioned how easy it is to get started, but I glossed over the fact that flipping the switch on an existing application can be a pretty harrowing experience. In today’s post, I hope to make the jump a little easier by walking you through a 4 step strategy that I’ve found helpful on a couple of projects.

In this post, I presume some knowledge of Roslyn-based code analysers. If you feel like this doesn’t apply to you, it might be worth starting with my introductory post on the subject.

Image courtesy of the US Air Force.

OK, so you’ve flipped the switch and enabled code analysers on your creaking legacy project. Good for you! But, oh boy, that’s a lot of warnings… What are you going to do?

Step 0: Self-care

Firstly there are the inevitable feelings of guilt and shame. Find a mirror. Look at yourself in it. Tell yourself: “this is OK”. Remember that the code analysers demand their own (very opinionated) version of perfection and accept that it would have been impossible to make your code fully conformant to this magnificent vision without their help.

Secondly, you’ll feel daunted by the sheer scale of the work needed. You’ll feel like you’re drowning in a sea of warnings. Don’t be deterred! The approach I’m going to share involves breaking the problem down into bite-size chunks and dealing with the backlog on your own terms.

Thirdly, remember this is a team game. If you took the advice from my previous post, you’ll have won the buy-in from your colleagues already. Sooner or later they’re going to have one of their builds fail due to the new rules, so you need to ensure you don’t let them become your rules. Luckily, there’s a bit of synergy here: The best way to keep them feeling positive about the project is also the best way to preserve your sanity – keep them engaged by making it a team effort.

Step 1: Limit your scope

Don’t try to eat the apple in one go. We’re going to divide and conquer. Remember, you’re starting from nothing. You don’t have to fix it all in one go. Pick somewhere to start and come back to the rest later.

Start with a single project

If you’re working on a big legacy project, you likely have more than one project. My advice would be to pick just one to start with. The ruleset file we’ll be editing is scoped to the project level (i.e. not solution-wide), so starting with one will save you from the potential confusion of managing multiple rulesets at once.

In practical terms, limiting the scope of a code analyser should be as simple as managing to which projects the analyser package is added. As I found out recently, a bug somewhere in the build chain (NuGet?) means that analysers “flow” along project dependencies. Luckily, the fix is simple: just add PrivateAssets="All" to the relevant PackageReference elements in your project file.

Start with a single analyser

As well as the venerable FxCop (ported to Roslyn), there are a fair few Roslyn code analysers out there. Some are pretty specific, but others are quite general. It can be hard to know which ones to use. I bet you can guess my advice here: just pick one to start with.

I’d suggest making your initial analyser a fairly general one. I find the Microsoft FxCop rules a pretty good starting point usually. That said, if you wanted a gentle introduction, StyleCop violations tend to be really easy to fix (add a space here, remove an empty line there…), so it could also be a good first analyser. In fact, I’m using StyleCop in the demo project for this post, because it’s so easy to grok.

Image courtesy of Zinneke on Wikimedia Commons.

Step 2: Taking stock

Now that we’ve selected a project to work on first, we’re ready to get our hands dirty. But before we start fixing things, it pays to fully understand the problem.

Since the ruleset file is where the rules are managed, I like to use it as the documentation too. It means that you’ve got one place where you can see the status of each of the rules that your project is violating.

What we’re going to do is take the build output and use a little Excel-fu to generate the contents of our ruleset file. We’re going to use XML comments to help us organise it.

  1. Copy & paste the output from MSBuild into Excel, or something like it.
  2. Munge the data so that you’ve got a de-duped list of the IDs (e.g. CA2140) of all the messages.
  3. Count how many instances you have of each message ID. Sort your list on this. Having a sense of the “length of the long tail” can help you decide where to start.
  4. Use your list to generate lines to go into your custom ruleset. For now, set them all to “none” to have them ignored.
  5. Re-build your project and see that all the messages have disappeared. Breathe a sigh of relief.

Don’t underestimate the importance of this step. Whilst your code isn’t any cleaner, you know now the extent and nature of the problems that you face. Effectively, you’ve just completed an audit of your code. What’s more, you’ve started your analysis journey without blowing up your project.

BONUS: I’ve included an example Excel file in the demo project on Github. It does the job as it is, but should also be dead simple to tweak for your own needs.

Step 3: Lather, rinse, repeat…

Now that we’ve populated our ruleset file, it can serve as both our to-do list and our map. By including the counts of each rule violation, we can see pretty easily which are widespread and which are just the “long tail”. Viewing it in this way is much easier than trying to work with the build output directly.

And we’re going to carry on using the ruleset file too. As we fix the violations of each rule, we’ll move it from the (initially) long list of ignored rules to one of two more prominent lists: one for enforced rules and one for rules we’ve actively decided to ignore going forward. I like to demarcate my lists using comments and whitespace. As an example, check out the ruleset file in the example project, as it was after I fixed the first issue

Start small

To get started, I recommend picking a rule from the bottom of the list – the one with the smallest number of violations. (In the demo project I picked one from the top to make the changes more obvious.)

Once you’ve picked one, go and look it up in the documentation and make a decision. Does this rule apply to you? Will fixing it yield enough benefit to warrant the pain of fixing it and keeping it fixed? It can be hard to find a balance, but remember that these rules are generally written by Smart People (I mean, that’s why we’re implementing them, right?) and make sure you understand why a rule is suggested before dismissing it. I tend to ignore rules very rarely.

One by one

If we are going to ignore a rule, we can simply move it to the ignore list. Generally, I’ll leave the initial violation count in the comment. I will also always add a short description of the rule (because anyone who tells you they can remember what SA1508 means is a dirty liar).

If we decide we’re going to fix the violations of a rule, we’ll start by changing the rule to “warning” and rebuilding the project. This will tell us where the issues are. We already know what the problem is (because we read the docs and made an informed decision to fix it), so the next step it to get stuck into the code file and get it sorted. Finally, rebuild the project to verify that the issue is resolved and the warnings are gone.

Congratulations! You’ve taken your first step towards a cleaner, more robust codebase.

Your next steps are (as the title of this section implies) going to be pretty repetitive in terms of process, but hopefully interesting in terms of the rules themselves. Choose another rule to fix and get stuck. Soon you’ll be on a roll.

This lovely photo is Joel Telling’s daughter’s first shampoo mohawk. Thanks for sharing Joel!

Our algorithm in full

Let’s recap. Our full algorithm for working our way through our rule violations is as follows:

  1. Pick a rule from the unreviewed list
  2. Consult the docs and decide whether to fix or ignore
  3. If “ignore”, then…
    1. Move it to the “ignore” list in your ruleset
    2. Add a note explaining the rule (and why it’s being ignored)
  4. If “fix”, then…
    1. Change it from being ignored (Action="None") to a warning (Action="Warning).
    2. Build the project.
    3. Get the location of the violation(s)
    4. Fix it/them
    5. Build the project again to confirm the fix
    6. Move the rule to the “fixed” section of your ruleset
  5. GOTO 1

This approach will mean you can see progress really rapidly. Moving items from one list to another makes it really obvious.

Step 4: The new normal

Once you’ve fixed a few rules, you’ll want to merge your changes back to a shared branch. Just make sure you keep communicating with your colleagues about the fact that you’re applying these rules. You don’t want anyone to be surprised when their code starts producing warnings.

You need to maintain your momentum now. Perhaps agree as a team to each knock two rules off the unreviewed list each iteration. Or set time aside to pair up and work through them. However you want to do it, make sure you continue to engage your team whilst you collectively improve your code.

Photo courtesy of Indi Samarajiva on Flickr.

Warnings are errors?

Now would also be a good time to discuss whether to start treating warnings as errors. It’s all too easy to ignore warnings at build time – especially if they are “just style warnings”. When the build fails, though, people pay attention.

It’s easy enough to enable this for your project. Just add the following line to your project file:

<TreatWarningsAsErrors>true</TreatWarningsAsErrors>

This works really well with CI systems – and especially with protected branches – as it basically prevents non-compliant code from escaping into the shared codebase.

Wrapping up

Woah – this started as a bunch of scrawled notes but ended up being a pretty long post. Thanks for sticking with me.

The first time I tried to add code analysis to an existing project, I just worked my way through the whole thing in one go. It was a soul-destroying marathon because I didn’t know the scale of the problem at the start and couldn’t appreciate my progress as I plodded blindly along. Frankly, if it hadn’t been just me by myself (and a relatively small project) there’s no way I’d have been able to get through it all.

Since then, I’ve used the technique I describe above with a couple of teams. On the whole, it’s worked pretty well. The main lessons I’ve learnt are:

  • You need the support of your colleagues for this to succeed
  • Know what you’re facing and keep a visible record of your progress
  • Don’t be put off by the scale of the task – break it up into small pieces
  • Don’t let prioritisation derail you – getting started is more important
  • The hard part is keeping it up, so think about changes in habit or process to sustain your endeavours

I really hope this helps someone. If it helps you, maybe let me know in the comments. For now, though, I’m off to fix some more rules in my own code. It turns out that it can be pretty addictive when you get the process right!

1 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.