The FreeAgent code base hails back to 2006, when we were running with Rails 1.0 and Ruby 1.8.6.
Since that time Ruby and Rails have undergone several transformations. We want to ensure FreeAgent is always running with the latest security patches installed, we want to take advantage of the latest versions of available plug-ins, and we want our developers to be using the most up-to-date libraries, so we've always migrated to the latest version of Ruby and Rails when the releases are stable. This future-proofs our code, reduces technical debt and allows us to aggressively develop FreeAgent.
Our engineers have recently been working hard to migrate FreeAgent from Rails 2.3 to 3.0 and from Ruby Enterprise Edition 1.8.7 to Ruby 1.9. These new technologies offer significant benefits from a development and performance perspective which we want, and arguably need, to take advantage of, but it has been a significant undertaking. Much more so than any previous update.
To put this work into perspective, FreeAgent currently contains over 93,000 lines of code - 31,000 lines of application code and 62,000 lines of test code. The nature of the changes in Rails 3 and Ruby 1.9 meant that to migrate FreeAgent over has taken 974 separate commits and changing over 1,500 files. That's a lot of changes to review and QA, which is why it's taken time.
This work was completed over the past two months and in parallel with the development, our QA team spent a significant amount of time testing, and re-testing, every single page in FreeAgent. Similarly, our platform team has been preparing for the large number of infrastructure/system changes required as part of this upgrade. We felt happy we had thoroughly covered all angles and eventualities.
In the early hours of Friday morning we deployed this change. All went smoothly until some users started reporting rendering issues a few hours after the release went live. We immediately investigated and it quickly became apparent that there was a character-encoding issue introduced by the move to Ruby 1.9 which was affecting some accounts. At that time we couldn't quickly diagnose how widespread this problem was, so we made the decision to rollback the release rather than spend time attempting to solve the issue at a peak period. We had provisioned for this rollback but it took longer than we had estimated to execute. Once we had rolled back successfully, we then took the time to carefully examine data integrity to ensure there was indeed no corruption before going live again. This extended the downtime period but allowed us to go live again with the confidence that everything was operating normally.
Over the next few days we'll be addressing this problem area and completing the upgrade shortly after.
Unplanned downtime is something we work extremely hard to prevent and it's hugely frustrating for us and our customers when it happens, so we would like to extend our apologies for the bother and frustration that this downtime period may have caused you this morning.
There is a lot we can take away from this experience to guarantee a similar issue doesn't arise in the future, and we’ll be going through this in detail in the coming days. We're working extremely hard to build and maintain a scalable and robust platform whilst we experience high growth, and this demands that a lot of technical changes go on behind the scenes which I'll be blogging more about soon. These changes should be transparent and go entirely unnoticed by customers, but clearly this time we haven’t been able to achieve that aim and we’re hugely disappointed as a result.
I hope this clarifies the reasons behind the downtime and also reassures you that we have an excellent team working behind the scenes building and maintaining our service upon which so many businesses now rely. Thanks for your understanding and for continuing to support FreeAgent.
Olly CTO, FreeAgent