A few months back I shared how I used Trucker to migrate data from a legacy Rails application into a more current one. This method works pretty well if your data is reasonably straightforward, but as I noted it hiccups on a couple of things:
userstable—rather than overwrite, I need to acknowledge the existing users and process accordingly.)
Address. The best solution is again to tweak the gem’s code.
These issues, combined with a much more complex data structure (basically a total re-engineering of the data layer), led me to look into other options for legacy data migration for my next project. As it turns out, it’s not terrifically difficult—it’s just a matter of setting up some Rake tasks, mapping old data to new, and paying attention to the details.
New to creating your own Rake tasks? Review this Railscasts episode on developing your own custom Rake tasks. It’s an important skill for any Rails developer.
The best place to start with this approach legacy data is Zach Holman’s Impress the Ladies with Legacy Migrations. It outlines a simple-but-effective strategy: Create Rake tasks for each model you need to migrate, create an ActiveRecord class for the model, and customize as needed. I added a few of my own takes on the process:
Rather than creating temporary tables in my production database, I decided to establish a separate connection to my legacy database. You could establish a direct connection to the live database; instead I opted to first
mysqldump the data, copy it to my development computer, and set up a local copy. This makes the migration process a little quicker and mitigates against accidentally doing something nasty to live data.
To connect it to my Rails application, I used a procedure I learned from Chad Fowler’s Rails Recipes, 3rd Edition (get early beta access now from Pragmatic Programmers). First you create the connection in your
Then access that database from each legacy class—for example:
You can name each legacy class whatever you want, as long as it’s not the same as a class in your new application—you’ll need to access the new application’s classes to actually move data. The first step is to tell ActiveRecord the legacy data table’s name, since it can’t deduce this from the class name as it normally would. This also requires a little extra work when it comes to defining any associations the legacy class may have, but that’s fairly straightforward as well—I’ll get to it in a moment.
Now, where to put those legacy classes? I had two problems with putting them inline in my Rake task. First, I had to do a lot of tweaking in each class, so my Rake tasks were getting pretty cluttered. Second, I had to access some classes in multiple Rake tasks, so it made good sense to put them somewhere from which I could access them in any of my tasks. My solution was to move them into a separate file. For simplicity’s sake I just put this file in my
tasks folder alongside the actual Rake file, then included it in each task:
If you don’t know how your ORM customizes the ways your application’s models associate with others, you’ll need to take a crash course to get everything connected—again, since we’re slightly breaking from convention in our class names, Rails can’t automatically hook them to tables as it normally would. Depending on what kind of association you’re establishing, this may be as straightforward as defining the class name used in the association, or as complex as also defining the keys and join tables. Luckily ActiveRecord (and most other ORMs) make this pretty straightforward; review ActiveRecord’s class methods to get a handle on them all. Here are a few examples:
If your new application’s data set will only consist of what you’re moving over from legacy, you can use existing ID values for associations. If not, you’ll need to figure out something else that’s unique and base associations off of that. In my case, projects have unique names; users have unique email addresses. Thus instead of making
legacy_project.user, I’d make the association via
new_project.user = User.find_by_email(legacy_project.user.email). Note that
User is the model from the new application in this case—I want to find that user and associate him with the new project.
If you need to keep your old data’s existing timestamps (and I don’t think it’s a bad idea), use
ActiveRecord::Base, as noted by Zach:
Speaking of timestamps, and other data you might have protected behind the likes of
attr_accessible in your new application’s models: You’ll need to temporarily comment out this protection during your migrations, or override it. My new application uses a trick shared by Ryan Bates to create a dynamic attr_accessible for each model in my app; I use this to my advantage by including the following in each legacy migration task:
Failing that, the simplest approach may be to comment out your
attr_accessible setup—just don’t forget to uncomment it prior to deployment.
Legacy migrations may often require a lot of extra data manipulation, as you bend old data to work in new models. The process is thus a great opportunity to empty your Ruby toolbox and get practice with both standard library utilities and other gems like the wonderful Chronic natural language parser for time and date. In my case, I had to merge dates and datetimes into new structures; creating and processing timestamps via Chronic turned out to be much more straightforward than using Ruby’s usual date and time-related methods. Check the Ruby Toolbox for other potential time-savers.
There’s always a chance that for whatever reason a few records won’t cleanly migrate from your old app to the new one. Rather than tweak your Rake task to handle these unique exceptions, wrap your code actually creating new database values inside
begin rescue end, log the exception, and deal with outliers individually.
Legacy migrations will take awhile to run, especially if you’re moving a lot of data or doing a lot of manipulation to it before saving it out to the new database. A general rule of thumb: Be more interested in making sure my data get moved over reliably rather than quickly. As a result, some processes may turn out to be slower than they would otherwise be. Plan ahead. In my case, I know I’ll probably need to dedicate about a day of non-stop processing to get everything from my old application (with a couple hundred thousand database rows) into the new one.
So let’s put together a rough example of how one of these might look. First the Rake task:
And here are a couple of classes used by the Rake task:
It may not be pretty, but as you can see, handling legacy migrations on your own gives you a lot of flexibility—and in the end, isn’t any more difficult than relying on a third party solution. Even if your project only consists of a few tables of data, I strongly recommend using this approach. The keys are to pay attention to the details and to allow plenty of time for both development and processing. If you have additional tips to add, please do so by posting a comment below. Thanks for reading and happy migrations!
I stand with the Black community against systemic racism, police violence and brutality, intolerance, and hate in the United States and worldwide. We must all demand better from our leaders, and ourselves. Stop tolerating intolerance.
While you're here, please consider making a donation to Black Girls CODE, who do great, important work to provide opportunity to underprivileged girls interested in tech, or any organization working toward equity and safety for all, not just the privileged. Thank you.
If you liked my series on practical advice for adding reliable tests to your Rails apps, check out the expanded ebook version. Lots of additional, exclusive content and a complete sample Rails application.
Ruby on Rails news and tips, and other ideas and surprises from Aaron at Everyday Rails. Delivered to your inbox on no particular set schedule.