Rake task to transfer a Rails database, say from MySQL to Postgres and back again

Posted by face on August 31, 2008


development:
  adapter: mysql
  database: legaltorrents_development
  username: fred
  password: password
  socket: /tmp/mysql.sock

production:
  adapter: postgresql
  database: legaltorrents_production
  username: fred
  password: password
  host: localhost


Thanks to Ruby on Rails, transferring and converting database from one database platform to another only takes a few lines of code. There are rake tasks for dumping to YAML and back. However the existing YAML scripts I found had issues with some of our data and then failed for blobs. This script will only work with a “rails style” database. By “rails style” I mean any database where every table has a unique key named “id”.

A special thanks to Matson Systems, Inc. for having me write this script for LegalTorrents and then contribute it to all under a BSD license.


Be sure to read the warnings in script. Here is a complete example converting a production postgresql database into a development mysql database. Let’s start with config/database.yml found to the left. Now both schemas must be identical. For this example let’s ensure both schemas are at the same migration:
1
2
rake db:migrate
env RAILS_ENV=production rake db:migrate
Ok, let’s get the rake task and run it. Running the rake task may take a long time.
1
2
3
4
5
cd lib/tasks
wget 'http://github.com/face/rails_db_convert_using_adapters/tree/master%2Fconvert.rake?raw=true' -O convert.rake

# and run it
rake db:convert:prod2dev
You should now have a copy of the production data in the development database.


Update Oct 1, 2008:Fixed a bug today for Rails 2.1.1. Also got rid of the hash of data that was a relic from an early version of the script that used a single model object.

Here is the full code to convert.rake:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
#
# Convert/transfer data from production => development.    This facilitates
# a conversion one database adapter type to another (say postgres -> mysql )
#
# WARNING 1: this script deletes all development data and replaces it with
#                     production data
#
# WARNING 2: This script assumes it is the only user updating either database.
#                     Database integrity could be corrupted if other users where 
#                     writing to the databases.
#
# Usage:  rake db:convert:prod2dev
#
# It assumes the development database has a schema identical to the production 
# database, but will delete any data before importing the production data
#
# A couple of the outer loops evolved from 
#    http://snippets.dzone.com/posts/show/3393
#
# For further instructions see 
#    http://myutil.com/2008/8/31/rake-task-transfer-rails-database-mysql-to-postgres
#
# The master repository for this script is at github:
#    http://github.com/face/rails_db_convert_using_adapters/tree/master
#
#
# Author: Rama McIntosh
#         Matson Systems, Inc.
#         http://www.matsonsystems.com
#
# This rake task is released under this BSD license:
#
# Copyright (c) 2008, Matson Systems, Inc. All rights reserved.
# 
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 
# * Redistributions of source code must retain the above copyright
#   notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
#   notice, this list of conditions and the following disclaimer in the
#   documentation and/or other materials provided with the distribution.
# * Neither the name of Matson Systems, Inc. nor the names of its
#   contributors may be used to endorse or promote products derived
#   from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

# PAGE_SIZE is the number of rows updated in a single transaction.
# This facilitates tables where the number of rows exceeds the systems
# memory
PAGE_SIZE=10000

namespace :db do
  namespace :convert do    
    desc 'Convert/import production data to development.   DANGER Deletes all data in the development database.   Assumes both schemas are already migrated.'
    task :prod2dev => :environment do

      # We need unique classes so ActiveRecord can hash different connections
      # We do not want to use the real Model classes because any business
      # rules will likely get in the way of a database transfer
      class ProductionModelClass < ActiveRecord::Base
      end
      class DevelopmentModelClass < ActiveRecord::Base
      end

      skip_tables = ["schema_info", "schema_migrations"]
      ActiveRecord::Base.establish_connection(:production)
      (ActiveRecord::Base.connection.tables - skip_tables).each do |table_name|

        ProductionModelClass.set_table_name(table_name)
        DevelopmentModelClass.set_table_name(table_name)
        DevelopmentModelClass.establish_connection(:development)
        DevelopmentModelClass.reset_column_information
        ProductionModelClass.reset_column_information
        DevelopmentModelClass.record_timestamps = false

        # Page through the data in case the table is too large to fit in RAM
        offset = count = 0;
        print "Converting #{table_name}..."; STDOUT.flush
        # First, delete any old dev data
        DevelopmentModelClass.delete_all
        while ((models = ProductionModelClass.find(:all, 
            :offset=>offset, :limit=>PAGE_SIZE)).size > 0)

          count += models.size
          offset += PAGE_SIZE

          # Now, write out the prod data to the dev db
          DevelopmentModelClass.transaction do
            models.each do |model|
              new_model = DevelopmentModelClass.new(model.attributes)
              new_model.id = model.id
              new_model.save(false)
            end
          end
        end
        print "#{count} records converted\n"
      end
    end
  end
end

Digg! Delicious! Technorati Blinklist Furl Reddit
Comments

Leave a response

  1. FodyFebruary 04, 2009 @ 02:35 AM

    How to run it??

  2. faceFebruary 06, 2009 @ 11:04 AM

    Good point….It was in the github readme but I messed it here. It has been added to the post above ( rake db:convert:prod2dev ).

    Thanks

  3. Ferdie February 16, 2009 @ 10:31 PM

    These work great!

    But do you have sample script rake task, like select table on mysql server then insert to sqlserver.

    Thanks

Comment

Hint: Comments now accept textile.