No Pugs

they're evil

I almost always use postgresql when working on a rails application. I won’t list all the little reasons why, but a major reason is for transactional DDL statements, which means when I run a migration that fails, I don’t have to then go run a bunch of cleanup queries to get my database back to how it was before the migration was ran.

When I was setting up this instance of typo, I decided I’d go ahead and go with mysql since I didn’t plan to hack on typo very much. Long story short: I decided to migrate from mysql to postgresql. This howto was done with mysql 5.0.70, postgresql 8.3.5, and typo 5.1.3 It probably will work with any mysql 5+ and postgresql 8+.

In case anybody else out there might be interested in doing likewise, here’s how I did it. These steps will be for a production database, but the changes required for doing it to a development database should be obvious.

Step 0: Backup your data

You don’t really need to be told this, do you?

Step 1: Dump the data from mysql

run the following to dump the data.

mysqldump --compatible=postgresql --no-create-info -u root -p --skip-extended-insert --complete-insert --skip-opt typo > typo.dump 

We are only dumping the data, hence the –no-create-info option.

Step 2: Create your postgresql database

You can do this however you see fit. I’ve included how I do it in case it’s useful:

CREATE USER typo_prod;
CREATE DATABASE typo_prod OWNER typo_prod ENCODING 'utf8';

\password typo_prod

and enter the password you wish to use.

Step 3: Change your database.yml to use your new postgresql database

Again, do this however you want. Here’s my database.yml with passwords omitted:

defaults: &defaults
  database: typo
  adapter: postgresql
  encoding: utf8
  host: localhost

  username: typo_dev
  database: typo_dev
  <<: *defaults

  username: typo_test
  database: typo_test
  <<: *defaults

  username: typo_prod
  database: typo_prod
  host: salmon
  <<: *defaults

Step 4: Create the schema in your new database

To do this we’ll run the db:migrate rake task

RAILS_ENV="production" rake db:migrate

Step 5: Fire up a rails console to fix stuff

Now we need to fire up a rails console to do a lot of necessary cleanup work before we can import our data

ruby script/console production

Once it’s ready to go, type (or more practically, copy pasta)

conn = ActiveRecord::Base.connection

We’ll need this for a lot of the commands we have yet to run. You’ll keep this console open for the remainder of this howto. Any ruby code you see in this document will go into this console.

Step 6: Remove data created during the migrations

The typo migrations automatically add some default data, like some default pages/articles/blog. All of the data we want is in the dump we created earlier. Let’s delete all this stuff that’s in the way

conn.tables.each do |table|
  conn.execute "delete from #{table}"

Step 7: Temporarily change boolean columns to integers

mysqldump dumps it’s booleans as 0/1. These are interpreted by postgres as integers. It will not automatically cast these into booleans just because the column is boolean (I’m not sure why.) It’s too time consuming to go add casts to all of these 0/1’s, and a regular expression to use with sed would be far too complex to bother with since not all 1’s and 0’s in the dump correspond to boolean data.

So, we will temporarily change the boolean columns in our shiny new database to integers. Before we do this, we need to temporarily drop the defaults for these boolean columns because there won’t be an implicit cast from false/true to 0/1.

This code will build a couple of hashes to store which columns are booleans and what the defaults are.

bools = {}
defaults = {}

conn.tables.each do |table|
  conn.columns(table).each do |col|
    if col.type.to_s == "boolean"
      (bools[table] ||= []) <<
      (defaults[table] ||= {})[] = col.default if !col.default.nil?

here’s the value of bools and defaults in my console after the above code:

{"resources"=>["itunes_metadata", "itunes_explicit"], "contents"=>["published", 
"allow_pings", "allow_comments"], "users"=>["notify_via_email", 
"notify_on_new_articles", "notify_on_comments", "notify_watch_my_articles", 
"notify_via_jabber"], "feedback"=>["published", "status_confirmed"], 
{"contents"=>{"published"=>false}, "feedback"=>{"published"=>false}}

Let’s now temporarily drop the defaults

defaults.each_pair do |table,cols|
  cols.each_key do |col|
    conn.execute "alter table #{table} alter column #{col} DROP DEFAULT"      

Now let’s alter the column types for the columns in bools.

We’ll use a closure to run the alter statements, so that we can use it again later to alter them back to booleans.

change_to_type = proc {|to_type|
  bools.each_pair do |table, cols|
    cols.each do |col|
      conn.execute "alter table #{table} alter column #{col} type #{to_type} 
                                USING (#{col}::#{to_type});"
} :integer

Step 8: Load the data dump into the new database

Ah, finally. Let’s load the data. Back to a shell in a directory with the dump, run:

sed "s/\\\'/\'\'/g" typo.dump | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" | psql -1 typo_prod

Pass whatever options you need to connect to psql as you normally would. The first sed converts all of the \’ to two consecutive ‘s, which is what psql expects. The next two calls to sed in the pipeline replace the escaped carriage returns and newlines with actual carriage returns and newlines, which is again what psql expects.

You may get a couple warnings, but hopefully no errors. The few warnings I received were inconsequential.

Step 9: Change the boolean columns back to boolean and restore the default columns

Back to our rails console. We now have the data in place and can change the columns back using our closure from earlier: :boolean

And then restore the defaults we dropped:

defaults.each_pair do |table, cols|
  cols.each_pair do |col, default|
    conn.execute "alter table #{table} alter column #{col} SET DEFAULT #{default}"

Step 10: Repair the sequences.

Another annoying aspect of postgresql is that inserting a value into a serial column doesn’t automatically advance the sequence to be ready to serve up an unused value. There will be a sequence called “#{table}idseq” for each table with an id column in the database.

We manually have to advance all of the sequences:

conn.tables.each do |table|
  if conn.columns(table).detect{|i| == "id"}
    conn.execute "SELECT setval('#{table}_id_seq', (SELECT max(id) FROM #{table}))"


So that should do it. Restart your mongrel cluster (or whatever you are using to manage your rails server processes) and you should now be using your blog with a postgresql backend!

Published on 01/04/2009 at 11:44PM under , , , .


Powered by Typo – Thème Frédéric de Villamil | Photo Glenn