How to migrate typo from mysql to postgresql
I almost always use postgresql when working on a rails application. I won’t list all the little reasons why, but a major reason is for transactional DDL statements, which means when I run a migration that fails, I don’t have to then go run a bunch of cleanup queries to get my database back to how it was before the migration was ran.
When I was setting up this instance of typo, I decided I’d go ahead and go with mysql since I didn’t plan to hack on typo very much. Long story short: I decided to migrate from mysql to postgresql. This howto was done with mysql 5.0.70, postgresql 8.3.5, and typo 5.1.3 It probably will work with any mysql 5+ and postgresql 8+.
In case anybody else out there might be interested in doing likewise, here’s how I did it. These steps will be for a production database, but the changes required for doing it to a development database should be obvious.
Step 0: Backup your data
You don’t really need to be told this, do you?
Step 1: Dump the data from mysql
run the following to dump the data.
mysqldump --compatible=postgresql --no-create-info -u root -p --skip-extended-insert --complete-insert --skip-opt typo > typo.dump
We are only dumping the data, hence the –no-create-info option.
Step 2: Create your postgresql database
You can do this however you see fit. I’ve included how I do it in case it’s useful:
CREATE USER typo_prod;
CREATE DATABASE typo_prod OWNER typo_prod ENCODING 'utf8';
\password typo_prodand enter the password you wish to use.
Step 3: Change your database.yml to use your new postgresql database
Again, do this however you want. Here’s my database.yml with passwords omitted:
defaults: &defaults
database: typo
adapter: postgresql
encoding: utf8
host: localhost
password:
development:
username: typo_dev
database: typo_dev
<<: *defaults
test:
username: typo_test
database: typo_test
<<: *defaults
production:
username: typo_prod
database: typo_prod
password:
host: salmon
<<: *defaults
Step 4: Create the schema in your new database
To do this we’ll run the db:migrate rake task
RAILS_ENV="production" rake db:migrateStep 5: Fire up a rails console to fix stuff
Now we need to fire up a rails console to do a lot of necessary cleanup work before we can import our data
ruby script/console productionOnce it’s ready to go, type (or more practically, copy pasta)
conn = ActiveRecord::Base.connectionWe’ll need this for a lot of the commands we have yet to run. You’ll keep this console open for the remainder of this howto. Any ruby code you see in this document will go into this console.
Step 6: Remove data created during the migrations
The typo migrations automatically add some default data, like some default pages/articles/blog. All of the data we want is in the dump we created earlier. Let’s delete all this stuff that’s in the way
conn.tables.each do |table|
conn.execute "delete from #{table}"
endStep 7: Temporarily change boolean columns to integers
mysqldump dumps it’s booleans as 0/1. These are interpreted by postgres as integers. It will not automatically cast these into booleans just because the column is boolean (I’m not sure why.) It’s too time consuming to go add casts to all of these 0/1’s, and a regular expression to use with sed would be far too complex to bother with since not all 1’s and 0’s in the dump correspond to boolean data.
So, we will temporarily change the boolean columns in our shiny new database to integers. Before we do this, we need to temporarily drop the defaults for these boolean columns because there won’t be an implicit cast from false/true to 0/1.
This code will build a couple of hashes to store which columns are booleans and what the defaults are.
bools = {}
defaults = {}
conn.tables.each do |table|
conn.columns(table).each do |col|
if col.type.to_s == "boolean"
(bools[table] ||= []) << col.name
(defaults[table] ||= {})[col.name] = col.default if !col.default.nil?
end
end
endhere’s the value of bools and defaults in my console after the above code:
#bools
{"resources"=>["itunes_metadata", "itunes_explicit"], "contents"=>["published",
"allow_pings", "allow_comments"], "users"=>["notify_via_email",
"notify_on_new_articles", "notify_on_comments", "notify_watch_my_articles",
"notify_via_jabber"], "feedback"=>["published", "status_confirmed"],
"categorizations"=>["is_primary"]}
#defaults
{"contents"=>{"published"=>false}, "feedback"=>{"published"=>false}}Let’s now temporarily drop the defaults
defaults.each_pair do |table,cols|
cols.each_key do |col|
conn.execute "alter table #{table} alter column #{col} DROP DEFAULT"
end
endNow let’s alter the column types for the columns in bools.
We’ll use a closure to run the alter statements, so that we can use it again later to alter them back to booleans.
change_to_type = proc {|to_type|
bools.each_pair do |table, cols|
cols.each do |col|
conn.execute "alter table #{table} alter column #{col} type #{to_type}
USING (#{col}::#{to_type});"
end
end
}
change_to_type.call :integer
Step 8: Load the data dump into the new database
Ah, finally. Let’s load the data. Back to a shell in a directory with the dump, run:
sed "s/\\\'/\'\'/g" typo.dump | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" | psql -1 typo_prodPass whatever options you need to connect to psql as you normally would. The first sed converts all of the \’ to two consecutive ‘s, which is what psql expects. The next two calls to sed in the pipeline replace the escaped carriage returns and newlines with actual carriage returns and newlines, which is again what psql expects.
You may get a couple warnings, but hopefully no errors. The few warnings I received were inconsequential.
Step 9: Change the boolean columns back to boolean and restore the default columns
Back to our rails console. We now have the data in place and can change the columns back using our closure from earlier:
change_to_type.call :booleanAnd then restore the defaults we dropped:
defaults.each_pair do |table, cols|
cols.each_pair do |col, default|
conn.execute "alter table #{table} alter column #{col} SET DEFAULT #{default}"
end
endStep 10: Repair the sequences.
Another annoying aspect of postgresql is that inserting a value into a serial column doesn’t automatically advance the sequence to be ready to serve up an unused value. There will be a sequence called “#{table}_id_seq” for each table with an id column in the database.
We manually have to advance all of the sequences:
conn.tables.each do |table|
if conn.columns(table).detect{|i|i.name == "id"}
conn.execute "SELECT setval('#{table}_id_seq', (SELECT max(id) FROM #{table}))"
end
endConclusion
So that should do it. Restart your mongrel cluster (or whatever you are using to manage your rails server processes) and you should now be using your blog with a postgresql backend!
Posted in Ruby, Ruby on Rails, typo | no comments |
externals-tutorial
What is externals and what is it used for?
externals allows you to make use of an svn:externals-like workflow with any combination of SCMs. What is the svn:externals workflow? I would describe it roughly like this:
You register subprojects with your main project. When you checkout the main project, the subprojects are automatically checked out. Doing a ‘status’ will tell you the changes in the main projects and any subprojects from where it’s ran. You commit changes to the the projects all seperately as needed. If somebody else does an update, they will get the changes to the subprojects as well.
For a more detailed explanation of why I started the externals project, please visit http://nopugs.com/why-ext It’s largely a rant about git-submodule.
On with the tutorial
Installation
ext should run on unix-like systems and windows systems. All the unit tests pass on Linux and Windows vista (with cygwin).
First we need to install externals. The first, and easiest, method is to use gem:
gem install extThe other method is to use github:
git clone git://github.com/azimux/externals.git
chmod u+x externals/bin/extIf you install using git clone instead of rubygems, be sure to add the externals/bin directory to your path.
Creating a repository to play around with
I will use git for the main project, and will use git and subversion for the subprojects (the tutorial would be mostly identical if I used svn for the main project, that’s part of the point of ext.)
Now let’s create a repository for use with our project. I like to test out stuff like this in my ~/tmp/ folder.
cd
mkdir tmp
cd tmp
mkdir repo
mkdir work
cd repo
mkdir rails_app.git
cd rails_app.git
git init --bareNow let’s go to our work directory and make a rails app to push to this repository.
cd ../../work/
rails rails_app
cd rails_app
git init
git add .
git commit -m "created fresh rails app"
git remote add origin ../../repo/rails_app.git
git push origin masterIf you’re like me, you consider empty directories in your project’s directory structure to be part of the project. Git will not track empty directories. So, here’s our first use of ext:
ext touch_emptydirs
git add .
git commit -m "touched empty dirs"
git pushThis adds a .emptydir file to every empty directory so that git will track these folders.
Using “ext install” to register subprojects.
Now for our second use of ext. Let’s add the current edge rails to our application:
ext install git://github.com/rails/rails.gitIt should take a moment because rails is a large project.
Now that that’s done, let’s see what “ext install” did.
$ cat .externals
[.]
scm = git
type = rails
[vendor/rails]
path = vendor/rails
repository = git://github.com/rails/rails.git
scm = git.externals is the externals configuration file. This is the file used to keep track of your subprojects. Projects are stored in the form:
[path/to/project]
repository = urlfor://project.repository/url
branch = somebranch
scm = git/svnThe format is very similar to ini format. The section name is the path to the project. The main project’s settings are stored under [.]
Some things to notice: externals was automatically able to figure out that we’re using git for the main project (scm = git under [.]) Also, note that the type of the main project has been detected as rails (type = rails) This means that we can leave the paths off of the repositories in .externals (when using “ext install”) and ext will automatically know where to install stuff (if it’s called rails it goes in vendor/rails otherwise it goes in vendor/plugins/) Let’s make sure it’s there.
$ ls vendor/rails
Rakefile activemodel activesupport pushgems.rb
actionmailer activerecord ci railties
actionpack activeresource doc release.rbThat’s not all, take a look at the ignore file:
$ cat .gitignore
vendor/railsThis makes sense because we don’t want the main repository to track any of the files in the subproject. The files in the subproject are tracked by their own repository, possibly of a different SCM than the main project.
Let’s add some more subprojects: some rails plugins this time. We’ll add a couple that are tracked under subversion and one tracked under git to demnostrate how ext is scm agnostic.
ext install git://github.com/lazyatom/engines -b edge
ext install svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/redhillonrails_core
ext install svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/foreign_key_migrationslet’s see if our plugins made it
$ du --max-depth=2 -h vendor/plugins/ | grep lib
252K vendor/plugins/foreign_key_migrations/lib
340K vendor/plugins/redhillonrails_core/lib
24K vendor/plugins/engines/liblooks good
$ cat .externals
[.]
scm = git
type = rails
[vendor/rails]
path = vendor/rails
repository = git://github.com/rails/rails.git
scm = git
[vendor/plugins/engines]
path = vendor/plugins/engines
repository = git://github.com/lazyatom/engines
scm = git
branch = edge
[vendor/plugins/redhillonrails_core]
path = vendor/plugins/redhillonrails_core
repository = svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/red
hillonrails_core
scm = svn
[vendor/plugins/foreign_key_migrations]
path = vendor/plugins/foreign_key_migrations
repository = svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/for
eign_key_migrations
scm = svn…and the ignore file…
$ cat .gitignore
vendor/rails
vendor/plugins/acts_as_list
vendor/plugins/foreign_key_migrations
vendor/plugins/redhillonrails_corealso looks very good!
Something worth noting: if we were using svn for our main project, ext is smart enough to set the ignores using ‘svn propset svn:ignore’ on the appropriate directories.
Let’s now commit and push our work.
git add .
git commit -m "added 4 subprojects"
git pushUsing “ext checkout” and “ext export”
And now let’s delete and check it out again to make sure we get the sub projects
cd ..
rm -rf rails_app
ext checkout ../repo/rails_app.gitIt will take a moment as it clones rails from github again.
Let’s make sure all of the subprojects were checked out properly:
$ cd rails_app
$ du --max-depth=3 -h vendor/ | grep lib
12K vendor/plugins/acts_as_list/lib
66K vendor/plugins/foreign_key_migrations/lib
162K vendor/plugins/redhillonrails_core/lib
382K vendor/rails/actionmailer/lib
1.5M vendor/rails/actionpack/lib
104K vendor/rails/activemodel/lib
791K vendor/rails/activerecord/lib
92K vendor/rails/activeresource/lib
2.4M vendor/rails/activesupport/lib
584K vendor/rails/railties/liblet’s also make sure the engines plugin is on a branch called “edge” (which is tracking the remote repository’s edge branch)
$ cd vendor/plugins
$ git branch -a
* edge
master
origin/HEAD
origin/add_test_for_rake_task_redefinition
origin/edge
origin/master
origin/timestamped_migrationsNotice how the subprojects were automatically fetched. As mentioned in the why ext article, the main project is usually incapable of functioning without it’s subprojects, so it makes sense to fetch the subprojects when we do a checkout or export. (This is what svn checkout does when it checks out a folder that has svn:externals set on it. It fetches the external projects automatically, which is very convenient.)
Note that you can use “ext export” instead of checkout if you don’t want histories to accompany the files. This tells ext to use “svn export” for subversion managed (sub)projects and “git clone --depth 1” for git managed (sub)projects. This can save a lot of time and is useful for deployment.
looks good, let’s go back to the rails_app directory to continue the tutorial
cd ../../../“ext status” propagates through subprojects
Let’s modify a subproject.
echo "lol, internet" >> vendor/plugins/foreign_key_migrations/READMEAnd now let’s check the status
$ ext status
status for .:
# On branch master
nothing to commit (working directory clean)
status for vendor/rails:
# On branch master
nothing to commit (working directory clean)
status for vendor/plugins/acts_as_list:
# On branch master
nothing to commit (working directory clean)
status for vendor/plugins/redhillonrails_core:
status for vendor/plugins/foreign_key_migrations:
M READMEAs expected, foreign_key_migrations has a modified file. This same (very common) task is a bit of a pain in the neck with git-submodule (unless I’m missing something), and impossible in this situation where the subproject is not managed under the same source control system as the main project (as in this example.)
Deployment with capistrano
Most commands also have a short version of the command. The short versions only operate on the subprojects and not the main projects. “ext checkout” or “ext export” fetches the main project and subprojects but “ext co” and “ext ex” (meant to be ran in the working folder of the main project, use --workdir to do it from elsewhere) will fetch all subprojects and doesn’t touch the main project.
If you deploy with capistrano, you can have all your subprojects fetched on deployment by adding the following to your deploy.rb:
task :after_update_code, :roles => :app do
run "ext --workdir #{release_path} ex"
endNotice how I chose to use “ex” instead of “co” This is because I never do work from a deployed project’s working directory, so the history is pointless.
If people find externals usefull, I’d be happy to add a :ext scm type to capistrano so that it runs ext instead of git/svn. Then it would pickup all the subprojects during a deploy without having to supply the above after_update_code task. I could also add a switch to rails “./script/plugin install” (perhaps -X) to tell it to use ext to manage the project (kind of how you can use -x to tell it to use svn:externals.) Though, this isn’t really any easier to make use of than just doing “ext install”
A few other tips
“ext help” will show you all the available commands. Also, feel free to manage the .externals file manually if you wish.
Conclusion
For issue tracking, at the moment I’m using lighthouseapp. Report bugs to http://ext.lighthouseapp.com/
I also have a rubyforge account for this project at http://rubyforge.org/projects/ext/ if you would prefer to submit bugs/feature requests via rubyforge’s tracking system. I’ve used both sites but never managed a project with either, so I don’t know which is better. Rubyforge seems to be more feature complete.
Externals is my first attempt at contributing a useful open source project to the community. If you have some tips for me in this regard, please feel free to share them.
Cheers!
Posted in externals, Ruby, Ruby on Rails | no comments |
Why externals?
Externals allows you to make use of an svn:externals-like workflow with any combination of SCMs. What is the svn:externals workflow? I would describe it roughly like this:
You register subprojects with your main project. When you checkout the main project, the subprojects are automatically checked out. Doing a ‘status’ will tell you the changes in the main projects and any subprojects from where it’s ran. You commit changes to the the projects all seperately as needed. If somebody else does an update, they will get the changes to the subprojects as well.
Probably like you, I’ve started using git for some projects/plugins. Git has a feature called git-submodule that is supposed to work similar to svn:externals. Git-submodule’s annoyances are what inspired me to start the externals project, and here are some of the problems I have with git-submodule’s workflow:
- When you clone/pull an existing project, you have to do git-submodule init/update manually to get the subprojects (or to update them.) You may be thinking this isn’t a big deal, but it’s an extra step. The main project is almost never functional without the subprojects so why would you ever want to pull in only the main project? Having to run extra steps every time is annoying.
- With git-submodule, the subprojects are pulled in at a specific commit, not at a branch tip. This is not a commit at some point in some branch’s history. It’s a commit with the working directory completely disconnected from any branch. If you want to make any edits to a subproject, you first have to checkout the branch you want to work with. This is extremely annoying. When I start adding a new feature to the main project, I can’t predict which subprojects may need to be modified along the way. Should I go to them all manually and do a checkout? If not, this usually means that as I’m stepping through the debugger I have to keep track of what subprojects I’m about to change, stop what I’m doing and go to them and checkout a branch. Not only is this disruptive to my workflow, on more than one occasion I have forgot to checkout a branch before making edits, and wound up issuing the wrong commands to do the checkout once I realized I was detatched. The result was I wiped out all of my changes in the subproject. This can be extremely irritating.
- Status doesn’t propagate through the subprojects. This means that when it’s time to make some commits, you have to go to every single subproject and do a ‘git status’ because you aren’t 100% sure which ones you’ve made changes too.
- Because of #2, when you do make changes to a subproject, you have to remember to do a ‘git add path/to/subproject’ so that the new commit is pointed to by the main project.
However, even if git-submodule was as useful to me as svn:externals, I would still see a need for ext: when you have a project with subprojects managed by different SCMs. What I’ve been doing is if the main project is git, all subprojects of type git I would manage with git-submodule, all subprojects using subversion I would simply commit the whole subversion work directories into my git repository. When the main project is subversion I’ve been using svn:externals for managing subprojects that use subversion, and checking whole git repositories into my main project’s repository. Now with externals, I can have a uniform workflow regardless of SCM combination.
For info on how to use externals, please see: http://nopugs.com/ext-tutorial
Posted in externals, Ruby, Ruby on Rails | no comments |