No Pugs

they're evil

How to migrate typo from mysql to postgresql

Posted by miles Sun, 04 Jan 2009 11:16:00 GMT

I almost always use postgresql when working on a rails application. I won’t list all the little reasons why, but a major reason is for transactional DDL statements, which means when I run a migration that fails, I don’t have to then go run a bunch of cleanup queries to get my database back to how it was before the migration was ran.

When I was setting up this instance of typo, I decided I’d go ahead and go with mysql since I didn’t plan to hack on typo very much. Long story short: I decided to migrate from mysql to postgresql. This howto was done with mysql 5.0.70, postgresql 8.3.5, and typo 5.3.1. It probably will work with any mysql 5+ and postgresql 8+.

In case anybody else out there might be interested in doing likewise. These steps will be for a production database, but the changes required for doing it to a development database should be obvious.

here’s how I did it:

Step 0: backup your data

You don’t really need to be told this, do you?

Step 1: Dump the data from mysql

run the following to dump the data.


mysqldump --compatible=postgresql --no-create-info -u root -p --skip-extended-insert --complete-insert --skip-opt typo > typo.dump 

We are only dumping the data, hence the –no-create-info option.

Step 2: Create your postgresql database

You can do this however you see fit. I’ve included how I do it in case it’s useful:

CREATE USER typo_prod;
CREATE DATABASE typo_prod OWNER typo_prod ENCODING 'utf8';

\password typo_prod

and enter the password you wish to use.

Step 3: Change your database.yml to use your new postgresql database

Again, do this however you want. Here’s my database.yml with passwords omitted:

defaults: &defaults
  database: typo
  adapter: postgresql
  encoding: utf8
  host: localhost
  password: 

development:
  username: typo_dev
  database: typo_dev
  <<: *defaults

test:
  username: typo_test
  database: typo_test
  <<: *defaults

production:
  username: typo_prod
  database: typo_prod
  password: 
  host: salmon
  <<: *defaults

Step 4: Create the schema in your new database

To do this we’ll run the db:migrate rake task

RAILS_ENV="production" rake db:migrate

Step 5: Fire up a rails console to fix stuff

Now we need to fire up a rails console to do a lot of necessary cleanup work before we can import our data

ruby script/console production

Once it’s ready to go, type (or more practically, copy pasta)

conn = ActiveRecord::Base.connection

We’ll need this for a lot of the commands we have yet to run. You’ll keep this console open for the remainder of this howto. Any ruby code you see in this document will go into this console.

Step 6: Remove data created during the migrations

The typo migrations automatically add some default data, like some default pages/articles/blog. All of the data we want is in the dump we created earlier. Let’s delete all this stuff that’s in the way

conn.tables.each do |table|
  conn.execute "delete from #{table}"
end

Step 7: Temporarily change boolean columns to integers

mysqldump dumps it’s booleans as 0/1. These are interpreted by postgres as integers. It will not automatically cast these into booleans just because the column is boolean (I’m not sure why.) It’s too time consuming to go add casts to all of these 0/1’s, and a regular expression to use with sed would be far too complex to bother with since not all 1’s and 0’s in the dump correspond to boolean data.

So, we will temporarily change the boolean columns in our shiny new database to integers. Before we do this, we need to temporarily drop the defaults because there won’t be an implicit cast from false/true to 0/1.

Here’s a hash of table => columns that are of boolean data type.

tc = { :categorizations => %w(is_primary),
:contents => %w(published allow_pings allow_comments),
:feedback => %w(published status_confirmed),
:resources => %w(itunes_metadata itunes_explicit),
:users => %w(via_email on_new_articles on_comments 
                        watch_my_articles via_jabber).map {|i|"notify_#{i}"}
}

This hash can be built programmatically as well, but this should be sufficient. If any of these differ from the actual boolean columns present in your instance of typo you’ll have to adjust this hash (pretty unlikely) or build it programmatically.

This code will build a hash of the defaults and drop them. The only column with a default should be contents.published. Still, this code should work if any others have defaults (If I had known it was only one column I wouldn’t have bothered with this code, although this code is what revealed to me that only one had a default.)

defaults = {}

tc.each_pair do |table, cols|
  cols.each do |col|
    default = conn.columns(table).select{|c|c.name == col.to_s}[0].default
    if !default.nil?
      (defaults[table] ||= {})[col] = default
      conn.execute "alter table #{table} alter column #{col} DROP DEFAULT"      
    end
  end
end

We’ll use a closure to run the alter statements, so that we can use it again later to alter them back to booleans.

change_to_type = proc {|to_type|
  tc.each_pair do |table, cols|
    cols.each do |col|
      conn.execute "alter table #{table} alter column #{col} type #{to_type} 
                                USING (#{col}::#{to_type});"
    end
  end
}

change_to_type.call :integer

Step 8: Load the data dump into the new database

Ah, finally. Let’s load the data. Back to a shell in a directory with the dump, run:

sed "s/\\\'/\'\'/g" typo.dump | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" | psql -1 typo_prod

Pass whatever options you need to connect to psql as you normally would. The first sed converts all of the \’ to two consecutive ‘s, which is what psql expects. The next two calls to sed in the pipeline replace the escaped carriage returns and newlines with actual carriage returns and newlines, which is again what psql expects.

Step 9: Change to boolean columns back to boolean and reset the default columns

Back to our rails console. We now have the data in place and can change the columns back using our closure from earlier:

change_to_type.call :boolean

And then restore the defaults we dropped:

defaults.each_pair do |table, cols|
  cols.each_pair do |col, default|
    conn.execute "alter table #{table} alter column #{col} SET DEFAULT #{default}"
  end
end

Step 10: Repair the sequences.

Another annoying aspect of postgresql is that inserting a value into a serial column doesn’t automatically advance the sequence to be ready to serve up an unused value. We manually have to advance all of the sequences:

conn.tables.each do |table|
  #check if it has an id column
  if conn.columns(table).detect{|i|i.name == "id"}
    conn.execute "SELECT setval('#{table}_id_seq', (SELECT max(id) FROM #{table}))"
  end
end

Conclusion

So that should do it. Restart your mongrel cluster (or whatever you are using to manage your rails server processes) and you should now be using your blog with a postgresql backend.

Posted in , , | no comments |

how to debug a Ruby on Rails script in NetBeans

Posted by miles Fri, 12 Dec 2008 10:12:00 GMT

So you’ve written some helper script or possibly a script that’s ran by cron to do some background work on your site (updating full text indexes, sending out notification emails, generating reports, etc) but you can’t find a way to debug it in rails so that it hits your breakpoints. Annoying.

What I did to solve this was first to create a rake task that creates other rake tasks based on script name. This way you can debug it the way you would debug any rake task (and it’s also convenient to be able to execute scripts from the rake context menu anyways.) This allows you to run any script by right clicking on the project in Netbeans, and going to “Run/Debug Rake Task->script->your_killer_script.rb”

Place this in a file called scripts.rake and place it in your lib/tasks folder

require 'find'

namespace :scripts do
  Find.find("#{RAILS_ROOT}/script/") do |p|
    if File.file?(p) && p !~ /(\.svn-base|\.netbeans-base)$/
      desc "Run #{File.basename(p)}"
      task File.basename(p, "*") => :environment do
        load p
      end
    end
  end
end

Then, right click on your project and hit “Run Rake Task->Refresh List”

You should now be able to right click on your project and hit “Debug Task Rake Task->script->your_killer_script.rb”

It should hit any breakpoints you have set. Happy debugging!

Posted in , | no comments |

How to debug an individual rails test in Netbeans

Posted by miles Fri, 12 Dec 2008 10:00:00 GMT

In Netbeans, to test a Ruby application normally I right click on the project and go to “Run Rake Task -> test”

A problem arises when I try to debug a test. None of the breakpoints get hit. I think this is because a new process is spawned off to actually run the tests and the debugger is attached to the parent process. So you can really only hit breakpoints involved in spawning the tests but none in the tests or in any of your application code called by the tests.

Opening an individual test and right clicking in the buffer and hitting “Debug your_mom_test.rb” seems to fail for me with rails 2.1. I was able to correct this by changing the line at the top of the test from

require 'test_helper'

to

$:.unshift File.join(File.dirname(__FILE__), '..', 'lib') if $0 == __FILE__
require File.dirname(__FILE__) + '/../test_helper'

Enjoy hitting your breakpoints while testing!

Posted in , | no comments |

externals-tutorial

Posted by miles Sun, 07 Sep 2008 06:58:00 GMT

What is externals and what is it used for?

externals allows you to make use of an svn:externals-like workflow with any combination of SCMs. What is the svn:externals workflow? I would describe it roughly like this:

You register subprojects with your main project. When you checkout the main project, the subprojects are automatically checked out. Doing a ‘status’ will tell you the changes in the main projects and any subprojects from where it’s ran. You commit changes to the the projects all seperately as needed. If somebody else does an update, they will get the changes to the subprojects as well.

For a more detailed explanation of why I started the externals project, please visit http://nopugs.com/why-ext It’s largely a rant about git-submodule.

On with the tutorial

Installation

ext should run on unix-like systems and windows systems. All the unit tests pass on Linux and Windows vista (with cygwin).

First we need to install externals. The first, and easiest, method is to use gem:

gem install ext

The other method is to use github:

git clone git://github.com/azimux/externals.git
chmod u+x externals/bin/ext

If you install using git clone instead of rubygems, be sure to add the externals/bin directory to your path.

Creating a repository to play around with

I will use git for the main project, and will use git and subversion for the subprojects (the tutorial would be mostly identical if I used svn for the main project, that’s part of the point of ext.)

Now let’s create a repository for use with our project. I like to test out stuff like this in my ~/tmp/ folder.

cd
mkdir tmp
cd tmp

mkdir repo
mkdir work

cd repo
mkdir rails_app.git
cd rails_app.git
git init --bare

Now let’s go to our work directory and make a rails app to push to this repository.

cd ../../work/
rails rails_app
cd rails_app
git init
git add . 
git commit -m "created fresh rails app"
git remote add origin ../../repo/rails_app.git 
git push origin master

If you’re like me, you consider empty directories in your project’s directory structure to be part of the project. Git will not track empty directories. So, here’s our first use of ext:

ext touch_emptydirs
git add .
git commit -m "touched empty dirs"
git push

This adds a .emptydir file to every empty directory so that git will track these folders.

Using “ext install” to register subprojects.

Now for our second use of ext. Let’s add the current edge rails to our application:

ext install git://github.com/rails/rails.git

It should take a moment because rails is a large project.

Now that that’s done, let’s see what “ext install” did.

$ cat .externals 
[.]
scm = git
type = rails

[vendor/rails]
path = vendor/rails
repository = git://github.com/rails/rails.git
scm = git

.externals is the externals configuration file. This is the file used to keep track of your subprojects. Projects are stored in the form:

[path/to/project]
repository = urlfor://project.repository/url
branch = somebranch
scm = git/svn

The format is very similar to ini format. The section name is the path to the project. The main project’s settings are stored under [.]

Some things to notice: externals was automatically able to figure out that we’re using git for the main project (scm = git under [.]) Also, note that the type of the main project has been detected as rails (type = rails) This means that we can leave the paths off of the repositories in .externals (when using “ext install”) and ext will automatically know where to install stuff (if it’s called rails it goes in vendor/rails otherwise it goes in vendor/plugins/) Let’s make sure it’s there.

$ ls vendor/rails
Rakefile      activemodel     activesupport  pushgems.rb
actionmailer  activerecord    ci             railties
actionpack    activeresource  doc            release.rb

That’s not all, take a look at the ignore file:

$ cat .gitignore
vendor/rails

This makes sense because we don’t want the main repository to track any of the files in the subproject. The files in the subproject are tracked by their own repository, possibly of a different SCM than the main project.

Let’s add some more subprojects: some rails plugins this time. We’ll add a couple that are tracked under subversion and one tracked under git to demnostrate how ext is scm agnostic.

ext install git://github.com/lazyatom/engines -b edge
ext install svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/redhillonrails_core
ext install svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/foreign_key_migrations

let’s see if our plugins made it

$ du --max-depth=2 -h vendor/plugins/ | grep lib
252K    vendor/plugins/foreign_key_migrations/lib
340K    vendor/plugins/redhillonrails_core/lib
24K vendor/plugins/engines/lib

looks good

$ cat .externals 
[.]
scm = git
type = rails

[vendor/rails]
path = vendor/rails
repository = git://github.com/rails/rails.git
scm = git

[vendor/plugins/engines]
path = vendor/plugins/engines
repository = git://github.com/lazyatom/engines
scm = git
branch = edge

[vendor/plugins/redhillonrails_core]
path = vendor/plugins/redhillonrails_core
repository = svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/red
hillonrails_core
scm = svn

[vendor/plugins/foreign_key_migrations]
path = vendor/plugins/foreign_key_migrations
repository = svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/for
eign_key_migrations
scm = svn

…and the ignore file…

$ cat .gitignore 
vendor/rails
vendor/plugins/acts_as_list
vendor/plugins/foreign_key_migrations
vendor/plugins/redhillonrails_core

also looks very good!

Something worth noting: if we were using svn for our main project, ext is smart enough to set the ignores using ‘svn propset svn:ignore’ on the appropriate directories.

Let’s now commit and push our work.

git add .
git commit -m "added 4 subprojects"
git push

Using “ext checkout” and “ext export”

And now let’s delete and check it out again to make sure we get the sub projects

cd ..
rm -rf rails_app
ext checkout ../repo/rails_app.git

It will take a moment as it clones rails from github again.

Let’s make sure all of the subprojects were checked out properly:

$ cd rails_app
$ du --max-depth=3 -h vendor/ | grep lib
12K     vendor/plugins/acts_as_list/lib
66K     vendor/plugins/foreign_key_migrations/lib
162K    vendor/plugins/redhillonrails_core/lib
382K    vendor/rails/actionmailer/lib
1.5M    vendor/rails/actionpack/lib
104K    vendor/rails/activemodel/lib
791K    vendor/rails/activerecord/lib
92K     vendor/rails/activeresource/lib
2.4M    vendor/rails/activesupport/lib
584K    vendor/rails/railties/lib

let’s also make sure the engines plugin is on a branch called “edge” (which is tracking the remote repository’s edge branch)

$ cd vendor/plugins
$ git branch -a
* edge
  master
  origin/HEAD
  origin/add_test_for_rake_task_redefinition
  origin/edge
  origin/master
  origin/timestamped_migrations

Notice how the subprojects were automatically fetched. As mentioned in the why ext article, the main project is usually incapable of functioning without it’s subprojects, so it makes sense to fetch the subprojects when we do a checkout or export. (This is what svn checkout does when it checks out a folder that has svn:externals set on it. It fetches the external projects automatically, which is very convenient.)

Note that you can use “ext export” instead of checkout if you don’t want histories to accompany the files. This tells ext to use “svn export” for subversion managed (sub)projects and “git clone --depth 1” for git managed (sub)projects. This can save a lot of time and is useful for deployment.

looks good, let’s go back to the rails_app directory to continue the tutorial

cd ../../../

“ext status” propagates through subprojects

Let’s modify a subproject.

echo "lol, internet" >> vendor/plugins/foreign_key_migrations/README

And now let’s check the status

$ ext status
status for .:
# On branch master
nothing to commit (working directory clean)

status for vendor/rails:
# On branch master
nothing to commit (working directory clean)

status for vendor/plugins/acts_as_list:
# On branch master
nothing to commit (working directory clean)

status for vendor/plugins/redhillonrails_core:


status for vendor/plugins/foreign_key_migrations:
M      README

As expected, foreign_key_migrations has a modified file. This same (very common) task is a bit of a pain in the neck with git-submodule (unless I’m missing something), and impossible in this situation where the subproject is not managed under the same source control system as the main project (as in this example.)

Deployment with capistrano

Most commands also have a short version of the command. The short versions only operate on the subprojects and not the main projects. “ext checkout” or “ext export” fetches the main project and subprojects but “ext co” and “ext ex” (meant to be ran in the working folder of the main project, use --workdir to do it from elsewhere) will fetch all subprojects and doesn’t touch the main project.

If you deploy with capistrano, you can have all your subprojects fetched on deployment by adding the following to your deploy.rb:

task :after_update_code, :roles => :app do
  run "ext --workdir #{release_path} ex"
end

Notice how I chose to use “ex” instead of “co” This is because I never do work from a deployed project’s working directory, so the history is pointless.

If people find externals usefull, I’d be happy to add a :ext scm type to capistrano so that it runs ext instead of git/svn. Then it would pickup all the subprojects during a deploy without having to supply the above after_update_code task. I could also add a switch to rails “./script/plugin install” (perhaps -X) to tell it to use ext to manage the project (kind of how you can use -x to tell it to use svn:externals.) Though, this isn’t really any easier to make use of than just doing “ext install”

A few other tips

“ext help” will show you all the available commands. Also, feel free to manage the .externals file manually if you wish.

Conclusion

For issue tracking, at the moment I’m using lighthouseapp. Report bugs to http://ext.lighthouseapp.com/

I also have a rubyforge account for this project at http://rubyforge.org/projects/ext/ if you would prefer to submit bugs/feature requests via rubyforge’s tracking system. I’ve used both sites but never managed a project with either, so I don’t know which is better. Rubyforge seems to be more feature complete.

Externals is my first attempt at contributing a useful open source project to the community. If you have some tips for me in this regard, please feel free to share them.

Cheers!


Posted in , , | no comments |

Why externals?

Posted by miles Fri, 05 Sep 2008 02:09:00 GMT

Externals allows you to make use of an svn:externals-like workflow with any combination of SCMs. What is the svn:externals workflow? I would describe it roughly like this:

You register subprojects with your main project. When you checkout the main project, the subprojects are automatically checked out. Doing a ‘status’ will tell you the changes in the main projects and any subprojects from where it’s ran. You commit changes to the the projects all seperately as needed. If somebody else does an update, they will get the changes to the subprojects as well.

Probably like you, I’ve started using git for some projects/plugins. Git has a feature called git-submodule that is supposed to work similar to svn:externals. Git-submodule’s annoyances are what inspired me to start the externals project, and here are some of the problems I have with git-submodule’s workflow:

  • When you clone/pull an existing project, you have to do git-submodule init/update manually to get the subprojects (or to update them.) You may be thinking this isn’t a big deal, but it’s an extra step. The main project is almost never functional without the subprojects so why would you ever want to pull in only the main project? Having to run extra steps every time is annoying.
  • With git-submodule, the subprojects are pulled in at a specific commit, not at a branch tip. This is not a commit at some point in some branch’s history. It’s a commit with the working directory completely disconnected from any branch. If you want to make any edits to a subproject, you first have to checkout the branch you want to work with. This is extremely annoying. When I start adding a new feature to the main project, I can’t predict which subprojects may need to be modified along the way. Should I go to them all manually and do a checkout? If not, this usually means that as I’m stepping through the debugger I have to keep track of what subprojects I’m about to change, stop what I’m doing and go to them and checkout a branch. Not only is this disruptive to my workflow, on more than one occasion I have forgot to checkout a branch before making edits, and wound up issuing the wrong commands to do the checkout once I realized I was detatched. The result was I wiped out all of my changes in the subproject. This can be extremely irritating.
  • Status doesn’t propagate through the subprojects. This means that when it’s time to make some commits, you have to go to every single subproject and do a ‘git status’ because you aren’t 100% sure which ones you’ve made changes too.
  • Because of #2, when you do make changes to a subproject, you have to remember to do a ‘git add path/to/subproject’ so that the new commit is pointed to by the main project.

However, even if git-submodule was as useful to me as svn:externals, I would still see a need for ext: when you have a project with subprojects managed by different SCMs. What I’ve been doing is if the main project is git, all subprojects of type git I would manage with git-submodule, all subprojects using subversion I would simply commit the whole subversion work directories into my git repository. When the main project is subversion I’ve been using svn:externals for managing subprojects that use subversion, and checking whole git repositories into my main project’s repository. Now with externals, I can have a uniform workflow regardless of SCM combination.

For info on how to use externals, please see: http://nopugs.com/ext-tutorial


Posted in , , | no comments |

How to get timestamps in your ruby on rails logs

Posted by miles Fri, 11 Jul 2008 01:49:00 GMT

For some reason, BufferedLogger is the default logger in rubyonrails. This logger at this time has no way to customize it’s format, and what’s worse, is it’s format is practically unusable.

At some point most of us are going to encounter a situation where knowing when something was logged is nearly essential.

So, after fighting with rails for a while, here is how I got timestamps in my logs using rails 2.1.0

It’s worth pointing out that I made several errors while trying to do this, however the behavior of rails with a misconfigured logger was completely unhelpful in 90% of the situations I found myself in. So hopefully this can save somebody some time.

In environment.rb, inside the Initializer.run block:

  config.logger = Logger.new(File.dirname(__FILE__) + "/../log/#{RAILS_ENV}.log") 
  config.logger.formatter = Logger::Formatter.new

Then in development.rb, test.rb and production.rb:

config.logger.level = Logger::DEBUG

You can set this to whatever logging level is appropriate for the given environment. I personally use Logger::WARN for production and Logger::DEBUG for everything else.

Now, Logger::Formatter cannot be customized but at least it gives you a timestamp. This was good enough for me and if you’re satisfied then stop here.

If you need further customization, you can write your own formatter class. The interface is pretty straight forward, you simply needs to implement:

def call(severity, time, progname, msg) 
   #your code goes here that builds the entry you want to see in the logfile
end

Posted in | no comments |