DISTINCT/:select is wiped out/overwritten by COUNT when using .size on an association proxy... with rails + PostgreSQL
If you create a named scope that uses :select, and then call “.size” or “.count” on the resulting proxy served by a call to the scoped method, and if that :select option sets DISTINCT, then it will be ignored because rails ignores the old :select and creates a new one of “COUNT(*) AS count_all”
Here is a made up example:
class Book < ActiveRecord::Base
named_scope :on_a_favorite_list,
:select => "DISTINCT ON (books.id) books.*",
:joins => "INNER JOIN favorites_list_entries
ON favorites_list_entries.book_id = books.id"
endNow let’s pretend that there is only one book on anybody’s favorites_list, but that two people have that book on their list.
Then this:
Book.on_a_favorite_list.map(&:id).sizewould return 1, because size would be called on an array that only has 1 integer in it.
but…
Book.on_a_favorite_list.sizeWould return 2! Why? Because the proxy object (remember, Book.on_a_favorite_list is NOT an array of books… it’s a proxy object that will fetch the books if needed) realizes it needs to create a COUNT query (it’s not going to actually fetch the books) and it does this by overriding the :select option from “DISTINCT ON (books.id) books.*” with “COUNT(*) as count_all”. So DISTINCT has been lost. Now the duplicate row appears to the select statement and 2 is returned as count_all to rails .size method.
One way around this is to pass the thing being counted, including the DISTINCT keyword, to size…
Book.on_a_favorite_list.size("DISTINCT books.id")This will return the expected result of 1.
Posted in Ruby on Rails, PostgreSQL | no comments |
Fix for long index names in rails using postgresql (fix for "Input string is longer than NAMEDATALEN-1 (63)")
If you are getting this when running rails migrations using a postgresql adapter:
Input string is longer than NAMEDATALEN-1 (63)
Then this tiny plugin can solve this problem. Index names longer than 63 characters work fine with some postgresql adapters and not others. I’m not sure why. This adds a method to ActiveRecord::ConnectionAdapters::PostgreSQLAdapter that keeps the index_names under the limit.
here’s the entire piece of code:
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.class_eval do
#There are PostgreSQL drivers/versions of drivers that truncate
#index names to 63 characters. Since some do not, this override makes
#sure that the truncate occurs regardless of PostgreSQL driver.
def index_name(table, columns)
super(table, columns)[0..62]
end
endTo install this plugin using ext, you can use:
ext install git://github.com/azimux/ax_fix_long_psql_index_names.git
or, using rake plugin:
./script/plugin install git://github.com/azimux/ax_fix_long_psql_index_names.git
Posted in Ruby on Rails, PostgreSQL | 2 comments |
How to migrate typo from mysql to postgresql
I almost always use postgresql when working on a rails application. I won’t list all the little reasons why, but a major reason is for transactional DDL statements, which means when I run a migration that fails, I don’t have to then go run a bunch of cleanup queries to get my database back to how it was before the migration was ran.
When I was setting up this instance of typo, I decided I’d go ahead and go with mysql since I didn’t plan to hack on typo very much. Long story short: I decided to migrate from mysql to postgresql. This howto was done with mysql 5.0.70, postgresql 8.3.5, and typo 5.1.3 It probably will work with any mysql 5+ and postgresql 8+.
In case anybody else out there might be interested in doing likewise, here’s how I did it. These steps will be for a production database, but the changes required for doing it to a development database should be obvious.
Step 0: Backup your data
You don’t really need to be told this, do you?
Step 1: Dump the data from mysql
run the following to dump the data.
mysqldump --compatible=postgresql --no-create-info -u root -p --skip-extended-insert --complete-insert --skip-opt typo > typo.dump
We are only dumping the data, hence the –no-create-info option.
Step 2: Create your postgresql database
You can do this however you see fit. I’ve included how I do it in case it’s useful:
CREATE USER typo_prod;
CREATE DATABASE typo_prod OWNER typo_prod ENCODING 'utf8';
\password typo_prodand enter the password you wish to use.
Step 3: Change your database.yml to use your new postgresql database
Again, do this however you want. Here’s my database.yml with passwords omitted:
defaults: &defaults
database: typo
adapter: postgresql
encoding: utf8
host: localhost
password:
development:
username: typo_dev
database: typo_dev
<<: *defaults
test:
username: typo_test
database: typo_test
<<: *defaults
production:
username: typo_prod
database: typo_prod
password:
host: salmon
<<: *defaults
Step 4: Create the schema in your new database
To do this we’ll run the db:migrate rake task
RAILS_ENV="production" rake db:migrate
Step 5: Fire up a rails console to fix stuff
Now we need to fire up a rails console to do a lot of necessary cleanup work before we can import our data
ruby script/console production
Once it’s ready to go, type (or more practically, copy pasta)
conn = ActiveRecord::Base.connectionWe’ll need this for a lot of the commands we have yet to run. You’ll keep this console open for the remainder of this howto. Any ruby code you see in this document will go into this console.
Step 6: Remove data created during the migrations
The typo migrations automatically add some default data, like some default pages/articles/blog. All of the data we want is in the dump we created earlier. Let’s delete all this stuff that’s in the way
conn.tables.each do |table|
conn.execute "delete from #{table}"
endStep 7: Temporarily change boolean columns to integers
mysqldump dumps it’s booleans as 0/1. These are interpreted by postgres as integers. It will not automatically cast these into booleans just because the column is boolean (I’m not sure why.) It’s too time consuming to go add casts to all of these 0/1’s, and a regular expression to use with sed would be far too complex to bother with since not all 1’s and 0’s in the dump correspond to boolean data.
So, we will temporarily change the boolean columns in our shiny new database to integers. Before we do this, we need to temporarily drop the defaults for these boolean columns because there won’t be an implicit cast from false/true to 0/1.
This code will build a couple of hashes to store which columns are booleans and what the defaults are.
bools = {}
defaults = {}
conn.tables.each do |table|
conn.columns(table).each do |col|
if col.type.to_s == "boolean"
(bools[table] ||= []) << col.name
(defaults[table] ||= {})[col.name] = col.default if !col.default.nil?
end
end
endhere’s the value of bools and defaults in my console after the above code:
#bools
{"resources"=>["itunes_metadata", "itunes_explicit"], "contents"=>["published",
"allow_pings", "allow_comments"], "users"=>["notify_via_email",
"notify_on_new_articles", "notify_on_comments", "notify_watch_my_articles",
"notify_via_jabber"], "feedback"=>["published", "status_confirmed"],
"categorizations"=>["is_primary"]}
#defaults
{"contents"=>{"published"=>false}, "feedback"=>{"published"=>false}}Let’s now temporarily drop the defaults
defaults.each_pair do |table,cols|
cols.each_key do |col|
conn.execute "alter table #{table} alter column #{col} DROP DEFAULT"
end
endNow let’s alter the column types for the columns in bools.
We’ll use a closure to run the alter statements, so that we can use it again later to alter them back to booleans.
change_to_type = proc {|to_type|
bools.each_pair do |table, cols|
cols.each do |col|
conn.execute "alter table #{table} alter column #{col} type #{to_type}
USING (#{col}::#{to_type});"
end
end
}
change_to_type.call :integer
Step 8: Load the data dump into the new database
Ah, finally. Let’s load the data. Back to a shell in a directory with the dump, run:
sed "s/\\\'/\'\'/g" typo.dump | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" | psql -1 typo_prod
Pass whatever options you need to connect to psql as you normally would. The first sed converts all of the \’ to two consecutive ‘s, which is what psql expects. The next two calls to sed in the pipeline replace the escaped carriage returns and newlines with actual carriage returns and newlines, which is again what psql expects.
You may get a couple warnings, but hopefully no errors. The few warnings I received were inconsequential.
Step 9: Change the boolean columns back to boolean and restore the default columns
Back to our rails console. We now have the data in place and can change the columns back using our closure from earlier:
change_to_type.call :booleanAnd then restore the defaults we dropped:
defaults.each_pair do |table, cols|
cols.each_pair do |col, default|
conn.execute "alter table #{table} alter column #{col} SET DEFAULT #{default}"
end
endStep 10: Repair the sequences.
Another annoying aspect of postgresql is that inserting a value into a serial column doesn’t automatically advance the sequence to be ready to serve up an unused value. There will be a sequence called “#{table}idseq” for each table with an id column in the database.
We manually have to advance all of the sequences:
conn.tables.each do |table|
if conn.columns(table).detect{|i|i.name == "id"}
conn.execute "SELECT setval('#{table}_id_seq', (SELECT max(id) FROM #{table}))"
end
endConclusion
So that should do it. Restart your mongrel cluster (or whatever you are using to manage your rails server processes) and you should now be using your blog with a postgresql backend!
Posted in Ruby, Ruby on Rails, typo, PostgreSQL | 2 comments |