No Pugs

they're evil

Ruby

ruby oddity with creating local variables dynamically using eval

It seems that you cannot create a local variable dynamically in ruby using eval with a binding.

Here’s an example from an IRB session:

irb(main):001:0> def make_a(b)
irb(main):002:1>   eval "a = 'this is a!'", b
irb(main):003:1> end
=> nil
irb(main):004:0> a
NameError: undefined local variable or method `a' for main:Object
        from (irb):4
        from :0
irb(main):005:0> make_a(binding)
=> "this is a!"
irb(main):006:0> a
=> "this is a!"
irb(main):007:0> def test_a
irb(main):008:1>   a = 10
irb(main):009:1>   make_a(binding)
irb(main):010:1>   puts a
irb(main):011:1> end
=> nil
irb(main):012:0> test_a
this is a!
=> nil
irb(main):013:0> def test_a
irb(main):014:1>   make_a(binding)
irb(main):015:1>   puts a
irb(main):016:1> end
=> nil
irb(main):017:0> test_a
NameError: undefined local variable or method `a' for main:Object
        from (irb):15:in `test_a'
        from (irb):17
        from :0
irb(main):018:0>

The last NameError in that IRB session makes no sense to me. I have tried this in Ruby 1.8.6 and in Ruby 1.9.1 and both give the same result.

Can somebody explain what is going on here? Are there any known workarounds for this? I was trying to DRY up some particularly repetitive code, part of which involves initializing a local variable, when I stumbled upon this. I’m having trouble finding blog posts that explain this unexpected behavior, and none of the official documentation that I’ve looked at for Binding and eval seem to point out that you cannot declare local variable in such a manner.

Here’s a stripped down version of the code if you want to put it in your own IRB session and play with it:


 def make_a(b)
   eval "a = 'this is a!'", b
 end
 a
 make_a(binding)
 a
 def test_a
   a = 10
   make_a(binding)
   puts a
 end
 test_a
 def test_a
   make_a(binding)
   puts a
 end
 test_a

Published on 03/10/2010 at 02:29PM under .

multimodel_transactions plugin: Saving and creating multiple model objects in a single controller action

In Ruby on Rails, it’s common to have a resource have a one-to-one correspondence with a model. Often you will have a create action that looks something like this:

class UsersController < ApplicationController
  # ... other actions, etc

  def create
    User.transaction do
      @user = User.new(params[:user])

      if @user.save
        flash[:notice] = 'User was successfully created.'
        redirect_to(@user)
      else
        render :action => "new"
      end
    end
  end

  # ... other actions
end

Now what if we have another model object that is also managed by this action, perhaps a profile object? Perhaps using the fields_for helper in the view?

Then we might change our code so that it looks something like this:

class UsersController < ApplicationController
  # ... other actions, etc

  def create
    User.transaction do
      @user = User.new(params[:user])
      @profile = Profile.new(params[:profile])

      user.profile = @profile

      if @user.save && @profile.save
        flash[:notice] = 'User was successfully created.'
        redirect_to(@user)
      else
        User.connection.rollback_db_transaction
        render :action => "new"
      end
    end
  end

  # ... other actions
end

And our new.html.erb view will have a form that looks something like this:

<% form_for(@user) do |f| %>
  <%= f.error_messages %>

  <p>
    <%= f.label :name %><br />
    <%= f.text_field :name %>
  </p>

  <%# Other fields using the "f" builder %>

  <% fields_for(@profile) do |builder| -%>
    <%= builder.error_messages %>
    <p>
      <%= builder.label :favorite_movie %><br />
      <%= builder.text_field :favorite_movie %>
    </p>

    <%# Other fields using the "builder" builder %>

  <% end %>
  <p>
    <%= f.submit "Create" %>
  </p>
<% end %>

There should be no problems here right? Wrong. The problem is that in edit.html.erb, we are going to also have a form_for(@user) and fields_for(@profile). So how does new.html.erb and edit.html.erb know to create a form action of create_user_url and update_user_url, respectively? It does this by checking the new_record? method of the passed in model.

Here’s where a problem is created. Lets revisit part of our controller code for the create action:

      if @user.save && @profile.save
        flash[:notice] = 'User was successfully created.'
        redirect_to(@user)
      else
        User.connection.rollback_db_transaction
        render :action => "new"
      end

If @user.save fails, @user.new_record? will still be true. This is what we want since we will be calling render :action => 'new' and we want the form generated there to call create_user_url, not update_user_url.

But what happens when @user.save works and @profile.save fails? @profile.new_record? will still be true, which is good, but @user.new_record? will be false, because it successfully saved! The user row will not exist in the database, because we rolled back our transaction, but only @profile knew to revert to an unsaved active record state. @user has no idea what’s going on.

So then, when new.html.erb renders form_for(@user), it will create a form with the update_user_url instead of create_user_url! This will result in an exception like this:

 ActiveRecord::RecordNotFound in UsersController#update

Couldn't find User with ID=2

To understand how to fix this, we need to understand why it works with only one model. It works because @user.save calls rollback_active_record_state! which records the object’s state, with respect to things like new_recodr?, and takes a block that it executes in a begin/rescue block. If it catches an exception, it reverts to the state it was in before it yielded to the block, and reraises the exception. The exception is usually ActiveRecord::Rollback, which is gobbled up by a transaction that @user.save starts before calling rollback_active_record_state!

We can fix the problem by using this method on both objects, and throwing an exception if any of the models fail to save.

The above if statement becomes something like this:

      #an anonymous Exception class, it's best to create a real class for use
      #in multiple locations
      ec = Class.new(:Exception)

      begin
        @user.rollback_active_record_state! do
          @profile.rollback_active_record_state! do
            if @user.save && @profile.save
              flash[:notice] = 'User was successfully created.'
              redirect_to(@user)
            else
              User.connection.rollback_db_transaction
              raise ec.new
            end
          end
        end
      rescue ec
        render :action => "new"
      end

I have created a plugin that helps replace this type of code. It works with any number of model objects across any number of database connections.

It is at git://github.com/azimux/multimodel_transactions.git

With it installed, the above code becomes:

ax_multimodel_if([@user, @profile],
  :if => proc {
    @user.save && @profile
  },
  :is_true => proc {
    flash[:notice] = 'User was successfully created.'
    redirect_to(@user)
  },
  :is_false => proc {
    render :action => "new"
  }
)

Note to Smalltalk users: if you need an example of ruby’s block passing syntax making certain method calls look a little ugly, there you go :P

There is also another method in the plugin that helps with models being in multiple databases (or the future possibility of such a thing happening.): ax_multimodel_transaction

To apply it to the above call to ax_multimodel_if, it would look somewhat like this:

ax_multimodel_transaction [@profile], :already_in => @user do
  ax_multimodel_if([@user, @profile],
    :if => proc {
      @user.save && @profile
    },
    :is_true => proc {
      flash[:notice] = 'User was successfully created.'
      redirect_to(@user)
    },
    :is_false => proc {
      render :action => "new"
    }
  )
end

h1. Gotchas

1) If @profile has the user_id foreign key, it’s important to do:

@user.profile = @profile

and not:

@profile.user = @user

When neither of the objects have been saved yet. I’m not exactly sure why this is, I just know from experience that if you do it the other way around, @profile will be saved with a nil user_id

2) Like always when doing things sensitive to transactions and rollbacks, you might wish to add a “self.use_transactional_fixtures = false” to the relevant functional test. This is really only necessary if your test runs commands that execute queries that are not expecting to be inside of a rolledback transaction after the call to “post :create” or “post :update” or whatever. It is only very rarely necessary to do, but it’s good to keep in mind when writing more complex functional tests.

Published on 01/27/2010 at 03:01PM under , .

Missing ExceptionNotifiable module with exception_notification plugin

It looks like the exception_notification plugin is undergoing a pretty big overhaul to get it working well with rails3. The official git repository for that plugin only has one branch, master. What should you do for your existing rails2 applications that are not ready for migration to rails3?

If you are getting something somewhat like this:

rake aborted!
uninitialized constant ApplicationController::ExceptionNotifiable
.../rails/activesupport/lib/active_support/dependencies.rb:102:in `const_missing'

You can fix it by using commit e8b603e523c14f145da7b3a1729f5cc06eba2dd1 of that plugin.

Something like this should do the trick:

cd vendor/plugins/exception_notification
git checkout -b rails2 e8b603e523c14f145da7b3a1729f5cc06eba2dd1

That particular commit is from November 13, 2008. It is the last commit not having anything to do with rails3, and it is a very stable commit and unlikely to need modification for existing projects.

if you are using externals to manage your subprojects, you can quickly fix it like this (from the main project directory):

ext freeze exception_notification e8b603e523c14f145da7b3a1729f5cc06eba2dd1

Published on 01/26/2010 at 06:26PM under , , .

bacterial colony mutation simulation

Below is a quick ruby program I threw together to try and make a very rough simulation of the impact of rare and barely beneficial mutations in bacteria colonies.

The way this code is here, there’s a strong type of bacteria (with the beneficial mutation) and a weak kind of bacteria (without the beneficial mutation)

The way these numbers are currently set, the bacterial colony starts at 1 weak bacteria and grows from there. There’s a 1 in 10 million chance that a weak bacteria will gain the strong mutation or that a strong bacteria will lose it’s strong mutation. The benefit to the bacteria is a 0.1% improvement in survival rate. Anytime the colony grows above capacity, a bunch are killed off (the current survival rate is 50%, so half die every time it grows above 4 million bacteria.)

Feel free to play with the numbers and see the various outcomes.

A quick explanation of the algorithm: I don’t test every single bacteria to see if it survives. Instead, if 49.9% of them are supposed to die, and let’s say that there’s 5 bacteria. 5 * 49.9 = 2.495. So I kill 2 bacteria and then generate a random number between 0 and 1 to test against 0.495 to see if the 3rd bacteria dies or not. Not testing every single bacteria directly makes it less like nature of course, but it also makes the algorithm much faster.

MUTATION_RATE = 1.0 / 10_000_000
CAPACITY = 4_000_000

BENEFIT = 0.001
SURVIVAL_RATE = 0.5

class Colony
  attr_accessor :survival_rate, :count
  def initialize rate, count = 0
    self.survival_rate = rate
    self.count = count
  end

  def double
    self.count *= 2
  end

  def kill
    self.count = self.count.cut(survival_rate)
  end
end

class Experiment
  attr_accessor :colony_w, :colony_s, :milestones, :capacity, :mutation_rate,
    :generations, :min_w, :min_s, :max_w, :max_s, :first_mutation

  def initialize survival_rate, benefit, capacity, mutation_rate
    self.capacity = capacity
    self.mutation_rate = mutation_rate
    self.colony_w = Colony.new(survival_rate, 1)
    self.colony_s = Colony.new(survival_rate + benefit)
    self.milestones = (0..10).map {|i| i * 0.1}
    self.generations = 0
    self.min_w = colony_w.count
    self.min_s = colony_s.count
    self.max_w = 0
    self.max_s = 0
    update_min_max
  end

  def update_min_max
    self.min_w = [min_w, colony_w.count].min
    self.min_s = [min_s, colony_s.count].min
    self.max_w = [max_w, colony_w.count].max
    self.max_s = [max_s, colony_s.count].max
  end

  def total
    colony_w.count + colony_s.count
  end

  def percent_w
    colony_w.count.to_f / total
  end

  def percent_s
    colony_s.count.to_f / total
  end

  def double
    [colony_w, colony_s].each {|c| c.double}
  end

  def above_capacity?
    total > capacity
  end

  def tick
    run
    update_min_max
    check_milestone
    if generations % 10_000 == 0
      print_status
    end
  end

  def check_milestone
    if percent_s > milestones[0]
      stone = milestones.shift
      puts "colony_s has reached #{stone * 100}% of the population at generation #{generations}.
      total population: #{total}
      "
    end
  end

  def print_status
    puts "gen #{generations}: weak: #{colony_w.count} strong: #{colony_s.count}"
  end
  
  def mutate
    cols = [colony_s,colony_w]
    [cols, cols.reverse].each do |cs|
      mutated = cs[0].count.cut(mutation_rate)
      if mutated > 0
        cs[1].count += mutated
        cs[0].count -= mutated
        
        unless first_mutation
          puts "first mutation arises at gen #{generations}"
          print_status
          self.first_mutation = true
        end
      end
    end
  end

  def run
    double
    mutate
    while above_capacity?
      colony_w.kill
      colony_s.kill
    end
    self.generations += 1
  end
end

Integer.class_eval do
  def cut prob
    full = self * prob
    whole = full.to_i
    partial = full - whole

    rand < partial ? whole + 1 : whole
  end
end

e = Experiment.new SURVIVAL_RATE, BENEFIT, CAPACITY, MUTATION_RATE

e.print_status

while e.percent_w > 0.001 && e.generations <= 500_000_000
  e.tick
end

puts "took #{e.generations} generations to get population_strong from min of #{e.min_s}
to #{e.colony_s.count} (#{e.percent_s * 100}%)
and populations_weak from #{e.min_w} to a max of #{e.max_w} down
to #{e.colony_w.count} (#{e.percent_w * 100})%"

Here’s some of the results I get when running it with different values:

Using the current values

gen 0: weak: 1 strong: 0
first mutation arises at gen 23
gen 23: weak: 4194303 strong: 1
colony_s has reached 0.0% of the population at generation 29.
      total population: 2097152

colony_s has reached 10.0% of the population at generation 3911.
      total population: 2329272

colony_s has reached 20.0% of the population at generation 4317.
      total population: 2620602

colony_s has reached 30.0% of the population at generation 4587.
      total population: 2995564

colony_s has reached 40.0% of the population at generation 4808.
      total population: 3494804

colony_s has reached 50.0% of the population at generation 5010.
      total population: 2097183

colony_s has reached 60.0% of the population at generation 5213.
      total population: 2621926

colony_s has reached 70.0% of the population at generation 5434.
      total population: 3495609

colony_s has reached 80.0% of the population at generation 5703.
      total population: 2622864

colony_s has reached 90.0% of the population at generation 6108.
      total population: 2623759

took 7307 generations to get population_strong from min of 0
to 3258961 (99.0003848879678%)
and populations_weak from 1 to a max of 2097152 down
to 32906 (0.999615112032169)%

So with a benefit of 0.001, it took 29 generations for the mutation to surface, 5010 generations for it to match the frequency of the weaker bacteria, and 7307 generations for the strong bacteria to make up over 99% of the colony.

With a 0.0001 benefit (strong bacteria has a %0.01 better chance of survival over the weak bacteria)

gen 0: weak: 1 strong: 0
first mutation arises at gen 20
gen 20: weak: 2097151 strong: 1
colony_s has reached 0.0% of the population at generation 21.
      total population: 2097152

gen 10000: weak: 2095041 strong: 6691
gen 20000: weak: 2093070 strong: 56031
colony_s has reached 10.0% of the population at generation 27047.
      total population: 2324111

gen 30000: weak: 2091153 strong: 420262
colony_s has reached 20.0% of the population at generation 31090.
      total population: 2613776

colony_s has reached 30.0% of the population at generation 33781.
      total population: 2986635

colony_s has reached 40.0% of the population at generation 35989.
      total population: 3484132

colony_s has reached 50.0% of the population at generation 38015.
      total population: 2090424

gen 40000: weak: 1045189 strong: 1554583
colony_s has reached 60.0% of the population at generation 40043.
      total population: 2613202

colony_s has reached 70.0% of the population at generation 42254.
      total population: 3485008

colony_s has reached 80.0% of the population at generation 44952.
      total population: 2615286

colony_s has reached 90.0% of the population at generation 49018.
      total population: 2620706

gen 50000: weak: 262319 strong: 2870166
gen 60000: weak: 33896 strong: 2649526
gen 70000: weak: 5297 strong: 2445672
took 75938 generations to get population_strong from min of 0
to 2004344 (99.9000174446134%)
and populations_weak from 1 to a max of 2097151 down
to 2006 (0.0999825553866474)%

It took 38015 generations for the strong and weak bacteria to be equal in numbers, 75938 generations for the strong bacteria to go from 0 to over 99% of the population. Almost 10 times longer, which is interesting because the benefit was decreased by 10.

With 0 benefit

Note: I changed the code to only print out the status every 50 million generations for this one, and only to run a maximum of 500 million generations

gen 0: weak: 1 strong: 0
first mutation arises at gen 25
gen 25: weak: 4194303 strong: 1
colony_s has reached 0.0% of the population at generation 27.
      total population: 2097151

colony_s has reached 10.0% of the population at generation 1117956.
      total population: 2097329

colony_s has reached 20.0% of the population at generation 2550113.
      total population: 2097589

colony_s has reached 30.0% of the population at generation 4577301.
      total population: 2097995

colony_s has reached 40.0% of the population at generation 8054126.
      total population: 2097269

colony_s has reached 50.0% of the population at generation 35885018.
      total population: 2101332

gen 50000000: weak: 1052661 strong: 1052518
gen 100000000: weak: 1051453 strong: 1052153
gen 150000000: weak: 1051255 strong: 1050806
gen 200000000: weak: 1052545 strong: 1051996
gen 250000000: weak: 1057002 strong: 1055415
gen 300000000: weak: 1053651 strong: 1055580
gen 350000000: weak: 1057720 strong: 1056790
gen 400000000: weak: 1059992 strong: 1059863
gen 450000000: weak: 1060187 strong: 1059305
gen 500000000: weak: 1058117 strong: 1059018
took 500000001 generations to get population_strong from min of 0
to 1059018 (50.0212787564326%)
and populations_weak from 1 to a max of 2097152 down
to 1058117 (49.9787212435674)%

Looks like it works it’s way to 50% very slowly, and then stays there. This is interesting and isn’t what I expected. I had expected that it would stay at whatever frequency it was at when it reached capacity. I guess what is happening is that whichever type there is more of is going to have more bacteria mutate into the other type than it gets back in return.

Published on 03/20/2009 at 04:00PM under , .

How to migrate typo from mysql to postgresql

I almost always use postgresql when working on a rails application. I won’t list all the little reasons why, but a major reason is for transactional DDL statements, which means when I run a migration that fails, I don’t have to then go run a bunch of cleanup queries to get my database back to how it was before the migration was ran.

When I was setting up this instance of typo, I decided I’d go ahead and go with mysql since I didn’t plan to hack on typo very much. Long story short: I decided to migrate from mysql to postgresql. This howto was done with mysql 5.0.70, postgresql 8.3.5, and typo 5.1.3 It probably will work with any mysql 5+ and postgresql 8+.

In case anybody else out there might be interested in doing likewise, here’s how I did it. These steps will be for a production database, but the changes required for doing it to a development database should be obvious.

Step 0: Backup your data

You don’t really need to be told this, do you?

Step 1: Dump the data from mysql

run the following to dump the data.

mysqldump --compatible=postgresql --no-create-info -u root -p --skip-extended-insert --complete-insert --skip-opt typo > typo.dump 

We are only dumping the data, hence the –no-create-info option.

Step 2: Create your postgresql database

You can do this however you see fit. I’ve included how I do it in case it’s useful:

CREATE USER typo_prod;
CREATE DATABASE typo_prod OWNER typo_prod ENCODING 'utf8';

\password typo_prod

and enter the password you wish to use.

Step 3: Change your database.yml to use your new postgresql database

Again, do this however you want. Here’s my database.yml with passwords omitted:

defaults: &defaults
  database: typo
  adapter: postgresql
  encoding: utf8
  host: localhost
  password: 

development:
  username: typo_dev
  database: typo_dev
  <<: *defaults

test:
  username: typo_test
  database: typo_test
  <<: *defaults

production:
  username: typo_prod
  database: typo_prod
  password: 
  host: salmon
  <<: *defaults

Step 4: Create the schema in your new database

To do this we’ll run the db:migrate rake task

RAILS_ENV="production" rake db:migrate

Step 5: Fire up a rails console to fix stuff

Now we need to fire up a rails console to do a lot of necessary cleanup work before we can import our data

ruby script/console production

Once it’s ready to go, type (or more practically, copy pasta)

conn = ActiveRecord::Base.connection

We’ll need this for a lot of the commands we have yet to run. You’ll keep this console open for the remainder of this howto. Any ruby code you see in this document will go into this console.

Step 6: Remove data created during the migrations

The typo migrations automatically add some default data, like some default pages/articles/blog. All of the data we want is in the dump we created earlier. Let’s delete all this stuff that’s in the way

conn.tables.each do |table|
  conn.execute "delete from #{table}"
end

Step 7: Temporarily change boolean columns to integers

mysqldump dumps it’s booleans as 0/1. These are interpreted by postgres as integers. It will not automatically cast these into booleans just because the column is boolean (I’m not sure why.) It’s too time consuming to go add casts to all of these 0/1’s, and a regular expression to use with sed would be far too complex to bother with since not all 1’s and 0’s in the dump correspond to boolean data.

So, we will temporarily change the boolean columns in our shiny new database to integers. Before we do this, we need to temporarily drop the defaults for these boolean columns because there won’t be an implicit cast from false/true to 0/1.

This code will build a couple of hashes to store which columns are booleans and what the defaults are.

bools = {}
defaults = {}


conn.tables.each do |table|
  conn.columns(table).each do |col|
    if col.type.to_s == "boolean"
      (bools[table] ||= []) << col.name
      (defaults[table] ||= {})[col.name] = col.default if !col.default.nil?
    end
  end
end

here’s the value of bools and defaults in my console after the above code:

#bools
{"resources"=>["itunes_metadata", "itunes_explicit"], "contents"=>["published", 
"allow_pings", "allow_comments"], "users"=>["notify_via_email", 
"notify_on_new_articles", "notify_on_comments", "notify_watch_my_articles", 
"notify_via_jabber"], "feedback"=>["published", "status_confirmed"], 
"categorizations"=>["is_primary"]}
#defaults
{"contents"=>{"published"=>false}, "feedback"=>{"published"=>false}}

Let’s now temporarily drop the defaults

defaults.each_pair do |table,cols|
  cols.each_key do |col|
    conn.execute "alter table #{table} alter column #{col} DROP DEFAULT"      
  end
end

Now let’s alter the column types for the columns in bools.

We’ll use a closure to run the alter statements, so that we can use it again later to alter them back to booleans.

change_to_type = proc {|to_type|
  bools.each_pair do |table, cols|
    cols.each do |col|
      conn.execute "alter table #{table} alter column #{col} type #{to_type} 
                                USING (#{col}::#{to_type});"
    end
  end
}

change_to_type.call :integer

Step 8: Load the data dump into the new database

Ah, finally. Let’s load the data. Back to a shell in a directory with the dump, run:

sed "s/\\\'/\'\'/g" typo.dump | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" | psql -1 typo_prod

Pass whatever options you need to connect to psql as you normally would. The first sed converts all of the \’ to two consecutive ‘s, which is what psql expects. The next two calls to sed in the pipeline replace the escaped carriage returns and newlines with actual carriage returns and newlines, which is again what psql expects.

You may get a couple warnings, but hopefully no errors. The few warnings I received were inconsequential.

Step 9: Change the boolean columns back to boolean and restore the default columns

Back to our rails console. We now have the data in place and can change the columns back using our closure from earlier:

change_to_type.call :boolean

And then restore the defaults we dropped:

defaults.each_pair do |table, cols|
  cols.each_pair do |col, default|
    conn.execute "alter table #{table} alter column #{col} SET DEFAULT #{default}"
  end
end

Step 10: Repair the sequences.

Another annoying aspect of postgresql is that inserting a value into a serial column doesn’t automatically advance the sequence to be ready to serve up an unused value. There will be a sequence called “#{table}idseq” for each table with an id column in the database.

We manually have to advance all of the sequences:

conn.tables.each do |table|
  if conn.columns(table).detect{|i|i.name == "id"}
    conn.execute "SELECT setval('#{table}_id_seq', (SELECT max(id) FROM #{table}))"
  end
end

Conclusion

So that should do it. Restart your mongrel cluster (or whatever you are using to manage your rails server processes) and you should now be using your blog with a postgresql backend!

Published on 01/04/2009 at 11:44PM under , , , .

externals-tutorial

What is externals and what is it used for?

externals allows you to make use of an svn:externals-like workflow with any combination of SCMs. What is the svn:externals workflow? I would describe it roughly like this:

You register subprojects with your main project. When you checkout the main project, the subprojects are automatically checked out. Doing a ‘status’ will tell you the changes in the main projects and any subprojects from where it’s ran. You commit changes to the the projects all seperately as needed. If somebody else does an update, they will get the changes to the subprojects as well.

For a more detailed explanation of why I started the externals project, please visit http://nopugs.com/why-ext It’s largely a rant about git-submodule.

On with the tutorial

Installation

ext should run on unix-like systems and windows systems. All the unit tests pass on Linux and Windows vista (with cygwin).

First we need to install externals. The first, and easiest, method is to use gem:

gem install ext

The other method is to use github:

git clone git://github.com/azimux/externals.git
chmod u+x externals/bin/ext

If you install using git clone instead of rubygems, be sure to add the externals/bin directory to your path.

Creating a repository to play around with

I will use git for the main project, and will use git and subversion for the subprojects (the tutorial would be mostly identical if I used svn for the main project, that’s part of the point of ext.)

Now let’s create a repository for use with our project. I like to test out stuff like this in my ~/tmp/ folder.

cd
mkdir tmp
cd tmp

mkdir repo
mkdir work

cd repo
mkdir rails_app.git
cd rails_app.git
git init --bare

Now let’s go to our work directory and make a rails app to push to this repository.

cd ../../work/
rails rails_app
cd rails_app
git init
git add . 
git commit -m "created fresh rails app"
git remote add origin ../../repo/rails_app.git 
git push origin master

If you’re like me, you consider empty directories in your project’s directory structure to be part of the project. Git will not track empty directories. So, here’s our first use of ext:

ext touch_emptydirs
git add .
git commit -m "touched empty dirs"
git push

This adds a .emptydir file to every empty directory so that git will track these folders.

Using “ext install” to register subprojects.

Now for our second use of ext. Let’s add the current edge rails to our application:

ext install git://github.com/rails/rails.git

It should take a moment because rails is a large project.

Now that that’s done, let’s see what “ext install” did.

$ cat .externals 
[.]
scm = git
type = rails

[vendor/rails]
path = vendor/rails
repository = git://github.com/rails/rails.git
scm = git

.externals is the externals configuration file. This is the file used to keep track of your subprojects. Projects are stored in the form:

[path/to/project]
repository = urlfor://project.repository/url
branch = somebranch
scm = git/svn

The format is very similar to ini format. The section name is the path to the project. The main project’s settings are stored under [.]

Some things to notice: externals was automatically able to figure out that we’re using git for the main project (scm = git under [.]) Also, note that the type of the main project has been detected as rails (type = rails) This means that we can leave the paths off of the repositories in .externals (when using “ext install”) and ext will automatically know where to install stuff (if it’s called rails it goes in vendor/rails otherwise it goes in vendor/plugins/) Let’s make sure it’s there.

$ ls vendor/rails
Rakefile      activemodel     activesupport  pushgems.rb
actionmailer  activerecord    ci             railties
actionpack    activeresource  doc            release.rb

That’s not all, take a look at the ignore file:

$ cat .gitignore
vendor/rails

This makes sense because we don’t want the main repository to track any of the files in the subproject. The files in the subproject are tracked by their own repository, possibly of a different SCM than the main project.

Let’s add some more subprojects: some rails plugins this time. We’ll add a couple that are tracked under subversion and one tracked under git to demnostrate how ext is scm agnostic.

ext install git://github.com/lazyatom/engines -b edge
ext install svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/redhillonrails_core
ext install svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/foreign_key_migrations

let’s see if our plugins made it

$ du --max-depth=2 -h vendor/plugins/ | grep lib
252K    vendor/plugins/foreign_key_migrations/lib
340K    vendor/plugins/redhillonrails_core/lib
24K vendor/plugins/engines/lib

looks good

$ cat .externals 
[.]
scm = git
type = rails

[vendor/rails]
path = vendor/rails
repository = git://github.com/rails/rails.git
scm = git

[vendor/plugins/engines]
path = vendor/plugins/engines
repository = git://github.com/lazyatom/engines
scm = git
branch = edge

[vendor/plugins/redhillonrails_core]
path = vendor/plugins/redhillonrails_core
repository = svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/red
hillonrails_core
scm = svn

[vendor/plugins/foreign_key_migrations]
path = vendor/plugins/foreign_key_migrations
repository = svn://rubyforge.org/var/svn/redhillonrails/trunk/vendor/plugins/for
eign_key_migrations
scm = svn

…and the ignore file…

$ cat .gitignore 
vendor/rails
vendor/plugins/acts_as_list
vendor/plugins/foreign_key_migrations
vendor/plugins/redhillonrails_core

also looks very good!

Something worth noting: if we were using svn for our main project, ext is smart enough to set the ignores using ‘svn propset svn:ignore’ on the appropriate directories.

Let’s now commit and push our work.

git add .
git commit -m "added 4 subprojects"
git push

Using “ext checkout” and “ext export”

And now let’s delete and check it out again to make sure we get the sub projects

cd ..
rm -rf rails_app
ext checkout ../repo/rails_app.git

It will take a moment as it clones rails from github again.

Let’s make sure all of the subprojects were checked out properly:

$ cd rails_app
$ du --max-depth=3 -h vendor/ | grep lib
12K     vendor/plugins/acts_as_list/lib
66K     vendor/plugins/foreign_key_migrations/lib
162K    vendor/plugins/redhillonrails_core/lib
382K    vendor/rails/actionmailer/lib
1.5M    vendor/rails/actionpack/lib
104K    vendor/rails/activemodel/lib
791K    vendor/rails/activerecord/lib
92K     vendor/rails/activeresource/lib
2.4M    vendor/rails/activesupport/lib
584K    vendor/rails/railties/lib

let’s also make sure the engines plugin is on a branch called “edge” (which is tracking the remote repository’s edge branch)

$ cd vendor/plugins
$ git branch -a
* edge
  master
  origin/HEAD
  origin/add_test_for_rake_task_redefinition
  origin/edge
  origin/master
  origin/timestamped_migrations

Notice how the subprojects were automatically fetched. As mentioned in the why ext article, the main project is usually incapable of functioning without it’s subprojects, so it makes sense to fetch the subprojects when we do a checkout or export. (This is what svn checkout does when it checks out a folder that has svn:externals set on it. It fetches the external projects automatically, which is very convenient.)

Note that you can use “ext export” instead of checkout if you don’t want histories to accompany the files. This tells ext to use “svn export” for subversion managed (sub)projects and “git clone –depth 1” for git managed (sub)projects. This can save a lot of time and is useful for deployment.

looks good, let’s go back to the rails_app directory to continue the tutorial

cd ../../../

“ext status” propagates through subprojects

Let’s modify a subproject.

echo "lol, internet" >> vendor/plugins/foreign_key_migrations/README

And now let’s check the status

$ ext status
status for .:
# On branch master
nothing to commit (working directory clean)

status for vendor/rails:
# On branch master
nothing to commit (working directory clean)

status for vendor/plugins/acts_as_list:
# On branch master
nothing to commit (working directory clean)

status for vendor/plugins/redhillonrails_core:


status for vendor/plugins/foreign_key_migrations:
M      README

As expected, foreign_key_migrations has a modified file. This same (very common) task is a bit of a pain in the neck with git-submodule (unless I’m missing something), and impossible in this situation where the subproject is not managed under the same source control system as the main project (as in this example.)

Deployment with capistrano

Most commands also have a short version of the command. The short versions only operate on the subprojects and not the main projects. “ext checkout” or “ext export” fetches the main project and subprojects but “ext co” and “ext ex” (meant to be ran in the working folder of the main project, use –workdir to do it from elsewhere) will fetch all subprojects and doesn’t touch the main project.

If you deploy with capistrano, you can have all your subprojects fetched on deployment by adding the following to your deploy.rb:

task :after_update_code, :roles => :app do
  run "ext --workdir #{release_path} ex"
end

Notice how I chose to use “ex” instead of “co” This is because I never do work from a deployed project’s working directory, so the history is pointless.

If people find externals usefull, I’d be happy to add a :ext scm type to capistrano so that it runs ext instead of git/svn. Then it would pickup all the subprojects during a deploy without having to supply the above after_update_code task. I could also add a switch to rails “./script/plugin install” (perhaps -X) to tell it to use ext to manage the project (kind of how you can use -x to tell it to use svn:externals.) Though, this isn’t really any easier to make use of than just doing “ext install”

A few other tips

“ext help” will show you all the available commands. Also, feel free to manage the .externals file manually if you wish.

Conclusion

For issue tracking, at the moment I’m using lighthouseapp. Report bugs to http://ext.lighthouseapp.com/

I also have a rubyforge account for this project at http://rubyforge.org/projects/ext/ if you would prefer to submit bugs/feature requests via rubyforge’s tracking system. I’ve used both sites but never managed a project with either, so I don’t know which is better. Rubyforge seems to be more feature complete.

Externals is my first attempt at contributing a useful open source project to the community. If you have some tips for me in this regard, please feel free to share them.

Cheers!


Published on 09/06/2008 at 11:58PM under , , .

Why externals?

Externals allows you to make use of an svn:externals-like workflow with any combination of SCMs. What is the svn:externals workflow? I would describe it roughly like this:

You register subprojects with your main project. When you checkout the main project, the subprojects are automatically checked out. Doing a ‘status’ will tell you the changes in the main projects and any subprojects from where it’s ran. You commit changes to the the projects all seperately as needed. If somebody else does an update, they will get the changes to the subprojects as well.

Probably like you, I’ve started using git for some projects/plugins. Git has a feature called git-submodule that is supposed to work similar to svn:externals. Git-submodule’s annoyances are what inspired me to start the externals project, and here are some of the problems I have with git-submodule’s workflow:

  • When you clone/pull an existing project, you have to do git-submodule init/update manually to get the subprojects (or to update them.) You may be thinking this isn’t a big deal, but it’s an extra step. The main project is almost never functional without the subprojects so why would you ever want to pull in only the main project? Having to run extra steps every time is annoying.
  • With git-submodule, the subprojects are pulled in at a specific commit, not at a branch tip. This is not a commit at some point in some branch’s history. It’s a commit with the working directory completely disconnected from any branch. If you want to make any edits to a subproject, you first have to checkout the branch you want to work with. This is extremely annoying. When I start adding a new feature to the main project, I can’t predict which subprojects may need to be modified along the way. Should I go to them all manually and do a checkout? If not, this usually means that as I’m stepping through the debugger I have to keep track of what subprojects I’m about to change, stop what I’m doing and go to them and checkout a branch. Not only is this disruptive to my workflow, on more than one occasion I have forgot to checkout a branch before making edits, and wound up issuing the wrong commands to do the checkout once I realized I was detatched. The result was I wiped out all of my changes in the subproject. This can be extremely irritating.
  • Status doesn’t propagate through the subprojects. This means that when it’s time to make some commits, you have to go to every single subproject and do a ‘git status’ because you aren’t 100% sure which ones you’ve made changes too.
  • Because of #2, when you do make changes to a subproject, you have to remember to do a ‘git add path/to/subproject’ so that the new commit is pointed to by the main project.

However, even if git-submodule was as useful to me as svn:externals, I would still see a need for ext: when you have a project with subprojects managed by different SCMs. What I’ve been doing is if the main project is git, all subprojects of type git I would manage with git-submodule, all subprojects using subversion I would simply commit the whole subversion work directories into my git repository. When the main project is subversion I’ve been using svn:externals for managing subprojects that use subversion, and checking whole git repositories into my main project’s repository. Now with externals, I can have a uniform workflow regardless of SCM combination.

For info on how to use externals, please see: http://nopugs.com/ext-tutorial


Published on 09/04/2008 at 07:09PM under , , .

Powered by Typo – Thème Frédéric de Villamil | Photo Glenn