At Bloom, we have a largish Rails application: 12 000 lines of model code, alone. Yesterday, we found a major problem, and the cause was very non-obvious.
Bloom Digital Platforms has a product called AdGear. AdGear is an ad distribution platform. The obvious models are Ad and Campaign: a Campaign has many ads. From a UI perspective, it’s nice to show how many active ads are in the campaign. In the campaign’s index view, we have something like this:
1 <%# app/views/campaigns/_campaign.html.erb %> 2 <tr> 3 <td><%= link_to(h(campaign.name), campaign) %></td> 4 <td><%= pluralize(campaign.active_ad_units_count, "active ads") %></td> 5 </tr> 6
Notice how we’re using an attribute named active_ad_units_count. This is a cached value. We keep this value updated by doing the following:
1 class Ad < ActiveRecord::Base 2 belongs_to :campaign 3 after_save :update_campaign_cached_values 4 5 private 6 7 def update_campaign_cached_values 8 campaign.update_cached_values 9 end 10 end 11 12 class Campaign < ActiveRecord::Base 13 def update_cached_values 14 update_attribute(:active_ad_units_count, ad_units.active.count) 15 end 16 end
Pretty reasonable, you’ll say. And I agree: good names explain the intent, without needing to read all the details.
Let’s throw a monkey wrench in there:
1 class Campaign < ActiveRecord::Base 2 before_save :ensure_consistency 3 4 attr_accessor :conditions 5 6 def conditions 7 return @conditions if @conditions 8 @conditions =|| serialized_conditions ? YAML.load(serialized_conditions) : nil 9 end 10 11 private 12 13 # Method renamed to protect the innocent 14 def ensure_consistency 15 if some_condition then 16 self.serialized_conditions = conditions ? conditions.to_yaml : nil 17 return true 18 end 19 20 false 21 end 22 end
Can you spot the issue? Note this code is in Campaign, not Ad. It’s pretty obvious when boiled down to 20 lines of code. When you’re wading through 12 000, it’s pretty tough.
If you read the documentation for #before_save, you’ll notice this gem:
1 # == Canceling callbacks 2 # 3 # If a <tt>before_*</tt> callback returns +false+, all the later callbacks and the associated action are cancelled. If an <tt>after_*</tt> callback returns 4 # +false+, all the later callbacks are cancelled. Callbacks are generally run in the order they are defined, with the exception of callbacks 5 # defined as methods on the model, which are called last.
Excerpt from activerecord/lib/active_record/callbacks.rb
In the example above, some_condition might return false, which would cancel the save, but let’s unwind the stack:
- AdsController#update
- Ad#update_campaign_cached_values
- Campaign#update_cached_values
- ActiveRecord::Base#update_attribute
- Campaign#ensure_consistency – returns false
Because we were calling #update_attribute, which returns false when it fails to update the object, we ended up not updating the Campaign’s cached values. And we ended up with a stale cache. And ads which didn’t get updated. A careful tower of convoluted code tumbled down upon us.
Why did this happen? It’s obvious the developer who wrote the #ensure_consistency method didn’t know about ActiveRecord’s #serialize, which did exactly what he wanted to. He also didn’t know about the dangers of returning false from a callback. These two point to relative inexperience / lack of knowledge about Rails & ActiveRecord.
We’re obviously not the first ones to hit this kind of problem:
James Golick also ranted off about something similar a while back: Crazy, Heretical, and Awesome: The Way I Write Rails Apps
This points to a misfeature of ActiveRecord: a seemingly innocuous change had unintended consequences in a very unrelated portion of the code. Rails has a lot of magic / implicits. You have to be very careful. Ignorance is no excuse. The more I think about it, the more I like the service / command oriented way James offers, and it boils down to the discussion we had at the last Montreal.rb: implicit (Rails/Ruby) vs explicit (Django/Python).