Now that I have a couple of minutes, let me tell you what I changed exactly in Mephisto to support multiple spam detection engines.

Initially, app/models/comment.rb looked like this:

1
2
3
4
5
6
7
8
9
10
11
class Comment < Content
  def check_approval(site, request)
    self.approved = site.approve_comments?
    if valid_comment_system?(site)
      akismet = Akismet.new(site.akismet_key, site.akismet_url)
      self.approved = !akismet.comment_check(comment_spam_options(site, request))
      logger.info "Checking Akismet (#{site.akismet_key}) for new comment on Article #{article_id}.  #{approved? ? 'Approved' : 'Blocked'}"
      logger.warn "Odd Akismet Response: #{akismet.last_response.inspect}" unless Akismet.normal_responses.include?(akismet.last_response)
    end
  end
end

app/models/comment.rb@bugfixing

The original method was intimately tied to Akismet: it would instantiate one, and use it directly. The first thing I needed to do was to be able to use any kind of spam detection engine, without knowing the details of it. In fact, what I needed was an instance of the Adapter pattern.

The code changed from the above to this:

1
2
3
4
5
6
7
8
9
class Comment < Content
  def check_approval(site, request)
    self.approved = site.approve_comments? || spam_engine(site).ham?(article.permalink_url(site, request), self)
  end

  def spam_engine(site)
    site.spam_engine
  end
end

app/models/comment.rb@multiengine

So now, I had to get hold of an instance of SpamDetectionEngine. I again turned to the Gang of Four patterns, and this is when I realized I needed a Strategy Pattern. The Site was the most obvious place to put this, as the configuration for storing Akismet details was already there. I simply extended the concept further to allow any engine to store any configuration settings in Site, and made Site return an instance of my spam engine:

1
2
3
4
5
6
7
class Site < ActiveRecord::Base
  def spam_engine
    klass_name = read_attribute(:spam_detection_engine)
    return Mephisto::SpamDetectionEngines::NullEngine.new(self) if klass_name.blank?
    klass_name.constantize.new(self)
  end
end

app/models/site.rb@multiengine

What’s a NullEngine you ask ? It’s the default engine. It’s a do nothing engine. It’s an instance of the Null object pattern. What does this engine do really ? It returns canned data for all requests. It’s always #valid?, and it always accepts comments.

So, I moved the old Akismet code to the AkismetEngine, and created a new DefensioEngine. Both of these engines have real code in place to do the validation. Although I haven’t verified that the AkismetEngine still works, I believe it should. I used Marc-André Cournoyer’s Defensio Rails plugin to actually talk to Defensio.

Next, I needed a way to configure those different engines. What I needed was a way to render templates, and have the engines provide the templates to the regular Rails views. I made myself a custom Template class, which renders a simple ERB template. I put the templates right next to the engine it’s linked to. Then it was just a matter of rendering the template. Looking at the AkismetEngine, the #settings_template code looks like this:

1
2
3
4
5
6
7
8
9
10
11
module Mephisto
  module SpamDetectionEngines
    class AkismetEngine < Mephisto::SpamDetectionEngine::Base
      class << self
        def settings_template(site)
          load_template(File.join(File.dirname(__FILE__), "akismet_settings.html.erb")).render(:site => site, :options => site.spam_engine_options)
        end
      end
    end
  end
end

lib/mephisto/spam_detection_engines/akismet_engine.rb@multiengine

Finally, Defensio provides statistics through it’s API. I wanted to show nice graphics, and turned to the Google Chart API to generate them. In the end, what does all of this look like ? Well, this:

The Defensio engine's configuration and statistics
Click for larger version

I had a bit of discussion with Carl Mercier about the accuracy graph. He said Defensio looked poor because the green bar is very low. I countered that the difference between 95 and 96 percent, if the graph were 100 pixels high, would only be 1 pixel. The graph’s scale is actually 95 to 100%, so now the difference between 95 and 96 percent will be 20 pixels, if the bar is 100 pixels high. Anyway, this is an unresolved issue, and I’d like to know what people feel. You can change this yourself by cloning the repository and editing the Defensio statistics template:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<%
  spam = @statistics.spam.to_i
  ham = @statistics.ham.to_i
  accuracy = @statistics.accuracy.to_f * 100.0
  mapped_accuracy = ((accuracy - 95) * 5.0)
  spam_pct = spam / (spam + ham) * 100.0
  ham_pct = 100.0 - spam_pct
  false_positives, false_negatives = @statistics.false_positives.to_i, @statistics.false_negatives.to_i
  false_positives_pct = false_positives.to_f / (false_positives + false_negatives) * 100.0
  false_negatives_pct = 100.0 - false_positives_pct
  stats_chd = sprintf("t:%.1f,%.1f", spam_pct, ham_pct)
  accuracy_chd = sprintf("t:%.1f|%.1f", mapped_accuracy, 100.0)
  retraining_chd = sprintf("t:%.1f,%.1f", false_positives_pct, false_negatives_pct)
%>

lib/mephisto/spam_detection_engines/defensio_statistics.html.erb@multiengine

All in all, I really enjoyed refactoring a tool that I use regularly. I didn’t find nor fix any major problems along the way, but if I do find some, I’ll be sure to fix them and make them available.

So, what are you waiting for ?

1
2
3
4
5
git clone git://github.com/francois/mephisto.git mephisto_defensio
cd mephisto_defensio
git checkout multiengine
rake db:bootstrap db:migrate
thin start

NOTE: There is a book called Refactoring to Patterns. I haven’t read it yet, but it seems like a good read.

Leave a Reply

 

Search

A picture of me

I am François Beausoleil, a Ruby on Rails coder. During the day, I work on XLsuite. At night, I am interested many things. Read my biography

Tags

(3) (1) (0) (2) (1) (1) (2) (2) (1) (2) (1) (2) (1) (2) (1) (1) (1) (1) (2) (14) (1) (1) (1) (1) (2) (1) (1) (2) (0) (1) (2) (1) (3) (1) (1) (1) (1) (1) (1) (0) (3) (2) (1) (2) (2) (1) (3) (2) (8) (8) (9) (12) (1) (1) (3) (1) (1) (1) (1) (1) (1) (2) (2) (2) (1) (1) (3) (1) (3) (1) (0) (23) (1) (1) (0) (1) (1) (1) (23) (25) (1) (1) (13) (1) (1) (2) (3) (1) (1) (4) (1) (2) (3) (0) (1) (7) (3) (1) (5) (5) (2) (2) (2) (4) (6) (7) (1) (0) (1) (1) (2) (2) (1) (4) (12) (2) (1) (2) (4) (1) (1) (1) (2) (8) (2) (3) (2) (2) (1) (3) (1) (1)

Links

Projects I work on

Categories

Archives