Using SQS and S3 to decouple image resizing from uploading
Message queues are a nice addition to any programming bag. They decouple the sender from the receiver. Let’s take the usual problem: what to do with images once they are received. We don’t want to show the full size image, so we need to scale & crop them.
In this article, I will show you how to use RightScale gems in this article to decouple the upload request from the actual resizing.
Starting from a fresh Rails application (I’m using 2.0.2), install AttachmentFu:
script/plugin install http://svn.techno-weenie.net/projects/plugins/attachment_fu/ |
Edit config/amazon_s3.yml and put this:
1 2 3 4 5 |
development: bucket_name: amazon-sqs-development-yourname access_key_id: "your key" secret_access_key: "your secret access key" queue_name: amazon-sqs-development-resizer-yourname |
queue_name is new. AttachmentFu does not require this, but we are going to reuse the file from our own code, so better put all configuration in the same place.
Generate a scaffolded Photo model using:
$ script/generate scaffold photo filename:string size:integer content_type:string width:integer height:integer parent_id:integer thumbnail:string |
Edit app/views/photos/new.erb.html and replace everything with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<h1>New photo</h1> <%= error_messages_for :photo %> <% form_for(@photo, :html => {:multipart => true}) do |f| %> <p> <label for="photo_uploaded_data">File:</label> <%= f.file_field :uploaded_data %> </p> <p> <%= f.submit "Create" %> </p> <% end %> <%= link_to 'Back', photos_path %> |
What we did here is simply tell Rails to use a multipart encoded form, and to only provide us with a single file upload field.
Edit app/models/photo.rb and add the AttachmentFu plugin configuration:
1 2 3 4 |
class Photo < ActiveRecord::Base has_attachment :content_type => :image, :storage => :s3 validates_as_attachment end |
Start your server and confirm you can upload a file. No thumbnails were generated as we did not configure any thumbnailing to do. We don’t actually want AttachmentFu to handle that, so we can’t just specify it in the has_attachment call.
To use RightScale’s AWS SQS component, we have to configure it with the access key and secret access key. Add this to the end of the Photo class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
class Photo < ActiveRecord::Base def queue self.class.queue end class << self def queue # This creates the queue if it doesn't exist @queue ||= sqs.queue(aws_config["queue_name"]) end def sqs @sqs ||= RightAws::Sqs.new( aws_config["access_key_id"], aws_config["secret_access_key"], :logger => logger) end def aws_config return @aws_config if @aws_config @aws_config = YAML.load(File.read(File.join(RAILS_ROOT, "config", "amazon_s3.yml"))) @aws_config = @aws_config[RAILS_ENV] raise ArgumentError, "Missing #{RAILS_ENV} configuration from config/amazon_s3.yml file." if @aws_config.nil? @aws_config end end end |
#aws_config is a method that reads the configuration. #sqs is a method that provides access to an instance of RightScale::Sqs, pre-configured with the correct access keys. #queue uses #sqs to get or create a named queue. There’s also an instance version of #queue, to ease our code later on.
Let’s add the request sending:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
class Photo < ActiveRecord::Base def send_resize_request # Don't send a resize request for thumbnails return true unless self.parent_id.blank? params = Hash.new params[:id] = self.id params[:sizes] = Hash.new params[:sizes][:square] = "75x75" params[:sizes][:thumbnail] = "100x" begin queue.push(params.to_yaml) rescue logger.warn {"Unable to send resize request. Error: #{$!.message}"} logger.warn {$!.backtrace.join("\n")} # Don't raise the error so the request goes through. # We don't want the user to see a 500 error because # we can't talk to Amazon. end end end |
Now, this is getting interesting. AttachmentFu knows if the current model is a thumbnail or not by looking at parent_id. If it’s nil, we are the parent, else we are a thumbnail. We do the same thing here.
Then, we setup a couple of parameters to send to the resizer. Notice we send the actual thumbnail sizes in the message itself.
Next, we do the most important part: queue.push. This sends a message string (limited to 256 KiB) to Amazon SQS, and returns. If there is an error, we don’t actually want to prevent the request from completing, so we rescue any exceptions and log them. If you have the ExceptionNotifier plugin installed, this is a good place to log to it.
Now that we have a way to send the resize request, we have to execute it at some point. The controller is not the right place to do it. If you create Photo models from more than one controller, you’re bound to forget to call #send_resize_request. It’s better to do it in an #after_create callback, which we’ll do with a single line:
1 2 3 |
class Photo < ActiveRecord::Base after_create :send_resize_request end |
Next, we have to receive the messages. So, we write a new method in Photo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
class Photo < ActiveRecord::Base class << self def fetch_and_thumbnail messages = queue.receive_messages(20) return if messages.blank? logger.debug {"==> Photo\#fetch_and_thumbnail -- received #{messages.size} messages"} messages.each do |message| params = YAML.load(message.body) photo = Photo.find_by_id(params[:id]) if photo.blank? then # The Photo was deleted before we got a chance to thumbnail it. # We must delete the message, or we'll always get it afterwards. message.delete next end photo.generate_thumbnails(params[:sizes]) message.delete end end end end |
The first thing we do is see if there are any messages. The call to #queue is the helper method we defined earlier on. We ask to receive up to 20 messages at a time. If there were no messages, we simply return.
Then, for each message, we have to process it, so we iterate over each message, retrieving the original parameters Hash. The important thing to do is to delete the message after we have processed it, or else the message will still be visible next time around.
#generate_thumbnails is important, but uninteresting in this discussion.
January 21st, 2008 at 02:50 PM
Hey,
isn’t it too slow, to use an external MQ server (as amazon SQS) for a webapp? It sounds strange for me, I thought SQS normally used for internal S3 apps.
January 21st, 2008 at 09:04 PM
And to be very specific (and perhaps dense), you’re calling `Photo.fetch_and_thumbnail` through `script/runner` via a cron job or something similar?
January 21st, 2008 at 09:46 PM
Stephen, yes, that’s essentially it.
dH, you’re right that using an external message queue server is going to be slow. But I wanted to illustrate the process, not necessarily say “Do it this way, the one true way.”
January 30th, 2008 at 04:31 PM
@dH
It’s not running in the request/mongrel thread, so who cares if there’s a little latency between requests to amazon?
February 2nd, 2008 at 12:10 PM
It seems overkill to use something as conceptually simple as a queue on a server that has all the lag of the internet, and charges money for every access.
Can someone suggest an open source queue project that has equivalent functionality to SQS?
February 17th, 2008 at 02:33 PM
@Don – Apache ActiveMQ is what you need
I think the point of this service as a middle layer is to decouple one process form another and make it safely asynchronous. The SQS system flies when connecting from EC2 servers and because of the way EC2 operates – i.e. a dead image = lost data – its going to be better to do asynchronous processing and rely on the queue system for for increasing fault tolerance. I haven’t done a calc on this yet but I assume its cheaper to use SQS than to keep 2 EC2 servers running another MQ server, and it certainly will reduce complexity in a system because you don;t have to worry about performance, redundancy, load-balancing etc.