1 # JRuby 1.6.0 required
 2 require "sequel"
 3 require "jdbc/postgresql"
 4 
 5 DB = Sequel.connect "jdbc:postgresql://127.0.0.1:5432/mydb"
 6 
 7 table = DB[:mytable]
 8 time = Benchmark.measure do
 9   (1..10000).each do |n|
10     table.insert(:a => n, :b => 2*n)
11   end
12 end
13 
14 puts time

How much time do you think this is going to take? This is the most inefficient way to insert many rows to a database. Remember each call to #insert will do a round trip to the database. Even if your round trip time is 1ms, you’ll still pay for 10 seconds of round trip time, time which your program could be doing something much more useful, such as generating revenue (somehow).

Instead, you should use the bulk copy feature of your database engine. In my case, that’s PostgreSQL. Since I’m using JRuby, I have to turn to the JDBC world, but that’s all right: everything has been implemented already, by someone, somewhere. I’ll refer you to the relevant pages:

And the relevant code would be:

 1 # JRuby 1.6.0 required
 2 require "sequel"
 3 require "jdbc/postgresql"
 4 require "java"
 5 
 6 DB = Sequel.connect "jdbc:postgresql://127.0.0.1:5432/mydb"
 7 
 8 time = Benchmark.measure do
 9   DB.synchronize do |connection|
10     copy_manager = org.postgresql.copy.CopyManager.new(connection)
11     stream = copy_manager.copy_in("COPY mytable(a, b) FROM STDIN WITH CSV")
12 
13     begin
14       (1..10000).each do |n|
15         # Don't forget we're streaming CSV data, thus each row/line MUST be terminated with a newline
16         row = "#{n},#{2*n}\n".to_java_bytes
17         stream.write_to_copy(row, 0, row.length)
18       end
19     rescue
20       stream.cancel_copy
21       raise
22     else
23       stream.end_copy
24     end
25   end
26 end
27 
28 puts time

This will execute a single round trip to the database server: you’ll pay the latence cost only once.

On an unrelated note, this is the first time ever I use an else clause on a begin/rescue. If an exception is raised, we want to cancel the copy (the rescue clause), but on the other hand, if nothing is raised, we want to end the copy (the else clause). One or the other must happen, but not both.

If you’re curious what difference bulk copying makes, here are the benchmark results:

10000 INSERT statements
  7.012000   0.000000   7.012000 (  7.012000)

1 COPY FROM STDIN statement
  0.848000   0.000000   0.848000 (  0.848000)

The numbers speak for themselves: 8× faster. Not too shabby, and remember this ratio will simply increase as the number of rows increases.

Search

Your Host

A picture of me

I am François Beausoleil, a Ruby on Rails and Scala developer. During the day, I work on Seevibes, a platform to measure social interactions related to TV shows. At night, I am interested many things. Read my biography.

Top Tags

Books I read and recommend

Links

Projects I work on

Projects I worked on