Quick Puppet Tip

2012-01-24

While testing my Puppet recipes, I usually boot a new instance then apply configuration manually. I have a base configuration, which is extended with other things. I just noticed I could apply a manifest using STDIN:

1 # ( echo "include sysbase" ; echo "include rabbitmq" ) | puppet apply -v

Viva the Unix tradition!

If you do any kind of configuration management, I highly recommend Puppet.

I just learned this morning that Needium will join Seevibes in sponsoring the beer at the Scala Hackaton and SQL Workshop. Both events are on February 2nd. If you haven’t already done so and wish to join us, please register at EventBrite:

Hope to see you there!

I would like to invite you to attend one or two events on February 2nd: Analyzing Twitter Social Data using Scala and Akka Actors and Social Media Metrics using SQL Engines.

Schedule

  • 2:00 PM Doors open
  • 2:30 PM Hackaton — Analyzing Twitter Social Data using Scala and Akka Actors
  • 5:00 PM Beer & Pizza, sponsored by Seevibes
  • 6:30 PM Workshop — Social Media Metrics using SQL Engines
  • 9:00 PM Socializing

The event will be at the Notman House, 51 Sherbrooke W. We have a limited number of places available, so be sure to reserve your seat now. There are two events, both are free to attend, and both are bilingual French and English.

Registration

Setup

Before you come in, please be sure to follow these instructions to get you started:

Scala Hackaton

The Scala Hackaton is an event where you’ll build whatever you wish: word counter, word frequency, hashtag frequency, etc. You get to choose. There will be people more familiar with Scala and Akka at the event which can help you. The first 30 minutes of the event will be reserved for a quick introduction to Scala.

  • Clone the scala-hackaton Git repository, or download a ZIP
  • In the repository / project, run mvn test and mvn exec:java -Dexec.mainClass=seevibes.HelloActor

The two Maven steps are to download all necessary dependencies. If you don’t, you’ll lose a lot of time at the event downloading your dependencies.

If you are unfamiliar with Java and Scala, I strongly recommend you use an IDE, which will help with code completion and syntax awareness. I happen to prefer JetBrains’ IDEA, but this is like Vim vs Emacs. You can use Eclipse if you prefer. If you use Eclipse, be sure to use the Scala IDE extension. In the case of IDEA, download and install the Scala plugin.

If you have any issues, please email me, François Beausoleil, and I’ll help you out. I’ll post updates to this page if common errors pop up.

SQL Workshop

The SQL workshop will be a series of directed examples:

1. I will present a problem, a report or a question we want answered, and some details on how you can accomplish the goal;
2. You will answer the question with the knowledge you have;
3. I’ll ask people to present their solutions;
4. I’ll present my solution and discuss specifics

I have 6 exercises planned out, from 15 to 45 minutes each. The topics range from indexing to joining to using intersections and unions and ending with windowing functions. The workshop is for people who wish to learn more about SQL and how to more effectively use thecapabilites of their favorite SQL engines.

  • Install PostgreSQL 9.1 (latest is currently 9.1.2)
  • Load this PostgreSQL database dump svworkshop.sql.bz2 (315 MiB) in your cluster using the following command:
1 bzcat svworkshop.sql.bz2 | psql

The dump file expects to create a new database named svworkshop using your default user.

Hope to see you there!

I sometimes have to do sysadmin work, such as when I’m the sole technical person on a probject. When I need to keep a service running, I usually turn to daemontools. Daemontools was written by D. J. Bernstein, a mathematician and author of many UNIX utilities.

From daemontools’ home page:

daemontools is a collection of tools for managing UNIX services.

daemontools home page

What this means is daemontools is designed to run, and keep running, a service. Daemontools also includes other utilities which I find useful, such as setuidgid, envdir and multilog,. I searched for an article such as this, but couldn’t find it. If you find a factual error, please let me know immediately. If you have your own best practices, let me know so I can expand on the list.

Read the articles themselves here:

I had lots of difficulties running my tests under IDEA. The exact error message was:

Error running All Tests:
Not found suite class.

Where All Tests was the name of my Run configuration.

I finally ended up with the right incantations. In my POM, I have the following:

 1 <dependencies>
 2     <dependency>
 3       <groupId>junit</groupId>
 4       <artifactId>junit</artifactId>
 5       <version>4.8.1</version>
 6       <scope>test</scope>
 7     </dependency>
 8     <dependency>
 9       <groupId>org.scalatest</groupId>
10       <artifactId>scalatest_2.9.0-1</artifactId>
11       <version>1.6.1</version>
12     </dependency>
13     <dependency>
14       <groupId>org.mockito</groupId>
15       <artifactId>mockito-core</artifactId>
16       <version>1.8.1</version>
17       <scope>test</scope>
18     </dependency>
19 </dependencies>

Then, I had to extend org.scalatest.junit.JUnitSuite for my test classes, like this:

1 import JUnitSuite
2 import org.junit.Test
3 
4 class GivenAnEmptyQueue extends JUnitSuite {
5   @Test def thenItShouldNotHaveAnyElements() {
6     assert(new Queue.empty)
7   }
8 }

Finally, I had to verify that both the JUnit and Scala plugins were enabled and at their latest versions in the IDE itself. After that, I was able to run my tests from within the IDE.

I’m starting in Scala, because Seevibes’ code is in Scala. Scala has a tool named simple-build-tool for managing your projects. sbt is similar to Ruby’s Bundler and Clojure’s Leiningen in that it manages dependencies and helps build a project for you.

Unfortunately, I had problems getting started. After following the Setup instructions, I was consistently getting this error:

 1 Getting org.scala-tools.sbt sbt_2.8.1 0.10.1 ...
 2 
 3 :: problems summary ::
 4 :::: WARNINGS
 5     [NOT FOUND  ] commons-logging#commons-logging;1.0.4!commons-logging.jar (5ms)
 6 
 7   ==== Maven2 Local: tried
 8 
 9     file:///Users/francois/.m2/repository/commons-logging/commons-logging/1.0.4/commons-logging-1.0.4.jar
10 
11     ::::::::::::::::::::::::::::::::::::::::::::::
12 
13     ::              FAILED DOWNLOADS            ::
14 
15     :: ^ see resolution messages for details  ^ ::
16 
17     ::::::::::::::::::::::::::::::::::::::::::::::
18 
19     :: commons-logging#commons-logging;1.0.4!commons-logging.jar
20 
21     ::::::::::::::::::::::::::::::::::::::::::::::
22 
23 
24 
25 :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
26 download failed: commons-logging#commons-logging;1.0.4!commons-logging.jar
27 Error during sbt execution: Error retrieving required libraries
28   (see /Users/francois/Projects/project/boot/update.log for complete log)
29 Error: Could not retrieve sbt 0.10.1

I hit #scala and RSchulz pointed me to ~/.ivy2. After I rm -rf’d ~/.ivy2 and ~/.m2, sbt ran to completion. I was unable to find any mentions of the errors above, so hopefully this may help someone else.

I ran into a little gotcha today, using Sequel. I’m writing an importer, you know the kind: read record from database A, apply some transformations, write to database B. No rocket science required. But, Sequel has a little gotcha that stumped me for a bit. My script looked like this:

 1 DBa = Sequel.connect "..."
 2 DBb = Sequel.connect "..."
 3 
 4 class APerson < Sequel::Model(DBa[:people])
 5 end
 6 
 7 class BContact < Sequel::Model(DBb[:contacts])
 8 end
 9 
10 contacts = Hash.new
11 APerson.all.each do |person|
12   contact = BContact.create(
13     :name        => person.last_name + ", " + person.first_name,
14     :tenant_code => ENV["TENANT_CODE"],
15     :updated_by  => "import",
16     :updated_at  => Time.now)
17   contacts[ person.id ] = contact.contact_id
18 end
19 
20 # Now I can map A's IDs to the correct value in database B, such as
21 # for attaching email addresses, phone numbers, etc.

The Contact model in the B database is declared like this:

1 create_table :contacts do
2   column :tenant_code, :integer,       :null => false
3   column :contact_id,  :serial,        :null => false
4   column :name,        "varchar(240)", :null => false
5 
6   primary_key [:tenant_code, :contact_id]
7   foreign_key [:tenant_code], :tenants
8 end

Notice tenant_code and contact_id are part of the primary key. I don’t write to contact_id because I want the sequence’s value to be returned to me. But I must write my own value to the tenant_code column. I was receiving an exception on the #create call:

 1 /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:1491:in `block in set_restricted': method tenant_code= doesn't exist or access is restricted to it (Sequel::Error)
 2   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:1486:in `each'
 3   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:1486:in `set_restricted'
 4   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:1077:in `set'
 5   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:1456:in `initialize_set'
 6   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:764:in `initialize'
 7   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:134:in `new'
 8   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:134:in `create'
 9   from /Users/francois/.rvm/gems/ruby-1.9.2-p180/gems/sequel-3.25.0/lib/sequel/model/base.rb:248:in `find_or_create'
10   from script/import:65:in `block (2 levels) in <top (required)>'
11 

I was very much stumped, and turned to the excellent documentation. I eventually found my way to #set_all, and changed my code accordingly:

1 APerson.all.each do |person|
2   contact = BContact.new.set_all(
3     :name            => person.last_name + ", " + person.first_name,
4     :tenant_code     => ENV["TENANT_CODE"],
5     :last_updated_by => "import",
6     :last_updated_at => Time.now)
7   contacts[ person.id ] = contact.contact_id
8 end

Even though the Sequel RDoc says #set_all ignores restricted columns, I was still receiving the same exception. I was now doubly stumped, until I found a reference to #unrestrict_primary_key. I added the declaration to BContact and was able to proceed:

1 class BContact < Sequel::Model(DBb[:contacts])
2   unrestrict_primary_key
3 end

You know the drill though: where you import one model, you’ll have more to import shortly. Ruby to the rescue!

1 class Sequel::Model
2   # Open classes win every time!
3   unrestrict_primary_key
4 end

Problem solved!

Sometimes, you need to know what your program’s doing, or how long it’s taking to do something. You could always log to a file, then use a combination of grep, awk and/or wc to gather the statistics yourself, but why bother? There are many tools out there which will do exactly what you want, just use them: Cacti, Graphite or plain-old RRD.

For instance, at Yatter, we need to know how fast our ranking algorithms are running, and we must know how long the ranking takes compared to the number of users and pages we have on hand. Graphing is the perfect solution for that, and Graphite fit the bill just fine for us. But Graphite alone won’t do all that we need: we also needed a way to instrument our code, hence the Counters library:

 1 require "counters"
 2 require "sequel"
 3 
 4 DB = Sequel.connect "jdbc:postgres://127.0.0.1:5432/db"
 5 Counter = Counters::StatsD.new(:url => "udp://127.0.0.1:8125", :namespace => "ranker")
 6 
 7 users = Counter.latency "fetch.users" do
 8   DB[:users].all
 9 end
10 
11 pages = Counter.latency "fetch.pages" do
12   DB[:pages].all
13 end
14 
15 Counter.magnitude "count.users", users.length
16 Counter.magnitude "count.pages", pages.length
17 
18 Counter.latency "ranking" do
19   entropy = 1.0
20   while entropy > MIN_ENTROPY
21     Counter.hit "iteration"
22     # Reduce entropy
23   end
24 end

At the end of the day, we’ll have hierarchical counters in Graphite which will give us all kinds of statistics. From the API above, you can gather that values are stored under hierarchical keys separated by fullstops. If you’re interested in the code, make yourself at home with the Counters GitHub repository.

Counters is certified to run on JRuby 1.6.0 in 1.8 mode, and MRI 1.9.2.

 1 # JRuby 1.6.0 required
 2 require "sequel"
 3 require "jdbc/postgresql"
 4 
 5 DB = Sequel.connect "jdbc:postgresql://127.0.0.1:5432/mydb"
 6 
 7 table = DB[:mytable]
 8 time = Benchmark.measure do
 9   (1..10000).each do |n|
10     table.insert(:a => n, :b => 2*n)
11   end
12 end
13 
14 puts time

How much time do you think this is going to take? This is the most inefficient way to insert many rows to a database. Remember each call to #insert will do a round trip to the database. Even if your round trip time is 1ms, you’ll still pay for 10 seconds of round trip time, time which your program could be doing something much more useful, such as generating revenue (somehow).

Instead, you should use the bulk copy feature of your database engine. In my case, that’s PostgreSQL. Since I’m using JRuby, I have to turn to the JDBC world, but that’s all right: everything has been implemented already, by someone, somewhere. I’ll refer you to the relevant pages:

And the relevant code would be:

 1 # JRuby 1.6.0 required
 2 require "sequel"
 3 require "jdbc/postgresql"
 4 require "java"
 5 
 6 DB = Sequel.connect "jdbc:postgresql://127.0.0.1:5432/mydb"
 7 
 8 time = Benchmark.measure do
 9   DB.synchronize do |connection|
10     copy_manager = org.postgresql.copy.CopyManager.new(connection)
11     stream = copy_manager.copy_in("COPY mytable(a, b) FROM STDIN WITH CSV")
12 
13     begin
14       (1..10000).each do |n|
15         # Don't forget we're streaming CSV data, thus each row/line MUST be terminated with a newline
16         row = "#{n},#{2*n}\n".to_java_bytes
17         stream.write_to_copy(row, 0, row.length)
18       end
19     rescue
20       stream.cancel_copy
21       raise
22     else
23       stream.end_copy
24     end
25   end
26 end
27 
28 puts time

This will execute a single round trip to the database server: you’ll pay the latence cost only once.

On an unrelated note, this is the first time ever I use an else clause on a begin/rescue. If an exception is raised, we want to cancel the copy (the rescue clause), but on the other hand, if nothing is raised, we want to end the copy (the else clause). One or the other must happen, but not both.

If you’re curious what difference bulk copying makes, here are the benchmark results:

10000 INSERT statements
  7.012000   0.000000   7.012000 (  7.012000)

1 COPY FROM STDIN statement
  0.848000   0.000000   0.848000 (  0.848000)

The numbers speak for themselves: 8× faster. Not too shabby, and remember this ratio will simply increase as the number of rows increases.

The README says it all:


You’re knee deep in a debugger session, and you can’t understand why something’s wrong. You wish you could fire up your application against the test database, but sadly, the process which is running the tests is within a transaction, and thus the actual data is opaque. What can you do?

1 # Somewhere deep in your tests
2 test "the frobble touches the widget" do
3   assert_equal 42, frobble.widget_id
4 end

You’ve been on this assert_equal call for the past hour wondering. Frustration’s been mounting, because you don’t understand why the frobble doesn’t touch the widget. Clearly, there’s something wrong with the fixtures, but you can’t understand what it is. Time to fire up the debugger and dump the data:

 1 [814, 823] in test/unit/widget_test.rb
 2    814          frobble.save!
 3    815        end
 4    816
 5    817        test "the frobble touches the widget" do
 6    818          debugger
 7 => 819          assert_equal 42, frobble.widget_id
 8    820        end
 9    821
10    822        test "the widget touched the frobble in turn" do
11    823          assert widget.touched_by_frobble?
12 test/unit/widget_test.rb:819
13 => 819          assert_equal 42, frobble.widget_id
14 (rdb:112)

Since the data_dumper gem is already declared in your Gemfile (if not, declare it, bundle install, then run your tests again), type:

1 (rdb:112) File.mkdir(Rails.root + "dump")
2 (rdb:113) DataDumper.dump(Rails.root + "dump")

Then, quit your failing tests, and from the trusty command line:

1 $ rails console
2 > DataDumper.load(Rails.root + "dump")
3 > exit
4 
5 $ rails server

Any and all data from your test database will be loaded in your development environment. You can now explore your model with your trusty application, to find out what’s really going on.

Search

Your Host

A picture of me

I am François Beausoleil, a Ruby on Rails and Scala developer. During the day, I work on Seevibes, a platform to measure social interactions related to TV shows. At night, I am interested many things. Read my biography.

Top Tags

Books I read and recommend

Links

Projects I work on

Projects I worked on