06 Mar 2008

When duplication is not duplication

Posted by Jamis on Thursday, March 6

I was looking through some C code today, and stumbled across this lovely little gem:

1
2
3
4
5
tmp = "\"#";
while (*tmp) {
  FD_SET(*tmp, url_encode_map);
  tmp++;
}

Now, be honest. I don’t care how good you are at C, it takes you a few brain cycles to process that and figure out that it is just setting two bits in a bit field. It really should have been written like this:

1
2
FD_SET('"', url_encode_map);
FD_SET('#', url_encode_map);

This raises the question: why wasn’t it? I’ll tell you why:

Programmers have this burning desire to avoid code duplication. We’re taught, almost since the cradle, to abhor duplicated code and to avoid it all cost. Duplicating code is evil, it leads to unmaintainable code, and propogates bugs. Never, ever, do it!!!

Allow me to let you in on a little secret.

Calling the same function twice is NOT duplicating code. Not if the arguments change between calls.

Even calling the same function three times in a row is kosher. Four times, even. At some point, you might want to consider a loop, if the arguments can be determined functionally, but only do so when the list of similar function calls is harder to read and understand than the loop is. This is often when the loop takes fewer lines of code than the function calls do:

1
2
3
4
for (i = 127; i < 256; i++) {
  FD_SET(i, hdr_encode_map);
  FD_SET(i, url_encode_map);
}

There. Had to get that off my chest. Now, back to work.

Posted in Essays and Rants | 17 comments

07 Jan 2008

Never. Ever. Cargo-cult.

Posted by Jamis on Monday, January 7

I was told today on a mailing list that some people have been justifying their coding decisions by saying things like “but that’s how Jamis does it!”

And I was mortified. Because someday a time will come (and likely already has!) when the things I’ve written will be surpassed by a better way, and I will wilt with embarrassment if anyone uses “that’s how Jamis does it” to justify continuing with the antiquated style.

I’m learning, constantly. Every project I undertake teaches me something new. Every programmer I’ve ever worked with has shown me a better way to do things. “How X does it” (for absolutely any mortal value of X) is a moving target, and if you’re blindly basing your designs on something I (or anyone else) wrote a year or two ago, then you should step cautiously.

Never. Ever. Cargo-cult. If someone writes about something that you find clever, understand why you think it is clever. If someone preaches a better algorithm, understand why the algorithm is better. And if someone asks why you do something a certain way, argue it on it’s own merits, without resorting to an appeal to someone’s (supposed) authority. If you can argue that something is better than something else solely by contrasting it’s pros and cons against the alternative, you’ll be taken much more seriously. And you’ll have a much better chance of recognizing a better way when it is presented to you.

I’ll say it again. Never. Ever. Cargo-cult. Ever.

That said, I’ve been very, very quiet lately, and I apologize. I’ve been rethinking some priorities and experimenting with some new interests. Also, I’ve been trying to finish up (finally) Net::SSH v2 and Net::SFTP v2. Hopefully this year I’ll climb out of the hole I dug for myself last year and have more to blog about again.

Posted in Essays and Rants | 23 comments

23 Feb 2007

Method visibility in Ruby

Posted by Jamis on Friday, February 23

A common point of confusion to even experienced Ruby programmers is the visibility of public, protected, and private methods in Ruby classes. This largely stems from the fact that the behavior of those keywords in Ruby is different from what you might have learned from Java and C.

To demonstrate these differences, let’s set up a little script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Foo
  def a; end

  # call 'a' with explicit 'self' as receiver
  def b; self.a; end

  # call 'a' with implicit 'self' as receiver
  def c; a; end
end

def safe_send(receiver, method, message)
  # can't use 'send' because it bypasses visibility rules
  eval "receiver.#{method}"
rescue => e
  puts "#{message}: #{e}"
else
  puts "#{message}: succeeded"
end

visibility = ARGV.shift || "public"
Foo.send(visibility, :a)

foo = Foo.new
safe_send(foo, :a, "explicit receiver       ")
safe_send(foo, :b, "explicit 'self' receiver")
safe_send(foo, :c, "implicit 'self' receiver")

Basically, the script just creates a class “Foo” with three methods: a, which we’ll invoke directly with an explicit, non-self receiver; b, which invokes a with self as receiver, and c, which invokes a with an implicit receiver of self. We’ll use the safe_send method to call each of those methods and log the result.

So, first: the public keyword. In Ruby, public means that the method may be invoked just about any way you please; in technical terms, the receiver of the message may be either explicit (“foo.bar”), self (“self.bar”) or implicit (“bar”).

1
2
3
4
$ ruby demo.rb public
explicit receiver       : succeeded
explicit 'self' receiver: succeeded
implicit 'self' receiver: succeeded

The protected keyword puts a straitjacket around the method. Any method declared protected may only be called if the receiver is self, explicitly or implicitly. (Update: protected methods may actually be called any time the receiver is of the same class as ‘self’...and an explicit self as receiver is just a specific case of that. Modifying the script to demonstrate this condition is left as an exercise for the reader.)

1
2
3
4
$ ruby demo.rb protected
explicit receiver       : protected method `a' called for #<Foo:0x3fc18>
explicit 'self' receiver: succeeded
implicit 'self' receiver: succeeded

Lastly, the private keyword is the tightest setting of all. A private method cannot be called with an explicit receiver at all, even if that receiver is “self”.

1
2
3
4
$ ruby demo.rb private
explicit receiver       : private method `a' called for #<Foo:0x3fc18>
explicit 'self' receiver: private method `a' called for #<Foo:0x3fc18>
implicit 'self' receiver: succeeded

Note that, unlike languages such as Java, inheritance plays absolutely no part in determining method visibility in Ruby. Subclasses can access both protected and private methods of the superclass without trouble, so long as they abide by the rules laid out above.

The difference between protected and private is very subtle, as you can see, which explains why protected is rarely used by most Rubyists. If it is used at all, it is generally as a convention, to document methods that are internal to the class, but which lie closer to the public interface than others. In Rails, for instance, you might declare your controller filter methods and model validation methods as “protected” (because the framework will call those methods) and reserve the “private” designation for those methods that are only ever called from within your own model or controller code.

Posted in Essays and Rants | 12 comments

26 Jan 2007

Scaffolding's place

Posted by Jamis on Friday, January 26

Scaffolding, scaffolding, scaffolding… In a recent article I said that “I have lots of issues with scaffolding”. Why would that be? I mean, what’s not to like about scaffolding, really? It’s all about rapid application development, and prototyping, and getting real, isn’t it? Isn’t it?? WELL????

Specifically, the issue I have with scaffolding is this: it puts the emphasis on the application’s model, instead of the user interface. It assumes that you know the domain of the application before you know how the user is going to interact with it. It assumes that the user interface can successfully follow your conjured domain. It assumes, frankly, far too much.

Now, don’t get me wrong: as a pedagogical aid, scaffolding is great. It lets newcomers to Rails quickly get a skeletal app up and running, giving them a platform from which to beginning learning Rails without stumbling over too many details. That’s great. But scaffolding is not for building real applications.

Your users don’t care about the data model. Face it, they just don’t care. They will never interact with the data model. They will never interact with your carefully crafted schema. They interact with the UI. Therefore, it is very important that when you start an application, you start with what the users will care about. Get the UI right. Sketch it out, mock it up, get it real. Once you have a “real” UI to work from, it is amazing how much it can tell you about the application’s domain.

A single screen can tell you more about what models you need and the relationships between them than a hundred-page written specification. A picture really is worth a thousand words. And the remarkable thing is this: the model you infer from the UI is often not what you would have created had you gone for the model first.

Furthermore, working with scaffolding makes it nigh impossible to do test-driven development, whereas working from a UI makes it very, very easy. With scaffolding, what tests would you write first? What is the behavior your want your final product to have? That’s not a very easy question to answer when all you know is the set of models you think your application needs.

When working from a UI, though, you can look at all the elements and data on the page and immediately start seeing what tests you need. “If the user is an administrator and they view the page, they ought to see this link, but otherwise that link is hidden.” BAM, instant test case. And you immediately know you’re going to need (at the very least) “users”, some of whom can be “administrators”.

I’ll say it again, scaffolding is a great learning tool, like training wheels or parachuting in tandem with an instructor. But when you do the real thing, those training wheels come off. You jump from the plane alone. You design the UI first.

Posted in Essays and Rants | 23 comments

10 Nov 2006

Just say "no" to certification

Posted by Jamis on Friday, November 10

Pat Eyler is looking into designing a certification program, in conjunction with a university course. This really got me thinking.

As a general rule, I believe certifications are a joke. Plain and simple. When I was at BYU, and the mandate came from the suits that we had to drop everything and become Java certified, I saw firsthand what a joke it was. The very idea that a test can, in any way, imply competence is laughable.

Now, I know and respect Pat. He’s got more planned for this than just a test, and that’s great. I certainly commend the idea of a Ruby course. But I have to plead against the introduction of “certification.”

Can certification produce competent programmers? I say “no”. If you are certified and are competent, then you were competent before you were certified. The two have no relation, except insofar as the certification process might ignite the passion of a competent programmer to improve themselves. The problem is that you don’t have to be passionate or competent to take and pass these tests. You just have to be good at memorizing and cargo culting.

Certifications are used primarily by ignorant decision makers as a discriminator. Thus, if someone wants to get noticed by said decision makers, they need to take and pass the test. It’s certification for certification’s sake. This encourages anything but learning. It encourages large-scale mediocrity, caused by people memorizing exactly what the test demands, and nothing more. It encourages learning out of context. It encourages cargo culting, rather than original thinking.

And what happens to the community when this happens? It becomes diluted. The passion gets leeched away. The language becomes inundated by people with little concern for the language itself, or for what they will use the language. They have little care for the community, except insomuch as the community can help them solve their own problems. They take. They demand. They question. They do not give. And the community suffers.

So please, Pat, and anyone else out there that is contemplating a certification program of any sort: don’t do it. By all means, educate, teach, spread the word, and encourage passionate programmers. But don’t certify.

Posted in Essays and Rants | 21 comments

07 Nov 2006

Don't be afraid of harnessing SQL

Posted by Jamis on Tuesday, November 7

Even after ten years of working with SQL, I still find myself tickled by how powerful it is, in spite of its warts.

In Basecamp, users can create to-do list “templates”. Each template is essentially just a name, an optional description, and a bunch of items. Once defined, users can create new to-do lists based on one of these templates.

We used to do this entirely via the ActiveRecord helper methods. First, we’d create a new list, and then creating the items for the list one at a time, for each item in the template. It looked something like this:

1
2
3
4
5
6
7
8
9
10
11
class TodoListTemplate < ActiveRecord::Base
  has_many :todo_item_templates

  def instantiate
    list = TodoList.create(:name => name, :description => description)
    todo_item_templates.each do |item|
      list.todo_items.create :content => item.content
    end
    list
  end
end

This worked, but was very inefficient. It results in a lot of SQL statements being sent down the pipe, mostly because we’ve got some before_create hooks and observers set up that perform work for each new to-do item that is created. As our traffic grew, we started running into deadlock issues. All those hooks and observers, so convenient at the time, were now wreaking havoc on the database.

The problem was easily solved. First of all, a little thought helped me see that those hooks and observers were either not needed in this case, or could be done slightly differently. Secondly, instead of copying each item template to an item, one at a time, we could do it all in SQL, as a single statement. Here’s more or less how we rewrote it:

1
2
3
4
5
6
7
8
9
10
11
12
13
def instantiate
  list = TodoList.create(:name => name, :description => description)

  TodoItem.connection.insert <<-SQL, "Populating items"
    INSERT INTO todo_items (todo_list_id, content, position, created_at)
      SELECT #{list.id}, content, position, UTC_TIMESTAMP()
        FROM todo_item_templates
       WHERE todo_list_template_id = #{id}
  SQL

  list.todo_items.reset
  list
end

Basically, the INSERT takes the associated SELECT statement, and inserts the results of each returned row into the todo_items table. Not only is this blazing fast, but it is much nicer to the database.

Once everything has been inserted, we call todo_items.reset, to force the todo_items association on the list to be unloaded, and then we return the list.

Your own situation may require more or less logic than this. You may even be completely fine doing everything via ActiveRecord. But if you find your application beginning to flounder in places where you are doing lots of database queries, consider rethinking those areas to consolidate some of that work.

Don’t be afraid of harnessing SQL.

I’ll probably begin publishing these kinds of “best practices” articles to The Rails Way, instead of to this blog. If you want to follow along, be sure and subscribe to that feed, too.

Posted in Essays and Rants | 16 comments

28 Oct 2006

Prolog in Ruby

Posted by Jamis on Saturday, October 28

About a month ago, I began experimenting with Prolog. (If you’re a Mac user wanting to tinker with Prolog, I’d recommend SWI-Prolog. I couldn’t get any other prolog implementation to build or run on my MacBook Pro.) I’m certainly not an expert now, and I’m not leaving Ruby for Prolog, but I did learn enough to appreciate the power of logic programming. (Curiously, I found that logic programming is very similar to functional programming in some respects.)

How timely, then, was Mauricio Fernandez’s article today about Logic Programming in Ruby.

It is cool stuff, to be sure! Prolog, in Ruby. You could just drop Mauricio’s library into your app and have a logic engine available for you, using a Prolog-esque DSL. (A previous article on a similar topic, but which only described a possible DSL, is here.)

That Prolog DSL in Ruby is an excellent first step. It opens all kinds of doors. The next step, I think, is a way to do logic programming in Ruby, using a Rubyish syntax. Prolog is nice and all, and its syntax (intentionally) mirrors the mathematic syntax of formal logic, but admit it: unless you’re familiar with that formal syntax, the meaning of a Prolog program is about as transparent as a two-year-old Perl program. Consider the following example from Mauricio’s article:

1
2
3
4
5
6
7
8
9
10
11
sibling[:X,:Y] <<= [ parent[:Z,:X], parent[:Z,:Y], noteq[:X,:Y] ]
parent[:X,:Y] <<= father[:X,:Y]
parent[:X,:Y] <<= mother[:X,:Y]

father["matz", "Ruby"].fact
mother["Trude", "Sally"].fact
father["Tom", "Sally"].fact
father["Tom", "Erica"].fact
father["Mike", "Tom"].fact

query sibling[:X, "Sally"]

Wouldn’t it be cool if you could define that with something closer to natural language? (Natural language, I know, introduces all kinds of ambiguities, which is why mathematicians use a more rigorous formal language for describing things like logic, but just follow along for a minute.) The following has not been implemented (at least by me), but wouldn’t it be nifty if it worked?

1
2
3
4
5
6
7
8
9
10
11
12
13
:X.sibling_of(:Y).if :Z.parent_of(:X).and(:Z.parent_of(:Y)).and(:X.noteq(:Y))
:X.parent_of(:Y).if :X.father_of(:Y)
:X.parent_of(:Y).if :X.mother_of(:Y)

"matz".father_of "Ruby"
"Trude".mother_of "Sally"
"Tom".father_of "Sally"
"Tom".father_of "Erica"
"Mike".father_of "Tom"

# returns an Enumerable of the possible solutions
result = :X.sibling_of("Sally").solutions
result.each { |solution| p solution }

Maybe that’s too verbose, or too much syntax. I’m sure it’s a little naive. (the Towers of Hanoi example, for instance, is hard to convert to this kind of syntax.) It’s pretty much off the top of my head, and could no doubt be made better. Nevertheless, I think it reads more naturally than Prolog, and feels more like Ruby.

Perhaps I’ll tinker on this…I’ve got at least one side project that could use a logic engine, and I’d love to use one with a clean, Ruby-esque syntax. If anyone beats me to the punch, though, I won’t be disappointed.

Posted in Essays and Rants | 6 comments

23 Oct 2006

Indexing for DB performance

Posted by Jamis on Monday, October 23

Isn’t Rails great? It makes interacting with your database so easy, and removes almost every vestige of SQL from the development process. You can build and mutate your entire database schema (thanks to ActiveRecord::Migration and ActiveRecord::Schema), go crazy shoving data into your database (with ActiveRecord::Base.create and friends) and query your data in a very friendly Ruby DSL (ActiveRecord::Base#find).

Wonderful! But I think most of us have experienced the puzzlement and frustration of wondering why our application, which ran so beautifully during testing and for the first few days or weeks after launch, is suddenly running slower and slower, and why our database is being so incredibly overworked. What happened?

Chances are, you forgot to add indexes to your tables. Rails won’t (and, honestly, can’t) do it for you. In fact, Rails doesn’t even try to tell you where those indexes might be needed. And without those indexes, the only recourse the database has when fulfilling your query is to do a “full table scan”, basically looking at each row in the table, one at a time, to find all matching records. That’s not too bad when there are only a few tens (or even thousands, on a fast machine) of rows, but when you starting getting tens of thousands, hundreds of thousands, or even millions of rows, just imagine how hard your database has to work to satisfy those queries!

So you may be wondering, “alright, I need indexes…how do I know what indexes to create?”

Here are a few general tips. My experience is primarily with MySQL, so that’s where my advice is directed, but I believe most of these tips apply regardless of your DBMS:

  • If you have a foreign key on a table (or, phrased another way, you have a belongs_to, has_many, has_one, or has_and_belongs_to_many association on a model), then you almost certainly need to add an index for it, because any time you access those associations, Rails is generating SQL under the covers that queries based on those foreign keys.
  • If you find yourself frequently doing queries on a non-foreign-key column (like user_name or permalink), you’ll definitely want an index on that column.
  • If you frequently sort on a column or combination of columns, make sure the index that is being used for the query includes those sort columns, too (if at all possible). Indexes store the data in sorted order, so if your index includes the sort column, the database can return the sorted data at almost no extra cost.
  • Many databases (like MySQL, or Postgres prior to 8.1) will only use a single index per table, per query, so make sure you have indexes defined for the column combinations that you will query on frequently. A common mistake is to define an index on “user_name” and an index on “account_id”, and then expect the database to use both indexes to satisfy a query that references both columns. (Some databases will use both indexes, though; be sure and understand how your DBMS uses indexes.)
  • Don’t go crazy defining indexes. It is tempting to just add an index on every column that could conceivably be queried on, just to preemptively destroy any possible DB performance problems that may arise. This is bad. Too many indexes can be just as bad as too few, since the DB has to try and determine which of the myriad indexes to use to satisfy a particular query. Also, indexes consume disk space, and they have to be kept in sync every time an insert, delete, or update statement is executed. Lots of indexes means lots of overhead, so try to strike a good balance. Start with only the indexes you absolutely need, and try to use only those. As problem queries surface, see if they can be rewritten to use existing indexes, and only if they can’t should you go ahead and add indexes to fix them.
  • EXPLAIN (MySQL) or ANALYZE (Postgres) (or whatever means your DB provides) are your best friends. Get to know them. Learn how to read their output. They will tell you what indexes (if any) a query will use, and how the database expects to be able to fulfil the query. It is a good idea to play with these commands during testing, to try and locate problem spots before they become problems. Note, though, that the number of rows in a table can affect how the database chooses indexes, so just because your query looks fine with only a handful of test rows in the database, don’t expect it to perform well when there are thousands of rows. In a perfect world, you could test your app with a large corpus of real data. In an imperfect world, you just have to make do.

In short, know your database. As convenient as ActiveRecord makes things, never assume you can get along with zero knowledge of SQL and how your database will work. Find a good book about your DBMS of choice. Read up on it. Take the time to educate yourself—it will pay off handsomely in the long run.

Posted in Essays and Rants | 17 comments

18 Oct 2006

Skinny Controller, Fat Model

Posted by Jamis on Wednesday, October 18

When first getting started with Rails, it is tempting to shove lots of logic in the view. I’ll admit that I was guilty of writing more than one template like the following during my Rails novitiate:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<!-- app/views/people/index.rhtml -->
<% people = Person.find(
      :conditions => ["added_at > ? and deleted = ?", Time.now.utc, false],
      :order => "last_name, first_name") %>
<% people.reject { |p| p.address.nil? }.each do |person| %>
  <div id="person-<%= person.new_record? ? "new" : person.id %>">
    <span class="name">
      <%= person.last_name %>, <%= person.first_name %>
    </span>
    <span class="age">
      <%= (Date.today - person.birthdate) / 365 %>
    </span>
  </div>
<% end %>

Not only is the above difficult to read (just you try and find the HTML elements in it), it also completely bypasses the “C” in “MVC”. Consider the controller and model implementations that support that view:

1
2
3
4
5
6
7
8
# app/controllers/people_controller.rb
class PeopleController < ActionController::Base
end

# app/models/person.rb
class Person < ActiveRecord::Base
  has_one :address
end

Just look at that! Is it really any wonder that it is so tempting for novices to take this approach? They’ve got all their code in one place, and they don’t have to go switching between files to follow the logic of their program. Also, they can pretend that they haven’t actually written any Ruby code; I mean, look, it’s just the template, right?

For various reasons, though, this is a very, very bad idea. MVC has been successful for many reasons, and some of those reasons are “readability”, “maintainability”, “modularity”, and “separation of concerns”. You’d like your code to have those properties, right?

A better way is to move as much of the logic as possible into the controller. Seriously, isn’t that what the controller is for? It is supposed to mediate between the view and the model. Let’s make it earn its right to occupy a position in our source tree:

1
2
3
4
5
6
7
8
9
10
11
<!-- app/views/people/index.rhtml -->
<% @people.each do |person| %>
  <div id="person-<%= person.new_record? ? "new" : person.id %>">
    <span class="name">
      <%= person.last_name %>, <%= person.first_name %>
    </span>
    <span class="age">
      <%= (Date.today - person.birthdate) / 365 %>
    </span>
  </div>
<% end %>
1
2
3
4
5
6
7
8
9
# app/controllers/people_controller.rb
class PeopleController < ActionController::Base
  def index
    @people = Person.find(
      :conditions => ["added_at > ? and deleted = ?", Time.now.utc, false],
      :order => "last_name, first_name")
    @people = @people.reject { |p| p.address.nil? }
  end
end

Better! Definitely better. We dropped that big noisy chunk at the top of the template, and it’s more immediately obvious what the structure of the HTML file is. Also, you can see by reading the controller code roughly what kind of data is going to be displayed.

However, we can do better. There’s still a lot of noise in the view, mostly related to conditions and computations on the model objects. Let’s pull some of that into the model:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# app/models/person.rb
class Person < ActiveRecord::Base
  # ...

  def name
    "#{last_name}, #{first_name}"
  end

  def age
    (Date.today - person.birthdate) / 365
  end

  def pseudo_id
    new_record? ? "new" : id
  end
end
1
2
3
4
5
6
7
<!-- app/views/people/index.rhtml -->
<% @people.each do |person| %>
  <div id="person-<%= person.pseudo_id %>">
    <span class="name"><%= person.name %></span>
    <span class="age"><%= person.age %></span>
  </div>
<% end %>

Wow. Stunning, isn’t it? The template is reduced to almost pure HTML, with only a loop and some simple insertions sprinkled about. Note, though, that this is not just a cosmetic refactoring: by moving name, age and pseudo_id into the model, we’ve made it much easier to be consistent between our views, since any time we need to display a person’s name or age we can simply call those methods and have them computed identically every time. Even better, if we should change our minds and decide that (e.g.) age needs to be computed differently, there is now only one place in our code that needs to change.

However, there’s still a fair bit of noise in the controller. I mean, look at that index action. If you were new to the application, coming in to add a new feature or fix a bug, that’s a lot of line noise to parse just to figure out what is going on. If we abstract that code into the model, we can not only slim the controller down, but we can effectively document the operation we’re doing by naming the method in the model appropriately. Behold:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# app/models/person.rb
class Person < ActiveRecord::Base
  def self.find_recent
    people = find(
      :conditions => ["added_at > ? and deleted = ?", Time.now.utc, false],
      :order => "last_name, first_name")
    people.reject { |p| p.address.nil? }
  end

  # ...
end

# app/controllers/people_controller.rb
class PeopleController < ActionController::Base
  def index
    @people = Person.find_recent
  end
end

Voila! Looking at PeopleController#index, you can now see immediately what is going on. Furthermore, in the model, that query is now self-documenting, because we gave the method a descriptive name, find_recent. (If you wanted, you could even take this a step further and override the find method itself, as I described in Helping ActiveRecord finders help you. Then you could do something like Person.find(:recent) instead of Person.find_recent. There’s not a big advantage in that approach in this example, so it mostly depends on what you prefer, esthetically.)

Be aggressive! Try to keep your controller actions and views as slim as possible. A one-line action is a thing of wonder, as is a template that is mostly HTML. It is also much more maintainable than a view that is full of assignment statements and chained method calls.

Another (lesser) nice side-effect of lean controllers: it allows respond_to to stand out that much more, making it simple to see at a glace what the possible output types are:

1
2
3
4
5
6
7
8
9
10
11
12
# app/controllers/people_controller.rb
class PeopleController < ActionController::Base
  def index
    @people = Person.find_recent

    respond_to do |format|
      format.html
      format.xml { render :xml => @people.to_xml(:root => "people") }
      format.rss { render :action => "index.rxml" }
    end
  end
end

Give all this a try in your next project. Like adopting RESTful practices, it may take some time to wrap your mind around the refactoring process, especially if you’re still accustomed to throwing lots of logic in the view. Just be careful not to go too far; don’t go putting actual view logic in your model. If you find your model rendering templates or returning HTML or Javascript, you’ve refactored further than you should. In that case, you should make use of the helper modules that script/generate so kindly stubs out for you in app/helpers. Alternatively, you could look into using a presenter object.

Posted in Essays and Rants | 35 comments

29 Sep 2006

D&D, Knowledge bases, and Prolog (oh, my!)

Posted by Jamis on Friday, September 29

I only get to tinker on my D&D DSL (as mentioned in 1d6 more reasons to love Ruby) a very little bit each week—maybe an hour or two, if I’m lucky. This means the project is moving ahead with excruciating slowness, but it also gives me plenty of time to think about the roadblocks I face.

My current obstacle is how best to represent dependencies. For instance, certain feats are only available to characters who have met specific prerequisites. Likewise, prestige classes have various lists of requirements, as do certain magic items. Additionally, the list of goal conditions that the user requests of the utility (“I want a mid-level male elven wizard who can cast Fireball”) is essentially a list of dependencies, too.

I did manage to implement a simple tree for representing some kinds of dependencies. It can take the prerequisites for the Archmage prestige class and apply them to a character, telling me not only whether the character is eligible for the class, but if not what the character still requires in order to become eligible. It’s pretty slick, though hardly something to brag about at this point.

The last couple of weeks have seen me pondering over how to represent another kind of dependency; that is, the parameterized dependency, and especially a chain of parameterized dependencies, where each link in the chain has the same parameter. Consider the case of the Greater Weapon Specialization feat. You have to select a weapon that you already have the Weapon Specialization and Greater Weapon Focus feats for. Weapon Specialization and Greater Weapon Focus both require the Weapon Focus feat, and Weapon Focus requires (among other things), Weapon Proficiency in the weapon of choice. The chain from Greater Weapon Specialization to weapon proficiency requires that each link reference the same weapon; weapon proficiency with a dagger won’t make you eligible for Weapon Focus with a longsword.

It’s one thing to evaluate this chain when the parameter is known. If I want to know if a character is eligible for Greater Weapon Specialization (Longsword), then I know that they have to have the requisite feats with the longsword as well. However, sometimes I need to ask “what feats is the character eligible for right now?” In that case, I can see that the character has weapon proficiency with a particular set of weapons, which implies that the character may be eligible for Weapon Focus in any of those weapons as well. Even trickier is the case of a prestige class that simply says “must have the Greater Weapon Specialization feat”, but doesn’t require a specific weapon. In that case, when I ask whether or not the character is eligibile for the prestige class, I basically have to use a variable for the feat’s parameter and then bind it, at the end, to the set of all weapons that the character might be able to use to eventually meet that requirement.

Ah, my head spins!

However, as I was pondering all of this, I kept getting a little ping from my university memories. Something I studied (and promptly forgot) 10 years ago was trying to tell me it was now relevant…

Enter automated theorem proving. As I begin researching and remembering the hours I spent on my homework and programming assignments, the concepts of Resolution and Unification came flooding home. I actually really enjoyed that class (which is probably why I remembered anything at all about it), even though I was sure I would never ever be doing anything with that knowledge.

For about 10 years, I was right. It was useless data stored in my brain.

But suddenly, it was relevant. How? Well, what my NPC generator needs to be is a knowledge base of all the facts and relationships between the various data in the system. Generating a character is (essentially) a query against the knowledge base—”has this character met this goal?” The knowledge base then needs to come back and either say “yes” (in which case the goal is met), or “no” (in which case the response includes the actions that need to be taken to help the character achieve the goal). Revelation!

With that in mind, I finally decided it was time to learn Prolog. It’s been one of those languages on my “huh, maybe I ought to look at that someday” list, but now it actually has relevance to something I want to accomplish. Mostly, I only want to use Prolog to test my ideas, and to prototype the NPC generator. I still love Ruby and think I could make a killer DSL for this in Ruby, but we’ll see what happens.

So far, all I’ve managed to do in Prolog is hard code a bunch of assertions that define a genealogy database, along with some rules that I can use to ask things like “who are the grandparents of this person”. It’s fun, and I’m looking forward to delving further in. I’m especially excited to see how far I can apply this to my original problem domain: random generation of gaming characters with some (potentially arbitrary) set of constraints.

Posted in Essays and Rants | 11 comments

27 Sep 2006

1d6 more reasons to love Ruby

Posted by Jamis on Wednesday, September 27

Like any self-respecting geek, I can claim countless hours spent poring over Dungeons & Dragons manuals. After high school I more-or-less stopped gaming (for various reasons), but I rediscovered the game shortly after the 3rd edition came out. About five years ago (coincidentally about the same time that I discovered Ruby) I actually wrote a suite of utilities to aid Dungeon Masters in creating random non-player characters, treasure hoards, towns and cities, and dungeon maps. In those days, C was my language of choice, so all of those were done “the hard way”. (But boy, was it ever fun!)

Now, looking back (and with a bit of experience with dynamic langauges under my belt) I’ve been wondering how I might have done those generators differently. Most notably, the NPC generator uses all hard-coded data and rules, which makes it quite difficult to extend with new data and rules as more D&D supplements are published. (In fact, it is so hard to extend that I haven’t touched it in about 4 years.)

Well, Ruby is all dynamic and stuff, right? Domain-specific languages, etc, etc?

Right. So I began tinkering. I figured if I could come up with DSL that would let me represent the data for the D&D game, the rest should (more or less) fall into place.

This article isn’t about that DSL, though. That particular DSL is still under construction, as I tinker on it a little bit at a time. However, one particular aspect of that DSL has matured nicely, and I wanted to share.

Just a quick aside for the uninitiated: dice in roll-playing games are described both by how many dice you need to roll, and by the number of sides of the die you’re rolling. Thus, “2d4” means “roll two 4-sided dice”, and “4d8” means “roll four 8-sided dice”.

I think every gamer-programmer in existence has written a “dice roller” app. I’m embarrassed to admit that I’ve done it myself. They are ridiculously simple to write, and all but impossible to use at the gaming table. Face it: real, physical dice are where it’s at.

However, in my DSL, I wanted to be able to define things like hit-dice for character classes, or starting wealth, or the rules for height/weight for the different races, all of which are described in terms of dice rolls.

Originally, I tried defining hit-dice like this:

1
2
3
4
character_class :wizard do
  hit_die 4
  ...
end

Sadly, that didn’t scale well to specifying things like the starting wealth of a character, which is often defined in terms like “3d4 * 10”. My DSL wasn’t rich enough! I considered just using strings ("3d4*10") and parsing them on demand to determine the dice to roll. Yucky. Then, I considered introducing a “dice” helper method (dice(3,4)*10). Also yucky. I needed a way to represent dice, using the language that gamers use. My first attempt was as follows:

1
2
3
4
5
6
7
class Integer
  def d4
    sum = 0
    times { sum += (rand(4) + 1) }
    sum
  end
end

Using the above, I could say things like 3.d4 and have it return the simulated result. Slick! In my DSL, I could now do:

1
2
3
4
5
character_class :wizard do
  hit_die 4
  gold { 3.d4 * 10 }
  ...
end

Here, gold represents the starting wealth of a new wizard. The intention is that when a new character is created, the gold block is evaluated to determine the starting wealth.

All well and good, except there is now this inconsistency with how the hit dice are described. I could describe hit dice the same way, but then I lose the ability to keep track of what the hit die actually was—I only know how many hit points are gained at each level. (I may appear to be splitting the proverbial hair rather fine here, but trust me, to a DM, the difference matters.)

So, the second revision became something like this:

1
2
3
4
5
class Integer
  def d4
    Dice.new(self, 4)
  end
end

The 3.d4 invocation now returns a Dice instance (described below). This new object lets me encapsulate all kinds of nifty functionality. For instance, I can represent 3.d4 * 10 and 1.d8 + 1 and so forth, because I can override the multiplication and addition operators on the Dice class. Using that, I can unify all of my dice-references in the DSL:

1
2
3
4
5
character_class :wizard do
  hit_die 1.d4
  gold    3.d4 * 10
  ...
end

Then, in the code that actually generates an NPC using this data, I can do:

1
2
3
character.hit_points += cclass.hit_die.roll
character.hit_dice += cclass.hit_die
character.gold += cclass.gold.roll

Awesome! Things are looking good. However, ability score generation (for determining how smart, strong, wise, etc. a character is) uses some tricky rules, like “roll 4d6 and discard the lowest, summing the rest”. Well, this kind of rule becomes easy to encapsulate with the Dice class:

1
2
3
4
5
# the "long" way
score = 4.d6.to_a.sort.last(3).inject(0) { |n,v| n + v }

# after encapsulating the above in a "best" method
score = 4.d6.best(3)

(Note that Dice#to_a returns an array containing the result of each rolled die. It comes in very handy!)

So, conclusion: Ruby is awesome. Comparing this to the mess of dice rolling routines in my C-implemented utilites, this DSL is wonderful.

Below, you’ll find the link to the Dice implementation I’m using, but I’d encourage you to try implementing your own before you go peeking—it’s quite fun! Consider implementing the following interface:

  • Dice#*(n): create a new dice instance that represents self multiplied by some integer value.
  • Dice#+(n): create a new dice instance that represents self incremented by some integer value.
  • Dice#roll: roll the dice and return the result. Consider making it return an integer or an array, depending on an optional parameter.
  • Dice#best(n): return the sum (or array) of the best n dice after rolling
  • Dice#max: return the highest possible value the dice object could return
  • Dice#min: return the lowest possible value the dice object could return
  • Dice#average: return the average value of the dice
  • Dice#to_s: return a string that represents the dice in a nice, readable format (like “4d6+2”).

Once you’ve got your Dice object, try to monkeypatch Integer to give you the nice “4.d6” DSL I described, for each of the standard die types (d4, d6, d8, d10, d12, and d20).

Finally, my implementation: dice.rb

Enjoy!

Posted in Essays and Rants | 19 comments

20 Apr 2006

Writing Domain Specific Languages

Posted by Jamis on Thursday, April 20

I received an email from Erik Kastner recently, in which he asked me, “How do you get to the point where you are writing Domain Specific Languages?”

I had never really thought critically about the process of writing a DSL. It’s like, if someone were to ask you, “how do you get to the point where you are programming computers?” For me, at least, it was something I just gradually started playing with, a little at a time. I certainly don’t consider myself an expert on the topic, but what follows are some of my thoughts regarding DSL creation.

On the technical end, the trick to writing DSL’s in Ruby is really knowing what you can and can’t do with Ruby’s metaprogramming features. For instance, how would you:

  • write a method that works just like attr_reader?
  • write a cattr_reader method, which worked just like attr_reader, but dealt at the Class level instead of the instance level?
  • write a method like Array#each?
  • create a mixin like Enumerable that provided similar functionality, simply based on the existance of #each?

The fascinating thing is that, in my experience, most well-written Ruby programs are already a DSL, just by nature of Ruby’s syntax. Symbols, blocks, optional parentheses around parameters—these all go a long way toward making Ruby programs read naturally. Additionally, a well-designed application encapsulates its problem domain, which also just happens to be a good metric for determining the effectiveness of a DSL. A DSL can be thought of as (and many cases, really is) an API for your application.

As with any interface, GUI or otherwise, mockups are critical in the design phase. How else will you know what you want to implement? I’ve found that when I’m wanting to write a DSL it helps to mock it up. Just as I would throw together some HTML to mock up a new web application, I will throw together a simple “mock.rb” file that contains what I would like the DSL to look like. It can even be helpful to disregard limits of Ruby syntax—make it look like what you would most prefer, in an ideal world, and when it is done, strip it back based on Rails syntax limitations. Once I’ve got something that reads well and seems to cover all the bases, I’ll convert that mockup into unit tests, and then start implementing it from there.

For example, suppose you were designing a DSL to represent meal recipes. Ideally, it might look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
PBJ Sandwich

ingredients:
- two slices of bread
- one heaping tablespoon of peanut butter
- one teaspoon of jam

instructions:
- spread peanut butter on one side of one slice of bread
- spread jam on top of peanut butter
- place other slice of bread on top

servings: 1
prep time: 2 minutes

This is definitely not a syntax that the Ruby parser will accept. However, with a few tweaks we can get it pretty close to what we’d like, and still have it parsable by Ruby:

1
2
3
4
5
6
7
8
9
10
11
12
recipe "PBJ Sandwich"

ingredients "two slices of bread",
            "one heaping tablespoon of peanut butter",
            "one teaspoon of jam"

instructions "spread peanut butter...",
             "spread jam...",
             "place other slice..."

servings 1
prep_time "2 minutes"

From there, we would build some unit tests to make sure each of the elements of the DSL work as expected, and that they work together as we would like. However, first we need to determine what kind of DSL we are making. This decision will depend on the format of our DSL, and will impact how we do our testing. There are basically four significant approaches to DSL design:

  • Instantiation. This is the form that is seen most often in Ruby projects, and which most Rubyists probably don’t even think of as a DSL. Basically, your DSL is simply methods of an object. You interact with it by instantiating the object and calling the methods. The HTML creation DSL of Ruby’s CGI class uses this approach, as does the XML creation DSL of Jim Weirich’s Builder.
  • Class macros. You define your DSL as methods on some ancestor class, and subclasses can then use those methods to tweak the behavior of themselves and their subclasses. These kinds of macros often create new methods. Think “attr_reader” in the stdlib, or “has_many” in ActiveRecord.
  • Top-level methods. Your application basically loads a “configuration” file, which is just a Ruby script augmented with your DSL syntax. Your application defines the DSL as top-level methods, and then invokes load with the path to your DSL script. When those methods are called in the configuration file, they modify some central (typically global) data, which your application uses to determine how it should execute. Rake is an example of this kind of DSL.
  • Sandboxing. This approach is a special case of the more general instantiation technique. Your DSL is defined as methods of some object, but that object is really just a “sandbox”. Interacting with the object’s methods modify some state in the sandbox, which is then queried by the application. Typically, this approach is used in conjunction with instance_eval and friends, so that some configuration file is loaded (or a block is given) and executed within the context of the sandbox. (This sounds similar to the top-level methods technique, with the exception that the DSL is restricted to the sandbox—there no global methods involved.) Capistrano and Needle both use this approach.

Looking at the recipe example earlier, we don’t want to use instantiation, because that would require explicit receivers (e.g. x.recipe "PBJ..."). We don’t want class macros, because that would imply that the recipes are defined within a class. What we want is to use either the top-level methods approach or the sandboxing approach, the difference being what our tolerance is for adding methods to the global namespace is, and whether or not we can deal with a global data store for the entire application.

Once we know what approach we are going to use, we would then define the unit tests based on that decision.

Regardless of the approach you use, some of the language features you can use to make your DSL come to life include:

  • symbols. These have less line-noise than strings and tend to be favored by DSL writers.
  • procs. More than anything else, these make DSL’s in Ruby read and work naturally. They allow simple encapsulation of functionality (so you can write augmented branching constructs), and also let you do delayed evaluation of code.
  • modules. With modules you can easily specialize individual objects with DSL methods.
  • eval, instance_eval, and class_eval. It is definitely worth learning the difference between these three, and how they can be used. These are critical to many different dynamic techniques.
  • define_method. This lets you define new methods that can reference their closure, which you can’t do so easily using the eval methods.
  • alias_method. Rails uses this to good effect to allow modules to override behavior of the classes they are included in.
  • Module#included lets you do additional processing at the moment that a module is included in a class.
  • Class#inherited lets you keep track of who is inheriting from what

There are, of course, many more tools that a DSL writer can use, but I won’t enumerate them all here. Hopefully some of this is helpful. I keep seeing people on the mailing lists asking for “books to learn how to write DSL’s”, but I don’t think it is something a book can really help you with. It’s a different way of thinking about writing code, and as such needs to be learned by doing, not by reading. Experimentation is the key!

Posted in Essays and Rants | 16 comments

27 Jan 2006

Wait Until it Hurts

Posted by Jamis on Friday, January 27

There’s a story behind the recent release of Net::SSH 1.0.7 that I want to share, and which ties in nicely with the indoctrination that I’ve been immersed in (and finding invaluable) at work.

For some time there has been a bug in Net::SSH that caused requests to die sporadically with a “corrupt mac detected” error. People have reported this, sending me bug reports and stack traces, but I was never able to duplicate it. Because I was never able to duplicate it, and because I wasn’t being flooded with reports of it, I felt no pain. Sure, I empathized with the people reporting the bug, in a “gee, I’m really sorry about that” kind of way, but I had no motivation to dig in and find the problem.

Last week I began playing with some fun SwitchTower tasks. For instance, I wanted a way to tail on the rails logs of all of our applications at once, so I could count the number of requests per second that were being handled. Our applications are distributed across four application servers, so this seemed like a great opportunity for SwitchTower. Here’s what I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
desc "Show application statistics in real-time"
task :watch_status, :roles => :app do
  count = 0
  last = Time.now
  run "tail -f /first/rails.log /second/rails.log" do |ch, stream, out|
    puts "#{ch[:host]}: #{out}" and break if stream == :err

    count += 1 if out =~ / (\w+) \w+\[(\d+)\]: Completed in/

    if Time.now - last >= 1
      puts "%2d rps" % count
      count = 0
      last = Time.now
    end
  end
end

(It’s rough, but it really works—just replace the rails.log paths with the paths to your applications. Feel free to polish it off and make it more useful.)

While running this, I finally saw my first, honest-to-goodness, in-the-flesh “corrupted mac detected” error, and it happened reproducibly. Suddenly, it got personal. I felt the pain. I really wanted this feature, but it would never be practical as long as it could not be relied upon to work for extended periods.

Armed with this new motivation, as well as a way to reproduce the problem, I set out to ease the pain. It turned out to be a problem that only occurred in a multithreaded environment. The scheduler was periodically interrupting the socket read of the mac value. All that was needed was to keep retrying the read until the full length of the data was obtained. And now it’s fixed!

The lesson? Wait until you’re feeling pain until you fix something. We all have lots of things vying for our time, and no one likes it when they have to make something a priority. Wait until something hurts, whether because you’re being affected personally, or because you’re being flooded by support requests. It always feels nice to make something stop hurting.

Posted in Essays and Rants | 8 comments

19 Nov 2005

Revealing hidden assumptions in estimation

Posted by Jamis on Saturday, November 19

37signals recently released an update to Backpack that allows each page to have multiple to-do lists, rather than just one. Many, many people seemed to think this should have been a trivial thing to implement, so rather than leave them with that impression, I wrote up a bit about the reality of the situation on Signal vs. Noise.

Posted in Essays and Rants | 2 comments

14 Jul 2005

Application Deployment with Rails

Posted by Jamis on Thursday, July 14

Update: as others have pointed out, this article may sound as if this approach is unique to Rails. It isn’t. I openly acknowledge that. The purpose of the article is to dispel the FUDdy claim that being a Ruby/Rails programmer somehow means you don’t know what it means to follow good deployment practices.

Picture this:

It’s late at night. You need to deploy an update to your production application ASAP. You type a (single!) command on your local development box which deploys your application to both of your application servers and restarts the fcgi processes for them. To your horror, though, you discover that the recent “fix” actually broke a few things! So, you type another (single!) command, and voila!, the update is rolled back and your app is running on the previous version again. You make the necessary corrections, type another (single!) command, and everything is beautiful.

Sound fun? Sound easy? Anyone want to take a guess at what environment we’re talking about?

Certainly not Java. Nor .NET. Heaven forbid it should be anything so “enterprisey”.

If you guessed Ruby on Rails, you’d be dead on. This is, in fact, the very way 37signals manages their application deployment.

How it works

37signals has developed (and will soon release) a “release manager” application, which they use in-house to do atomic, distributed deployment of their RoR applications. Both Basecamp and Backpack are deployed using this tool.

It allows you define a few simple configuration items in a yaml file, things like “hosts to deploy to”, “deployment path”, and even Ruby hooks to be executed at various points during the deployment.

This is then hooked up into the rakefiles for those applications, so they can do things like:

  rake deploy
  rake rollback

A deploy simply establishes an SSH connection to each box to deploy to, uploads a deployment script to that box, and executes it. This is done atomically, as well, so if the deploy fails on one box, it is automatically rolled back on all boxes. If the deploy succeeds, the fcgi’s are restarted and the application begins running on the new version.

Managing versions

Rolling back is possible because of the way the deployment works. Every production application has the following directory structure:

  [approot]
      +--- releases
      |       +--- 1234
      |       |      +--- app
      |       |      +--- doc
      |       |      +--- cache
      |       |      +--- log --> [approot]/shared/log
      |       |      +--- public
      |       |      +--- test
      |       +--- 1245
      |       +--- 1371
      |       +--- 1511
      |       ...
      |       +--- 2713
      +--- shared
      |       +--- log
      |       ...
      +--- current --> [approot]/releases/2713

In this lovely ascii diagram, you see the approot has two subdirectories (releases, and shared) and one symbolic link (current). The releases subdirectory contains one subdirectory for each release, named for the subversion version number of that release. The current symlink always points to the most recent release in that directory.

The web server is then configured so that the webroot of the application is [approot]/current/public.

When a deployment occurs, the latest release is checked out of the svn repository into the releases directory (of all of the production app servers) and the current symlink is updated. If all goes well on the other deployment servers, the fcgi processes are then restarted on all servers.

This makes rolling back to the previous version a snap. You just update the symlink, delete the bad version, and restart the fcgis.

Conclusion

As you can see, there need be nothing haphazard about application deployment in RoR. To be honest, I’ve used Java war files and ear files (alot) and hated them. They weren’t for me. I find the kind of agile deployment described in this article much more powerful, and simpler.

And, hey, it’s all written in Ruby. What’s not to like about that?

Posted in Essays and Rants | 6 comments