MOVED

Scaling and Tools Diversity: Google vs. Facebook

Steve Yegge in one of his posts talks about Google’s policy of standardizing on the use of only a few programming languages. He mentions how he, as a guy interested in different languages, was at first annoyed by that fact, but later came to realize it was the only sensible way of building systems as scalable as theirs have to be.

In what seems to contradict that view, in Facebook’s recent discussion of the decisions they had to make when designing Facebook Chat, they mention choosing Erlang because, well, it was made to do distributed, realtime systems with… message passing. Can’t get a better fit for a Chat project than that.

To make that code interface with their existing codebase, they used Thrift, their free software “framework for scalable cross-language services development”, whose white paper begins with the strong remark:

“In our implementation of these services [Facebook], various programming lan-
guages have been selected to optimize for the right combination of performance, ease and speed of development, availability of existing libraries, etc. By and large, Facebook’s engineering culture has tended towards choosing the best tools and implementations available over standardizing on any one programming language and begrudgingly accepting its inherent limitations.

Now, of course Google and Facebook have very different needs (crawling, storing, rating and indexing the whole web on a regular basis must be a bit more challenging than showing user profiles — even if it’s showing many of them, many many times a day), and playing safe has surely worked out well for Google so far. But sometimes that requires reimplementing Ruby on Rails in Javascript just to suite a company requirement.

It’s good to see there is a place in the monster-traffic world for programming language enthusiasts :)

Making Gmail always use HTTPS without any Greasemonkey

It’s the stupidest solution, but I hadn’t thought of it before.

I had seen that the Better Gmail extension had this feature of making Gmail always run over HTTPS, but it also came with so much other stuff that I found it to be just too bloated, and kept wishing for a simpler solution (meanwhile exposing my privacy to all those sniffers out there — oh the danger!).

Of course, I could always replace “http” with “https” on the address bar, but it’s a pain doing that every time. If only I could set it to be always like that…

Wait a second: I always start Gmail as my home page. Always use Alt+Home when I want to go to it. Yes, people, I had this brilliant idea: why not just put the address with “HTTPS://” in the configs as my Home page?

And that I did. Now I’m a safe, happy Gmail user. No extensions, no glitchy Greasemonkey scripts. Just the ululating obvious.

When testing isn’t worth the price

I’m just starting to use RSpec for Rails instead of Test::Unit, and with it comes a little novelty: there are separate Controller and View tests (unlike TUnit’s functional tests). At first I thought “hm.. cool”. But after spending the first hours writing tests for views I started to feel very stupid, and the whole thing feels very awkward and unnecessary.

Views are too unstructured and change too often for it to be worth keeping it all tested, and most of the time you’re not testing Ruby code, but HTML, and I don’t think that’s what tests are for. If your controllers are well tested, views should do OK.

As convinced as I am about this, I was feeling a little guilty to just ditch testing like that, so I searched for some supporting opinion and found this post. It agrees with me, so it must be right :) The comments are also interesting. The main idea is that you should just test if the views render without errors and get on with life. Now I just have to find out how to test that little thing.

Yak Shaving: optimizing brain usage for code snippets

So you checked out how TextMate has all those wonderful snippets for every possible piece of code you could think of (and how now Emacs also does!), but you’re having second thoughts on wether it pays off to memorizing all those little abreviations and their meanings?

Not anymore!

With the cute one-liner below, you can fire up irb on your snippets directory and see instantly the TOP 5 winners in characters-saved / characters-typed !! Pretty neat huh? ;)

Here’s an example:

$ cd ~/.emacs.d/yasnippet/snippets/text-mode/ruby-mode
$ irb
irb(main):019:0> Dir['*'].map {|f| [f,File.read(f).reject{|s| s =~ /^#/}.join.size.to_f/f.size]}.sort {|a,b| b.last <=> a.last}.first(5).each {|f| puts "####" + f.first,File.read(f.first), "\n" *2}
####w
#name : attr_writer ...
# --
attr_writer :${attr_names}

####r
#name : attr_reader …
# –
attr_reader :${attr_names}

####mm
#name : def method_missing … end
# –
def method_missing(method, *args)
$0
end

####am
#name : alias_method new, old
# –
alias_method :${new_name}, :${old_name}

####bm
#name : Benchmark.bmbm(…) do … end
# –
Benchmark.bmbm(${1:10}) do |x|
$0
end

=> [["w", 26.0], ["r", 26.0], ["mm", 21.0], ["am", 19.5], ["bm", 19.5]]

It would probably be nice also to consider how much typing it saves you by mirroring variable names and stuff… And also how frequent that particular construct actually is in your code… Come to think of it, this one-liner is pretty useless, but at least picking an arbitrary 5 or 6 snippets to add to your dynamic cheatsheet is better than trying to randomly memorize them.

In case you didn’t catch it, here it is in full color (and we discover wordpress doesn’t use a full parser for syntax highlighting, what a shame :P):

Dir['*'].map {|f| [f,File.read(f).reject{|s| s =~ /^#/}.join.size.to_f/f.size]}.sort {|a,b| b.last => a.last}.first(5).each {|f| puts "####" + f.first,File.read(f.first), "\n" *2}

Rails vs SCM: resolving conflicts between local and upstream Migrations

If you’re working on a local branch of a Rais project for long enough, you’re bound to run into this irritating problem: you create a new migration, it gets the smallest unique number from the ones you got from upstream, BUT, before you get the chance to commit it, someone does it first, and in your next update (svn up || git pull) you have that tangled migration mess.

This little rake task might help you out. Warning: it assumes that all your local migrations have already been run, and that *none* of the new migrations from upstream have been run.

The code is definetely not very DRY and doesn’t take much advantage of Rake (I’m pretty n00b on Rake), so I accept suggestions/patches :)

To use it, just throw it in your lib/tasks folder and call it using “rake db:migrate:fast_forward”.

Next (and easy) step is making it receive a SCM parameter (git/svn) so it’ll use the proper “mv” command.


namespace :db do
  namespace :migrate do
    desc <<STR
Resolves conflicts between local and upstream migrations.

This task assumes the following scenario:
During your local development, you've created migrations and ran rake db:migrate;
Then, you updated from upstream (svn update || git svn rebase), and ended up with
pairs of migrations with the same number: one is the local you created, and the
other is the one from upstream that someone commited before you.

Besides that, there might be some other non-overlapping migrations *after* the
overlapping zone that are *also* local (you had more local migrations than new ones
that came from upstream on the update).

This tasks takes *all your local migrations*, **reverts them** (in reverse order),
and moves them (in order) to the end of the line.

After that you can run rake db:migrate again and it'll run first the migrations
from upstream, and yours last.
STR

    task :fast_forward => :environment do
      migrator = ActiveRecord::Migrator.new(:down, 'db/migrate')
      puts "Looking for migrations with repeated numbers"
      all_migrations = Dir['db/migrate/*'].sort
      pairs = all_migrations.group_by{|migration| migration =~ /(\d+)/; $1}.
        select {|number, migrations| 1 < migrations.size && migrations.size < 3}
      pairs = pairs.sort {|x, y| x[0] <=> y[0]}

      # Pick the range of (local) migrations that will be slided to the end
      migrations_to_move = []
      # First the ones that overlap (disambiguated by user)
      pairs.map{|pair| pair[1]}.each do |mig1, mig2|
        begin
          puts "\n[1]\t#{mig1}"
          puts "[2]\t#{mig2}"
          puts "\nWhich one is part of the range to be slided to the end of the list?"
          option = STDIN.gets.to_i
        end until option == 1 || option == 2

        migrations_to_move << (option == 1 ? mig1 : mig2)

      end
      # Then the (local) ones past the overlap zone
      unless pairs.empty?
        idx_last_overlapping_migration = all_migrations.index(pairs.last[1].last)
        migrations_to_move += all_migrations[idx_last_overlapping_migration+1..-1]
        # Assumes all (and only) the local ones past the overlap zone have already been run
        migrations_to_move.reject! { |m| m =~ /(\d+)/; $1.to_i > migrator.current_version }
      end

      migrations_to_move.first =~ /(\d+)/
      schema_version = $1.to_i # set_schema_version subtracts one

      # Slide the range to be slided to the end of the list
      upstream_migrations = all_migrations - migrations_to_move
      upstream_migrations.last =~ /(\d+)/
      next_number = $1.to_i + 1

      new_names = migrations_to_move.map { |migration|
        migration =~ /(\d+)(.*)/
        name_migration_to_move = $2

        new_name = 'db/migrate/' + ("%03d" % next_number) + name_migration_to_move
        next_number += 1
        new_name
      }

      # Confirm and execute
      unless migrations_to_move.empty?
        pp "Latest upstream migrations", upstream_migrations.last(5)
        pp "These are your local migrations: ", migrations_to_move
        pp "They will be reverted and renamed to: ", new_names
        puts "And the new schema version will be: #{schema_version-1}"

        begin
          puts "\nShould I proceed? [Y/n] "
          option = STDIN.gets.strip.downcase
        end until option == 'y' || option == 'n'

        if option == 'y'
          # Revert
          migrations_to_move.reverse.each do |migration|
            require migration
            migration_class = migrator.send(:migration_class, *(migrator.send(:migration_version_and_name, migration).reverse))
            migration_class.down
          end
          migrator.send(:set_schema_version, schema_version)
          # Move to end of line
          migrations_to_move.zip(new_names) do |old_name, new_name|
            File.rename old_name, new_name
          end
        end
      else
        puts "No overlapping migrations. You can safely run rake db:migrate."
      end
    end
  end
end

Update: Just after writing this, a friend told me about the Git Migration Buddy. It is git specific and seems to handle handle multiple branches better. Mine is kinda 1-n (main (svn in my case) repo syncing with multiple local branches). There’s the enhanced_migrations plugin that supposedly stops the problem at the root, having timestamps instead of increasing numbers for migrations. Zach in the comments also mentions a great solution he’s coming up with: a post-checkout hook to change database.yml and have a different db for each branch (dunno if it works too well with big dbs, but it’s a great idea nonetheless).

Fast-forwarding through screencasts without ant voice

One thing that always stopped me from using screencasts as a viable learning tool is that they usually take too long. And most of the time it’s the guy moving around, or saying “uh”, “er…”, or doing stuff I already know. And when I tried skipping a few seconds ahead I usually skipped the very few meaty bits and had to go back and listen to them again. So what I did sometimes was increasing the playback speed (by pressing the ]-key on mplayer), but that made the guy speak as if breathing helium.

Not anymore!

Mplayer (from svn) has a fantastic new audio filter called scaletempo. It basically lets you change the playback speed without changing the sound pitch. The guy speaks faster, but in the same tone. Isn’t that amazing?! So, here’s how to do it (you really need the svn version as of now; 1.0rc2 won’t do it) on Ubuntu:

First, we need to install the dependencies for compiling the new package (without installing the package itself):

sudo apt-get build-dep mplayer

Now the usual checkout, compile and install:

cd /tmp
svn checkout svn://svn.mplayerhq.hu/mplayer/trunk mplayer
cd mplayer
./configure
make
sudo make install

And that’s it! Now you just open your videos like this:

mplayer screencast.ogm -af scaletempo

and use the keys [ and ] to adjust playback speed at will.

Edit: from my experience, you can speed up speech up to 1.5-1.75 without losing quality. That means you can watch a 1-hour video in 34-40min!

FISL 8.0

I’m here at the 8th International Free Software Forum (FISL in pt-BR). It’s the last day and it’s been a great experience so far. I’ll make a few highlights so I can archive what’s going on in my head right now for future reference.

Most importantly, it was my first time here or in any community event, and getting in contact with a lot of the culture and the people has left me really enthusiastic about all things Free Software. I met a few big shots I had never heard of, met some of the small people who are very passionate and do a lot of work, and also lots of people like me who do nothing but wish they’d do a lot more (or anything at all).

The big shots get you hyped up and the small workers give you the self-assurance that you too can actually contribute.

Most people find it important to be part of a group, and this is a very good group to be part of. It’s active fun, it helps people, and it might help you professionaly also =)

Well, here’s a small sum up of the stuff I’ve been to:

1st day (12/04):

- LiVES is a Video Editing System (Gabriel Finch). Seems like I might have found an alternative for the completely unusable Cinelerra. Demo was quite nice. Have to try that out.

- Liberating Java – The Story So Far (Simon Phipps). This guy is a good speaker. Explained the whole story, seems like things are going well. There was even a video with Stallman giving a thumbs up for Sun’s initiative. Too bad that’s not true in the hardware front as well (more on that later).

- X.org: Projects and People (Keith Packard). Keith left no doubt as to the fact that all X.org developers are big drunkards. He proved it with pictures. He also showed some neat stuff they’ve been working on, and raised awareness about the pains they have to go through to get X to work on modern videocards with stupidass manufacturers who don’t release documentation for their hardware (like nVidia and ATi). Intel, on the other hand, not only releases such documentation, but is also actively helping in the development of X.org (Keith works for them). That seems smart, since not only they get respect from the community but also have become the first testing platform for new features in X. They work in their cards first.

- KDE: State of the Union (Aaron Seigo). This was a great talk on the new features of KDE 4.0 and all the cool backend stuff people have been working on. Aaron is a really captivating guy who even got me to start considering learning C++. On the second day he gave a more developer-oriented talk on how nice and cool it is to develop applications for KDE 4.0 and that was also very interesting. After that we sat around at the LinuxChix booth and he gave a few more demos and discussed community topics with me and some other guys. After I’m through with the Google SoC (oh, yes, I was accepted! :-D but that’s for another post), I’m going to really start looking into KDE development. Seems like they have some pretty nice Ruby bindings ;-)

Well, I have a talk to attend to now, so this basically sums it up for day one. More to come soon.