bdunagan
fill the void

Rack Tip: ActiveRecord with Logging

Using ActiveRecord in Rack is one line of Ruby code, and setting up database logging outside of Rails is just as easy. As a minor extension of this excellent tidbit, you simply instantiate the Logger object with a file path. Here’s the code:

require 'active_record'
require 'active_support'
ActiveRecord::Base.logger = Logger.new("log/db.log")
# ActiveRecord::Base.establish_connection(...)

Rails Tip: Locking Cron Jobs

I really like using the Ruby whenever gem for managing cron jobs in Rails. But recently I needed to ensure there was only one instance of a cron job running. If it took a bit too long, I didn’t want another instance starting up in parallel. Enter flock.

task :do_stuff_for_a_while => :environment do
  # Ensure there is only one instance running.
  return if File.new(__FILE__).flock(File::LOCK_EX | File::LOCK_NB) == false

  # Do some stuff.
end

Converting MySQL from LATIN1 to UTF8

MySQL defaults to “latin1″ as its character set, but at some point, most people want to migrate to “utf8″. I realize that there are dozens of posts about how people handled this, and yet, not a single one of those worked completely for me.

I wanted MySQL to use “utf8″ for the character set and “utf8_unicode_ci” (not “utf8_general_ci”) for the collation, and I wanted it all to work on RDS with ActiveRecord on Rails and Rack.

Here are my steps:

  1. use mysqldump to extract the old data as “latin1″
  2. use sed to replace “latin1″ with “utf8″ in the dump file
  3. create the new database with the right parameters: character set utf8 collate utf8_unicode_ci
  4. use mysql --default-character-set=utf8 to pipe the converted dump into the new database

Here is my code:

# Dump the old database as latin1, because ironically, mysqldump defaults to utf8.
mysqldump --default-character-set=latin1 db > db.dump

# If you need to convert a MySQL dump from one character set to another, use iconv.
iconv -f LATIN1 -t UTF-8 < db.dump > db.dump

# If you've been running mysqldump without parameters on a latin1 instance, you can convert the dump from UTF8 to latin1 to correct it.
iconv -f UTF-8 -t LATIN1 < db.dump > db.dump

# Rewrite the dump to say 'utf8' and 'utf8_unicode_ci' in all the right places.
sed -e 's/SET NAMES latin1/SET NAMES utf8/g' -i db.dump
sed -e 's/CHARSET=latin1/CHARSET=utf8 COLLATE=utf8_unicode_ci/g' -i db.dump

# Create a new database with the correct parameters.
create database db character set utf8 collate utf8_unicode_ci;
# Verify it.
show create database db;

# Pipe the converted database dump into MySQL.
mysql -h hostname --default-character-set=utf8 -u root -p db < db.dump

To verify the character set and collation, you can always query the MySQL variables:

show variables like 'collation%';
show variables like 'character%';

Amazon RDS

For those using Amazon’s AWS RDS for their MySQL instance, you have to create a parameter group with “utf8″ values. I’d guess you could just modify the current parameter group then apply it, but I haven’t verified that.

# Create a parameter group.
rds-create-db-parameter-group utf8 -e mysql5.1 -d utf8

# Modify the parameter group's values
rds-modify-db-parameter-group utf8 \
    --parameters="name=character_set_server, value=utf8, method=immediate" \
    --parameters="name=character_set_client, value=utf8, method=immediate" \
    --parameters="name=character_set_results,value=utf8,method=immediate" \
    --parameters="name=collation_server, value=utf8_unicode_ci, method=immediate" \
    --parameters="name=collation_connection, value=utf8_unicode_ci, method=immediate"

# Check the parameter group's values.
rds-describe-db-parameters utf8 --source=User

# Push this new parameter group to your instance.
rds-modify-db-instance rds-db --db-parameter-group-name utf8

# Reboot the instance (necessary: http://aws.amazon.com/articles/Amazon-RDS/2935).
rds-reboot-db-instance rds-db

ActiveRecord in Rails and Rack

ActiveRecord supports an :encoding option in its parameters for ActiveRecord::Base.establish_connection. The option tells the connection to execute SET NAMES as soon as the connection is established, thereby telling the server what the character set the client wants.

However, I also wanted to specify the collation. When I added :encoding to ActiveRecord::Base.establish_connection, collation_connection (from the MySQL variables, not connection.collation) remained as “utf8_general_ci”. Some people have indicated that you can specify :collation in database.yml for the establish_connection call, but that never worked for me. I think MySQL bug #34980 prevented it. Others indicated that you can simply add ActiveRecord::Base.connection.execute("set collation_connection='utf8_unicode_ci'") at the bottom of environment.rb for Rails; that also never worked for me. To specify the collation in Rails, I used a before_filter in application_controller.rb. See my code below.

# Rack: add a statement right after establishing the connection.
ActiveRecord::Base.establish_connection(
  :adapter  => "mysql",
  :host     => "host",
  :username => "username",
  :password => "password",
  :database => "database",
  :encoding => "utf8",
  :reconnect => true
)
ActiveRecord::Base.connection.execute("SET collation_connection='utf8_unicode_ci'");

# Rails: add a before_filter in application_controller.rb
before_filter :set_database_collation
def set_database_collation
  ActiveRecord::Base.connection.execute("set collation_connection='utf8_unicode_ci'")
end

Pointers

I pieced my steps together from the following helpful links:


Sanitizing POST params in Rack

Rack is a handy way to get Ruby up and running on a web server, but it’s picky about input. Recently, I tried to post a URL with an ampersand (&) to a Rack instance, and because the URL contained an ampersand, Rack parsed the data wrong. It considers ampersands to be separating tokens.

When I can control the input, I can simply use percent encoding to escape the ampersand (%26). But for dealing with malformed input, Rack needs to rewrite the POST data before processing it.

# Escape the ampersand in the POST data.
rack_input = env["rack.input"].read
rack_input = rack_input.gsub("&","%26")
params = Rack::Utils.parse_query(rack_input, "&")
params["post_data"] = Rack::Utils.unescape(params["post_data"])
env["rack.input"] = StringIO.new(Rack::Utils.build_query(params))
# Parse the request.
req = Rack::Request.new(env)

Thanks to Pivotal Labs for the crucial bits of code.


Compiling PowerPC with LLVM GCC 4.2 in Xcode 4.1 on 10.7 Lion

There have been a number of excellent articles describing how to restore various older technologies to Xcode 4 (10.4 SDK, 10.5 SDK, GCC 4.0, PPC). Thanks to mecki for doing such a good job detailing the steps. The only trouble is those steps didn’t work for me in one case: PowerPC support. Luckily, it’s easy to restore.

Here is the short C code I compiled to test the process:

#include <stdio.h>
main() { printf("Hello world\n"); return 0; }

I compiled the C code with LLVM GCC 4.2 (Xcode 4′s default compiler), specifying the PowerPC architecture with 10.6 SDK. Then I used lipo to check the binary’s architecture.

# Compile with PPC architecture.
/Developer/usr/bin/llvm-gcc-4.2 -arch ppc -isysroot /Developer/SDKs/MacOSX10.6.sdk main.c -o Hello
# Check the binary's architecture.
lipo -info Hello
# Should see: "Non-fat file: Hello is architecture: ppc7400"

I installed Xcode 4.1 on my 10.7 system and then copied files from Xcode 3.2.5 on my 10.6 system into /Xcode3 on my 10.7 system. Here are the files I included:

/Xcode3/usr/bin/as (51 KB)
/Xcode3/usr/libexec/gcc/darwin/ppc/as (466 KB)
/Xcode3/usr/llvm-gcc-4.2/* (130 MB)

Here are the shell commands that I used to finally get LLVM GCC 4.2 to compile with PowerPC:

cd /Developer/usr/llvm-gcc-4.2/bin
sudo ln -s /Xcode3/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin10-llvm-gcc-4.2 powerpc-apple-darwin11-llvm-gcc-4.2

One last hurdle was Xcode 4′s build settings. I had to specify “ppc” not only in “Valid Architectures” but also in “Architectures”, along with $(ARCHS_STANDARD_32_64_BIT).