Wednesday, 21 October 2009

Creating a background task for Rails

First you will need somewhere to store your settings. Create a file called example.yml in the configdirectory and a file called example.rb in the config/initializers directory. A minimal example.yml file contains this

development:
pid_file: /var/run/example.pid
sleep_for: 10

# Same for production and test

The example.rb file should look like this

EXAMPLE = YAML.load_file("#{RAILS_ROOT}/config/example.yml")[RAILS_ENV]

You can now access all your settings from the global EXAMPLE hash. Now we need to create a script to run, the skeleton of which is stored in the scripts directory (using the same set_pid code from a previous post)
def set_pid(filename)
# Code from the previous post
end

logger = Logger.new("#{RAILS_ROOT}/log/example.log", 'daily')
logger.formatter = Logger::Formatter.new()

if set_pid(EXAMPLE['pid_file'])
logger.info "Example started, polling every #{EXAMPLE['sleep_for']} seconds"
else
logger.error "Example is already running"
exit
end

$continue = true

trap(:SIGTERM) { $continue = false }

while $continue
# Do your stuff here

sleep POLL['sleep_for']
end

logger.info "Shutting down"

Finally we need to make sure that the process is started up when the machine starts up we need this script (example._ctl) in /etc/init.d
#!/bin/sh

PID=/var/run/example.pid

# Find the name of the script
NAME=`basename $0`

script_result=0

start()
{
su --login example_user --command "cd /home/application/path/current ; /usr/bin/env ruby script/runner -e production script/example.rb &"

echo "Starting $NAME service: OK"
}

stop()
{
if [ -r $PID ]
then
kill `cat $PID`
STATUS="OK"
else
STATUS="FAILED"
script_result=1
fi

echo "Stopping $NAME service: $STATUS"
}

# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
echo "Usage: $0 {start|stop|restart}"
exit 1
esac

exit $script_result

Wednesday, 14 October 2009

Scraping a users Twitter

I needed to get all the urls from a user's tweets, this is what I came up with.

#!/usr/bin/env ruby

require 'rubygems'

require 'open-uri'
require 'json'
require 'hpricot'

def get_page(url)
x = open(url).read
y = JSON::parse(x)

doc = Hpricot(y['#timeline'])
doc.search('li').each do |l|
status = l.attributes['id']
l.search("span.entry-content/a.web").each do |c|
puts "#{status} = #{c.attributes['href']}"
end
end

doc = Hpricot(y['#pagination'])
return doc.search('a').first.attributes['href']
end

user = 'twitter-user'

next_page = get_page("http://twitter.com/#{user}?page=1&format=json")

while next_page
# So we don't get banned
sleep 1
next_page = get_page("http://twitter.com#{next_page}")
end