Grab Conference with Hpricot

So you slept through a few sessions of conference (or you were distracted by the computer or stuck in the bathroom with a potty training toddler). The natural thing to do is to listen to sessions one at a time on your mp3 player over the next week or two. But it would be too much work to go to the archives website every day to download the appropriate session, then sync your mp3 player. No, you need to download them all now. But clicking on each one is tedious at best.

Enter a script. Now I’ve done something like this in the past with just sed and grep, but this time I thought I’d skip the sed trial and error and practice some Hpricot. It was quick and painless:

require 'rubygems'
require 'hpricot'
require 'open-uri'
CONFERENCE_URL="http://www.lds.org/conference/sessions/display/0,5239,49-1-851,00.html"
doc = Hpricot(open(CONFERENCE_URL))
puts (doc/"a").map{|a| a['href']}.join("\n")

That little script extracts all the urls from a URL. Well maybe there’s a program that does that already. Maybe it’s urlview, but an impatient google didn’t find it in time (and urlview isn’t installed on my laptop). Now you continue on with the grep magic:

./strip_urls.rb | grep mp3 | grep -v Complete | xargs wget

And you’re on your way.


Leave a Reply