Nov
20
2008
Our first frost should come any time now, and I want to have warning of it so we can rescue our tomatoes. Well, I have a link to the NWS Area Forecast Discussion in my bookmarks which I try to read every day, but some days I forget. What I need is a feed. But as far as I can tell only Honolulu is cool enough to get a feed for AFDs. Weird. So what I needed was to create a feed from an existing web page.
I thought I would find a website that offers a service like this but I didn’t (in a few short minutes of searching). I found websites that came halfway, but they were very complicated to set up and/or they didn’t display the body of the page, only a link. The whole point is that I want to read it in my feed reader!
So I whipped up this Ruby code (standard libs only, no gems required):
#! /usr/bin/ruby
require 'erb'
require 'open-uri'
require 'ostruct'
# configuration
channel = OpenStruct.new(:url => "http://www.srh.noaa.gov/data/EPZ/AFDEPZ",
:title => "EPZAFD",
:description => "National Weather Service Area Forecast Discussion, El Paso TX/Santa Teresa NM")
item = OpenStruct.new(:url => channel.url,
:title => "Area Forecast Discussion",
:date => Time.now)
# fetch the page
afd = open(item.url)
item.date = afd.last_modified unless afd.last_modified.nil?
item.body = "<pre>" + afd.read + "</pre>"
# emit
include ERB::Util
template = ERB.new <<EOF
Content-Type: application/rss+xml
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title><%= h channel.title %></title>
<link><%= h channel.url %></link>
<description><%= h channel.description %></description>
<lastBuildDate><%= h Time.now.rfc822 %></lastBuildDate>
<generator>feedme</generator>
<item>
<title><%= h item.title %></title>
<description><%= h item.body %></description>
<pubDate><%= h item.date.rfc822 %></pubDate>
<guid><%= h "#{channel.url}?date=#{item.date.iso8601}" %></guid>
</item>
</channel>
</rss>
EOF
puts template.result(binding)
Nothing overly fancy here. I use open-uri to fetch the page, extract the Last-Modified header (if it exists) and shoehorn it into an ERB template for the RSS.
In this case I just made it executable and slapped a Content-Header before the output and call it as a CGI. You could just as well run a cron job to update a file on disk (In which case remove the Content-Header from the template).
Once I found the pure text version of the AFD, it was just a matter of slapping it between <pre> tags, but if you had some actual screen scraping to do you might want to look at Hpricot which makes that really easy. In particular, I could have used the URL http://www.crh.noaa.gov/product.php?site=NWS&issuedby=EPZ&product=AFD&format=txt&version=1&glossary=1 and done
...
require 'hpricot'
...
item.body = (doc/"#content").to_html
which is in fact how I started out. But this page doesn’t have a Last-Modified header which means my feed reader would always show it as a new item (every time the cron job updated, or every time I hit the CGI script, either way). Luckily I found the text-only URL that doesn’t have this problem.
no comments | tags: afd, cgi, erb, feed, feedme, frost, html, nws, open, rss, ruby, tech, uri, xml
Feb
28
2008
Have you ever thrown together a simple static webpage, only to find down the road that you want to add an RSS feed? What are your options? Maintain an ugly XML file by hand, or migrate to a big slow messy CMS. Yeah, no fun.
Sars is a simple RSS domain specific language. This:
# This is a YAML stream (http://yaml.org) but you don't need to know much YAML
# to get the hang of it.
# There are multiple "documents". The first document is the channel information:
---
title: Foo News Feed
link: http://example.com/foo/
description: News for the Foo Project
webmaster: you@example.com
# The second and subsequent documents are items. The first line is the title,
# the second line is the date, and the rest is the item description (Markdown).
# Because line endings are important, don't forget the pipe character.
--- |
Really Exciting Title
2/28/08 12:00
This is where I pontificate
about the really exciting fish
that is sitting on my plate.
Here's a [download link](http://example.com/foo/foo-1.0.tar.gz).
--- |
Another Item
2/28/08 12:04
You know, it doesn't really matter what order you put them in, since they each
have dates.
becomes this:
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>Foo News Feed</title>
<link>http://example.com/foo/</link>
<description>News for the Foo Project</description>
<lastBuildDate>Thu, 28 Feb 2008 12:07:32 -0700</lastBuildDate>
<generator>yaml2rss</generator>
<webMaster></webMaster>
<item>
<title>Really Exciting Title</title>
<description><p>This is where I pontificate
about the really exciting fish
that is sitting on my plate.</p>
<p>Here's a <a href="http://example.com/foo/foo-1.0.tar.gz">download link</a>.</p></description>
<pubDate>Thu, 28 Feb 2008 12:00:00 -0700</pubDate>
<guid>http://example.com/foo//2008-02-28T12:00:00-07:00</guid>
</item>
<item>
<title>Another Item</title>
<description><p>You know, it doesn't really matter what order you put them in, since they each
have dates.</p></description>
<pubDate>Thu, 28 Feb 2008 12:04:00 -0700</pubDate>
<guid>http://example.com/foo//2008-02-28T12:04:00-07:00</guid>
</item>
</channel>
</rss>
Any questions?
1 comment | tags: dsl, markdown, rss, ruby, sars, src, text, yaml
Aug
14
2007
Steve Dibb and I are on the same page about
full-content RSS feeds. It’s good to see some empirical evidence supporting
common sense. And it really is common sense: why do people subscribe to RSS
feeds? So they don’t have to manually visit every site they want to keep
abreast of. So what RSS subscriber would want for a site to make them click
through to read it? As with so many things in life, it comes down to treating
your readers right (or customers/users/friends/family/etc).
That said, my apologies for when I recently tried to make a spoiler-free book
review of the latest Harry Potter by using the usually-annoying click-through
trick, and my feed generator ignored me.
1 comment | tags: rss
Jun
14
2006
I’m giving flock another look tonight and I’m liking what I’m seeing. The RSS reader is at least as good as Safari’s. I’ve been stuck in Safari even though I might rather use Firefox for months now, because I was addicted to the RSS reading, so this is great. Flock looks better in OS X than firefox, although it’s not quite as nicely integrated as Safari.
I have a couple of major gripes. It imported my Safari settings only after I explicitly asked it to, and then it failed to do anything useful with my RSS feeds from Safari. Talk about a wet blanket. So I figured out how to create OPML from Safari and tried to import it. Flock failed to import it. Strike two. So then I decided to try the built-in blogging fun, and I’m not impressed with the stupid editor. Composing blog entries in what is basically the same interface as Netscape Composer ten years past is not my idea of cutting edge. I should have the option of starting out in source mode, but more importantly (since editing HTML is even less fun than using a composer-like interface) I should be able to use Markdown or Textile or whatever I want. I’ll still be using flog for writing my posts.
Nevertheless, flock looks promising indeed, and I may end up using it if I can get my RSS imported.
technorati tags:flock, safari, rss, blog
Update: there’s something else right there: my blog has tags, I don’t want tags in the post. Sheesh.
no comments | tags: blog, flock, rss, safari