The Fugue Counterpoint by Hans Fugal

20Nov/080

Feed Me

Our first frost should come any time now, and I want to have warning of it so we can rescue our tomatoes. Well, I have a link to the NWS Area Forecast Discussion in my bookmarks which I try to read every day, but some days I forget. What I need is a feed. But as far as I can tell only Honolulu is cool enough to get a feed for AFDs. Weird. So what I needed was to create a feed from an existing web page.

I thought I would find a website that offers a service like this but I didn't (in a few short minutes of searching). I found websites that came halfway, but they were very complicated to set up and/or they didn't display the body of the page, only a link. The whole point is that I want to read it in my feed reader!

So I whipped up this Ruby code (standard libs only, no gems required):

#! /usr/bin/ruby
require 'erb'
require 'open-uri'
require 'ostruct'

# configuration
channel = OpenStruct.new(:url => "http://www.srh.noaa.gov/data/EPZ/AFDEPZ",
                         :title => "EPZAFD",
                         :description => "National Weather Service Area Forecast Discussion, El Paso TX/Santa Teresa NM")
item = OpenStruct.new(:url => channel.url,
                      :title => "Area Forecast Discussion",
                      :date => Time.now)

# fetch the page
afd = open(item.url)
item.date = afd.last_modified unless afd.last_modified.nil?
item.body = "<pre>" + afd.read + "</pre>"

# emit
include ERB::Util
template = ERB.new <<EOF
Content-Type: application/rss+xml

<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      <title><%= h channel.title %></title>
      <link><%= h channel.url %></link>
      <description><%= h channel.description %></description>
      <lastBuildDate><%= h Time.now.rfc822 %></lastBuildDate>
      <generator>feedme</generator>
      <item>
         <title><%= h item.title %></title>
         <description><%= h item.body %></description>
         <pubDate><%= h item.date.rfc822 %></pubDate>
         <guid><%= h "#{channel.url}?date=#{item.date.iso8601}" %></guid>
      </item>
   </channel>
</rss>
EOF
puts template.result(binding)

Nothing overly fancy here. I use open-uri to fetch the page, extract the Last-Modified header (if it exists) and shoehorn it into an ERB template for the RSS.

In this case I just made it executable and slapped a Content-Header before the output and call it as a CGI. You could just as well run a cron job to update a file on disk (In which case remove the Content-Header from the template).

Once I found the pure text version of the AFD, it was just a matter of slapping it between <pre> tags, but if you had some actual screen scraping to do you might want to look at Hpricot which makes that really easy. In particular, I could have used the URL http://www.crh.noaa.gov/product.php?site=NWS&issuedby=EPZ&product=AFD&format=txt&version=1&glossary=1 and done

...
require 'hpricot'
...
item.body = (doc/"#content").to_html

which is in fact how I started out. But this page doesn't have a Last-Modified header which means my feed reader would always show it as a new item (every time the cron job updated, or every time I hit the CGI script, either way). Luckily I found the text-only URL that doesn't have this problem.

17Nov/083

Clojure DSP Longing

I often find myself longing to be able to use Clojure, a very enticing lispy language that runs on the JVM.

I could possibly be using it right now in my dissertation research. It has the promise of dynamic languages, functional programming, almost-as-cool-as-Erlang concurrency, JVM performance, and Java library soup. It could be so awesome. A few months ago I started briefly down this road, unaware that…

Clojure sucks. Not generally, but it sucks for DSP. More specifically, Java and therefore Clojure has no real support for complex numbers. In order to do serious DSP, you need native syntactic, semantic, and performance support for complex numbers. Java has none of the above. Older versions of C didn't have syntactic or semantic support, but the performance of using arrays was plenty fast. Not so in Java, at least not to the extent necessary to override the lack of syntactic and semantic.

So someday, when I'm writing general purpose code again and not high performance DSP code, I will have an opportunity to use Clojure, and I think that will make me very happy. By then the book will be out of beta. The community will be in full swing. There will be awesome libraries. Children will play in pristine parks with formerly-ravenous ravens.

In the meantime, if anyone sees the scene change, do let me know.

7Apr/080

Grab Conference with Hpricot

So you slept through a few sessions of conference (or you were distracted by the computer or stuck in the bathroom with a potty training toddler). The natural thing to do is to listen to sessions one at a time on your mp3 player over the next week or two. But it would be too much work to go to the archives website every day to download the appropriate session, then sync your mp3 player. No, you need to download them all now. But clicking on each one is tedious at best.

Enter a script. Now I've done something like this in the past with just sed and grep, but this time I thought I'd skip the sed trial and error and practice some Hpricot. It was quick and painless:

require 'rubygems'
require 'hpricot'
require 'open-uri'
CONFERENCE_URL="http://www.lds.org/conference/sessions/display/0,5239,49-1-851,00.html"
doc = Hpricot(open(CONFERENCE_URL))
puts (doc/"a").map{|a| a['href']}.join("\n")

That little script extracts all the urls from a URL. Well maybe there's a program that does that already. Maybe it's urlview, but an impatient google didn't find it in time (and urlview isn't installed on my laptop). Now you continue on with the grep magic:

./strip_urls.rb | grep mp3 | grep -v Complete | xargs wget

And you're on your way.

27Mar/081

Rails Sessions

I was doing some maintenance on my blog, and was devastated to find that Typo was taking 225 megabytes of resident RAM. Yikes! After some creative debug thinking and digging I figured out it was due to sessions. Typo now stores sessions in the database, so my maintenance cron job to delete old sessions didn't clean up old sessions. (Ha! had you going for a second!)

Well I could write a cron job to run a script to clean the sessions out of the db, like:

#!/bin/sh
sqlite3 /path/to/typo/db/production.db 'delete from sessions'

Ok, that's a bit extreme, but you get the idea. But when I deleted the sessions in this manner the memory usage didn't drop at all until I had restarted the server, which seems unnecessary. So instead I changed typo's configuration to use a different session store. I commented out this line in config/environment.rb:

-  config.action_controller.session_store = :active_record_store
+  #config.action_controller.session_store = :active_record_store

Then I restarted the server and fired up a browser. "Huh, that's odd… no sessions in tmp/sessions or /tmp or anywhere I can see. No, they're not in the database…" What I was seeing didn't match up with what all the stuff Google said. The default session store was PStore, aka file system, so they said. But apparently that recently changed in Rails, and now the default is CookieStore. From ActionController::Base documentation:

Sessions are stored in a browser cookie that‘s cryptographically signed, but unencrypted, by default. This prevents the user from tampering with the session but also allows him to see its contents.

Do not put secret information in session!

Well a quick grep -ri session app lib told me that typo wasn't storing
anything secret, so I decided that default was alright with me. Now I don't
have to set up any session cleanup script at all. Sweet.

Now, don't stop there. You should set your session key and secret while you're
hanging out in config/environment.rb. Add the following lines in the same
place as the line you commented out above:

config.action_controller.session['session_key'] = 'something unique'
config.action_controller.session['secret'] = 'get this from rake secret'
Tagged as: , , , , 1 Comment
28Feb/081

Simple RSS

Have you ever thrown together a simple static webpage, only to find down the road that you want to add an RSS feed? What are your options? Maintain an ugly XML file by hand, or migrate to a big slow messy CMS. Yeah, no fun.

Sars is a simple RSS domain specific language. This:

# This is a YAML stream (http://yaml.org) but you don't need to know much YAML
# to get the hang of it.
# There are multiple "documents". The first document is the channel information:
---
title: Foo News Feed
link: http://example.com/foo/
description: News for the Foo Project
webmaster: you@example.com

# The second and subsequent documents are items. The first line is the title,
# the second line is the date, and the rest is the item description (Markdown).
# Because line endings are important, don't forget the pipe character.
--- |
Really Exciting Title
2/28/08 12:00
This is where I pontificate
about the really exciting fish
that is sitting on my plate.

Here's a [download link](http://example.com/foo/foo-1.0.tar.gz).

--- |
Another Item
2/28/08 12:04
You know, it doesn't really matter what order you put them in, since they each
have dates.

becomes this:

<?xml version="1.0"?>
<rss version="2.0">
<channel>
    <title>Foo News Feed</title>
    <link>http://example.com/foo/</link>
    <description>News for the Foo Project</description>
    <lastBuildDate>Thu, 28 Feb 2008 12:07:32 -0700</lastBuildDate>
    <generator>yaml2rss</generator>
    <webMaster></webMaster>

    <item>
        <title>Really Exciting Title</title>
        <description>&lt;p&gt;This is where I pontificate
about the really exciting fish
that is sitting on my plate.&lt;/p&gt;

&lt;p&gt;Here's a &lt;a href=&quot;http://example.com/foo/foo-1.0.tar.gz&quot;&gt;download link&lt;/a&gt;.&lt;/p&gt;</description>
        <pubDate>Thu, 28 Feb 2008 12:00:00 -0700</pubDate>
        <guid>http://example.com/foo//2008-02-28T12:00:00-07:00</guid>
    </item>

    <item>
        <title>Another Item</title>
        <description>&lt;p&gt;You know, it doesn't really matter what order you put them in, since they each
have dates.&lt;/p&gt;</description>
        <pubDate>Thu, 28 Feb 2008 12:04:00 -0700</pubDate>
        <guid>http://example.com/foo//2008-02-28T12:04:00-07:00</guid>
    </item>

</channel>
</rss>

Any questions?

Tagged as: , , , , , , , 1 Comment
28Feb/082

Crème Rappel v2.2

I've released yet again. Go to the web page for the details. Now Crème Rappel has its own RSS feed so I'll shut up about here now.

Tagged as: , , , 2 Comments
21Feb/081

Crème Rappel v2

In the spirit of release early, rewrite often, I have released Crème Rappel version 2. Version 1 was a shell script that combined Growl and at. Then Apple released 10.5.2 not half a week later and broke at altogether. Sick of fighting with launchd and other Apple superiority complexes, I set about to nurture my own superiority complex and rewrite Crème Rappel to be completely independent of at.

Of course, that's getting too heavy for a shell script, so I moved to Ruby. One thing I didn't want was to require a daemon to be running. Daemons can fail or
forget to start up, and that means I couldn't really truly trust the tool. The recent at debacle is just another case in point. So, instead I wrote Crème to fork a process that sleeps until the moment of truth, then fires off the reminder. It turns out the obvious function for the job, sleep(), is a poor choice here. In fact, every timer I tried had the same problem, including one I thought would not: setitimer(). When you suspend the laptop, it appears you also suspend time. If you don't believe me, try this simple experiment:

date; sleep 30; date

Put the laptop to sleep during the sleep for a substantial time, then notice that when you resume you still have to wait for the full 30 seconds to tick by
even though it has actually been a minute plus since you issued the command. So I sleep for one second intervals instead, checking the time every time.

This is not just a backpedaling rewrite, though. It also adds more flexible and easy-to-type timespecs, and a spiffy website.
If you give it a try and it doesn't work, or you struggle with the documentation, please do drop me a line so I can fix it. I want it to be worth
every bit of bandwidth that you paid for it.

5Nov/071

X-Sendfile

I'm writing a little photo gallery of my own, because everything out there stinks. But sending big images files in Rails (using send_file and send_data) is slow, mostly because you tie up a whole rails process just feeding data to the web. Web servers like Apache, Lighttpd, and Mongrel are good at serving static files, let them do it.

That's the idea behind X-Sendfile. If you send an X-Sendfile header with the path of the file you want to send, then a supporting webserver will do the dirty work and do it fast, and you can get on with serving other requests.

That's the theory anyway, but there's some bumps in the road. First, AFAICT
mongrel doesn't support X-Sendfile. This is fine when mongrel is running behind
an Apache proxy which does, but kind of throws a wet blanket on development and
apachephobes like myself. Ok, apachephobe might be a bit strong, but I don't
want to set that monster on my laptop just for some rails development. So mongrel's out. Correct me if I'm wrong.

Lighttpd supposedly invented X-Sendfile, but 1.4.x and earlier don't seem to
support it. Instead, you have to use the header X-LIGHTTPD-send-file. Also, it
doesn't work unless Content-Length is properly set (or perhaps if it's absent).
This is bad news for rails users, since a bug in rails causes the
Content-Length header to be set to the content, which is not the file. If you
do render :nothing => true, then the content is one space character, and the
Content-Length is 1, and Lighttpd defiantly refuses to fix it. So you either
have to work around the rails bug, or upgrade to lighttpd version 1.5.x (now in
release candidate) which supposedly works (I haven't tested it—I can't get it
to compile on Leopard). I say bug in rails, but frankly I'm more inclined to
consider this bad behavior on the part of lighttpd. In that vein, here is a
patch for lighttpd version 1.4.18 that will enable both X-LIGHTTPD-send-file
and X-Sendfile headers with rails 1.2.3 which has the Content-Length resetting
behavior. It makes lighttpd set the Content-Length on its own. Thanks to
stbuehler for the patch.

--- src/mod_fastcgi.c.orig      2007-11-05 13:52:47.000000000 -0700
+++ src/mod_fastcgi.c   2007-11-05 13:55:17.000000000 -0700
@@ -2530,22 +2530,28 @@
                }

                if (host->allow_xsendfile &&
-                                   NULL != (ds = (data_string *) array_get_element(con->response.headers, "X-LIGHTTPD-send-file"))) {
+                                   ((NULL != (ds = (data_string *) array_get_element(con->response.headers, "X-LIGHTTPD-send-file")))
+                                     || (NULL != (ds = (data_string *) array_get_element(con->response.headers, "X-Sendfile"))))) {
                    stat_cache_entry *sce;

                                         if (HANDLER_ERROR != stat_cache_get_entry(srv, con, ds->value, &sce)) {
-                                               /* found */
-                                                con->parsed_response &= ~HTTP_CONTENT_LENGTH;
-
+                                               data_string *dcls = data_string_init();
+                                                /* found */
                        http_chunk_append_file(srv, con, ds->value, 0, sce->st.st_size);
                        hctx->send_content_body = 0; /* ignore the content */
                        joblist_append(srv, con);
-                                       }
-                                        else
-                                        {
-                                               log_error_write(srv, __FILE__, __LINE__, "sb",
-                                                       "send-file error: couldn't get stat_cache entry for:",
-                                                       ds->value);
+
+                                               buffer_copy_string_len(dcls->key, "Content-Length", sizeof("Content-Length")-1);
+                                               buffer_copy_long(dcls->value, sce->st.st_size);
+                                               dcls = (data_string*) array_replace(con->response.headers, (data_unset *)dcls);
+                                               if (dcls) dcls->free((data_unset*)dcls);
+
+                                               con->parsed_response |= HTTP_CONTENT_LENGTH;
+                                               con->response.content_length = sce->st.st_size;
+                                       } else {
+                                               log_error_write(srv, __FILE__, __LINE__, "sb",
+                                                       "send-file error: couldn't get stat_cache entry for:",
+                                                       ds->value);
                                         }
                }
--- src/response.c.orig 2007-11-05 14:08:26.000000000 -0700
+++ src/response.c      2007-11-05 14:04:49.000000000 -0700
@@ -59,7 +59,8 @@
    ds = (data_string *)con->response.headers->data[i];

    if (ds->value->used && ds->key->used &&
-                   0 != strncmp(ds->key->ptr, "X-LIGHTTPD-", sizeof("X-LIGHTTPD-") - 1)) {
+                   0 != strncmp(ds->key->ptr, "X-LIGHTTPD-", sizeof("X-LIGHTTPD-") - 1) &&
+                   0 != strncmp(ds->key->ptr, "X-Sendfile", sizeof("X-Sendfile") - 1)) {
            if (buffer_is_equal_string(ds->key, CONST_STR_LEN("Date"))) have_date = 1;
            if (buffer_is_equal_string(ds->key, CONST_STR_LEN("Server"))) have_server = 1;

Then, you need to configure your lighttpd server. Run script/server lighttpd once to generate config/lighttpd.conf, and add this bit to the fastcgi.server section:

    "allow-x-send-file" => "enable"

Finally, use it—either by setting the X-Sendfile header manually or by using the rails x_send_file plugin (I recommend the latter).

Here's some links for more reading:

3Nov/073

Backticks 2.0

When you've been bitten by spaces and other odd characters in filenames as
often as I have, you begin to get not a little bit paranoid. Beginners often
make this mistake (ruby syntax, but the same thing happens in bash, perl,
python, etc.):

foo = `file #{filename}`

They quickly learn to do this instead:

foo = `file "#{filename}"`

This works fine until they come across a filename with quotes or any other
characters that are special to the shell from within quotes.

What's needed is something akin to what can be done with exec and system.
With those two, you can do something like this and it doesn't matter what
crazy filename is thrown at it, you won't have any trouble:

system 'file', filename

But exec and system don't return stdout, like backticks do. The former
doesn't return at all, and the latter returns the return value. Here is an
idiom that works like glorified array-parameter backticks in ruby:

def backtick(cmd,*args)
  IO.popen('-') {|f| f ? f.read : exec(cmd,*args)}
end

Everything you need to understand that code can be found in ri IO.popen and
ri Kernel.exec. If you can think of a better name than backtick do let me
know. Now our code becomes paranoid-friendly:

foo = backtick('file',filename)

While we're on the subject, a very handy method is Open3#popen3, which is a
bit overkill for this glorified backticks problem but could very well simplify
your life (or mine) in the future.

3Sep/070

QtRuby Arrives

I've extolled the virtues of QtRuby before. As GUI toolkits go, Qt is the best (IMHO) and as languages go Ruby is tops (again, IMHO), so naturally QtRuby has potential. QtRuby's biggest stumbling block in the past has been installation. Easy on Linux, but deep voodoo on OS X and perhaps even worse in Windows. That has finally changed.

qt4-qtruby now uses a cmake-based build system. I installed Qt4 from source on my MacBook, then followed the instructions to install qt4-qtruby and gave it a spin. Everything worked perfectly out of the box.

I can't speak for Windows, but QtRuby on Linux and OS X is now practical. (It has been since June or perhaps earlier, but I just got around to trying it out.) Celebrate!

Tagged as: , , , No Comments