Gnuplot in Action
One of the oldest and most universally useful tools we have is gnuplot. It is also one of the least understood and most underutilized tools we have.
I can hear you now. "What do I need gnuplot for? I don't make graphs." Well that's exactly the problem. Everyone who works with data should be making graphs, and lots of them. Do you write programs that manipulate data? You need gnuplot. Do you want to evaluate performance or traffic on your website? You need gnuplot. Do you want to impress your friends with cool graphs of the growth rates of yeast and bacteria in sourdough or your weight loss and percent body fat? You need gnuplot.
I've been using gnuplot for years. I scraped up enough gnuplot skillz to make basic graphs and it has been invaluable. But I knew gnuplot could do more than I knew how to make it do, and whenever I tried to do something advanced it was only with great pain that I succeeded. Often I failed. Let's face it, gnuplot can be a bear to learn. Why? Well, mostly because of the documentation. Not that there isn't any, almost the contrary. There's a lot of documentation, but it's very much reference documentation. What the world has been lacking is a good introduction to gnuplot that isn't afraid to get nitty-gritty where it needs to, but doesn't just parrot the abundant but obscure documentation that's already out there.
We no longer need to wait. The book is called Gnuplot in Action by Philipp Janert, and it is an absolutely fantastic book. Really, I can't say enough good about it.
Janert walks the fine line between cheesy tutorial and dense reference with the skill of a circus acrobat. The writing is approachable, yet chock full of useful information. Nothing is rushed, but it doesn't plod. The text is sprinkled with beautiful graphs that expand your imagination and open your eyes to the possibilities of gnuplot.
In chapter 2, "Essential Gnuplot", the impatient reader is given a whirlwind tour of gnuplot basics. After just 11 pages you will know everything you need to know for 90% of the graphs you will ever need to create. In fact, you'll know more than I knew when I began reading it—I learned a couple things that I kick myself for never having discovered on my own.
Chapter 3 goes into more detail on dealing with data, and in that chapter I learned a ton. Several of the things I learned in this chapter have saved me numerous hours this semester alone. Chapter 4 picks up the remaining miscellany.
In part 2, all those nagging questions of polish are addressed. This is where I used to spend the most time banging my head against the wall, searching, plodding through various newsgroup threads. "How do I get this or that to look just right?" These types of questions are hard to find answers to in search engines. Janert takes us by the hand and explains each and every question I've ever had and a few I hadn't yet dared to have. Truly beautiful graphs are now within my grasp. What's more, it no longer seems like an exercise in pain but a simple recipe for success. After Janert explains these techniques they seem plain as the nose on your face, yet he's not condescending.
Part 3 dives into the deep dark secrets of gnuplot. 3D plots, color, multiplots, different coordinate systems, fitting, terminals, and a dozen other things you didn't even know that you didn't want to know. No doubt you'll skim this section the first time and come back to it when you need those dark magic tidbits.
Part 4 is arguably the most important part of this book, or perhaps second after part 1. Part 4 is a crash course on graphical analysis. What kinds of graphs you can create, when you should and shouldn't use them, how not to lie with graphs (and how to pick out people lying with graphs), and most importantly, how to go from raw data that you don't understand to organized data that you do understand and have pretty graphs to demonstrate to boot. All with practical examples that you can tweak for your own use.
Finally, there's a gnuplot reference in the appendix. This is a deluxe package and has everything you need to become a gnuplot guru. I am thrilled that this book is coming to dispel the darkness surrounding gnuplot.
I really have no cons to speak of, other than the prerelease PDF I had access to had some minor problems—the sort of problem I would expect to be resolved in the final stages of editing. I don't have experience with other Manning books, but having seen prerelease versions of other books from other publishers I'd say the current copy is par for the course. I'm certain they'll fix those things up and have an outstanding PDF in the end. I recommend springing for the dead tree version though, as I expect the reference at the end of the book and the examples throughout will be more accessible next to your computer instead of on the screen. (You already use quite a bit of real estate running gnuplot and/or editing a gnuplot file and displaying graphs.)
Gnuplot and Missing Data
Let's say you have some data you want to plot with gnuplot:
2007-08-16 119.02264
2007-08-17 120.20198
2007-08-18 121.29
2007-08-19 120.65557
2007-08-20 119.92982
Further suppose you'd like to plot both the weight (those are kilograms) and the BMI (weight/height2 in kg/m2). One way to do that is this gnuplot snippet:
set xdata time
set timefmt '%Y-%m-%d'
set datafile missing "?"
plot 'data' using 1:2 title 'Weight' with lines,\
'data' using 1:($2/3.7249) title 'BMI' with lines
So far so good. Now, what if you have a missing data point? Well, in this contrived example you would leave that line out and all would be well. But let's say you have other columns with data that may not be missing, so you can have a missing data point:
2007-08-16 119.02264
2007-08-17 120.20198
2007-08-18 ?
2007-08-19 120.65557
2007-08-20 119.92982
If you use the same snippet above to plot this data, you get an interesting result. The first plot (weight) is just what you'd expect. It just skips that missing data point, connecting the data from 8/17 and 8/19. But the second plot (BMI) instead leaves a gap between 8/17 and 8/19. This oddly inconsistent behavior is unexpected to mere mortals, but not to gnuplot developers. I quote from help missing:
set datafile missing "?" set style data lines plot '-' 1 10 2 20 3 ? 4 40 5 50 e plot '-' using 1:2 1 10 2 20 3 ? 4 40 5 50 e plot '-' using 1:($2) 1 10 2 20 3 ? 4 40 5 50 eThe first
plotwill recognize only the first datum in the "3 ?" line. It will use the single-datum-on-a-line convention that the line number is "x" and the datum is "y", so the point will be plotted (in this case erroneously) at (2,3).The second
plotwill correctly ignore the middle line. The plotted line will connect the points at (2,20) and (4,40).The third
plotwill also correctly ignore the middle line, but the plotted line will not connect the points at (2,20) and (4,40).
Sourdough Critter Growth Rates
Fellow sourdough freaks, I give you a graph of the growth rate of the two primary sourdough critters, Lactobacillus sanfransiscensis (the sourdough bacteria) and Candida milleri (the wild yeast), by temparature (Celsius), as described in the sourdough FAQ and I believe also in this paper by Gänzl et al. There are graphs in the paper, but they're not superimposed. My imagination is imperfect so I made graphs based on the curve equation given.

The take-home lesson is that the bacteria grow faster than the yeast at higher temperature, so if you want more sour proof at a higher temperature. Rumor has it that "flavor" develops better at cooler temperatures though so you might be trading "flavor" (whatever that is) for more sour. I think they mean the elusive bread faeries that seem to visit when you retard your dough in the fridge or in autolyse or something.
Posted in food | 3 comments |