The Fugue Counterpoint by Hans Fugal


Sensible Graphs with Cacti

I love Cacti. It's an excellent tool for visualizing interesting statistics like bandwidth usage, CPU and load average, memory usage, etc. It's relatively straightforward to set up, if slightly klunky, and it takes a lot of guesswork out of questions that are otherwise difficult to answer. (I should note here that Cacti is a sort of front-end to RRDtool which does all the hard work as far as the visualization is concerned.)

But some of the default graphs that come with Cacti are absolute rubbish. I took it upon myself to fix the two worst offenders this week: the load average graph and the memory usage graph. Let's compare, shall we?

Here's the default load average graph:

default load average graph

This graph is just plain wrong. It stacks the load averages one on top of the other which makes it impossible to get a real reading for the 5 and 15 minute averages, and makes things look worse than they are. If that textual explanation went over your head, compare with this repaired load average graph and all will be made clear:

my load average graph

Wow, you can actually see how the averages are, well, averages. Funny thing about proper graphs.

This change is simple enough to do yourself so I won't provide a template download in the interest of expanding your mind (hopefully without exploding your skull). Right after I show you my pretty memory usage graph, that is.

First, let's see the default memory usage graph:

default memory usage graph

If you can tell what that graph is saying at a glance, you're better than I. This one doesn't so much lie as beat around the bush. The vital information is there, if you know how to read it. The key is that the stuff you see totals the RAM that is available for programs to consume (free+buffers+cache), so the smaller the area of the graph, the less memory you have available. It also doesn't show swap. Swap is available on another graph (also in terms of free swap not swap used), but on a separate graph you miss out on the relative comparison.

Here's the memory graph I came up with:

my memory usage graph

I think it is self-explanatory and that it has all the information you could ask of a memory usage graph presented in the clearest possible way. Maybe I'm a bit biased, but you have to admit it's better.

So how do we modify and create graphs in Cacti for fun and profit? Let's begin with the load average graph. No, scratch that. Let's begin with some terminology.

Cacti has graph templates that define what the graph will look like. We'll spend a lot of time creating and modifying those. It also has data templates for telling it how to get the data (e.g. the SNMP OID or the script to run). You use a data template to create a data source which actually fetches and stores that data, and you use a graph template to create a graph that is associated with a device (host) and its data sources. Data sources are usually created automatically when you create a graph. There's one more oddball thing called a CDEF which is basically a rudimentary RPN calculator that you have to define the expressions for ahead of time in the most excruciatingly painful way. But we'll need a couple for the memory usage graph.

SNMP stands for Simple Network Management Protocol, which naturally means that it's the antithesis of simple and that it is mostly used for monitoring instead of management (though you can indeed use it for management, which is way beyond the scope here). The short of it is, you have devices that talk SNMP and you can get info about interesting things that you'd like to graph with Cacti over the network. If you have a linux box, it can be made to talk SNMP by installing Net-SNMP and configuring it.

SNMP version 3 is a complicated mess to configure because you have to have a PhD in network security to understand its authentication schemes (in which case you might conclude that it's not secure enough). Versions 1 and 2c are both sufficient for my needs, and from our point of view they're essentially identical and simple enough to explain. I'll assume you use version 2c. There's a cleartext password for read-only access and optionally one for read-write access (for that management thing that we don't do). In order to keep things (anti)simple, they're not called passwords but rather "community strings". The default community strings for when you really can't be bothered to change them are "public" and "private", and most SNMP devices come with these defaults preset. What's that? You didn't realize you had several (dozens?) of devices on your network just waiting for some bored employee to start playing with its settings from the comfort of his workstation because you didn't change the default read-write community string? Well, you do.

Here's the snmpd config file I use, which I don't mind sharing because the only way you can get to it is over my LAN or my VPN, and it's read-only anyway and I have no secrets about my host stats.

rocommunity  yoursecrethere
syslocation  "Las Cruces"
sysservices 79

If you can't figure out how to tweak the configuration file included with your distro (which is no doubt hundreds of lines long with loads of comments), you can replace it with something like that and you'll be up and running with SNMP version 2c.

Ok, now you can install Cacti. Then create a device using the ucd/net SNMP device template for the host you want to monitor (you don't technically have to do that with localhost but you'd have to modify my graphs to use the non-SNMP data sources). When the device is created and it says it was able to connect to it ok, then you can create graphs for the device. Go ahead and create the "ucd/net - Load Average" graph. Then you'll no doubt dash over to the graphs "tab" and be totally dismayed that the graph seems broken. Fear not, it'll show up once it's had some time to gather data (check back in 5 minutes).

In the meantime we can go fix the load average graph template. Any changes we make will apply to the graph we just created as well as any new graphs we create with that template. Go to "Graph templates" on the left then find the graph of interest and click on its name. Take a moment familiarizing yourself with this page, then click on the 5 minute average item to edit it. Here you change the graph item type from STACK to LINE1. I also changed the color to 002ABF which shows up better. Do the same for the 15 minute average item (LINE1, I left the color alone). Now go refresh your graph and you'll see the changes. Et voilà, you are a Cacti graph template hacker. At this point you may feel the irresistable urge to change the colors of some of the more ugly but functional graphs, and I won't hinder you. I'll wait right here.

Ok, the memory usage graph is a bit more work. I won't take you through it step by step but I'll point out a couple of gotchas that I encountered when creating it. First, I realize that others have made memory usage graphs and provided them on forums and such to download. After the third one failed to work I decided it was better to just make my own. Hopefully mine will work for you—I put a bit of effort into making sure it would import cleanly.

There's actually a reason why the memory usage graphs are so backwards: because most devices provide total and free stats but not used stats. Obviously they expect you to calculate used yourself. So directly graphing the bits provided by SNMP was the easy way out.

We, on the other hand, have chosen the path of pain. We need to calculate memory used (which is total-(free+cache+buffers)). We could do this with a script but that's sticky and not very portable (depending on the target distro, version of Cacti, etc.). The better thing is to use a CDEF. If you click on graph management the CDEFs link is revealed. We want a CDEF that calculates (total-free-cache-buffers)*1024 (the sources are kilobytes). Now, a CDEF uses a positional reference system. The first data source used by your graph is a, the second is b, and so on. So the CDEF string will look something like d,a,-,b,-,c,-,1024,*. But here's where things get dodgy—it's hard to know what order the data sources will settle on until after you've created the graph. If you create the graph in the right order (no shuffling) and you realize that the AVERAGE and MAX consolidation functions create separate data source (but not LAST), and who-knows-what other pitfalls, then you can be confident ahead of time. Or, you can just create the template, create a graph using the template, and look at the graph debug output to figure out which source is which.

So now you create a new graph template, and referring to a template similar to what you want you fill in all the right fields, leave most at their defaults, add graph items, tweak and refresh a sample graph using your template a gazillion times, go back and forth with the CDEFs getting things right, then create new (temporary) graphs to make sure it works.

Luckily for you, if all you want is a cool memory graph, I did all this for you. Download and import my memory usage graph template, create a graph, and in a day or so you'll have a memory usage graph as pretty as mine. Oh, alright, I'll provide a load average template for you as well.

Comments (37) Trackbacks (2)
  1. in your template cacti_graph_template_ucdnet_-_memory_usage_hans.xml is an error in CDEF’s custom string definition…

  2. Hi,
    I have memory graph but it is showing NAN value for “Swap Current”. by any chance can you tell me what is wrong and why it is not showing any value for the “Swap”?
    Papia Rahman

  3. what kind of definition “CDEF” will be expecting?

  4. Hi,
    I have imported template that you have provided on your site. But when I try to plot graph using your template it does not reflect memory usage properly. I created New host with Local Linux host, I also tried by creating host with Generic SNMP Host template & apllied your template but it shows only Buffer memory & Swap memory. It does not show Used, Cache & Free Memory. Where am I missing to get the graphs?

  5. I have received a few emails as well from people who have tried this template. All I can say is that this sort of thing in cacti is a bit fragile and I can’t know where you went wrong. I did my best to make the template as portable as possible, but it’s going to take some thought and effort on your part unless you’re quite lucky.

    That said, I’m happy to go on the clock at my going rate and help anyone out, if you can give me (temporary) access to your cacti installation.

  6. Hi,
    I have imported your graph. But it only show me the Swap and Free field and rest 3 gives -NaN. I configured this to pull info of a Solaris x86 box. Any idea why this happening?? Thank you.

  7. The problem with the template xml is that the data source for graph template item 1-4, 17-22 falls away.

  8. The problem with the template xml is that the data source for graph template item 1-4, 17-22 falls away.

  9. I think there is something wrong with the CDEF:s. Even if i change the rrd file behind the data, i get no graphs.

  10. I don’t know what you’re trying to say about things falling away, Daniel.

    Cacti templates in general are very fragile. Frequently while testing these there would be a problem with CDEFs or data sources even on the same system they originated on. I did due diligence to make sure they were generic, and tested on several other cacti installations to make sure they were generic.

    If they don’t work for you, the problem is most likely on your end. Which isn’t to say you’ve done anything wrong, it’s just the nature of the beast. I can’t troubleshoot it over email or blog comments, but if you want to hire me to get in your installation and troubleshoot do drop me a line. Otherwise, stick at it—I’m sure you’ll figure it out. Feel free to leave notes on what you had to do here for posterity. And if you do find something wrong with my templates, I accept patches.

  11. I mean that the data sources for those items hadnt been registered. It would help me a lot if you could post the names of those.
    That cacti is fragile is sadly something i have seen lots of times and im beggining to think about switching to zenoss or something else.

  12. Here’s an excerpt from my graph, when I select “Graph Debug”

    DEF:a="/var/lib/cacti/rra/gwythaint_mem_buffers_145.rrd":mem_buffers:AVERAGE \
    DEF:b="/var/lib/cacti/rra/gwythaint_mem_buffers_145.rrd":mem_buffers:MAX \
    DEF:c="/var/lib/cacti/rra/gwythaint_mem_cache_146.rrd":mem_cache:AVERAGE \
    DEF:d="/var/lib/cacti/rra/gwythaint_mem_cache_146.rrd":mem_cache:MAX \
    DEF:e="/var/lib/cacti/rra/gwythaint_mem_free_147.rrd":mem_free:AVERAGE \
    DEF:f="/var/lib/cacti/rra/gwythaint_mem_free_147.rrd":mem_free:MAX \
    DEF:g="/var/lib/cacti/rra/gwythaint_mem_total_148.rrd":mem_total:MAX \
    DEF:h="/var/lib/cacti/rra/gwythaint_swap_free_149.rrd":swap_free:MAX \
    DEF:i="/var/lib/cacti/rra/gwythaint_swap_total_150.rrd":swap_total:MAX \
    DEF:j="/var/lib/cacti/rra/gwythaint_swap_free_149.rrd":swap_free:AVERAGE \

  13. well my graph look like this:
    DEF:a=”/var/www/html/cacti/rra/mysql_optest_mem_buffers_301.rrd”:mem_buffers:AVERAGE \
    DEF:b=”/var/www/html/cacti/rra/mysql_optest_mem_cache_302.rrd”:mem_cache:AVERAGE \
    DEF:c=”/var/www/html/cacti/rra/mysql_optest_mem_free_303.rrd”:mem_free:AVERAGE \
    DEF:d=”/var/www/html/cacti/rra/mysql_optest_swap_free_1338.rrd”:swap_free:AVERAGE \
    DEF:e=”/var/www/html/cacti/rra/mysql_optest_swap_total_1339.rrd”:swap_total:AVERAGE \

    i have tried to restore your settings by checking in the database, but failed.
    what i want is for you to go in i “Graph templates” and choose your memory usage template. There you will see graph template items. If you click the items you will see what data sources is behind the items.
    I want the name of the data sources for the items 1-4, 17-22.

  14. There aren’t any data sources for items 1-4, 17-22. They are calculated by CDEF from the data sources for the other items. That’s because there is no memory used data source, or swap used data source. (Which is why the cacti default graphs are so silly—they took the easy road.)

    Order is important for the CDEFs, which just go by letters. So when you create the graph make sure your data sources are, in this order: mem_buffers, mem_cache, mem_free, mem_total, swap_free, swap_total.

  15. RRDTool Command:

    /usr/bin/rrdtool graph – \
    –imgformat=PNG \
    –start=-86400 \
    –end=-300 \
    –title=”mda-vm1a – Memory Usage” \
    –base=1000 \
    –height=120 \
    –width=500 \
    –alt-autoscale-max \
    –lower-limit=0 \
    –vertical-label=”” \
    –slope-mode \
    –font TITLE:12: \
    –font AXIS:8: \
    –font LEGEND:10: \
    –font UNIT:8: \
    DEF:a=”/var/www/cacti/rra/mda-vm1a_mem_buffers_88.rrd”:mem_buffers:AVERAGE \
    DEF:b=”/var/www/cacti/rra/mda-vm1a_mem_cache_89.rrd”:mem_cache:AVERAGE \
    CDEF:cdefa=g,a,-,c,-,e,-,1024,* \
    CDEF:cdefe=a,1024,* \
    CDEF:cdefi=b,1024,* \
    CDEF:cdefbc=a,1024,* \
    CDEF:cdefbg=i,h,-,1024,* \
    CDEF:cdefcc=a,1024,* \
    CDEF:cdefcd=a,1024,* \
    AREA:cdefa#FF4105FF:”Used” \
    GPRINT:cdefa:LAST:” Current\:%8.2lf%s” \
    GPRINT:cdefa:AVERAGE:”Average\:%8.2lf%s” \
    GPRINT:cdefa:MAX:”Max\:%8.2lf%s\n” \
    AREA:cdefe#00FF004C:”Buffers”:STACK \
    GPRINT:cdefe:LAST:”Current\:%8.2lf%s” \
    GPRINT:cdefe:AVERAGE:”Average\:%8.2lf%s” \
    GPRINT:cdefe:MAX:”Max\:%8.2lf%s\n” \
    AREA:cdefi#0000FF4C:”Cache”:STACK \
    GPRINT:cdefi:LAST:” Current\:%8.2lf%s” \
    GPRINT:cdefi:AVERAGE:”Average\:%8.2lf%s” \
    GPRINT:cdefi:MAX:”Max\:%8.2lf%s\n” \
    AREA:cdefbc#FF39324C:”Free”:STACK \
    GPRINT:cdefbc:LAST:” Current\:%8.2lf%s” \
    GPRINT:cdefbc:AVERAGE:”Average\:%8.2lf%s” \
    GPRINT:cdefbc:MAX:”Max\:%8.2lf%s\n” \
    AREA:cdefbg#0000007F:”Swap”:STACK \
    GPRINT:cdefbg:LAST:” Current\:%8.2lf%s” \
    GPRINT:cdefbg:AVERAGE:”Average\:%8.2lf%s” \
    GPRINT:cdefbg:MAX:”Max\:%8.2lf%s” \

    RRDTool Says:

    ERROR: invalid rpn expression in: g,a,-,c,-,e,-,1024,*

    Maybe this could be fixed in your template? It’s Greek to me, so I don’t have an answer unfortunately :-)

  16. You’re looking in the right place – you see that the problem is the lack of DEFs. You only have two. This isn’t something that is handled in the template, but rather you hooking up the data sources to the graph.

  17. Sorry, I have no idea what you just said :-) How does one “hook up a data source to a graph”? It sounds like there are some extra steps to using your template than simply specifying it under Graph Management – Graph Template Selection. What do I need to do to get the graphs drawn with your template?

  18. Unfortunately there’s no subsitute here for a thorough understanding of Cacti concepts. Some quality time with the manual will be of great help.

  19. How can I run cacti verbose query locally in my cacti box command console?
    I want to make a script and place it in the cronjob so I wouldn’t need to update it manually, by clicking on verbose query in cacti web interface, whenever I change any port names on the switch. Any one knows how?


    According to one of the cacti developers:

    “The given CDEF refers to other DEFs (data sources) by position (in this case: by letters >= c). But those letters do not show up in that graph statement.
    We recently fixed an error with 087b, where 087b unfortunately created a lot more DEFs than required. If the given template was tailored to 087b, it should be changed to 087d by the original author”

    Is there any chance this can get fixed? The whole purpose of a template is so that people who use cacti do not have to get a Ph.D in atomic rocket surgery to use it.

  21. That would explain the pain and frustration we’re all feeling, thanks. I will happily post an updated template for it here if someone sends me one that works for 087d. When I end up upgrading to 087d I will do it myself – but when exactly I do that is unknown but probably at least October. (Ubuntu 9.04 appears to still have 087b, but if someone sends me a .deb or a link to a .deb for 087d that works with Ubuntu 8.10 or 9.04 I may be able to do it sooner.)

  22. The article shows the correct cdef to use with 0.8.7d, its just not correct in the template. Editing the cdefs as follows seems to do the trick:

    Memory Used (Hans)
    item1: d,a,-,b,-,c,-

    Swap Used (Hans)
    item1: e,f,-

  23. Thanks Billy. I’ve made that change blindly, if someone can redownload it and verify that it imports cleanly in 087d that would be great.

  24. Hey Hans,

    The updated templates work perfectly with 087d.

  25. Really simple change for the Load Avg graph. Never really thought of doing that way, always accepted the default. Thanks for the tip.

  26. The memory graph is a bit strange. First the cdef for the swap should be f,e,- otherwise it will become a negative value, it did here at least. Secondly item #24 in the graph template should be removed, because it fills no purpose and breaks the graph(it adds a stack with swap free). I’ve tried this in 087d btw.

  27. I confirm these changes work in 87d

    Memory Used (Hans)
    item1: d,a,-,b,-,c,-

    Swap Used (Hans)
    item1: f,e,-
    (not e,f,- otherwise swap is negative amount)

    • I confirm the:
      “Swap Used (Hans)
      item1: f,e,-
      (not e,f,- otherwise swap is negative amount)”
      In 087g it is negative with e,f,- and works properly with f,e,-

  28. Hello Hans,

    Thanks for you awesome Graph Templates!

    Regards, Coert from South Africa

  29. I’ve been exposed to cacti but haven’t tried configuring it yet. Good to know the graph is modifiable for usability specially for those who are new to it. I for one was misled at first on the stacked graph.

    Thanks a lot :)

  30. I have been finding on my cacti install that I get NaN for any new graph I add (including these ones) and need to go management->data sources then load each relevant source for the new graph and hit save without changing anything. I had to do this originally, and just do it again with these templates. Now things look nice and useful on my new server. Thanks for the templates!

  31. Great work, these are very nice touches and thanks for sharing! They worked fine on my cacti server – I only got NaN before data was collected. If you get NaN try waiting 30 minutes or so.

  32. Excellent description, many thanks. There was one thing else I had to do: in Ubuntu 10.04 with Cacti 0.8.7e, the load average graph includes a fourth black line which is the total of all the other data sources:

    Item #7 (No task): Total LINE1 000000

    You just need to delete this by clicking the ‘X’ on the right to get the correct finished graphs.

  33. Nice template Hans, thanks.

    If anybody is using this to monitor a system with more than 100GB of memory and getting NaN values, you just need to alter the “Data Source Item Maximum Value” in the data template to the size you need, then either delete or use rrdtool to tune your rrd files.

  34. rrdtool tune xxx.rrd –maximum dataSource_name:200000000

  35. I am struggling to setup cacti in one of my solaris 10 servers…. pls help

  36. Very nice write-up. I absolutely appreciate this website. Keep writing!

Leave a comment