I've been looking for tools to monitor my ARM server farm, and the best contender so far is Ganglia, which is a tool for monitoring large clusters of computers, and logging their performance over time. You can see an online demo of the Ganglia web UI used to monitor Wikipedia's servers.
Ganglia has three main components:
The monitor needs to be installed and configured on each node that's being monitored. The meta daemon needs to be installed on a master node, where it will collect data from the monitors on the other nodes. Data is transferred using UDP multicasting. The multicast addresses used can be specified in the Ganglion configuration files.
To begin with, I'm just going to install Ganglia on a single Banana Pi. I'm going to use the Ganglia web front end, so I'm going to install Apache and support for PHP5:
I'm not completely sure if php5-json is needed, but it's installed on my server. Now install the necessary Ganglia components:
The configuration file for the Ganglia daemon is in /etc/ganglia/gmetad.conf, and the configuration file for the monitor is /etc/ganglia/gmond.conf. I used Leafpad to edit the monitor configuration file:
and I set the name of the cluster:
Copy the Ganglia sample Apache configuration file to Apache's 'sites-available' directory, and enable the new configuration:
Restart Ganaglia monitor, Ganglia Meta Daemon and Apache:
Now visit http://<your Banana Pi's IP address>/ganglia/ in a web browser, and you should see something like this:
One of the benefits of Ganglia is it means I don't have to have a web server installed on each node that I want to monitor, just the control node.
The downside of Ganglion is that it accumulates a signicant amount of data over time. The RRD tools database that's used to store the data has an averaging function which reduces the amount of space needed to store older data (but also reduces the resolution of that data), but the database still occupies quite a lot of space. My Ganglia master node will need its own hard disk.
See also: http://ganglia.info/
Ganglia can be used to monitor groups of clusters. Ganglia-monitor needs to be installed and configured on each node that's being monitored. The Ganglia meta daemon (gmetad) runs on a master node and connects to the monitor processes to collect system information. A group of clusters monitored by a single Ganglia server is called a grid.
I installed the following packages on the Ganglia master node:
The next step is to edit /etc/ganglia/gmetad.conf:
Ganglia uses different ports to distinguish between different clusters. I'm using port 8650 for the database cluster, port 8655 for a small cluster of control nodes, and port 8656 for the cluster of cluster servers. I've set up three data_source lines which specify the cluster names, refresh interval, and a list of hosts in each cluster:
I uncommented the gridname directive and set it to "ARM_Farm":
I also edited the list of trusted servers to include each IP address:
Next I copied the Ganglia Apache configuration file to Apache's config directories and enabled it:
Restart Apache, the monitor, and server processes:
Install ganglia-monitor on each client:
Enter the name of the cluster:
Edit the port number for the udp send and recv channels, and for the tcp connection port:
Restart the ganglia-monitor process:
There are some situations where UDP multicasting won't work. The most common reason for this is that some ethernet switches don't support it. Unicasting can be used in stead. You need to comment out the lines with the mcast_join address, set the bind IP address to the server's IP address in the recv channel, and the host IP address (also the server's IP address) in the send channel.
You may have to wait a minute or two before all nodes show up in the UI, but eventually when you visit Ganglia's url in your browser (http://<your Banana Pi's IP address>/ganglia/), you should see the Ganglia grid overview (see screen shot above) in your browser.
Note that you can use the links in the top left hand of the page to move between different views and select individual nodes.
Detailed information is available about each node's resources. These graphs show a node's CPU usage:
These graphs show information about a node's network utilization:
Share this page: