Raspberry Pi, Nagios, and a Classic

Started by midnite_andy, August 29, 2013, 10:51:42 PM

Previous topic - Next topic

midnite_andy

Hi Everyone,
I will be demonstrating my custom monitoring project here. I finally received a Pi here at MidNite's engineering lab and I have some tricks to show off.  I am using a Classic we use for development up here in the engineering lab.

Nagios is the industry standard in IT infrastructure monitoring! I will apply Nagios software which is free and open-source to monitor a Classic installation over the network or internet.
http://www.nagios.org

Of course we are working with Rasperry Pi here:
http://www.raspberrypi.org/

midnite_andy

I am starting on a fresh Pi with default Raspbian (wheezy) software. Here is how to install Apache and Nagios.

pi@raspberrypi ~ $ sudo apt-get install nagios3

Nagios will ask to install the Apache web server also.  The software will ask for the nagiosadmin password.  After the software is installed, I determine my Raspberry Pi's IP address and then open the default web page on another machine in my LAN network.

pi@raspberrypi ~ $ ifconfig
eth0      Link encap:Ethernet  HWaddr b8:27:eb:73:45:6c 
          inet addr:192.168.1.169  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:64571 errors:0 dropped:0 overruns:0 frame:0
          TX packets:18094 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:46243726 (44.1 MiB)  TX bytes:1516661 (1.4 MiB)

lo        Link encap:Local Loopback 
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:102 errors:pi@raspberrypi ~ $ sudo cp ./work/nagios_modcache.cfg /usr/local/modcache0 dropped:0 overruns:0 frame:0
          TX packets:102 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:9852 (9.6 KiB)  TX bytes:9852 (9.6 KiB)

Here are the screenshots of http://192.168.1.169 and http://192.168.1.169/nagios3 (user: nagiosadmin / pass: *****)

midnite_andy

#2
I am going to dive right into the good stuff before we go back and clean things up.  At MidNite we have a Classic charge controller in our engineering lab.

I am going to use Nagios to monitor this Classic.  In steps:
1. Write Modbus caching program in Python.
2. Install and configure Modbus caching program as a Cron job.
3. Configure Nagios alerts
4. Install and configure Nagios graphs

Let me explain the Modbus caching program.  The Classic allows only one Modbus network connection to be used by monitoring software.  In order to allow multiple programs running on the Pi access to the Modbus register data, my Modbus caching program will connect to the Classic at a regular interval and download a list of desired registers to be stored locally.  When a program wants access to Classic data, the cache file can be used in place of connecting to the Classic.  The cache is updated in a Cron script at a configurable interval.  Any number of Classic charge controllers can be cached, with each one having a different cache file.

This is Python code and will require the pymodbus module.  I am now going to install the modcache program in the folder /usr/local/modcache and get it configured to run as a Cron job on the Pi.

I am going to install modcache files in the directory /usr/local/modcache.  I will copy the files in one at a time to show how they are setup and used.

Install the pymodbus module by following the directions at their web page or install the distribution package:
https://code.google.com/p/pymodbus/

pi@raspberrypi ~ $ sudo apt-get install python-pymodbus

Now create the directory, copy in the Python files, and make them executable.

pi@raspberrypi ~ $ sudo mkdir /usr/local/modcache

Modcache is the program that does the caching of the registers.
pi@raspberrypi ~ $ sudo cp ./work/modcache.py /usr/local/modcache
pi@raspberrypi ~ $ sudo chmod +x /usr/local/modcache/modcache.py

Readmod is a program that reads the cached registers.
pi@raspberrypi ~ $ sudo cp ./work/readmod.py /usr/local/modcache
pi@raspberrypi ~ $ sudo chmod +x /usr/local/modcache/readmod.py

Now I need to setup the Cron script for Modcache.  In order to make this easy to call with Cron, I have created a Bash script to take care of some details.

File: modcache_engclassic.sh
#!/bin/bash
MCHOST=192.168.1.62
MODCACHEPATH=/usr/local/modcache/modcache.py
CACHEDIR=/usr/local/modcache
$MODCACHEPATH --cachefile $CACHEDIR/$MCHOST.csv --registerfile $CACHEDIR/$MCHOST.reg --ip $MCHOST


This Bash script would be copied with the IP address changed if I wanted to monitor a second (or more) Classic.  The IP address of the Classic here at MidNite's engineering lab is 192.168.1.62.  Just copy that puppy in there.

pi@raspberrypi ~ $ sudo cp ./work/modcache_engclassic.sh /usr/local/modcache/
pi@raspberrypi ~ $ sudo chmod +x /usr/local/modcache/modcache_engclassic.sh

I need a list of registers to grab.  Here is your most basic register list to be extended later:
File: 192.168.1.62.reg
BATT_VOLTS,4114
PV_VOLTS,4115
BATT_WATTS,4118
INFO_FLAGS_L,4129
INFO_FLAGS_H,4130
BATT_TEMP,4131
FET_TEMP,4132
PCB_TEMP,4133


pi@raspberrypi ~ $ sudo cp ~/work/192.168.1.62.reg /usr/local/modcache/

Lets try running modcache_engclassic.sh to see what the cache file looks like.

pi@raspberrypi /usr/local/modcache $ sudo ./modcache_engclassic.sh
pi@raspberrypi /usr/local/modcache $ cat 192.168.1.62.csv
BATT_TEMP,266
BATT_VOLTS,511
BATT_WATTS,0
FET_TEMP,309
INFO_FLAGS_H,12800
INFO_FLAGS_L,12292
PCB_TEMP,372
PV_VOLTS,1578
Status,Success
Time,1377731093.080432


The cache file is called 192.168.1.62.csv and it contains the registers that grabbed including battery temperature and battery volts in tenths of a unit.  The Status entry is “Success” to indicate that modcache.py was able to get data from the Classic.  The Time entry is a timestamp that will allow checking the cache file for age as well as if the data is valid.

Now I install the modcache_engclassic.sh in Cron to do the regular monitoring.

pi@raspberrypi ~ $ sudo vim /etc/crontab

Add the following line with comments to the file /etc/crontab and save:

# Check Modbus devices every 4 minutes
# We set Nagios to check every 5 (that way Nagios stays a little behind the updates)
*/4 *   * * *   root    /usr/local/modcache/modcache_engclassic.sh

Now restart Cron to save the changes:
pi@raspberrypi ~ $ sudo service cron restart

Now you should see that the /usr/local/modcache/192.168.1.62.csv file's timestamp is updated every 4 minutes with new values.  The next task is to setup the Nagios check scripts to check the Modbus registers in the cache file for warnings or faults.  I chose to write these in Bash.

Copy in the new files.

pi@raspberrypi ~ $ sudo cp ./work/check_mc_bit.sh /usr/local/modcache
pi@raspberrypi ~ $ sudo cp ./work/check_mc_perfdata.sh /usr/local/modcache
pi@raspberrypi ~ $ sudo cp ./work/check_mc_scale.sh /usr/local/modcache
pi@raspberrypi ~ $ sudo cp ./work/check_mc.sh /usr/local/modcache
pi@raspberrypi ~ $ sudo cp ./work/nagios_modcache.cfg /usr/local/modcache
pi@raspberrypi ~ $ sudo chmod +x /usr/local/modcache/*.sh

A Nagios check script is a script that verifies that something is working properly and then returns 0 for OK, 1 for Warning, 2 for Critical, and 3 for Unknown.  Our check scripts will check for the presence of faults at the Classic and report those to Nagios.

We also have to define the Nagios command formats which are in the nagios_modcache.cfg file.  In order for Nagios to use this file we have to create a symbolic link to the Nagios directory.

pi@raspberrypi ~ $ sudo ln -s /usr/local/modcache/nagios_modcache.cfg /etc/nagios-plugins/config/modcache.cfg

Finally, we can setup the Classic in the Nagios hosts list and list all of the checks we want to perform.

pi@raspberrypi ~ $ sudo cp ./work/midnite_devices.cfg /usr/local/modcache
pi@raspberrypi ~ $ sudo ln -s /usr/local/modcache/midnite_devices.cfg /etc/nagios3/conf.d/midnite_devices.cfg

Reload the Nagios configuration files.

pi@raspberrypi /etc/nagios3/conf.d $ sudo service nagios3 reload

OK, here is the screenshot, things are not perfect yet.  There is more to be done.

Westbranch

Looks good Andy, can you make a parallel posting with all the new acronyms.  you started losing me with nagios then Cron and BASH...?


thanks
KID FW1811 560W >C&D 24V 900Ah AGM
CL150 29032 FW V.2126-NW2097-GP2133 175A E-Panel WBjr, 3Px4s 140W > 24V 900Ah AGM,
2 Cisco WRT54GL i/c DD-WRT Rtr, NetGr DS104Hub
Cotek ST1500 Inv  want a 24V  ROSIE Inverter
OmniCharge3024  Eu1/2/3000iGens
West Chilcotin 1680+W to come

midnite_andy

#4
Now to install the nagiosgraph plugin.  Download the latest source files from http://sourceforge.net/projects/nagiosgraph/files/

Install a required package:
pi@raspberrypi ~/work/nagiosgraph/nagiosgraph-1.4.4 $ sudo apt-get install librrds-perl

Unpack the nagiosgraph-1.4.4.tar.gz file in a temporary directory.  Run the install script to check for the required dependencies.

pi@raspberrypi ~/work/nagiosgraph/nagiosgraph-1.4.4 $ ./install.pl --check-prereq
checking required PERL modules
  Carp...1.20
  CGI...3.52
  Data::Dumper...2.130_02
  File::Basename...2.82
  File::Find...1.19
  MIME::Base64...3.13
  POSIX...1.24
  RRDs...1.4007
  Time::HiRes...1.972101
checking optional PERL modules
  GD... ***FAIL***
checking nagios installation
  found nagios at /usr/sbin/nagios3
checking web server installation
  found apache at /usr/sbin/apache2

It looks like we have everything but the optional PERL modules.  So run the nagiosgraph installation script.  The default values should work for all of the settings.  But do select “y” to modify the Nagios and Apache configurations at the end so that we don't have to do that manually.

pi@raspberrypi ~/work/nagiosgraph/nagiosgraph-1.4.4 $ ./install.pl --layout debian

Oops, looks like we need one more program:
pi@raspberrypi ~ $ sudo apt-get install bc

Also, change the update period to 5 minutes at the end of /etc/nagios3/nagios.cfg:
service_perfdata_file_processing_interval=5

Now I can restart Apache:
pi@raspberrypi ~ $ sudo service nagios3 restart
pi@raspberrypi ~ $ sudo service apache2 restart

I have some problems, the plots aren't showing, so I am going to work on the settings to see if I can figure it out.

Change the following lines in /etc/nagios3/nagios.cfg:
check_external_commands=1
process_performance_data=1


Run the following commands to setup the external command processing:
pi@raspberrypi ~ $ sudo service nagios3 stop
pi@raspberrypi ~ $ sudo dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw
pi@raspberrypi ~ $ sudo dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3
pi@raspberrypi ~ $ sudo service nagios3 start

Success!  My battery temperature is now showing up in the plots.

midnite_andy

Thanks for the interest Westbranch, I will try to put in some more explanation after I finish the big stuff.
-Andy

TomW

Quote from: Westbranch on August 29, 2013, 11:24:24 PM
Looks good Andy, can you make a parallel posting with all the new acronyms.  you started losing me with nagios then Cron and BASH...?


thanks

I am not Andy, nor do I play him on TV..

Been using Linux for a couple decades and Debian and variants for most of that time.  I never used Nagios but I can address a few of the "new acronyms".

CRON is a system in Linux that runs programs at a specified time. Very handy and I use it quite regularly.

bash is a Linux "shell interpreter"  it processes your commands / scripts. There are several but it seems bash is the most common.

Just what I can help with now.

Andy, have you experienced any Classic communication lock outs running Naglios on the PI? I ask because Ross and I are hunting them down here. I stopped using my PI over ethernet to the Classic and am testing our theory that it is either the PI or the PI and the Classic together that gets errors that lock out communications.  Been running Ross' script on an X86 laptop Ubuntu install with no errors for a few days where the PI doing the logging would have been locked out of the Classic and the Local App would have been blocked., also.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

midnite_andy

#7
TomW,
I just got this up and running yesterday.  I have not seen any unexpected behavior regarding the Classic.  I suspect that the problem is simply that too many connections are being opened with the Classic at one time.  It can be a bit tricky, which is why I created the modcache.py register caching program.  I will keep you updated.

Today I have changed the following line in /etc/nagios3/nagios.cfg:
command_check_interval=15s

Here are some screenshots of the Pi's web page from this morning:

TomW

Quote from: midnite_andy on August 30, 2013, 12:43:55 PM
TomW,
I just got this up and running yesterday.  I have not seen any unexpected behavior regarding the Classic.  I suspect that the problem is simply that too many connections are being opened with the Classic at one time.  It can be a bit tricky, which is why I created the modcache.py register caching program.  I will keep you updated.

Andy;

Well I am installing nagio3 and will try running it from the PI and see what develops. If it works OK I can eliminate the PI / Classics as the issue.

Thanks.

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

midnite_andy

Good luck!  I would consider this an advanced level DIY project.  I have included all my files, you will need to change a few things here and there for your system.

TomW

Quote from: midnite_andy on August 30, 2013, 01:42:35 PM
Good luck!  I would consider this an advanced level DIY project.  I have included all my files, you will need to change a few things here and there for your system.

Andy,

Well, typical of a Debian derived system the non repository application installation failed. Lots of files not found and not anywhere on the system. Too "advanced" for today to test the theory of the issues with ethernet comms. Took forever to run the apt installation process, too

Nice idea but, as you say, not exactly "easy" on my install of wheezy. And it changed where my web directory lives. Dirty trick, that one.

Back to the laptop. :o

Tom
Do NOT mistake me for any kind of "expert".

( ͡° ͜ʖ ͡°)


24 Trina 310 watt modules, SMA SunnyBoy 7.7 KW Grid Tie inverter.

I thought that they were angels, but much to my surprise, We climbed aboard their starship and headed for the skies

midnite_andy

A few more changes:

Here is the new text in my/etc/nagios3/conf.d/services_nagios3.cfg file:
# check that web services are running
define service {
        host_name                       localhost
        service_description             HTTP
        check_command                   check_http
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

# check that ssh services are running
define service {
        host_name                       localhost
        service_description             SSH
        check_command                   check_ssh
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}


Here is my new /etc/nagios3/conf.d/hostgroups_nagios2.cfg file:

# Some generic hostgroup definitions

# A list of your Raspberry Pi servers.
define hostgroup {
        hostgroup_name  pi-servers
        alias           Raspberry Pi Servers
        members         localhost
}




zoneblue

#12
Can you say something about why you chose nagios? Is it something like mango?

Tom, what about using stephenvs python script as a backend for your raspberry test. I was thinking of trying it on the cubie to rule out newmodbus. Pythons not my thing but i figure i could hack it enough to dump the registers betw 4100 and 4300 easy enough. Just not enough hours ni the day.

apt-get install python-pymodbus, then run the python script out of cron.

https://github.com/continuumsecurity/IslandManager/blob/master/scripts/classic.py
6x300W CSUN, ground mount, CL150Lite, 2V/400AhToyo AGM,  Outback VFX3024E, Steca Solarix PL1100
http://www.zoneblue.org/cms/page.php?view=off-grid-solar

RossW

Quote from: zoneblue on August 31, 2013, 03:09:04 AM
Tom, what about using stephenvs python script as a backend for your raspberry test. I was thinking of trying it on the cubie to rule out newmodbus.

Tom said to me in IRC 8 hrs ago:

"well, after 4 days and no commerrors I tentatively pronounce your script fine on X86 based hardware"

3600W on 6 tracking arrays.
7200W on 2 fixed array.
Midnite Classic 150
Outback Flexmax FM80
16 x LiFePO4 600AH cells
16 x LiFePO4 300AH cells
Selectronics SP-PRO 481 5kW inverter
Fronius 6kW AC coupled inverter
Home-brew 4-cyl propane powered 14kVa genset
2kW wind turbine

midnite_andy

Hi zoneblue,
I chose Nagios, because it is easily extensible through its plugin interface, supports plotting through nagiosgraph and is free.  I have used Mango in the past and it is a great product that is more directly suited to this kind of monitoring.  We may have to do a Mango posting and get together a Mango script for all of the Classic registers.

Have a look at the script I made called modcache.py and readmod.py.  I attached the files a few postings up in a zip file.  You can see how I do exactly what you are talking about regarding pymodbus and Cron.