Configuration

During installation a default configuration file sysusage.cfg is generated. The default settings are good enougth to report essential information of your system, but if you want to monitor some processes, queue directories or some devices you must edit this file by hand.

Here is the format of the configuration file and all directives. There is three section, the first one set the general parameters of the application, the second set the parameters related to SMTP or Nagios notification at threshold exceed and the last configure all type of system information you may want to monitor.

Full sample of configuration file:

        [GENERAL]
        DEBUG       = 0
        DATA_DIR    = /usr/local/sysusage/rrdfiles
        PID_DIR    = /etc
        DEST_DIR    = /var/www/htdocs/sysusage
        SAR_BIN     = /usr/bin/sar
        UPTIME      = /usr/bin/uptime
        HOSTNAME    = /bin/hostname
        INTERVAL    = 60
        SKIP        = 12:00/14:00 20:00/06:00
        HDDTEMP_BIN = /usr/local/sbin/hddtemp
        SENSORS_BIN = /usr/bin/sensors
        DAEMON      = 0
        GRAPH_WIDTH = 550
        GRAPH_HEIGHT= 200
        FLAMING     = 0
        HIRES       = 0
        LINE_SIZE   = 2
        PROC_QSIZE  = 4
	RESRC_URL   =
	SSH_BIN     = /usr/bin/ssh
	SSH_OPTION  = -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
	SSH_USER    =
	SSH_IDENTITY=

        [ALARM]
        WARN_MODE   = 0
        ALARM_PROG  = /usr/local/sysusage/bin/sysusagewarn
        SMTP        = localhost
        FROM        = root@localhost
        TO          = root@localhost
        NAGIOS      = /usr/local/nagios/bin/submit_check_result
        UPPER_LEVEL = 1
        LOWER_LEVEL = 2
        URL         =

        [MONITOR]
        load:threshold_max_value
        cpuall:threshold_max_value
        mem:threshold_max_value
        swap:threshold_max_value
        share:threshold_max_value
        sock:threshold_max_value
        socktw:threshold_max_value
        io:threshold_max_value
        file:threshold_max_value
        page:threshold_max_value
        pcrea:threshold_max_value
        pswap:threshold_max_value
        net:threshold_max_value
        err:threshold_max_value
        disk:threshold_max_value
        proc:proc_name:threshold_max_value:threshold_min_value
        tproc:proc_name:threshold_max_value:threshold_min_value
        queue:path_queue_dir:threshold_max_value
        hddtemp:device:threshold_max_value
	dev:device(alias):threshold_max_value
        dev:device(alias):rpm_speed:raid_type:nb_disk
        work:threshold_max_value
        sensors:pattern:threshold_max_value

        [PLUGIN testplug]
        title:Sysage Test plugin
	menu:Database
        enable:no
        program:/usr/local/sysusage/plugins/plugin-sample.pl
        minThreshold:0
        maxThreshold:10
        verticallabel:Number of seconds
        label1:Total seconds
        label2:
        label3:
        legend1:seconds
        legend2:
        legend3:
	remote:yes

	[REMOTE hostname1]
	enable:no
	ssh_user:monitor
	ssh_identity:/home/monitor/.ssh/id_rsa
	#ssh_options: -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
	#ssh_command:
	remote_sysusage:/usr/local/sysusage/bin/rsysusage

Section GENERAL

DEBUG = 0|1: This option is used to set debug mode. If set to 1 then sysusage and sysusagegraph just show what they do but don't create or send anything.
DATA_DIR = /path/to/rrdfiles: This option is used to set te ouput directory for all RRDTOOL database.
PID_DIR = /path/to/piddir: sysusage and sysusagegraph use a file to store the pid of the running process to prevent simultaneous run.
DEST_DIR = /path/to/html_output: Set the path to the directory where all HTML and graph files should be created.
SAR_BIN = /path/to/sar_binary: sysusage use sar, part of the sysstat distribution to grab system information so we need to know where it is.
UPTIME = /path/to/uptime_binary: sysusagegraph report the current uptime of the system using the uptime command. Used to set path to uptime binary.
HOSTNAME = /path/to/hostname_binary: All scripts of Sysusage distribution need to know the name of the host. They use hostname command for that.
INTERVAL = pull_interval_in_second: All RRDTOOL input use the given interval in second to store monitored values. Graph construction also use this interval to render things properly. By default Sysusage use an interval of 60 seconds to have a better statistic report. You can change this but it's not recommanded. If you change this adjust your crontab to the same value. This value must between 10 and 300 seconds. If you want to be under the minute you must use the daemon mode to run sysusage. See DAEMON bellow.
SKIP = HH:MM/HH:MM HH:MM/HH:MM ...: You can define here some time range where monitoring will not be done. Value is a list of begin_time/end_time separated by space or tabulation. Let's say you don't want to monitor the host during the night for some good reason, you can write it like that: 20:00/06:00
HDDTEMP_BIN = /path/to/hddtemp_binary: You can monitor your hard drive temperature if you have installed hddtemp utility. We need to know the path to hddtemp binary.
SENSORS_BIN = /path/to/sensors_binary: You can monitor your device temperature if you have installed lm_sensor utility. We need to know the path to sensors binary.
DAEMON = 0 | 1: You can monitor your system under the crond limitation of 1 minute by running sysusage in daemon mode with an INTERVAL between 10 end 60 seconds.
GRAPH_WIDTH and GRAPH_HEIGHT: These are usefull if you want to resize graph dimension. Default is a width of 550 pixels and a height of 200.

FLAMING

This is for fun, if you want to have random flaming effect on graphs with only dataset set this directive to 1. Disable by default. Not used with JQuery graph renderer.

HIRES

Allow addition of hourly graph to have fine granularity of the data. This is disable by default. Set it to any integer between 1 to 23 hours included to show data from past N hours to now. Not used with JQuery graph renderer as the Javascript library allow you to zoom into the resolution you want.

LINE_SIZE

By default the graph line size is 1 if you want graph with a more thick line set it to 2. This is rrd graph limitation (1 or 2). Not used with JQuery graph renderer.

PROC_QSIZE

Number of simultaneous remote sysusage call process that should be run. Default is 4 but it can be up to 15 or more depending of the hardware configuration. One per core is the lower value you may think about.

RESRC_URL

Images, javascripts and css ressources by default are search into the DEST_DIR directory so that in the HTML view they all stayed on the current main directory. You may want to place thoses resources on an other directory or an another place. Using this directive you can set any FQDN, absolute or relative URL for these resources.

SSH_IDENTITY

Used to set the default identity file to connect to all remote hosts without password. If undefined, sysusage will use the ssh system default value. You may want to use the default value unless you know exactly what's you are doing.

SSH_OPTION

Use set the default ssh options, that correspond to a passwordless authent:

        -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey

with a five seconds connection timeout. You may want to increase this timeout on very slow network links.

Do not change this value unless you know exactly what's you are doing.

SSH_BIN

Path to the ssh command is set here at install time.

SSH_USER

Used to defined the default ssh user that will be used to connect to all remote hosts.

Section ALARM

WARN_MODE = 0|1: Used to disable/enable alert message during threshold exceed.
ALARM_PROG = /path/to/sysusagewarn: Used to set path to the external program responsible of sending alarm message. You can change it to your own, just take a look at the sysusagewarn usage to see what command line options are used by sysusage
SMTP = smtp.server.net: Name or Ip address of the SMTP server to contact. Default is none => No smtp message is sent.
FROM = sender@localhost: Sender email addresse to use in the SMTP message.
TO = destination@localhost: Destination email address where the alarm message will be sent. You can set multiple recipient by using a coma separated list of recipient adress.
NAGIOS = /usr/local/nagios/bin/submit_check_result: Path to the external nsca program used to send check message to Nagios. Setting this will activate nagios check report. See at end of this file to see how to configure Nagios
UPPER_LEVEL = 1: Nagios check level to send when a high threshold limit is reached. Default is 1 => WARNING.
LOWER_LEVEL = 2: Nagios check level to send when a low threshold limit is reached. Default is 2 => CRITICAL.
URL = Url of Sysusage report: Used to overwrite the default URL of SysUsage report http://host.dom/sysusage/ especially if you have a special port or a different path. Example: http://hostname.domain:9080/Reports/Sysusage/
SKIP = HH:MM/HH:MM HH:MM/HH:MM ...: You can define here some time range where alarm notice will not be sent. Value is a list of begin_time/end_time separated by space or tabulation. Let's say you don't want to received notice during the night for some good reason, you can write it like that: 20:00/06:00

Section MONITOR

This section has two different format the first one is used to specify most of the monitoring target:

        type:threshold_max

        type:threshold_max(attempt)

type

        Type of system information you may want to monitor. It can takes
        16 differents values:

        load   => monitor load average
        cpu    => monitor cpu(s) user/nice/system usage
               => monitor cpu(s) total/iowait usage
               => monitor cpu(s) steal/guest usage
	cpuall => will only monitor global cpu usage unlike the cpu
		     type that will generate extra reports per cpu.
        cswch  => monitor context switches usage
        intr   => monitor number of interrupt per second
        mem    => monitor memory usage
        share  => monitore Posix share memory usage (/dev/shm)
        swap   => monitor swap usage
        work   => monitor amount of memory needed for current workload
        sock   => monitor number of open socket
        socktw => monitor number of socket in TIME_WAIT state
        io     => monitor I/O request and block usage
        page   => monitor I/O page usage
        pswap  => monitor I/O page swap usage
        pcrea  => monitor number of process created per second
        file   => monitor number of open file
        net    => monitor I/O network bytes on all network interfaces
        err    => monitor bad packet, drop and collision on interfaces
        disk   => monitor disk space usage
        tcp    => monitor number of tcp connection and segment

threshold_max

This is the maximum threshold value. Any value equal or upper than this one will generate SMTP and/or Nagios alert if you have enable it.

attempt

You can delay the call to the alarm program at threshold exceed by specifying the number of consecutive exceed attempt before the command will be called. Just specify the number of attempt between bracket just after the min and/or max threshold value. This setting is optional for both threshold value and the default is to send alarm immediatly.

Specials cases

There's a special case for 'disk' usage monitoring that allow exclusion of some mount point. This is usefull if you have hard link or some special device you don't need to monitor. Where exclusion is a semi- colon (;) separated list of mount point to exclude from monitoring.

        disk:ThresholdMax:exclusion

Ex: disk:90:/home/mondo_image;/home/smb_mountpoint

You can use regexp in your excluded path.

The other directive with special syntax is 'dev'. It is construct as follow:

        dev:device(alias):rpm_speed:raid_type:nb_disk

where device is sda, sdb or any device name (without the /dev/), the alias between parenthesis is the name that must be displayed in the user interface instead of the device name. For example:

	dev:sdc(ASM disk1):
	dev:sdb(/data):

I you plan to use I/O workload report, SysUsage need to know the speed of the disk (RPM), the raid type (0,1,5,10) and the number of disk in the raid array to calculate the IOPS. For example if we have a 7200 RPM disk with 2 disk in raid 1, we will write thing like that:

	dev:sdc(ASM disk1):7200:1:2

I/O workload is the relation between TPS (transfers per second) and IOPS (I/O operations measured in seconds) of a device. If the tps returned by sysstat reach the maximum theoretical IOPS, your storage subsystem is saturated. Here is the equation to calculate the maximum theoretical IOPS:

        d = number of disks
	dIOPS = IOPS per disk
	%r = % of read workload
	%w = % of write workload
	F = raid factor

	IOPS = (d *dIOPS) / (%r + (F * %w))

the theoretical maximum IOPS for a RAID set (excluding caching of course). To do this you take the product of the number of disks and IOPS per disk divided by the sum of the %read workload and the product of the raid factor and %write workload. Where %read and %write are calculated from the following equation:

        %r = rd_sec / (rd_sec + wr_sec);
	%w = wr_sec / (rd_sec + wr_sec);

This IOPS monitoring is build following the excellent article of Nick Anderson readable from Analyzing I/O performance in Linux.

The second format is used to monitor running process, hard drive temperature or queue directory. It has the following format:

        type:target:threshold_max_value:threshold_min_value

        type:target:threshold_max_value(attempt):threshold_min_value(attempt)

type

Type of system information you may want to monitor. It can takes these differents values:

        proc    => monitor number of running process
        queue   => monitor number of files in a directory
        dev     => monitor CPU usage per device (ex: sda)
        hddtemp => monitor hard drive temperature
        sensors => monitor device (cpu temp, fan speed, etc.)

target

If type is 'proc' or 'tproc' target represent the name of the process or thread to monitor. You can put a regexp as target to match exactly the required process. The number of running process are obtain by the system command line:

        ps -e -o command | grep -E "target" | grep -v grep | wc -l

or for thread monitoring (tproc):

        ps -eL -o command | grep -E "target" | grep -v grep | wc -l

so you can replace the word target by the regexp to match and see if it returns the right number of process.

If type is 'queue' this represent the full path of the directory to monitor. Sysusage will try to find and count any regular file in the target directory and will not follow sub directories.

If type is 'hddtemp' the target represent the hard drive device to monitor, ex: /dev/sda. You can try it with the following command line:

        hddtemp -n /dev/sda

This may return the actual temperature detected on the hard drive.

If this is 'dev' this represent the device name to monitor. Ex: sda. Do not add the /dev/ before this will not work. You may want to change the device name in the graphic menu, this is possible by adding the device alias enclosed with parenthesis.

For example lets say you're monitoring some EMCpower SAN device. Using sar the reported devices are dev120-48 and dev120-64. Once you have find what partition are mapped to these devices (reading /proc/partitions). In this example these devices are mounted as /cache1 and /cache2 so we want to see these mount points instead of device number in the graphical menu:

	dev:dev120-48(/cache1):
	dev:dev120-64(/cache2):

in you sysusage.conf file will do the job. The threshold_max value is the max percentage of CPU used for this device before sending an alarm.

If type is 'sensors' this represent the pattern to match to obtain temperature or fan speed information in the sensors program output. See chapter SENSORS to have more information.

threshold_max

This is the maximum threshold value. Any value equal or upper will generate an SMTP and/or Nagios alert if you have enable it.

threshold_min

This is the minimum threshold value. Any value equal or lower of this one will generate SMTP and/or Nagios alert if you have enable it. Min threshold should certainly only be used with 'proc' monitoring type. If you set it to 0 then you will be warn if any of the monitored process are down.

attempt

For example a load average monitoring defined like this

        load:12(3)

will send an alarm when the system load average will exceed 12 after three consecutives attempts at the define interval. If the interval is 60 seconds, the alarm will be sent up to 180 second after the first exceed.

Section PLUGIN

This part enable the use of custom plugins. You can call any program or script provide that it return up to 3 numbers separated by a space character. See plugins/ directory for sample scripts.

This section must include a name composed of any alphanumeric character that will be used to create the target file, for example:

        [PLUGIN testplug1] or [PLUGIN testplug2]

The section allow the following configuration directives. They are composed of named directives followed by ':' or '=' and a value.

enable: Is used to disable temporary the plugin monitoring. Default is 'yes' enable. To disable write it enable:no
program: Is used to set the path to the program or script to execute as plugin. This program must print to STDOUT 1 to 3 numbers separated by a space character as result following the number of reports you want. So each plugin can have 1, 2 or 3 graphed data.
title: Is used to set the title of the report page and the index link. Default is set to "Sysusage plugin".
menu: Is used to store the plugin under a submenu of the plugins menu. Default is to store plugin under the "Others" submenu.
maxthreshold: This is the maximum threshold value. Any value equal or upper than this one will generate SMTP and/or Nagios alert if you have enable it.
minthreshold: This is the minimum threshold value. Any value equal or lower of this one will generate SMTP and/or Nagios alert if you have enable it.
verticallabel: This is used to set the vertical label of the graph.
label1, label2, label3: Are used to show a legend for each graphed data, label1 is for the first returned value, label2 for the second and label3 for the last. If you just have one value returned just omit the other labels.
legend1, legend2, legend3: These are use to set the units for Current, Avg and Max values.
remote: This directive must be set to 'no' to prevent execution of the plugin program by a issh call to sysusage in a remote context. This directive is activated by default ('yes').

Section REMOTE

This part allow to run sysusage on remote hosts from a central server. It use ssh to execute sysusage on the destination host with the -r option that force sysusage to not write anything to local data files but to print all result to stdout. As sysusage is run by cron job or daemon mode it can not authenticate interactively to remote host so you must give a ssh user and an identity file with the corresponding configuration option.

This section must include the name or the ip address of the remote host that will be used to create the target data directory, for example:

        [REMOTE hostname] or [REMOTE host.domain.dom] or [REMOTE 192.168.1.14]

The section allow the following configuration directives. They are composed of named directives followed by ':' or '=' and a value.

Once you have installed sysusage on all remote host and exchange the SSH key certificat between the central host and all remote hosts, most of the time you just have to set the ssh_user directive to have it working. Use remote_sysusage directive if sysusage perl script is not installed on the same place than the central server.

enable

Is used to enable/disable the remote host monitoring. Default is 'yes' enable. Set it as 'enable=no' to disable it.

ssh_user

Used to defined the ssh user allowed to connect to remote host. By default the value set to SSH_USER configuration option in the GENERAL section will be used.

ssh_identity

Used to set the identity file to connect to remote host without password. By default the value set to SSH_IDENTITY configuration option in the GENERAL section will be used. Usually this is the private key that you've generated using ssh-keygen and most of the time file $HOME/.ssh/id_rsa. You may want to use the default value unless you know exactly what's you are doing.

ssh_options

Use to overwrite the default ssh options, that are:

        -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey

The default options are set into the SSH_OPTIONS configuration option in the GENERAL section. You may want to use the default value unless you know exactly what's you are doing.

ssh_command

You can overwrite the complete ssh command using this directive, this will replace the ssh command, the ssh option, the ssh user and the host part. The sysusage remote command will not be replaced. You may want to use the default value unless you know exactly what's you are doing.

remote_sysusage

Use it to set the path to the rsysusage command that must be used on the remote host, SysUsage will automatically add the -r option to cause the remote execution mode.