Configuration
During installation a default configuration file sysusage.cfg is generated. The default settings are good enougth to report essential information of your system, but if you want to monitor some processes, queue directories or some devices you must edit this file by hand.
Here is the format of the configuration file and all directives. There is three section, the first one set the general parameters of the application, the second set the parameters related to SMTP or Nagios notification at threshold exceed and the last configure all type of system information you may want to monitor.
Full sample of configuration file:
[GENERAL] DEBUG = 0 DATA_DIR = /usr/local/sysusage/rrdfiles PID_DIR = /etc DEST_DIR = /var/www/htdocs/sysusage SAR_BIN = /usr/bin/sar UPTIME = /usr/bin/uptime HOSTNAME = /bin/hostname INTERVAL = 60 SKIP = 12:00/14:00 20:00/06:00 HDDTEMP_BIN = /usr/local/sbin/hddtemp SENSORS_BIN = /usr/bin/sensors DAEMON = 0 GRAPH_WIDTH = 550 GRAPH_HEIGHT= 200 FLAMING = 0 HIRES = 0 LINE_SIZE = 2 PROC_QSIZE = 4 RESRC_URL = SSH_BIN = /usr/bin/ssh SSH_OPTION = -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey SSH_USER = SSH_IDENTITY= [ALARM] WARN_MODE = 0 ALARM_PROG = /usr/local/sysusage/bin/sysusagewarn SMTP = localhost FROM = root@localhost TO = root@localhost NAGIOS = /usr/local/nagios/bin/submit_check_result UPPER_LEVEL = 1 LOWER_LEVEL = 2 URL = [MONITOR] load:threshold_max_value cpuall:threshold_max_value mem:threshold_max_value swap:threshold_max_value share:threshold_max_value sock:threshold_max_value socktw:threshold_max_value io:threshold_max_value file:threshold_max_value page:threshold_max_value pcrea:threshold_max_value pswap:threshold_max_value net:threshold_max_value err:threshold_max_value disk:threshold_max_value proc:proc_name:threshold_max_value:threshold_min_value tproc:proc_name:threshold_max_value:threshold_min_value queue:path_queue_dir:threshold_max_value hddtemp:device:threshold_max_value dev:device(alias):threshold_max_value dev:device(alias):rpm_speed:raid_type:nb_disk work:threshold_max_value sensors:pattern:threshold_max_value [PLUGIN testplug] title:Sysage Test plugin menu:Database enable:no program:/usr/local/sysusage/plugins/plugin-sample.pl minThreshold:0 maxThreshold:10 verticallabel:Number of seconds label1:Total seconds label2: label3: legend1:seconds legend2: legend3: remote:yes [REMOTE hostname1] enable:no ssh_user:monitor ssh_identity:/home/monitor/.ssh/id_rsa #ssh_options: -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey #ssh_command: remote_sysusage:/usr/local/sysusage/bin/rsysusage
Section GENERAL
- DEBUG = 0|1
-
This option is used to set debug mode. If set to 1 then sysusage and sysusagegraph just show what they do but don't create or send anything.
- DATA_DIR = /path/to/rrdfiles
-
This option is used to set te ouput directory for all RRDTOOL database.
- PID_DIR = /path/to/piddir
-
sysusage and sysusagegraph use a file to store the pid of the running process to prevent simultaneous run.
- DEST_DIR = /path/to/html_output
-
Set the path to the directory where all HTML and graph files should be created.
- SAR_BIN = /path/to/sar_binary
-
sysusage use sar, part of the sysstat distribution to grab system information so we need to know where it is.
- UPTIME = /path/to/uptime_binary
-
sysusagegraph report the current uptime of the system using the uptime command. Used to set path to uptime binary.
- HOSTNAME = /path/to/hostname_binary
-
All scripts of Sysusage distribution need to know the name of the host. They use hostname command for that.
- INTERVAL = pull_interval_in_second
-
All RRDTOOL input use the given interval in second to store monitored values. Graph construction also use this interval to render things properly. By default Sysusage use an interval of 60 seconds to have a better statistic report. You can change this but it's not recommanded. If you change this adjust your crontab to the same value. This value must between 10 and 300 seconds. If you want to be under the minute you must use the daemon mode to run sysusage. See DAEMON bellow.
- SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
-
You can define here some time range where monitoring will not be done. Value is a list of begin_time/end_time separated by space or tabulation. Let's say you don't want to monitor the host during the night for some good reason, you can write it like that: 20:00/06:00
- HDDTEMP_BIN = /path/to/hddtemp_binary
-
You can monitor your hard drive temperature if you have installed hddtemp utility. We need to know the path to hddtemp binary.
- SENSORS_BIN = /path/to/sensors_binary
-
You can monitor your device temperature if you have installed lm_sensor utility. We need to know the path to sensors binary.
- DAEMON = 0 | 1
-
You can monitor your system under the crond limitation of 1 minute by running sysusage in daemon mode with an INTERVAL between 10 end 60 seconds.
- GRAPH_WIDTH and GRAPH_HEIGHT
-
These are usefull if you want to resize graph dimension. Default is a width of 550 pixels and a height of 200.
This is for fun, if you want to have random flaming effect on graphs with only dataset set this directive to 1. Disable by default. Not used with JQuery graph renderer.
Allow addition of hourly graph to have fine granularity of the data. This is disable by default. Set it to any integer between 1 to 23 hours included to show data from past N hours to now. Not used with JQuery graph renderer as the Javascript library allow you to zoom into the resolution you want.
By default the graph line size is 1 if you want graph with a more thick line set it to 2. This is rrd graph limitation (1 or 2). Not used with JQuery graph renderer.
Number of simultaneous remote sysusage call process that should be run. Default is 4 but it can be up to 15 or more depending of the hardware configuration. One per core is the lower value you may think about.
Images, javascripts and css ressources by default are search into the DEST_DIR directory so that in the HTML view they all stayed on the current main directory. You may want to place thoses resources on an other directory or an another place. Using this directive you can set any FQDN, absolute or relative URL for these resources.
Used to set the default identity file to connect to all remote hosts without password. If undefined, sysusage will use the ssh system default value. You may want to use the default value unless you know exactly what's you are doing.
Use set the default ssh options, that correspond to a passwordless authent:
-o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
with a five seconds connection timeout. You may want to increase this timeout on very slow network links.
Do not change this value unless you know exactly what's you are doing.
Path to the ssh command is set here at install time.
Used to defined the default ssh user that will be used to connect to all remote hosts.
Section ALARM
- WARN_MODE = 0|1
-
Used to disable/enable alert message during threshold exceed.
- ALARM_PROG = /path/to/sysusagewarn
-
Used to set path to the external program responsible of sending alarm message. You can change it to your own, just take a look at the sysusagewarn usage to see what command line options are used by sysusage
- SMTP = smtp.server.net
-
Name or Ip address of the SMTP server to contact. Default is none => No smtp message is sent.
- FROM = sender@localhost
-
Sender email addresse to use in the SMTP message.
- TO = destination@localhost
-
Destination email address where the alarm message will be sent. You can set multiple recipient by using a coma separated list of recipient adress.
- NAGIOS = /usr/local/nagios/bin/submit_check_result
-
Path to the external nsca program used to send check message to Nagios. Setting this will activate nagios check report. See at end of this file to see how to configure Nagios
- UPPER_LEVEL = 1
-
Nagios check level to send when a high threshold limit is reached. Default is 1 => WARNING.
- LOWER_LEVEL = 2
-
Nagios check level to send when a low threshold limit is reached. Default is 2 => CRITICAL.
- URL = Url of Sysusage report
-
Used to overwrite the default URL of SysUsage report http://host.dom/sysusage/ especially if you have a special port or a different path. Example: http://hostname.domain:9080/Reports/Sysusage/
- SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
-
You can define here some time range where alarm notice will not be sent. Value is a list of begin_time/end_time separated by space or tabulation. Let's say you don't want to received notice during the night for some good reason, you can write it like that: 20:00/06:00
Section MONITOR
This section has two different format the first one is used to specify most of the monitoring target:
type:threshold_max
or
type:threshold_max(attempt)
- type
-
Type of system information you may want to monitor. It can takes 16 differents values:
-
load => monitor load average cpu => monitor cpu(s) user/nice/system usage => monitor cpu(s) total/iowait usage => monitor cpu(s) steal/guest usage cpuall => will only monitor global cpu usage unlike the cpu type that will generate extra reports per cpu. cswch => monitor context switches usage intr => monitor number of interrupt per second mem => monitor memory usage share => monitore Posix share memory usage (/dev/shm) swap => monitor swap usage work => monitor amount of memory needed for current workload sock => monitor number of open socket socktw => monitor number of socket in TIME_WAIT state io => monitor I/O request and block usage page => monitor I/O page usage pswap => monitor I/O page swap usage pcrea => monitor number of process created per second file => monitor number of open file net => monitor I/O network bytes on all network interfaces err => monitor bad packet, drop and collision on interfaces disk => monitor disk space usage tcp => monitor number of tcp connection and segment
- threshold_max
-
This is the maximum threshold value. Any value equal or upper than this one will generate SMTP and/or Nagios alert if you have enable it.
- attempt
-
You can delay the call to the alarm program at threshold exceed by specifying the number of consecutive exceed attempt before the command will be called. Just specify the number of attempt between bracket just after the min and/or max threshold value. This setting is optional for both threshold value and the default is to send alarm immediatly.
- Specials cases
-
There's a special case for 'disk' usage monitoring that allow exclusion of some mount point. This is usefull if you have hard link or some special device you don't need to monitor. Where exclusion is a semi- colon (;) separated list of mount point to exclude from monitoring.
-
disk:ThresholdMax:exclusion
-
Ex: disk:90:/home/mondo_image;/home/smb_mountpoint
-
You can use regexp in your excluded path.
-
The other directive with special syntax is 'dev'. It is construct as follow:
-
dev:device(alias):rpm_speed:raid_type:nb_disk
-
where device is sda, sdb or any device name (without the /dev/), the alias between parenthesis is the name that must be displayed in the user interface instead of the device name. For example:
-
dev:sdc(ASM disk1): dev:sdb(/data):
-
I you plan to use I/O workload report, SysUsage need to know the speed of the disk (RPM), the raid type (0,1,5,10) and the number of disk in the raid array to calculate the IOPS. For example if we have a 7200 RPM disk with 2 disk in raid 1, we will write thing like that:
-
dev:sdc(ASM disk1):7200:1:2
-
I/O workload is the relation between TPS (transfers per second) and IOPS (I/O operations measured in seconds) of a device. If the tps returned by sysstat reach the maximum theoretical IOPS, your storage subsystem is saturated. Here is the equation to calculate the maximum theoretical IOPS:
-
d = number of disks dIOPS = IOPS per disk %r = % of read workload %w = % of write workload F = raid factor IOPS = (d *dIOPS) / (%r + (F * %w))
-
the theoretical maximum IOPS for a RAID set (excluding caching of course). To do this you take the product of the number of disks and IOPS per disk divided by the sum of the %read workload and the product of the raid factor and %write workload. Where %read and %write are calculated from the following equation:
-
%r = rd_sec / (rd_sec + wr_sec); %w = wr_sec / (rd_sec + wr_sec);
-
This IOPS monitoring is build following the excellent article of Nick Anderson readable from Analyzing I/O performance in Linux.
The second format is used to monitor running process, hard drive temperature or queue directory. It has the following format:
type:target:threshold_max_value:threshold_min_value
or
type:target:threshold_max_value(attempt):threshold_min_value(attempt)
- type
-
Type of system information you may want to monitor. It can takes these differents values:
-
proc => monitor number of running process queue => monitor number of files in a directory dev => monitor CPU usage per device (ex: sda) hddtemp => monitor hard drive temperature sensors => monitor device (cpu temp, fan speed, etc.)
- target
-
If type is 'proc' or 'tproc' target represent the name of the process or thread to monitor. You can put a regexp as target to match exactly the required process. The number of running process are obtain by the system command line:
-
ps -e -o command | grep -E "target" | grep -v grep | wc -l
or for thread monitoring (tproc):ps -eL -o command | grep -E "target" | grep -v grep | wc -l
-
so you can replace the word target by the regexp to match and see if it returns the right number of process.
-
If type is 'queue' this represent the full path of the directory to monitor. Sysusage will try to find and count any regular file in the target directory and will not follow sub directories.
-
If type is 'hddtemp' the target represent the hard drive device to monitor, ex: /dev/sda. You can try it with the following command line:
-
hddtemp -n /dev/sda
-
This may return the actual temperature detected on the hard drive.
-
If this is 'dev' this represent the device name to monitor. Ex: sda. Do not add the /dev/ before this will not work. You may want to change the device name in the graphic menu, this is possible by adding the device alias enclosed with parenthesis.
For example lets say you're monitoring some EMCpower SAN device. Using sar the reported devices are dev120-48 and dev120-64. Once you have find what partition are mapped to these devices (reading /proc/partitions). In this example these devices are mounted as /cache1 and /cache2 so we want to see these mount points instead of device number in the graphical menu:
-
dev:dev120-48(/cache1): dev:dev120-64(/cache2):
-
in you sysusage.conf file will do the job. The threshold_max value is the max percentage of CPU used for this device before sending an alarm.
-
If type is 'sensors' this represent the pattern to match to obtain temperature or fan speed information in the sensors program output. See chapter SENSORS to have more information.
- threshold_max
-
This is the maximum threshold value. Any value equal or upper will generate an SMTP and/or Nagios alert if you have enable it.
- threshold_min
-
This is the minimum threshold value. Any value equal or lower of this one will generate SMTP and/or Nagios alert if you have enable it. Min threshold should certainly only be used with 'proc' monitoring type. If you set it to 0 then you will be warn if any of the monitored process are down.
- attempt
-
You can delay the call to the alarm program at threshold exceed by specifying the number of consecutive exceed attempt before the command will be called. Just specify the number of attempt between bracket just after the min and/or max threshold value. This setting is optional for both threshold value and the default is to send alarm immediatly.
For example a load average monitoring defined like this
load:12(3)
will send an alarm when the system load average will exceed 12 after three consecutives attempts at the define interval. If the interval is 60 seconds, the alarm will be sent up to 180 second after the first exceed.
Section PLUGIN
This part enable the use of custom plugins. You can call any program or script provide that it return up to 3 numbers separated by a space character. See plugins/ directory for sample scripts.
This section must include a name composed of any alphanumeric character that will be used to create the target file, for example:
[PLUGIN testplug1] or [PLUGIN testplug2]
The section allow the following configuration directives. They are composed of named directives followed by ':' or '=' and a value.
- enable
-
Is used to disable temporary the plugin monitoring. Default is 'yes' enable. To disable write it enable:no
- program
-
Is used to set the path to the program or script to execute as plugin. This program must print to STDOUT 1 to 3 numbers separated by a space character as result following the number of reports you want. So each plugin can have 1, 2 or 3 graphed data.
- title
-
Is used to set the title of the report page and the index link. Default is set to "Sysusage plugin".
- menu
-
Is used to store the plugin under a submenu of the plugins menu. Default is to store plugin under the "Others" submenu.
- maxthreshold
-
This is the maximum threshold value. Any value equal or upper than this one will generate SMTP and/or Nagios alert if you have enable it.
- minthreshold
-
This is the minimum threshold value. Any value equal or lower of this one will generate SMTP and/or Nagios alert if you have enable it.
- verticallabel
-
This is used to set the vertical label of the graph.
- label1, label2, label3
-
Are used to show a legend for each graphed data, label1 is for the first returned value, label2 for the second and label3 for the last. If you just have one value returned just omit the other labels.
- legend1, legend2, legend3
-
These are use to set the units for Current, Avg and Max values.
- remote
-
This directive must be set to 'no' to prevent execution of the plugin program by a issh call to sysusage in a remote context. This directive is activated by default ('yes').
Section REMOTE
This part allow to run sysusage on remote hosts from a central server. It use ssh to execute sysusage on the destination host with the -r option that force sysusage to not write anything to local data files but to print all result to stdout. As sysusage is run by cron job or daemon mode it can not authenticate interactively to remote host so you must give a ssh user and an identity file with the corresponding configuration option.
This section must include the name or the ip address of the remote host that will be used to create the target data directory, for example:
[REMOTE hostname] or [REMOTE host.domain.dom] or [REMOTE 192.168.1.14]
The section allow the following configuration directives. They are composed of named directives followed by ':' or '=' and a value.
Once you have installed sysusage on all remote host and exchange the SSH key certificat between the central host and all remote hosts, most of the time you just have to set the ssh_user directive to have it working. Use remote_sysusage directive if sysusage perl script is not installed on the same place than the central server.
- enable
-
Is used to enable/disable the remote host monitoring. Default is 'yes' enable. Set it as 'enable=no' to disable it.
- ssh_user
-
Used to defined the ssh user allowed to connect to remote host. By default the value set to SSH_USER configuration option in the GENERAL section will be used.
- ssh_identity
-
Used to set the identity file to connect to remote host without password. By default the value set to SSH_IDENTITY configuration option in the GENERAL section will be used. Usually this is the private key that you've generated using ssh-keygen and most of the time file $HOME/.ssh/id_rsa. You may want to use the default value unless you know exactly what's you are doing.
- ssh_options
-
Use to overwrite the default ssh options, that are:
-o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
The default options are set into the SSH_OPTIONS configuration option in the GENERAL section. You may want to use the default value unless you know exactly what's you are doing.
- ssh_command
-
You can overwrite the complete ssh command using this directive, this will replace the ssh command, the ssh option, the ssh user and the host part. The sysusage remote command will not be replaced. You may want to use the default value unless you know exactly what's you are doing.
- remote_sysusage
-
Use it to set the path to the rsysusage command that must be used on the remote host, SysUsage will automatically add the -r option to cause the remote execution mode.
Copyright (c) 2003-2017 Gilles Darold - All rights reserved. (GPL v3).