OpenSM Performance manager HOWTO ================================ Introduction ============ OpenSM now includes a performance manager which collects Port counters from the subnet and stores them internally in OpenSM. Some of the features of the performance manager are: 1) Collect port data and error counters per v1.2 spec and store in 64bit internal counts. 2) Automatic reset of counters when they reach approximatly 3/4 full. (While not guarenteeing that counts will not be missed this does keep counts incrementing as best as possible given the current hardware limitations.) 3) Basic warnings in the OpenSM log on "critical" errors like symbol errors. 4) Automatically detects "outside" resets of counters and adjusts to continue collecting data. 5) Can be run when OpenSM is in standby or inactive states. Known issues are: 1) Data counters will be lost on high data rate links. Sweeping the fabric fast enough for a DDR link is not practical. 2) Default partition support only. Setup and Usage =============== Using the Performance Manager consists of 3 steps: 1) compiling in support for the perfmgr (Optionally: the console socket as well) 2) enabling the perfmgr and console in opensm.conf 3) retrieving data which has been collected. 3a) using console to "dump data" 3b) using a plugin module to store the data to your own "database" Step 1: Compile in support for the Performance Manager ------------------------------------------------------ Because of the performance manager's experimental status, it is not enabled at compile time by default. (This will hopefully soon change as more people use it and confirm that it does not break things... ;-) The configure option is "--enable-perf-mgr". At this time it is really best to enable the console socket option as well. OpenSM can be run in an "interactive" mode. But with the console socket option turned on one can also make a connection to a running OpenSM. The console option is "--enable-console-socket". This option requires the use of tcp_wrappers to ensure security. Please be aware of your configuration for tcp_wrappers as the commands presented in the console can affect the operation of your subnet. The following configure line includes turning on the performance manager as well as the console: ./configure --enable-perf-mgr --enable-console-socket Step 2: Enable the perfmgr and console in opensm.conf ----------------------------------------------------- Turning the Perfmorance Manager on is pretty easy, set the following options in the opensm.conf config file. (Default location is /usr/local/etc/opensm/opensm.conf) # Turn it all on. perfmgr TRUE # sweep time in seconds perfmgr_sweep_time_s 180 # Dump file to dump the events to event_db_dump_file /var/log/opensm_port_counters.log Also enable the console socket and configure the port for it to listen to if desired. # console [off|local|socket] console socket # Telnet port for console (default 10000) console_port 10000 As noted above you also need to set up tcp_wrappers to prevent unauthorized users from connecting to the console.[*] [*] As an alternate you can use the loopback mode but I noticed when writing this (OpenSM v3.1.10; OFED 1.3) that there are some bugs in specifying the loopback mode in the opensm.conf file. Look for this to be fixed in newer versions. [**] Also you could use "local" but this is only useful if you run OpenSM in the foreground of a terminal. As OpenSM is usually started as a daemon I left this out as an option. Step 3: retrieve data which has been collected ---------------------------------------------- Step 3a: Using console dump function ------------------------------------ The console command "perfmgr dump_counters" will dump counters to the file specified in the opensm.conf file. In the example above "/var/log/opensm_port_counters.log" Example output is below: "SW1 wopr ISR9024D (MLX4 FW)" 0x8f10400411f56 port 1 (Since Mon May 12 13:27:14 2008) symbol_err_cnt : 0 link_err_recover : 0 link_downed : 0 rcv_err : 0 rcv_rem_phys_err : 0 rcv_switch_relay_err : 2 xmit_discards : 0 xmit_constraint_err : 0 rcv_constraint_err : 0 link_integrity_err : 0 buf_overrun_err : 0 vl15_dropped : 0 xmit_data : 470435 rcv_data : 405956 xmit_pkts : 8954 rcv_pkts : 6900 unicast_xmit_pkts : 0 unicast_rcv_pkts : 0 multicast_xmit_pkts : 0 multicast_rcv_pkts : 0 Step 3b: Using a plugin module ------------------------------ If you want a more automated method of retrieving the data OpenSM provides a plugin interface to extend OpenSM. The header file is osm_event_plugin.h. The functions you register with this interface will be called when data is collected. You can then use that data as appropriate. An example plugin can be configured at compile time using the "--enable-default-event-plugin" option on the configure line. This plugin is very simple. It logs "events" recieved from the performance manager to a log file. I don't recomend using this directly but rather use it as a templat to create your own plugin.