1 OpenSM Release Notes 3.2
2 =============================
5 Repo: git://git.openfabrics.org/~sashak/management.git
10 This document describes the contents of the OpenSM 3.2 release.
11 OpenSM is an InfiniBand compliant Subnet Manager and Administration,
12 and runs on top of OpenIB. The OpenSM version for this release
15 This document includes the following sections:
16 1 This Overview section (describing new features and software
18 2 Known Issues And Limitations
19 3 Unsupported IB compliance statements
21 5 Main Verification Flows
22 6 Qualified Software Stacks and Devices
24 1.1 Major New Features
27 OpenSM provides an optional unicast routing cache (enabled by '-A' or
28 '--ucast_cache' options). When enabled, unicast routing cache prevents
29 routing recalculation (which is a heavy task in a large cluster) when
30 there was no topology change detected during the heavy sweep, or when
31 the topology change does not require new routing calculation, e.g. when
32 one or more CAs/RTRs/leaf switches going down, or one or more of these
33 nodes coming back after being down.
36 Routing chaining is the ability to configure the order in which routing
37 algorithms are applied in opensm, i.e. '-R ftree,updn,minhop' - try
38 using ftree routing. If ftree fails, try updn. If updn fails, try
41 * IPv6 Solicited Node Multicast addresses consolidation
42 When this mode is used (enabled with --consolidate_ipv6_snm_req option)
43 OpenSM will map all IPv6 Solicited Node Multicast address join requests
44 into a single Multicast group with address ff10:601b::1:ff00:0. In this
45 way limited MLID space is saved. This IBA noncompliant feature is very
46 useful with large (~> 1024 nodes) clusters.
48 * OpenSM sweep state machine rework
49 Huge and buggy OpenSM sweep state machine was fully rewritten in safer
50 and more effective synchronous manner.
52 * Multi lid routing balancing for updn/minhop routing algorithms
53 When LMC > 0 is used OpenSM will ensure to generate routing paths via
54 different switches and when possible chassis.
56 * Preserve base lid routes when LMC > 0
57 When LMC > 0 is used OpenSM will preserve routing paths for base lids
58 as it would be with LMC = 0. In this way traffic on each LID level is
59 not affected by LMC changes.
61 * Ordered routing paths balancing
62 This adds ability to predefine the port order in which routing paths
63 balancing is performed by OpenSM. Helps to improve performance
64 dramatically (40-50%) for applications with known communication
65 pattern. Activated with --guid_routing_order_file command line option.
67 * Unified OpenSM configuration
68 Now there is "conventional" config file instead of hidden option cache
69 file (opensm.opts). OpenSM will find this in a default place (consult
70 man page for exact value) or the file name can be specified with '-F'
71 command line option. Also there is an option ('-c') to generate config
74 * Query remote SMs during light sweep
75 Master OpenSM will query remote standby SMs periodically to catch its
76 possible state changes and react accordingly (as required by IBA spec).
78 * Predefined port ids for Up/Down algorithm
79 This is useful as Up/Down fine tuning tool - the algorithm will use
80 predefined port IDs instead of GUIDs for its decision about direction.
81 Activated with --ids_guid_file command line option.
83 * Improved plugin API version 2.
84 Now OpenSM will provide to plugins the access to all data structures.
85 This make it possible to implement powerful multi purpose plugins. All
86 OpenSM header files are installed now and specific configuration/build
87 options are exported via generated osm_config.h header file.
89 * Many code improvements, optimizations and cleanups
91 * Automatic daily snapshots generation.
92 This is is not a "feature", but simplifies the access to recent OpenSM
95 1.2 Minor New Features:
97 * Cleanup cl_qlock_pool memory allocator - speedup memory allocations
99 * Support for configurable (via OSM_UMAD_MAX_PENDING environment variable)
100 size of pending MADs pool.
102 * Set packet life time to subnet timeout option rather than default
104 * Enforce routing paths rebalancing on switch reconnection
106 * In Up/Down routing algorithm compare GUID values in host byte order
108 * Add 'switchbalance' and 'lidbalance' commands for OpenSM console
110 * Respond to new trap 144 node description update flag
112 * Add '--connect_roots' command line options. This preserves connectivity
113 between root nodes in Up/Down routing algorithm
115 * Setting SL in the IPoIB MCast groups in accordance with QoS policy
117 * Dump auto detected root node guids in Up/Down routing algorithm
119 * Unify OpenSM dumpers code
121 * Unify various guid files parsers - add generic nodenamemap style parser
123 * When root node guids were provided in file update the list on each
126 * During ./configure show values of configuration dirs and files
128 * Make prefix routes config file name configurable
130 * Add a Performance Manager HOWTO to the docs and the dist
132 * Support separate SA and SM keys as clarified in IBA 1.2.1
134 * Remove AM_MAINTAINER_MODE in ./configure
136 * Make vendor type OSM_VENDOR_INTF_OPENIB (libibumad) to be default
138 * Build osm_perfmgr_db.* content only when PerfMgr is enabled.
140 * Move PerfMgr event_db_dump_file to common OpenSM dump dir
142 * Allow space separated strings as values in OpenSM config
144 * Support for multiple event plugins
146 * Add '--version' command line option
148 * Add '--create-config <file-name>' command line option
150 * Speedup and simplify logging code
152 * Speedup multicast processing in SA DB
154 * In log messages convert unicast LIDs from hex to decimal format and
155 GIDs from hex to IPv6 address format
157 * Handle all possible ports in "ignore-guids" file
159 * Add 'reroute' console command
161 * Remove many install-exec-hook from Makefiles
163 * Some cleanups in LASH routing algorithm code
165 * In Makefiles remove -rpath and explicit -lpthread, -ldl from LDFLAGS
166 (move to configurator)
168 * Install all OpenSM header files
170 * Improve locking in SM Info receiver
172 * Add new OSM_EVENT_ID_SUBNET_UP event for plugins
174 * Redo lex and yacc files generation in conventional way
176 * Add a missing Node Description check on light sweep.
178 * Move vendor specific compilation defines from command to generated
181 * Provide useful error message when log file opening fails
183 * Add generated osm_config.h file with OpenSM specific defines
185 * Display port number in decimal in log messages
187 * Replace osm_vendor_select.h by generated osm_config.h
189 * Unify options listing in OpenSM usage message
191 * LFT buffers handling simplification
193 * Add 'dump_conf' console command
195 * OpenSM performs sweep on SIGCONT (coming out of suspend).
197 * When our SM is in Standby state and its priority is increased
198 (via console command), notify master SM by sending Trap 144.
200 * When entering standby state (after discovery) notify master SM
203 * support more PortInfo:CapabilityMask bits
205 * When babbling port policy is on disable the port with the least hop
208 1.3 Library API Changes
212 1.4 Software Dependencies
214 OpenSM depends on the installation of either OFED 1.x, OpenIB gen2 (e.g.
215 IBG2 distribution), OpenIB gen1 (e.g. IBGD distribution), or Mellanox
216 VAPI stacks. The qualified driver versions are provided in Table 2,
217 "Qualified IB Stacks".
219 Also, building of QoS manager policy file parser requires flex, and either
220 bison or byacc installed.
222 1.5 Supported Devices Firmware
224 The main task of OpenSM is to initialize InfiniBand devices. The
225 qualified devices and their corresponding firmware versions
226 are listed in Table 3.
228 2 Known Issues And Limitations
229 ------------------------------
231 * No Service / Key associations:
232 There is no way to manage Service access by Keys.
234 * No SM to SM SMDB synchronization:
235 Puts the burden of re-registering services, multicast groups, and
236 inform-info on the client application (or IB access layer core).
238 3 Unsupported IB Compliance Statements
239 --------------------------------------
240 The following section lists all the IB compliance statements which
241 OpenSM does not support. Please refer to the IB specification for detailed
242 information regarding each compliance statement.
244 * C14-22 (Authentication):
245 M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
246 SubnSet method. As a work-around, an OpenSM option is provided for
247 defining the protect bits.
249 * C14-67 (Authentication):
250 On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
251 the SM shall generate a SubnGetResp if the M_Key matches, or
252 silently drop the packet if M_Key does not match.
254 * C15-0.1.23.4 (Authentication):
255 InformInfoRecords shall always be provided with the QPN set to 0,
256 except for the case of a trusted request, in which case the actual
257 subscriber QPN shall be returned.
259 * o13-17.1.2 (Event-FWD):
260 If no permission to forward, the subscription should be removed and
261 no further forwarding should occur.
263 * C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
264 GUIDInfo - SM should enable assigning Port GUIDInfo.
266 * C14-44 (Initialization):
267 If the SM discovers that it is missing an M_Key to update CA/RT/SW,
268 it should notify the higher level.
270 * C14-62.1.1.12 (Initialization):
271 PortInfo:M_Key - Set the M_Key to a node based random value.
273 * C14-62.1.1.13 (Initialization):
274 PortInfo:P_KeyProtectBits - set according to an optional policy.
276 * C14-62.1.1.24 (Initialization):
277 SwitchInfo:DefaultPort - should be configured for random FDB.
279 * C14-62.1.1.32 (Initialization):
280 RandomForwardingTable should be configured.
282 * o15-0.1.12 (Multicast):
283 If the JoinState is SendOnlyNonMember = 1 (only), then the endport
284 should join as sender only.
286 * o15-0.1.8 (Multicast):
287 If a request for creating an MCG with fields that cannot be met,
288 return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
290 * C15-0.1.8.6 (SA-Query):
291 Respond to SubnAdmGetTraceTable - this is an optional attribute.
293 * C15-0.1.13 Services:
294 Reject ServiceRecord create, modify or delete if the given
295 ServiceP_Key does not match the one included in the ServiceGID port
296 and the port that sent the request.
298 * C15-0.1.14 (Services):
299 Provide means to associate service name and ServiceKeys.
306 * Set SA attribute offset to 0 when no records are returned
308 * Send trap 64 only after new ports are in ACTIVE state.
310 * Fix in sending client reregistration bit
312 * Fix default OpenSM SM (and SA) Key byte order
314 * Fix in sending Multicast groups creation/deletion notification (Traps
317 * Don't startup automatically on SuSE based systems
321 * opensm/osm_console.c: fix seg fault when running "portstatus ca" in
324 * opensm: fix potential core dumps where osm_node_get_physp_ptr can
327 * opensm/osm_mcast_mgr: limit spanning tree creation recursion to value
330 * opensm: switch LFTs incremental update fix
332 * opensm/osm_state_mgr.c: fix segmentation fault
334 * opensm: eliminate some potential NULL pointer dereferences
336 * opensm/osm_console.c: fix guid parsing
338 * opensm: fix off by 1 issue with max_lid and max_multicat_lid_ho
340 * opensm: fix potentially wrong port_guid initialization
342 * opensm/configure.in: fix wrong HAVE_DEFAULT_OPENSM_CONFIG_FILE define
345 * opensm: fix snprintf() usage
347 * opensm/osm_sa_lft_record: validate LFT block number
349 * opensm/osm_sa_lft_record: pass block parameter in host byte order
351 * opensm/include/Makefile.am: don't duplicate header files in EXTRA_DIST
353 * opensm/osm_sa_class_port_info.c: fix over bound array access
355 * osmtest/osmt_service.c: fix over bound array access
357 * osmtest: fix qpn encoding in osmtest_informinfo_request()
359 * opensm/osm_vendor_mlx_sa.c: handling attribute offset of 0
361 * opensm: fix segfault corner case when osm_console_init fails
363 * opensm/console: close console socket on cleanup path
365 * opensm/osm_ucast_lash: fix buffer overflow
367 * opensm: fix broken IPv6 SNM consolidation code
369 * opensm/osm_sa_lft_record.c: fix block number encoding byte order
371 * opensm/osm_sa: fix memory leak in SA responder
373 * opensm/osm_mcast_mgr: fix memory leak
375 * opensm: fix qos config parsing bugs
377 * opensm/osm_mcast_tbl.c: fix sending invalid MF block due to max mlid
380 * opensm: log_max_size config parameter in MB
382 * opensm/osm_ucast_lash: fix extra memory allocations
384 * opensm: fix race in main OpenSM flow
386 * opensm/ftree: fix GUID check against cn_guid_file
388 * opensm/ftree: save FLT buffers memory allocations
390 * opensm/osm_sa_link_record.c: prevent potential endless recursion
392 * opensm: remove SM from sm_guid_tbl when IsSM port capability flag is
395 * opensm: fix QoS config bug
397 * opensm: don't reassign zeroed params from config file
399 * Other less critical or visible bugs were also fixed.
401 5 Main Verification Flows
402 -------------------------
404 OpenSM verification is run using the following activities:
405 * osmtest - a stand-alone program
406 * ibmgtsim (IB management simulator) based - a set of flows that
407 simulate clusters, inject errors and verify OpenSM capability to
408 respond and bring up the network correctly.
409 * small cluster regression testing - where the SM is used on back to
410 back or single switch configurations. The regression includes
411 multiple OpenSM dedicated tests.
412 * cluster testing - when we run OpenSM to setup a large cluster, perform
413 hand-off, reboots and reconnects, verify routing correctness and SA
414 responsiveness at the ULP level (IPoIB and SDP).
418 osmtest is an automated verification tool used for OpenSM
419 testing. Its verification flows are described by list below.
421 * Inventory File: Obtain and verify all port info, node info, link and path
425 - Register new service
426 - Register another service (with a lease period)
427 - Register another service (with service p_key set to zero)
428 - Get all services by name
429 - Delete the first service
430 - Delete the third service
431 - Added bad flows of get/delete non valid service
432 - Add / Get same service with different data
433 - Add / Get / Delete by different component mask values (services
434 by Name & Key / Name & Data / Name & Id / Id only )
436 * Multicast Member Record:
437 - Query of existing Groups (IPoIB)
438 - BAD Join with insufficient comp mask (o15.0.1.3)
439 - Create given MGID=0 (o15.0.1.4)
440 - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
441 - Create BAD MGID=0xFA. (o15.0.1.6)
442 - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
443 - New MGID with invalid join state (o15.0.1.9)
444 - Retry of existing MGID - See JoinState update (o15.0.1.11)
445 - BAD RATE when connecting to existing MGID (o15.0.1.13)
446 - Partial JoinState delete request - removing FullMember (o15.0.1.14)
447 - Full Delete of a group (o15.0.1.14)
448 - Verify Delete by trying to Join deleted group (o15.0.1.14)
449 - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
452 - All GUIDInfoRecords in subnet are obtained
455 - Perform some compliant and noncompliant MultiPathRecord requests
456 - Validation is via status in responses and IB analyzer
459 - Perform some compliant and noncompliant PKeyTableRecord queries
460 - Validation is via status in responses and IB analyzer
462 * LinearForwardingTableRecord:
463 - Perform some compliant and noncompliant LinearForwardingTableRecord queries
464 - Validation is via status in responses and IB analyzer
466 * Event Forwarding: Register for trap forwarding using reports
467 - Send a trap and wait for report
468 - Unregister non-existing
470 * Trap 64/65 Flow: Register to Trap 64-65, create traps (by
471 disconnecting/connecting ports) and wait for report, then unregister.
473 * Stress Test: send PortInfoRecord queries, both single and RMPP and
474 check for the rate of responses as well as their validity.
477 5.2 IB Management Simulator OpenSM Test Flows:
479 The simulator provides ability to simulate the SM handling of virtual
480 topologies that are not limited to actual lab equipment availability.
481 OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
482 regressions use smaller (16 and 128 nodes clusters).
484 The following test flows are run on the IB management simulator:
487 Up to 12 links from the fabric are randomly selected to drop packets
488 at drop rates up to 90%. The SM is required to succeed in bringing the
489 fabric up. The resulting routing is verified to be correct as well.
492 Using LMC = 2 the fabric is initialized with LIDs. Faults such as
493 zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
494 randomly assigned to various nodes and other errors are randomly
495 output to the guid2lid cache file. The SM sweep is run 5 times and
496 after each iteration a complete verification is made to ensure that all
497 LIDs that could possibly be maintained are kept, as well as that all nodes
498 were assigned a legal LID range.
501 Nodes randomly join the 0xc000 group and eventually the
502 resulting routing is verified for completeness and adherence to
503 Up/Down routing rules.
506 The complete osmtest flow as described in the previous table is run on
507 the simulated fabrics.
510 This flow merges fabric, LID and stability issues with continuous
511 PathRecord, ServiceRecord and Multicast Join/Leave activity to
512 stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
513 were added to the test such both existing and non existing nodes
514 perform them in random order.
516 5.3 OpenSM Regression
518 Using a back-to-back or single switch connection, the following set of
519 tests is run nightly on the stacks described in table 2. The included
522 * Stress Testing: Flood the SA with queries from multiple channel
523 adapters to check the robustness of the entire stack up to the SA.
525 * Dynamic Changes: Dynamic Topology changes, through randomly
526 dropping SMP packets, used to test OpenSM adaptation to an unstable
527 network & verify DB correctness.
529 * Trap Injection: This flow injects traps to the SM and verifies that it
530 handles them gracefully.
532 * SA Query Test: This test exhaustively checks the SA responses to all
533 possible single component mask. To do that the test examines the
534 entire set of records the SA can provide, classifies them by their
535 field values and then selects every field (using component mask and a
536 value) and verifies that the response matches the expected set of records.
537 A random selection using multiple component mask bits is also performed.
541 Cluster testing is usually run before a distribution release. It
542 involves real hardware setups of 16 to 32 nodes (or more if a beta site
543 is available). Each test is validated by running all-to-all ping through the IB
544 interface. The test procedure includes:
548 * Hand-off between 2 or 3 SM's while performing:
550 - Switch power cycles (disconnecting the SM's)
552 * Unresponsive port detection and recovery
554 * osmtest from multiple nodes
556 * Trap injection and recovery
559 6 Qualified Software Stacks and Devices
560 ---------------------------------------
564 Note that OpenSM version 3.2.1 and earlier used a value of 1 in host
565 byte order for the default SM_Key, so there is a compatibility issue
566 with these earlier versions of OpenSM when the 3.2.2 or later version
567 is running on a little endian machine. This affects SM handover as well
568 as SA queries (saquery tool in infiniband-diags).
571 Table 2 - Qualified IB Stacks
572 =============================
575 -----------------------------------------|--------------------------
581 OpenIB Gen2 (IBG2 distribution) | 1.0
582 OpenIB Gen1 (IBGD distribution) | 1.8.0
583 VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later
585 Table 3 - Qualified Devices and Corresponding Firmware
586 ======================================================
590 ------------------------------------|-------------------------------
591 InfiniScale | fw-43132 5.2.000 (and later)
592 InfiniScale III | fw-47396 0.5.000 (and later)
593 InfiniScale IV | fw-48436 7.1.000 (and later)
594 InfiniHost | fw-23108 3.5.000 (and later)
595 InfiniHost III Lx | fw-25204 1.2.000 (and later)
596 InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later)
597 InfiniHost III Ex (MemFree Mode) | fw-25218 5.3.000 (and later)
598 ConnectX IB | fw-25408 2.3.000 (and later)
602 --------|-----------------------------------------------------------
603 iPath | QHT6040 (PathScale InfiniPath HT-460)
604 iPath | QHT6140 (PathScale InfiniPath HT-465)
605 iPath | QLE6140 (PathScale InfiniPath PE-880)
609 Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
610 QP0 and QP1. However, it does support it as a device on the subnet.
612 Note 2: QoS firmware and Mellanox devices
614 HCAs: QoS supported by ConnectX. QoS-enabled FW release is 2_5_000 and
617 Switches: QoS supported by InfiniScale III
618 Any InfiniScale III FW that is supported by OpenSM supports QoS.