1 OpenSM Release Notes 3.1.11
2 =============================
4 Version: OpenFabrics Enterprise Distribution (OFED) 1.3
5 Repo: git://git.openfabrics.org/~ofed_1_3/management.git (release)
6 git://git.openfabrics.org/~sashak/management.git (development)
11 This document describes the contents of the OpenSM OFED 1.3 release.
12 OpenSM is an InfiniBand compliant Subnet Manager and Administration,
13 and runs on top of OpenIB. The OpenSM version for this release
16 This document includes the following sections:
17 1 This Overview section (describing new features and software
19 2 Known Issues And Limitations
20 3 Unsupported IB compliance statements
22 5 Main Verification Flows
23 6 Qualified software stacks and devices
25 1.1 Major New Features
27 * QoS manager (experimental)
28 This QoS manager implementation is in accordance with IBA QoS Annex.
29 Highly configurable QoS Policy is parsed from OpenSM QoS policy file.
30 Valid QoS parameters will be reported in SA PathRecord and
31 MultiPathRecord. In addition simple QoS levels per ULPs configuration
35 When enabled it collects a fabric port counters and able to log it or
36 to pass to external program via event plugin interface. It handles
37 counters overflow, supports LID/QP redirection and is able to work
38 when OpenSM is in master, standby, and inactive states.
40 * Dimension Order routing (DOR) algorithm
41 DOR Unicast routing algorithm - based on the Min Hop algorithm, but
42 avoids port equalization except for redundant links between the
43 same two switches. This provides deadlock free routes for hypercubes
44 when the fabric is cabled as a hypercube and for meshes when cabled
45 as a mesh (see details in OpenSM man page).
47 * Routing improvements
48 Speedup the current routing algorithms default MinHops, Up/Down and
49 LASH and lid matrix generation. Fat Tree routing engine is able to work
50 with not pure fat free topology.
52 * Multiple IB routers support
53 OpenSM now able to keep configurable subnet prefix to router table.
54 SA will report path to this routers when SA PathRecord was issued with
58 This is possible to name nodes in this config file. Those names will be
59 used for logging and by QoS configuration.
62 Proper support for PKey index in GSI queries.
64 * Incremental LFTs, PKey, SL2VL, and VLarbitration table updates
65 OpenSM will only fetch those tables in first heavy sweep and then
66 will maintain this internally.
68 * Fast port and switch detector
69 When port and/or switch was externally reset and it was fast so sweep
70 doesn't find this device as disconnected OpenSM will detect this by
71 changed port states and handle accordingly.
73 * Duplicated GUIDs/port moving detector
74 OpenSM will be able to detect port moving during a fabric discovery
75 and will not report duplicated GUIDs in this case.
77 * Multicast rerouting speedup
78 Now OpenSM will calculate and setup multicast forwarding tables for
79 all altered multicast groups and not for each one.
82 OpenSM allows to load dynamically various plugin modules.
84 * Many generic improvements
86 1.2 Minor New Features:
88 * Daemon mode can be activated with -B option.
90 * Support multiple scopes for IPoIB multicast groups in partition config.
92 * Loopback connection handling
93 Loopback connection is not interpreted as duplicated GUID anymore.
95 * Connect root nodes option for Up/Down routing engine.
96 When this option is specified Up/Down will create routing paths between
99 * Dump and log filenames changed from osm* to opensm*.
101 * Support loopback console
102 Socket console with only local access.
104 * Configurable config directory (the default value is /etc/opensm) and
105 configurable default values of OpenSM config filenames.
107 * Add option for force SDR link speed
108 Add option to opensm.opts to force link speed. Currently, only forcing
109 to SDR link speed is supported. This option is not supported as a
113 Building and RPM packaging were improved and simplified.
115 * Handle "babbling" ports
116 When a babbling port (port which causes a frequent trap generation) is
117 detected, OpenSM will disable the port which should terminate the trap
120 1.3 Library API Changes
124 1.4 Software Dependencies
126 OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1,
127 OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
128 distribution), or Mellanox VAPI stacks. The qualified driver versions
129 are provided in Table 2, "Qualified IB Stacks".
131 Also building of QoS manager policy file parser requires flex, and either
132 bison or byacc installed.
134 1.5 Supported Devices Firmware
136 The main task of OpenSM is to initialize InfiniBand devices. The
137 qualified devices and their corresponding firmware versions
138 are listed in Table 3.
140 2 Known Issues And Limitations
141 ------------------------------
143 * No Service / Key associations:
144 There is no way to manage Service access by Keys.
146 * No SM to SM SMDB synchronization:
147 Puts the burden of re-registering services, multicast groups, and
148 inform-info on the client application (or IB access layer core).
150 3 Unsupported IB Compliance Statements
151 --------------------------------------
152 The following section lists all the IB compliance statements which
153 OpenSM does not support. Please refer to the IB specification for detailed
154 information regarding each compliance statement.
156 * C14-22 (Authentication):
157 M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
158 SubnSet method. As a work-around, an OpenSM option is provided for
159 defining the protect bits.
161 * C14-67 (Authentication):
162 On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
163 the SM shall generate a SubnGetResp if the M_Key matches, or
164 silently drop the packet if M_Key does not match.
166 * C15-0.1.23.4 (Authentication):
167 InformInfoRecords shall always be provided with the QPN set to 0,
168 except for the case of a trusted request, in which case the actual
169 subscriber QPN shall be returned.
171 * o13-17.1.2 (Event-FWD):
172 If no permission to forward, the subscription should be removed and
173 no further forwarding should occur.
175 * C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
176 GUIDInfo - SM should enable assigning Port GUIDInfo.
178 * C14-44 (Initialization):
179 If the SM discovers that it is missing an M_Key to update CA/RT/SW,
180 it should notify the higher level.
182 * C14-62.1.1.12 (Initialization):
183 PortInfo:M_Key - Set the M_Key to a node based random value.
185 * C14-62.1.1.13 (Initialization):
186 PortInfo:P_KeyProtectBits - set according to an optional policy.
188 * C14-62.1.1.24 (Initialization):
189 SwitchInfo:DefaultPort - should be configured for random FDB.
191 * C14-62.1.1.32 (Initialization):
192 RandomForwardingTable should be configured.
194 * o15-0.1.12 (Multicast):
195 If the JoinState is SendOnlyNonMember = 1 (only), then the endport
196 should join as sender only.
198 * o15-0.1.8 (Multicast):
199 If a request for creating an MCG with fields that cannot be met,
200 return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
202 * C15-0.1.8.6 (SA-Query):
203 Respond to SubnAdmGetTraceTable - this is an optional attribute.
205 * C15-0.1.13 Services:
206 Reject ServiceRecord create, modify or delete if the given
207 ServiceP_Key does not match the one included in the ServiceGID port
208 and the port that sent the request.
210 * C15-0.1.14 (Services):
211 Provide means to associate service name and ServiceKeys.
216 The following is a list of bugs that were fixed. Note that other less critical
217 or visible bugs were also fixed.
219 * osm_ucast_ftree.c: do load-leveling of non-CN routes
221 * osm_ucast_ftree.c: ignore port 0 and loopbacks on switches
223 * lash: fix possible segfault in osm_get_lash_sl()
225 * osm_ucast_ftree.c: fixing coredump in fat-tree routing
227 * osm_sa_slvl_record: fix overflow crash
229 * Break multicast rerouting requests processing when heavy sweep is
232 * updn: report fallback properly
234 * Fix incorrect identification of routing engine used
236 * Don't zero base LID when invalid value is received
238 * lash: fix wrong allocation size
240 * Fixing broken logic in 'process world' part of LinkRecord processing
242 * Fix lmc_mask bit order in osm_sa_link_record.c
244 * Adding missing comparison by to_lid/from_lid in LinkRecord processing
246 * Broken logic when scanning subnet for PIR request
248 * No interactive games in daemon mode
250 * Fixing memory leak in node description
252 * Fix PortInfo update issues for switch port 0
254 * Changed method_mask type in user_mad interface in accordance with
257 * Use umad_get_issm_path() in osm_vendor_set_sm()
261 * Uninitialized variables usage fix
263 * osm_ucast_ftree.c: Possible NULL ptr seg fault
265 * osm_mcast_mgr.c: Possible NULL ptr seg fault
267 * TrapRepress was failing for mkey != 0
269 * IB_PR_COMPMASK was used in MPR
271 * Set hop limit when creating ipoib multicast groups
273 * Fix outstanding mad counters tracking on the error paths.
275 * Report new ports before handover mastership
277 * Fix opvls and neighbormtu when remote port invalid.
279 * Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid
282 * Protect SMInfo response against port moving issue.
284 5 Main Verification Flows
285 -------------------------
287 OpenSM verification is run using the following activities:
288 * osmtest - a stand-alone program
289 * ibmgtsim (IB management simulator) based - a set of flows that
290 simulate clusters, inject errors and verify OpenSM capability to
291 respond and bring up the network correctly.
292 * small cluster regression testing - where the SM is used on back to
293 back or single switch configurations. The regression includes
294 multiple OpenSM dedicated tests.
295 * cluster testing - when we run OpenSM to setup a large cluster, perform
296 hand-off, reboots and reconnects, verify routing correctness and SA
297 responsiveness at the ULP level (IPoIB and SDP).
301 osmtest is an automated verification tool used for OpenSM
302 testing. Its verification flows are described by list below.
304 * Inventory File: Obtain and verify all port info, node info, link and path
308 - Register new service
309 - Register another service (with a lease period)
310 - Register another service (with service p_key set to zero)
311 - Get all services by name
312 - Delete the first service
313 - Delete the third service
314 - Added bad flows of get/delete non valid service
315 - Add / Get same service with different data
316 - Add / Get / Delete by different component mask values (services
317 by Name & Key / Name & Data / Name & Id / Id only )
319 * Multicast Member Record:
320 - Query of existing Groups (IPoIB)
321 - BAD Join with insufficient comp mask (o15.0.1.3)
322 - Create given MGID=0 (o15.0.1.4)
323 - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
324 - Create BAD MGID=0xFA. (o15.0.1.6)
325 - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
326 - New MGID with invalid join state (o15.0.1.9)
327 - Retry of existing MGID - See JoinState update (o15.0.1.11)
328 - BAD RATE when connecting to existing MGID (o15.0.1.13)
329 - Partial JoinState delete request - removing FullMember (o15.0.1.14)
330 - Full Delete of a group (o15.0.1.14)
331 - Verify Delete by trying to Join deleted group (o15.0.1.14)
332 - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
335 - All GUIDInfoRecords in subnet are obtained
338 - Perform some compliant and noncompliant MultiPathRecord requests
339 - Validation is via status in responses and IB analyzer
342 - Perform some compliant and noncompliant PKeyTableRecord queries
343 - Validation is via status in responses and IB analyzer
345 * LinearForwardingTableRecord:
346 - Perform some compliant and noncompliant LinearForwardingTableRecord queries
347 - Validation is via status in responses and IB analyzer
349 * Event Forwarding: Register for trap forwarding using reports
350 - Send a trap and wait for report
351 - Unregister non-existing
353 * Trap 64/65 Flow: Register to Trap 64-65, create traps (by
354 disconnecting/connecting ports) and wait for report, then unregister.
356 * Stress Test: send PortInfoRecord queries, both single and RMPP and
357 check for the rate of responses as well as their validity.
360 5.2 IB Management Simulator OpenSM Test Flows:
362 The simulator provides ability to simulate the SM handling of virtual
363 topologies that are not limited to actual lab equipment availability.
364 OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
365 regressions use smaller (16 and 128 nodes clusters).
367 The following test flows are run on the IB management simulator:
370 Up to 12 links from the fabric are randomly selected to drop packets
371 at drop rates up to 90%. The SM is required to succeed in bringing the
372 fabric up. The resulting routing is verified to be correct as well.
375 Using LMC = 2 the fabric is initialized with LIDs. Faults such as
376 zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
377 randomly assigned to various nodes and other errors are randomly
378 output to the guid2lid cache file. The SM sweep is run 5 times and
379 after each iteration a complete verification is made to ensure that all
380 LIDs that could possibly be maintained are kept, as well as that all nodes
381 were assigned a legal LID range.
384 Nodes randomly join the 0xc000 group and eventually the
385 resulting routing is verified for completeness and adherence to
386 Up/Down routing rules.
389 The complete osmtest flow as described in the previous table is run on
390 the simulated fabrics.
393 This flow merges fabric, LID and stability issues with continuous
394 PathRecord, ServiceRecord and Multicast Join/Leave activity to
395 stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
396 were added to the test such both existing and non existing nodes
397 perform them in random order.
399 5.3 OpenSM Regression
401 Using a back-to-back or single switch connection, the following set of
402 tests is run nightly on the stacks described in table 2. The included
405 * Stress Testing: Flood the SA with queries from multiple channel
406 adapters to check the robustness of the entire stack up to the SA.
408 * Dynamic Changes: Dynamic Topology changes, through randomly
409 dropping SMP packets, used to test OpenSM adaptation to an unstable
410 network & verify DB correctness.
412 * Trap Injection: This flow injects traps to the SM and verifies that it
413 handles them gracefully.
415 * SA Query Test: This test exhaustively checks the SA responses to all
416 possible single component mask. To do that the test examines the
417 entire set of records the SA can provide, classifies them by their
418 field values and then selects every field (using component mask and a
419 value) and verifies that the response matches the expected set of records.
420 A random selection using multiple component mask bits is also performed.
424 Cluster testing is usually run before a distribution release. It
425 involves real hardware setups of 16 to 32 nodes (or more if a beta site
426 is available). Each test is validated by running all-to-all ping through the IB
427 interface. The test procedure includes:
431 * Hand-off between 2 or 3 SM's while performing:
433 - Switch power cycles (disconnecting the SM's)
435 * Unresponsive port detection and recovery
437 * osmtest from multiple nodes
439 * Trap injection and recovery
445 Table 2 - Qualified IB Stacks
446 =============================
449 -----------------------------------------|--------------------------
454 OpenIB Gen2 (IBG2 distribution) | 1.0
455 OpenIB Gen1 (IBGD distribution) | 1.8.0
456 VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later
458 Table 3 - Qualified Devices and Corresponding Firmware
459 ======================================================
463 ------------------------------------|-------------------------------
464 InfiniScale | fw-43132 5.2.000 (and later)
465 InfiniScale III | fw-47396 0.5.000 (and later)
466 InfiniHost | fw-23108 3.5.000 (and later)
467 InfiniHost III Lx | fw-25204 1.2.000 (and later)
468 InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later)
469 InfiniHost III Ex (MemFree Mode) | fw-25218 5.3.000 (and later)
470 ConnectX IB | fw-25408 2.3.000 (and later)
474 --------|-----------------------------------------------------------
475 iPath | QHT6040 (PathScale InfiniPath HT-460)
476 iPath | QHT6140 (PathScale InfiniPath HT-465)
477 iPath | QLE6140 (PathScale InfiniPath PE-880)
481 Note 1: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
482 QP0 and QP1. However, it does support it as a device on the subnet.
484 Note 2: QoS firmware and Mellanox devices
486 HCAs: QoS supported by ConnectX. The current FW release
487 doesn't support QoS. QoS-enabled FW release (2_5_000) is
488 planned for May. If someone wishes to get QoS-enabled FW
489 before the official release, they should contact Mellanox FAE.
491 Switches: QoS supported by InfiniScale III
492 Any InfiniScale III FW that is supported by OpenSM supports QoS.