1 OpenSM Release Notes 3.0.13
2 =============================
4 Version: OpenFabrics Enterprise Distribution (OFED) 1.2
5 Repo: git://git.openfabrics.org/~ofed_1_2/management.git (release)
6 git://git.openfabrics.org/~halr/management.git (development)
11 This document describes the contents of the OpenSM OFED 1.2 release.
12 OpenSM is an InfiniBand compliant Subnet Manager and Administration,
13 and runs on top of OpenIB. The OpenSM version for this release
16 This document includes the following sections:
17 1 This Overview section (describing new features and software
19 2 Known Issues And Limitations
20 3 Unsupported IB compliance statements
22 5 Main Verification Flows
23 6 Qualified software stacks and devices
25 1.1 Major New Features
27 * Routing improvements
28 Two additional routing algorithms have been added in addition to
29 performance improvements to the existing routing algorithms. The
30 two new routing algorithms are FAT tree and LASH. See the
31 opensm man page for additional details.
33 * SA Optional Record support now "virtually" complete
34 Includes SA InformInfo improvements and InformInfoRecord support in
35 addition to support for the remaining SA optional records
36 (MulticastForwardingTableRecord, SwitchInfoRecord). Also, SMInfoRecord
37 support was improved to include all SMs found.
39 * SA database dump/restore
40 OpenSM now includes the ability to dump and restore the SA database.
41 This allows for all SA registrations (multicast, services, and events)
42 to be saved and restored later.
44 In verbose mode, OpenSM will dump SA DB (existing multicast groups,
45 services and InformInfo) into dump file which named "opensm-sa.dump"
46 and located under standard OpenSM dump directory (/var/log by default).
48 If option -S is specified and SA DB dump file name is provided, OpenSM
49 will try to restore SA database from this file. And if restore is
50 successful, OpenSM won't ask for client reregistration at subnet bring-up.
52 * Modular routing for multicast
53 In conjunction was SA database dump/restore, there is the ability to
54 dump and load switch lid matrices (min hops tables) which are used
55 for multicast route calculation.
57 * IB router enablement
58 OpenSM now supports router ports properly (in terms of PortInfo handling).
59 There is also some experimental support for IB routers which is enabled
60 via the ROUTER_EXP compile flag. This support includes SA PathRecord and
61 MCMemberRecord support for off subnet GIDs.
63 * Socket support added to console
64 OpenSM console now supports remote in addition to local access.
65 Remote access is currently via telnet.
67 1.2 Minor New Features:
69 * Change output format of DR path from hex to decimal port numbers
72 The OpenSM log can now be rotated while OpenSM is running (without
73 stopping and restarting OpenSM). This is accomplished via SIGUSR1.
75 * Support scope for IPoIB multicast groups in partition config
77 * Dump filename changed from subnet.lst to osm-subnet.lst
78 Default temp directory for non Windows platforms was previously changed
79 from /tmp to /var/log.
81 * Add option for force SDR link speed
82 Add option to opensm.opts to force link speed. Currently, only forcing
83 to SDR link speed is supported. This option is not supported as a
86 1.3 Library API Changes
90 1.4 Software Dependencies
92 OpenSM depends on the installation of either OFED 1.2, OFED 1.1,
93 OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
94 distribution), or Mellanox VAPI stacks. The qualified driver versions
95 are provided in Table 2, "Qualified IB Stacks".
97 1.5 Supported Devices Firmware
99 The main task of OpenSM is to initialize InfiniBand devices. The
100 qualified devices and their corresponding firmware versions
101 are listed in Table 3.
103 2 Known Issues And Limitations
104 ------------------------------
106 * No Service / Key associations:
107 There is no way to manage Service access by Keys.
109 * No SM to SM SMDB synchronization:
110 Puts the burden of re-registering services, multicast groups, and
111 inform-info on the client application (or IB access layer core).
113 * No "port down" event handling:
114 Changing the switch port through which OpenSM connects to the IB
115 fabric may cause incorrect operation. Please restart OpenSM whenever
116 such a connectivity change is made.
118 * Changing connections during SM operation:
119 Under some conditions the SM can get confused by a change in
120 cabling (moving a cable from one switch port to the other) and
121 momentarily see this as having the same GUID appear connected
122 to two different IB ports. Under some conditions, when the SM fails to
123 get the corresponding change event it might mistakenly report this case
124 as a "duplicated GUID" case and abort. It is advisable to double-check
125 the syslog after each such change in connectivity and restart
126 OpenSM if it has exited. The same error ("duplicated GUID") will
127 also appear with a loopback plug.
129 3 Unsupported IB Compliance Statements
130 --------------------------------------
131 The following section lists all the IB compliance statements which
132 OpenSM does not support. Please refer to the IB specification for detailed
133 information regarding each compliance statement.
135 * C14-22 (Authentication):
136 M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
137 SubnSet method. As a work-around, an OpenSM option is provided for
138 defining the protect bits.
140 * C14-67 (Authentication):
141 On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
142 the SM shall generate a SubnGetResp if the M_Key matches, or
143 silently drop the packet if M_Key does not match.
145 * C15-0.1.23.4 (Authentication):
146 InformInfoRecords shall always be provided with the QPN set to 0,
147 except for the case of a trusted request, in which case the actual
148 subscriber QPN shall be returned.
150 * o13-17.1.2 (Event-FWD):
151 If no permission to forward, the subscription should be removed and
152 no further forwarding should occur.
154 * C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
155 GUIDInfo - SM should enable assigning Port GUIDInfo.
157 * C14-44 (Initialization):
158 If the SM discovers that it is missing an M_Key to update CA/RT/SW,
159 it should notify the higher level.
161 * C14-62.1.1.12 (Initialization):
162 PortInfo:M_Key - Set the M_Key to a node based random value.
164 * C14-62.1.1.13 (Initialization):
165 PortInfo:P_KeyProtectBits - set according to an optional policy.
167 * C14-62.1.1.24 (Initialization):
168 SwitchInfo:DefaultPort - should be configured for random FDB.
170 * C14-62.1.1.32 (Initialization):
171 RandomForwardingTable should be configured.
173 * o15-0.1.12 (Multicast):
174 If the JoinState is SendOnlyNonMember = 1 (only), then the endport
175 should join as sender only.
177 * o15-0.1.8 (Multicast):
178 If a request for creating an MCG with fields that cannot be met,
179 return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
181 * C15-0.1.8.6 (SA-Query):
182 Respond to SubnAdmGetTraceTable - this is an optional attribute.
184 * C15-0.1.13 Services:
185 Reject ServiceRecord create, modify or delete if the given
186 ServiceP_Key does not match the one included in the ServiceGID port
187 and the port that sent the request.
189 * C15-0.1.14 (Services):
190 Provide means to associate service name and ServiceKeys.
195 The following is a list of bugs that were fixed. Note that other less critical
196 or visible bugs were also fixed.
198 * osm_sminfo_rcv.c: Add SMInfo self query check. OpenSM can query
199 itself for SMInfo occassionally due to port moving during subnet
200 discovery process. Don't create remote SM entry in this case to
203 * osm_ucast_updn.c: Two similar bugs in up/down routing fixed.
204 8-bit integers were used as indexes when scanning subnet, which
205 in one case caused OpenSM to crash when ranking "path" is longer
206 than 256 switches, and in the other case, caused OpenSM to go into
207 an infinite loop when fabric has more than 256 roots.
209 * osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req,
210 handle master GUID port not found properly
212 * osm_sa_multipath_record.c: In __osm_mpr_rcv_get_path_parms, return
213 IB_NOT_FOUND rather than IB_ERROR when can't route to LID from switch
215 * osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, return IB_NOT_FOUND
216 rather than IB_ERROR when can't route to LID from switch
218 * osm_vendor_ibumad.c: In osm_vendor_set_sm, set issmfd to
221 * osm_vendor_ibumad: Termination crash fix
222 When OpenSM is terminated umad_receiver thread still running even after
223 the structures are destroyed and freed, this causes to random (but easily
224 reproducible) crashes. The reason is that osm_vendor_delete() does not
225 care about thread termination. This patch adds the receiver thread
226 cancellation (by using pthread_cancel() and pthread_join()) and cares to
227 keep have all mutexes unlocked upon termination. There is also minor
228 termination code consolidation - osm_vendor_port_close() function.
230 * osm_port_profile.h: Fix reinsertion issue in osm_port_prof_set_ignored_port
232 * osm_matrix.h: Fix segfault with up/down and root nodes file
234 * osm_sa_path_record.c: In osm_pr_rcv_process, fix endian of hop_limit
236 * osm_vendor_ibumad.c: Close umad port in osm_vendor_delete
238 * osm_sa_(multipath path)_record.c: Fix MultiPathRecord/PathRecord issues
239 with using MTU/rate/PktLife explicitly ignoring selectors
241 OpenSM just uses the resulting path MTU/rate/pkt-life and fail the
242 query even though the selector might be allowing for selecting an
245 After this fix, the following results are obtained for a case of
246 path allowing maximal 2K MTU.
249 ------------------------------------------------------------
250 MTU greater than ... 256 (0x01) -> equal to ....... 2K
251 MTU less than ...... 256 (0x41) -> NO PATHS
252 MTU equal to ....... 256 (0x81) -> equal to ....... 256
253 MTU largest possible 256 (0xc1) -> equal to ....... 2K
254 MTU greater than ... 512 (0x02) -> equal to ....... 2K
255 MTU less than ...... 512 (0x42) -> equal to ....... 256
256 MTU equal to ....... 512 (0x82) -> equal to ....... 512
257 MTU largest possible 512 (0xc2) -> equal to ....... 2K
258 MTU greater than ... 1K (0x03) -> equal to ....... 2K
259 MTU less than ...... 1K (0x43) -> equal to ....... 512
260 MTU equal to ....... 1K (0x83) -> equal to ....... 1K
261 MTU largest possible 1K (0xc3) -> equal to ....... 2K
262 MTU greater than ... 2K (0x04) -> NO PATHS
263 MTU less than ...... 2K (0x44) -> equal to ....... 1K
264 MTU equal to ....... 2K (0x84) -> equal to ....... 2K
265 MTU largest possible 2K (0xc4) -> equal to ....... 2K
266 MTU greater than ... 4K (0x05) -> NO PATHS
267 MTU less than ...... 4K (0x45) -> equal to ....... 2K
268 MTU equal to ....... 4K (0x85) -> NO PATHS
269 MTU largest possible 4K (0xc5) -> equal to ....... 2K
270 ============================================================
272 With enable_quirks (when one of the ends is a Tavor device):
273 ------------------------------------------------------------
274 MTU greater than ... 256 (0x01) -> equal to ....... 1K
275 MTU less than ...... 256 (0x41) -> NO PATHS
276 MTU equal to ....... 256 (0x81) -> equal to ....... 256
277 MTU largest possible 256 (0xc1) -> equal to ....... 2K
278 MTU greater than ... 512 (0x02) -> equal to ....... 1K
279 MTU less than ...... 512 (0x42) -> equal to ....... 256
280 MTU equal to ....... 512 (0x82) -> equal to ....... 512
281 MTU largest possible 512 (0xc2) -> equal to ....... 2K
282 MTU greater than ... 1K (0x03) -> NO PATHS
283 MTU less than ...... 1K (0x43) -> equal to ....... 512
284 MTU equal to ....... 1K (0x83) -> equal to ....... 1K
285 MTU largest possible 1K (0xc3) -> equal to ....... 2K
286 MTU greater than ... 2K (0x04) -> NO PATHS
287 MTU less than ...... 2K (0x44) -> equal to ....... 1K
288 MTU equal to ....... 2K (0x84) -> equal to ....... 2K
289 MTU largest possible 2K (0xc4) -> equal to ....... 2K
290 MTU greater than ... 4K (0x05) -> NO PATHS
291 MTU less than ...... 4K (0x45) -> equal to ....... 1K
292 MTU equal to ....... 4K (0x85) -> NO PATHS
293 MTU largest possible 4K (0xc5) -> equal to ....... 2K
294 ============================================================
296 * osm_pkey_rcv.c: rwlock double release fix
297 When the port is removed from subnet, but previously requested pkey
298 table block is received after this - the lock will be released twice.
299 This leads to deadlocks later when other MAD processor will try to
300 acquire the same lock.
302 * osm_sa_informinfo.c: Fix InformInfoRecord searches
304 * Better SA MCMemberRecord leave locking
305 Hold locked multicast group leave request (MCMember Record) processing.
306 This prevents kind of race with multicast group join request where
307 those requests can be reordered during processing.
309 * osm_sa_informinfo.c: Conformance changes for subscribe component
311 * osm_sa_path_record.c: Handle LID 0 as error
313 * Fix comparing InformInfo records
314 1. The received InformInfo struct was modified before dumping it.
315 2. The function that compares InformInfo structures was just
316 comparing the whole memory allocated for it, including reserved
317 fields. Fixed to compare more selectively.
319 As for QPN, from the IB spec, table 119 InformInfo:
320 QPN : Ignored except when subscribe=0 (an unsubscribe
321 request). Queue pair to which Report()s were sent as
322 a result of a corresponding subscription. If no
323 subscription for this Report() with this QPN exists,
324 the request to unsubscribe performs no action and
325 produces GetResp() with status indicating an invalid
328 * osm_trap_rcv.c: Reduce repeated trap messages so log doesn't fill
331 * osm_helper.c: Fix stack smashing detected problem in osm_dump_service_record
333 * Fix permission on db files directory
334 When creating directory for db files (guid2lid) storing create it with
335 reasonable permissions (current 777 decimal = octal 01411) and don't do
338 * Fix node_desc.description as string usages
340 5 Main Verification Flows
341 -------------------------
343 OpenSM verification is run using the following activities:
344 * osmtest - a stand-alone program
345 * ibmgtsim (IB management simulator) based - a set of flows that
346 simulate clusters, inject errors and verify OpenSM capability to
347 respond and bring up the network correctly.
348 * small cluster regression testing - where the SM is used on back to
349 back or single switch configurations. The regression includes
350 multiple OpenSM dedicated tests.
351 * cluster testing - when we run OpenSM to setup a large cluster, perform
352 hand-off, reboots and reconnects, verify routing correctness and SA
353 responsiveness at the ULP level (IPoIB and SDP).
357 osmtest is an automated verification tool used for OpenSM
358 testing. Its verification flows are described by list below.
360 * Inventory File: Obtain and verify all port info, node info, link and path
364 - Register new service
365 - Register another service (with a lease period)
366 - Register another service (with service p_key set to zero)
367 - Get all services by name
368 - Delete the first service
369 - Delete the third service
370 - Added bad flows of get/delete non valid service
371 - Add / Get same service with different data
372 - Add / Get / Delete by different component mask values (services
373 by Name & Key / Name & Data / Name & Id / Id only )
375 * Multicast Member Record:
376 - Query of existing Groups (IPoIB)
377 - BAD Join with insufficient comp mask (o15.0.1.3)
378 - Create given MGID=0 (o15.0.1.4)
379 - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
380 - Create BAD MGID=0xFA. (o15.0.1.6)
381 - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
382 - New MGID with invalid join state (o15.0.1.9)
383 - Retry of existing MGID - See JoinState update (o15.0.1.11)
384 - BAD RATE when connecting to existing MGID (o15.0.1.13)
385 - Partial JoinState delete request - removing FullMember (o15.0.1.14)
386 - Full Delete of a group (o15.0.1.14)
387 - Verify Delete by trying to Join deleted group (o15.0.1.14)
388 - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
391 - All GUIDInfoRecords in subnet are obtained
394 - Perform some compliant and noncompliant MultiPathRecord requests
395 - Validation is via status in responses and IB analyzer
398 - Perform some compliant and noncompliant PKeyTableRecord queries
399 - Validation is via status in responses and IB analyzer
401 * LinearForwardingTableRecord:
402 - Perform some compliant and noncompliant LinearForwardingTableRecord queries
403 - Validation is via status in responses and IB analyzer
405 * Event Forwarding: Register for trap forwarding using reports
406 - Send a trap and wait for report
407 - Unregister non-existing
409 * Trap 64/65 Flow: Register to Trap 64-65, create traps (by
410 disconnecting/connecting ports) and wait for report, then unregister.
412 * Stress Test: send PortInfoRecord queries, both single and RMPP and
413 check for the rate of responses as well as their validity.
416 5.2 IB Management Simulator OpenSM Test Flows:
418 The simulator provides ability to simulate the SM handling of virtual
419 topologies that are not limited to actual lab equipment availability.
420 OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
421 regressions use smaller (16 and 128 nodes clusters).
423 The following test flows are run on the IB management simulator:
426 Up to 12 links from the fabric are randomly selected to drop packets
427 at drop rates up to 90%. The SM is required to succeed in bringing the
428 fabric up. The resulting routing is verified to be correct as well.
431 Using LMC = 2 the fabric is initialized with LIDs. Faults such as
432 zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
433 randomly assigned to various nodes and other errors are randomly
434 output to the guid2lid cache file. The SM sweep is run 5 times and
435 after each iteration a complete verification is made to ensure that all
436 LIDs that could possibly be maintained are kept, as well as that all nodes
437 were assigned a legal LID range.
440 Nodes randomly join the 0xc000 group and eventually the
441 resulting routing is verified for completeness and adherence to
442 Up/Down routing rules.
445 The complete osmtest flow as described in the previous table is run on
446 the simulated fabrics.
449 This flow merges fabric, LID and stability issues with continuous
450 PathRecord, ServiceRecord and Multicast Join/Leave activity to
451 stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
452 were added to the test such both existing and non existing nodes
453 perform them in random order.
455 5.3 OpenSM Regression
457 Using a back-to-back or single switch connection, the following set of
458 tests is run nightly on the stacks described in table 2. The included
461 * Stress Testing: Flood the SA with queries from multiple channel
462 adapters to check the robustness of the entire stack up to the SA.
464 * Dynamic Changes: Dynamic Topology changes, through randomly
465 dropping SMP packets, used to test OpenSM adaptation to an unstable
466 network & verify DB correctness.
468 * Trap Injection: This flow injects traps to the SM and verifies that it
469 handles them gracefully.
471 * SA Query Test: This test exhaustively checks the SA responses to all
472 possible single component mask. To do that the test examines the
473 entire set of records the SA can provide, classifies them by their
474 field values and then selects every field (using component mask and a
475 value) and verifies that the response matches the expected set of records.
476 A random selection using multiple component mask bits is also performed.
480 Cluster testing is usually run before a distribution release. It
481 involves real hardware setups of 16 to 32 nodes (or more if a beta site
482 is available). Each test is validated by running all-to-all ping through the IB
483 interface. The test procedure includes:
487 * Hand-off between 2 or 3 SM's while performing:
489 - Switch power cycles (disconnecting the SM's)
491 * Unresponsive port detection and recovery
493 * osmtest from multiple nodes
495 * Trap injection and recovery
501 Table 2 - Qualified IB Stacks
502 =============================
505 -----------------------------------------|--------------------------
509 OpenIB Gen2 (IBG2 distribution) | 1.0
510 OpenIB Gen1 (IBGD distribution) | 1.8.0
511 VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later
513 Table 3 - Qualified Devices and Corresponding Firmware
514 ======================================================
518 --------|-----------------------------------------------------------
519 MT43132 | InfiniScale - fw-43132 5.2.0 (and later)
520 MT47396 | InfiniScale III - fw-47396 0.5.0 (and later)
521 MT23108 | InfiniHost - fw-23108 3.3.2 (and later)
522 MT25204 | InfiniHost III Lx - fw-25204 1.0.1i (and later)
523 MT25208 | InfiniHost III Ex (InfiniHost Mode) - fw-25208 4.6.2 (and later)
524 MT25208 | InfiniHost III Ex (MemFree Mode) - fw-25218 5.0.1 (and later)
528 --------|-----------------------------------------------------------
529 iPath | QHT6040 (PathScale InfiniPath HT-460)
530 iPath | QHT6140 (PathScale InfiniPath HT-465)
531 iPath | QLE6140 (PathScale InfiniPath PE-880)
533 Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
534 QP0 and QP1. However, it does support it as a device on the subnet.