hselasky [Thu, 16 May 2019 18:24:51 +0000 (18:24 +0000)]
MFC r347319:
Flush command workqueue when command completion is triggered in mlx5core.
Avoid race for command completion when triggering a command completions event.
Serialize operation by queueing all commands on the same work queue.
This can happen when healthcare triggers.
hselasky [Thu, 16 May 2019 18:24:05 +0000 (18:24 +0000)]
MFC r347318:
Make command timeout way shorter in mlx5core.
The command timeout is terribly long, whole two hours. Make it 60s so if
things do go wrong, the user gets feedback in relatively short time, so
they can take corrective actions and/or investigate using tools and such.
hselasky [Thu, 16 May 2019 18:16:08 +0000 (18:16 +0000)]
MFC r347308:
Extend the counters framework in mlx5en(4).
Allow more macro arguments and split the variable type and name into
separate arguments. This allows simple and powerful copy and extraction
of values from IFC based structures into SYSCTLs with the use of a single
macro.
hselasky [Thu, 16 May 2019 18:13:02 +0000 (18:13 +0000)]
MFC r347306:
Implement reading PCI power status in mlx5core.
Implement a watchdog as part of the healtcare subsystem which
reads the PCI power status during startup and upon the PCI
power status change event and store it into the core device
structure. This value is then exported to user-space via a
read-only SYSCTL. A dmesg print has been added to inform
the admin about the PCI power status.
hselasky [Thu, 16 May 2019 18:11:25 +0000 (18:11 +0000)]
MFC r347304:
Always return success for RoCE modify port in mlx5ib.
CM layer calls ib_modify_port() regardless of the link layer.
For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.
hselasky [Thu, 16 May 2019 18:00:37 +0000 (18:00 +0000)]
MFC r347293:
Use software counters for rx_packets and rx_bytes in mlx5en(4).
The physical- and virtual- port counters might not reflect the amount
of data received after address filtering. Use the software counters
instead for rx_packets and rx_bytes to know exactly how much data
was received.
hselasky [Thu, 16 May 2019 17:35:39 +0000 (17:35 +0000)]
MFC r347279:
Fix netstat counters mapping in mlx5en(4).
The current mapping of driver counters to netstat counters is wrong.
For example, a single jabber packet, will cause the Ierrs counter to
count three times.
The work for mapping the hardware and software counters to their right
place in netstat counters were already done in Linux, take that as is
to the FreeBSD driver.
hselasky [Thu, 16 May 2019 17:34:54 +0000 (17:34 +0000)]
MFC r347278:
Fix endless loop in ipoib_poll().
ib_req_notify_cq may return negative value which will indicate a
failure. In the case of uncorrectable error, we will end up in an
endless loop. Fix that, by going to another loop with poll_more
only if there is anything left to poll.
hselasky [Thu, 16 May 2019 17:30:55 +0000 (17:30 +0000)]
MFC r347272:
Query and cache PCAM, MCAM registers on initialization in mlx5core.
On load_one, we now cache our capabilities registers internally, similar
to QUERY_HCA_CAP. Capabilities can later be queried using macros
introduced in this patch.
hselasky [Thu, 16 May 2019 17:30:20 +0000 (17:30 +0000)]
MFC r347271:
Implement PCAM, MCAM access register commands in mlx5core.
Introduced registers will expose capabilities of new registers and
features related to port/management.
Driver will query MCAM and PCAM in order to avoid failing on old
firmwares with lack of support.
PCAM and MCAM registers will provide information regarding firmware
support for different features, in order to avoid cases where new driver
combined with old firmware results in syndromes (for ex. PCIe counters
before this patchset).
hselasky [Thu, 16 May 2019 17:28:30 +0000 (17:28 +0000)]
MFC r347268:
Add Fast teardown support to mlx5core.
Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been
scattered to memory
hselasky [Thu, 16 May 2019 17:24:21 +0000 (17:24 +0000)]
MFC r347264:
Configure firmware to use RX hash format in mini CQE in mlx5en(4).
When using CQE zipping, one can choose between RX hash and Checksum.
This will indicate the parameter on which a zipping session should be
stopped.
While porting the Linux code, Checksum was chosen. However, the value
of Checksum is not being used anywhere.
For the FreeBSD driver, we prefer to use the RX hash format which will
guarantee the RX hash value for all the mini CQEs.
While at it, make sure to initialize the Checksum value in the
decompressed CQE.
hselasky [Thu, 16 May 2019 17:23:36 +0000 (17:23 +0000)]
MFC r347263:
Disable CQE zipping by default in mlx5en(4).
After doing performance measurements, it seems like CQE zipping doesn't
have any significant benefit.
Moreover, we know that this feature is disabled by default on other
operating systems (Linux for example).
hselasky [Thu, 16 May 2019 17:22:57 +0000 (17:22 +0000)]
MFC r347262:
Split mlx5e_update_stats_work() in mlx5en(4).
Split the function into the mlx5e_update_stats_locked() core and make
mlx5e_update_stats_work() call the _locked helper, similar to many other
places in the kernel. This improves the code structure, making the
locking clean.
hselasky [Thu, 16 May 2019 17:22:11 +0000 (17:22 +0000)]
MFC r347261:
Implement fast close of RX channel in mlx5en(4).
Instead of waiting for all jobs to be cancelled, simply close the completion
queue to prevent more completion events and let mlx5e_destroy_rq() cleanup
the remaining mbufs.
hselasky [Thu, 16 May 2019 17:17:12 +0000 (17:17 +0000)]
MFC r347255:
Fix tx_jumbo_packets counter in mlx5en(4).
Instead of reading Ethernet RFC 2819 pXtoYoctets counters from
hardware which counts RX octets, count tx_stat_pXtoYoctets from
Ethernet extended counters which counts TX octets.
TX jumbo counters should be accumulated only after the PPCNT
counters were fetched from hardware with their latest value.
hselasky [Thu, 16 May 2019 17:15:41 +0000 (17:15 +0000)]
MFC r347253:
Protect from infinite sw-reset loop in mlx5core.
Avoid an infinite software firmware reset loop that may be caused by a
hardware bug by limiting the maximum number of resets.
The counter between resets is reset by request for reset, and not by a
successful reset.
The interval between two resets can be configured via sysctl:
hw.mlx5.sw_reset_timeout
which is global to all mlx5 devices in the system.
hselasky [Thu, 16 May 2019 17:13:21 +0000 (17:13 +0000)]
MFC r347250:
Add temperature warning event to log in mlx5core.
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
hselasky [Thu, 16 May 2019 17:12:38 +0000 (17:12 +0000)]
MFC r347249:
Correctly define the interface state bits in mlx5en(4).
While at it remove unused interface state bits. This also fixes and issue
during shutdown:
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
hselasky [Thu, 16 May 2019 17:03:16 +0000 (17:03 +0000)]
MFC r347188:
Disabling a PCI device should only disable busmaster in the LinuxKPI.
As Linux comment for this function point:
Signal to the system that the PCI device is not in use by the system
anymore. This only involves disabling PCI bus-mastering, if active.
hselasky [Thu, 16 May 2019 17:01:39 +0000 (17:01 +0000)]
MFC r347185:
Allow controlling pr_debug at runtime in the LinuxKPI.
Turning on pr_debug at compile time make it non-optional at runtime.
This often means that the amount of the debugging is unbearable.
Allow developer to turn on pr_debug output only when needed.
ian [Thu, 16 May 2019 15:32:03 +0000 (15:32 +0000)]
MFC r346968, r346973
r346968:
Update the manpage text to show the output generated by the first-stage
bootloader these days (x86 instead of i386).
r346973:
Add a paragraph that mentions gptboot having an interactive mode, and
direct the user to the boot(8) manpage, which provides the details on that.
tuexen [Thu, 16 May 2019 11:20:02 +0000 (11:20 +0000)]
MFC r346402:
When a checksum has to be computed for a received IPv6 packet because it
is requested by the application using the IPPROTO_IPV6 level socket option
IPV6_CHECKSUM on a raw socket, ensure that the packet contains enough
bytes to contain the checksum at the specified offset.
tuexen [Thu, 16 May 2019 11:18:50 +0000 (11:18 +0000)]
MFC r346401:
Avoid a buffer overwrite in rip6_output() when computing the checksum
as requested by the user via the IPPROTO_IPV6 level socket option
IPV6_CHECKSUM. The check if there are enough bytes in the packet to
store the checksum at the requested offset was wrong by 1.
tuexen [Thu, 16 May 2019 11:09:53 +0000 (11:09 +0000)]
MFC r346182:
When sending IPv4 packets on a SOCK_RAW socket using the IP_HDRINCL option,
ensure that the ip_hl field is valid. Furthermore, ensure that the complete
IPv4 header is contained in the first mbuf. Finally, move the length checks
before relying on them when accessing fields of the IPv4 header.