]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
5 years agosx: retire SX_NOADAPTIVE
mjg [Wed, 5 Dec 2018 16:43:03 +0000 (16:43 +0000)]
sx: retire SX_NOADAPTIVE

The flag is not used by anything for years and supporting it requires an
explicit read from the lock when entering slow path.

Flag value is left unused on purpose.

Sponsored by: The FreeBSD Foundation

5 years agoRemove redundant declaration after r341517.
hselasky [Wed, 5 Dec 2018 15:56:44 +0000 (15:56 +0000)]
Remove redundant declaration after r341517.

MFC after: 1 week
Sponsored by: Mellanox Technologies

5 years agoFix some build of LinuxKPI on some platforms after r341518.
hselasky [Wed, 5 Dec 2018 15:53:34 +0000 (15:53 +0000)]
Fix some build of LinuxKPI on some platforms after r341518.

MFC after: 1 week
Sponsored by: Mellanox Technologies

5 years agoFix LINT build after r341572.
hselasky [Wed, 5 Dec 2018 15:42:31 +0000 (15:42 +0000)]
Fix LINT build after r341572.

MFC after: 1 week
Sponsored by: Mellanox Technologies

5 years agonetmap.h: include stdatomic.h
vmaffione [Wed, 5 Dec 2018 15:38:52 +0000 (15:38 +0000)]
netmap.h: include stdatomic.h

The stdatomic.h header exports atomic_thread_fence(), that
can be used to implement the nm_stst_barrier() macro needed
by netmap.

MFC after: 3 days

5 years agomlx4/mlx5: Updated driver version to 3.5.0
slavash [Wed, 5 Dec 2018 14:25:34 +0000 (14:25 +0000)]
mlx4/mlx5: Updated driver version to 3.5.0

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Implement backpressure indication.
slavash [Wed, 5 Dec 2018 14:25:03 +0000 (14:25 +0000)]
mlx5en: Implement backpressure indication.

The backpressure indication is implemented using an unlimited rate type of
mbuf send tag. When the upper layers typically the socket layer has obtained such
a tag, it can then query the destination driver queue for the current
amount of space available in the send queue.

A single mbuf send tag may be referenced multiple times and a refcount has been added
to the mlx5e_priv structure to track its usage. Because the send tag resides
in the mlx5e_channel structure, there is no need to wait for refcounts to reach
zero until the mlx4en(4) driver is detached. The channels structure is persistant
during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed
without any need of synchronization.

The mlx5e_snd_tag structure was extended to contain a type field, because there are now
two different tag types which end up in the driver which need to be distinguished.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Improve configuration of HW LRO.
slavash [Wed, 5 Dec 2018 14:24:33 +0000 (14:24 +0000)]
mlx5en: Improve configuration of HW LRO.

In order to enable HW LRO, both the "hw_lro" sysctl in the mlx5en(4) config
space must be set, and the ifconfig(8) LRO capability must be set. Any other
settings will disable HW LRO.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Count all transmitted and received bytes.
slavash [Wed, 5 Dec 2018 14:24:02 +0000 (14:24 +0000)]
mlx5en: Count all transmitted and received bytes.

Add counter for all transmitted and received bytes. Currently only all
transmitted and received packets were counted. Fix description of RX LRO
counters while at it.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Statically allocate and free the channel structure(s).
slavash [Wed, 5 Dec 2018 14:23:31 +0000 (14:23 +0000)]
mlx5en: Statically allocate and free the channel structure(s).

By allocating the worst case size channel structure array
at attach time we can eliminate various NULL checks in the
fast path. And also reduce the chance for use-after-free
issues in the transmit fast path.

This change is also a requirement for implementing
backpressure support.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Fix race in mlx5e_ethtool_debug_stats().
slavash [Wed, 5 Dec 2018 14:23:01 +0000 (14:23 +0000)]
mlx5en: Fix race in mlx5e_ethtool_debug_stats().

Writing to the debug stats variable must be locked,
else serialization will be lost which might cause
various kernel panics due to creating and destroying
sysctls out of order.

Make sure the sysctl context is initialized after freeing
the sysctl nodes, else they can be freed twice.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Add support for IFM_10G_LR and IFM_40G_ER4 media types.
slavash [Wed, 5 Dec 2018 14:22:30 +0000 (14:22 +0000)]
mlx5en: Add support for IFM_10G_LR and IFM_40G_ER4 media types.

Inspect the ethernet compliance code to figure out actual cable type by reading
the PDDR module info register.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Don't set rate on SQs when the SQ is already stopped.
slavash [Wed, 5 Dec 2018 14:21:59 +0000 (14:21 +0000)]
mlx5en: Don't set rate on SQs when the SQ is already stopped.

This can happen when connections are short lived and leads to
a firmware error printout in dmesg, syndrome 0x51cfb0, because
the SQ is in the wrong state.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Fix for inlining issues in transmit path
slavash [Wed, 5 Dec 2018 14:21:28 +0000 (14:21 +0000)]
mlx5en: Fix for inlining issues in transmit path

1) Don't exceed the drivers own hardcoded TX inline limit.

The blueflame register size can be much greater than the hardcoded limit
for inlining. Make sure we don't exceed the drivers own limit, because this
also means that the maximum number of TX fragments becomes invalid and
then memory size assumptions in the TX path no longer hold up.

2) Make sure the mlx5_query_min_inline() function returns an error code.

3) Header inlining is required when using TSO.

4) Catch failure to compute inline header size for TSO.

5) Add support for UDP when computing inline header size.

6) Fix for inlining issues with regards to DSCP.

Make sure we inline 4 bytes beyond the ethernet and/or
VLAN header to workaround a hardware bug extracting
the DSCP field from the IPv4/v6 header.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Remove the DRBR and associated logic in the transmit path.
slavash [Wed, 5 Dec 2018 14:20:57 +0000 (14:20 +0000)]
mlx5en: Remove the DRBR and associated logic in the transmit path.

The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5en: Implement support for bandwidth limiting in by ratio, ETS.
slavash [Wed, 5 Dec 2018 14:20:26 +0000 (14:20 +0000)]
mlx5en: Implement support for bandwidth limiting in by ratio, ETS.

Add support for setting the bandwidth limit as a ratio rather than in bits per
second. The ratio must be an integer number between 1 and 100 inclusivly.

Implement the needed firmware commands and SYSCTLs through mlx5en(4).

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5fpga: Add set and query connect/disconnect FPGA
slavash [Wed, 5 Dec 2018 14:19:55 +0000 (14:19 +0000)]
mlx5fpga: Add set and query connect/disconnect FPGA

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5fpga: IOCTL for FPGA temperature measurement
slavash [Wed, 5 Dec 2018 14:19:23 +0000 (14:19 +0000)]
mlx5fpga: IOCTL for FPGA temperature measurement

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5fpga: Support MorseQ board
slavash [Wed, 5 Dec 2018 14:18:52 +0000 (14:18 +0000)]
mlx5fpga: Support MorseQ board

Added and supported new enum "morseQ = 4" for fpga_id field

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5fpga_tools initial code import.
slavash [Wed, 5 Dec 2018 14:17:22 +0000 (14:17 +0000)]
mlx5fpga_tools initial code import.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5fpga: Initial code import.
slavash [Wed, 5 Dec 2018 14:11:20 +0000 (14:11 +0000)]
mlx5fpga: Initial code import.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5ib: Set default active width and speed when querying port.
slavash [Wed, 5 Dec 2018 13:49:11 +0000 (13:49 +0000)]
mlx5ib: Set default active width and speed when querying port.

Make sure the active width and speed is set in case the
translate_eth_proto_oper() function doesn't recognize the
current port operation mask.

Linux commit:
7672ed33c4c15dbe9d56880683baaba4227cf940

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5ib: Make sure the congestion work timer does not escape the drain procedure.
slavash [Wed, 5 Dec 2018 13:48:39 +0000 (13:48 +0000)]
mlx5ib: Make sure the congestion work timer does not escape the drain procedure.

If the mlx5_ib_read_cong_stats() function was running when mlx5ib was unloaded,
because this function unconditionally restarts the timer, the timer can still
be pending after the delayed work has been cancelled. To fix this simply loop
on the delayed work cancel procedure as long as it returns non-zero.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5ib: Fix null pointer dereference in mlx5_ib_create_srq
slavash [Wed, 5 Dec 2018 13:48:10 +0000 (13:48 +0000)]
mlx5ib: Fix null pointer dereference in mlx5_ib_create_srq

Although "create_srq_user" does overwrite "in.pas" on some paths, it
also contains at least one feasible path which does not overwrite it.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5ib: Fix sign extension in mlx5_ib_query_device
slavash [Wed, 5 Dec 2018 13:47:41 +0000 (13:47 +0000)]
mlx5ib: Fix sign extension in mlx5_ib_query_device

"fw_rev_min(dev->mdev)" with type "unsigned short" (16 bits, unsigned) is
promoted in "fw_rev_min(dev->mdev) << 16" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long" (64 bits, unsigned). If
"fw_rev_min(dev->mdev) << 16" is greater than 0x7FFFFFFF, the upper bits of the
result will all be 1.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Fix driver version location
slavash [Wed, 5 Dec 2018 13:47:10 +0000 (13:47 +0000)]
mlx5: Fix driver version location

Driver description should be set by core and not by the Ethernet driver.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Fixes to allow command polling mode to exist alongside event mode.
slavash [Wed, 5 Dec 2018 13:46:39 +0000 (13:46 +0000)]
mlx5: Fixes to allow command polling mode to exist alongside event mode.

A command is either polling or event driven and the mode cannot change
during execution of a command. Make sure the event handler only handle
commands which are not polled. This is done by checking the command mode
in the command handler before completing commands.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Fix wrong size allocation for QoS ETC TC register
slavash [Wed, 5 Dec 2018 13:46:09 +0000 (13:46 +0000)]
mlx5: Fix wrong size allocation for QoS ETC TC register

The driver allocates wrong size (due to wrong struct name) when issuing
a query/set request to NIC's register.

Linux commit:
d14fcb8d877caf1b8d6bd65d444bf62b21f2070c

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Add software tx_jumbo_packets counter
slavash [Wed, 5 Dec 2018 13:45:37 +0000 (13:45 +0000)]
mlx5: Add software tx_jumbo_packets counter

This counter will represent transmitted packets which has more than
1518 octets.
The NIC has multiple hardware counters for counting transmitted
packets larger than 1518 octets. Each counter counts the packets
in specific range.
We accumulate those counters to have a single counter.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Implement support for configuring PCIe packet write ordering via a sysctl.
slavash [Wed, 5 Dec 2018 13:45:08 +0000 (13:45 +0000)]
mlx5: Implement support for configuring PCIe packet write ordering via a sysctl.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Extend vector argument to u64.
slavash [Wed, 5 Dec 2018 13:44:38 +0000 (13:44 +0000)]
mlx5: Extend vector argument to u64.

Else the MLX5_TRIGGERED_CMD_COMP flag will be masked away.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Add global control to disable firmware reset, for all mlx5 devices.
slavash [Wed, 5 Dec 2018 13:44:08 +0000 (13:44 +0000)]
mlx5: Add global control to disable firmware reset, for all mlx5 devices.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Fix use-after-free in self-healing flow
slavash [Wed, 5 Dec 2018 13:43:37 +0000 (13:43 +0000)]
mlx5: Fix use-after-free in self-healing flow

When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Move hw.mlx5 node definition to mlx5_core.
slavash [Wed, 5 Dec 2018 13:43:07 +0000 (13:43 +0000)]
mlx5: Move hw.mlx5 node definition to mlx5_core.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Convert some spaces into tabs and use device_printf() instead of printf().
slavash [Wed, 5 Dec 2018 13:42:36 +0000 (13:42 +0000)]
mlx5: Convert some spaces into tabs and use device_printf() instead of printf().

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Add SRQ fixes from Linux
slavash [Wed, 5 Dec 2018 13:42:06 +0000 (13:42 +0000)]
mlx5: Add SRQ fixes from Linux

Combine multiple fixes from Linux to SRQ.
Linux commits:
c73b791 IB/mlx5: Assign SRQ type earlier
0fd27a8 IB/mlx5: Fix out-of-bound access
c2b37f7 IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
d63c467 RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Fix for potential memory leaks.
slavash [Wed, 5 Dec 2018 13:41:37 +0000 (13:41 +0000)]
mlx5: Fix for potential memory leaks.

Make sure allocated data gets freed in error cases.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Discard unused return values.
slavash [Wed, 5 Dec 2018 13:41:06 +0000 (13:41 +0000)]
mlx5: Discard unused return values.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Raise fatal IB event when sys error occurs
slavash [Wed, 5 Dec 2018 13:40:36 +0000 (13:40 +0000)]
mlx5: Raise fatal IB event when sys error occurs

All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Linux commit:
aba462134634b502d720e15b23154f21cfa277e5

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx5: Fix integer overflow while resizing CQ
slavash [Wed, 5 Dec 2018 13:40:05 +0000 (13:40 +0000)]
mlx5: Fix integer overflow while resizing CQ

The user can provide very large cqe_size which will cause to integer
overflow.

Linux commit:
28e9091e3119933c38933cb8fc48d5618eb784c8

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4en: Optimise reception of small packets.
slavash [Wed, 5 Dec 2018 13:39:35 +0000 (13:39 +0000)]
mlx4en: Optimise reception of small packets.

Copy small packets like TCP ACKs into a new mbuf
reusing the existing mbuf to receive a new ethernet
frame. This avoids wasting buffer space for
small sized packets.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4: Make sure default VNET is set when adding a new interface.
slavash [Wed, 5 Dec 2018 13:39:05 +0000 (13:39 +0000)]
mlx4: Make sure default VNET is set when adding a new interface.

Adding an interface might be done outside the device_attach() routine
and will then cause a panic, due to the VNET not being defined.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4en: Remove duplicate statistics variable assignment.
slavash [Wed, 5 Dec 2018 13:38:35 +0000 (13:38 +0000)]
mlx4en: Remove duplicate statistics variable assignment.

The "priv->pkstats.rx_dropped" is written twice in a row.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4en: Add support for receiving all data using one or more MCLBYTES sized mbufs.
slavash [Wed, 5 Dec 2018 13:32:46 +0000 (13:32 +0000)]
mlx4en: Add support for receiving all data using one or more MCLBYTES sized mbufs.
Also when the MTU is greater than MCLBYTES.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4en: Add support for netdump.
slavash [Wed, 5 Dec 2018 13:32:15 +0000 (13:32 +0000)]
mlx4en: Add support for netdump.

Implement the needed callback functions and support for polling the driver.

Differential Revision: https://reviews.freebsd.org/D15259
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4en: Remove the DRBR and associated logic in the transmit path.
slavash [Wed, 5 Dec 2018 13:31:45 +0000 (13:31 +0000)]
mlx4en: Remove the DRBR and associated logic in the transmit path.

The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4en: Add driver version to sysctl desc
slavash [Wed, 5 Dec 2018 13:31:14 +0000 (13:31 +0000)]
mlx4en: Add driver version to sysctl desc

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4: Add board identifier and firmware version to sysctl
slavash [Wed, 5 Dec 2018 13:30:48 +0000 (13:30 +0000)]
mlx4: Add board identifier and firmware version to sysctl

In last mlx4 update (r325841) we lost the sysctl to show the
firmware version for mlx4 devices.
Add both board identifier and firmware version under:
sys.device.mlx4_core0.hw sysctl node.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4core: Add checks for invalid port numbers.
slavash [Wed, 5 Dec 2018 13:30:16 +0000 (13:30 +0000)]
mlx4core: Add checks for invalid port numbers.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4: Zero initialize device capabilities to avoid use of uninitialized fields.
slavash [Wed, 5 Dec 2018 13:29:46 +0000 (13:29 +0000)]
mlx4: Zero initialize device capabilities to avoid use of uninitialized fields.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agomlx4core: Avoid multiplication overflow by casting multiplication.
slavash [Wed, 5 Dec 2018 13:29:16 +0000 (13:29 +0000)]
mlx4core: Avoid multiplication overflow by casting multiplication.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoopensm: Use precision specifier for scanf
slavash [Wed, 5 Dec 2018 13:28:46 +0000 (13:28 +0000)]
opensm: Use precision specifier for scanf

If user input a string larger than the length of buffer, the stack
memory will be corrupted.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agolibibverbs: Fix memory leak in ibv_read_sysfs_file().
slavash [Wed, 5 Dec 2018 13:28:17 +0000 (13:28 +0000)]
libibverbs: Fix memory leak in ibv_read_sysfs_file().

Testing packetdrill using valgrind resulted in finding a memory leak in
ibv_read_sysfs_file(). The attached patch fixes it.

Submitted by: tuexen@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agokrping: Fix for memory leak in error case.
slavash [Wed, 5 Dec 2018 13:27:48 +0000 (13:27 +0000)]
krping: Fix for memory leak in error case.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoipoib: Notify on modify QP failure only when relevant
slavash [Wed, 5 Dec 2018 13:27:17 +0000 (13:27 +0000)]
ipoib: Notify on modify QP failure only when relevant

Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.

The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.

Linux commit:
5dc78ad1904db597bdb4427f3ead437aae86f54c

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoipoib: increase the non-cm queue length
slavash [Wed, 5 Dec 2018 13:26:47 +0000 (13:26 +0000)]
ipoib: increase the non-cm queue length

When a packet needs fragmentation, it might generate more than 3 fragments.
With the queue length 3, all fragments are generated faster than the
queue is drained, which effectively drops fourth and later fragments on
the floor.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoipoib: Don't do a light flush when MTU is unchanged.
slavash [Wed, 5 Dec 2018 13:26:17 +0000 (13:26 +0000)]
ipoib: Don't do a light flush when MTU is unchanged.

When changing the MTU of ibX network interfaces, check that the MTU was really
changed before requesting an update of the multicast rules. Else we might go
into an infinite loop joining and leaving ibX multicast groups towards the
opensm master interface.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoipoib: correct setting MTU from inside ipoib(4).
slavash [Wed, 5 Dec 2018 13:25:47 +0000 (13:25 +0000)]
ipoib: correct setting MTU from inside ipoib(4).

It is not enough to set ifnet->if_mtu to change the interface MTU.
System saves the MTU for route in the radix tree, and route cache keeps
the interface MTU as well. Since addition of the multicast group causes
recalculation of MTU, even bringing the interface up changes MTU from
4042 to 1500, which makes the system configuration inconsistent. Worse,
ip_output() prefers route MTU over interface MTU, so large packets are
not fragmented and dropped on floor.

Fix it for ipoib(4) using the same approach (or hack) as was applied
for it_tun/if_tap in r339012.  Thanks to bz@ for giving the hint.

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Fix clearing of bound device interface.
slavash [Wed, 5 Dec 2018 13:25:13 +0000 (13:25 +0000)]
ibcore: Fix clearing of bound device interface.

Binding to a loopback device is not allowed. Make sure the destination
device address is global by clearing the bound device interface.
Only do this conditionally, else link local addresses won't work.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: ip6_dev_find() needs to know the scope ID.
slavash [Wed, 5 Dec 2018 13:24:43 +0000 (13:24 +0000)]
ibcore: ip6_dev_find() needs to know the scope ID.

Else the wrong network device can be returned for link-local addresses.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Fix sleeping in atomic when RoCE is used
slavash [Wed, 5 Dec 2018 13:24:12 +0000 (13:24 +0000)]
ibcore: Fix sleeping in atomic when RoCE is used

A couple of places in the CM do

    spin_lock_irq(&cm_id_priv->lock);
    ...
    if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))

However when the underlying transport is RoCE, this leads to a sleeping function
being called with the lock held - the callchain is

    cm_alloc_response_msg() ->
      ib_create_ah_from_wc() ->
        ib_init_ah_from_wc() ->
          rdma_addr_find_l2_eth_by_grh() ->
            rdma_resolve_ip()

and rdma_resolve_ip() starts out by doing

    req = kzalloc(sizeof *req, GFP_KERNEL);

not to mention rdma_addr_find_l2_eth_by_grh() doing

    wait_for_completion(&ctx.comp);

to wait for the task that rdma_resolve_ip() queues up.

Fix this by moving the AH creation out of the lock.

Linux commit:
c76161181193985087cd716fdf69b5cb6cf9ee85

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Add missing unref of netdevice.
slavash [Wed, 5 Dec 2018 13:23:44 +0000 (13:23 +0000)]
ibcore: Add missing unref of netdevice.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Fix loopback with rdma-cm.
slavash [Wed, 5 Dec 2018 13:23:14 +0000 (13:23 +0000)]
ibcore: Fix loopback with rdma-cm.

Trying to validate loopback fails because rtalloc1() resolves system
local addresses to the loopback network interface, lo0. Fix this by
explicitly checking for loopback during validation of the source
and destination network address. If the source address belongs to
a local network interface and is equal to the destination address,
there is no need to run the destination address through rtalloc1().

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Make sure all VNETs are scanned for VLAN interfaces.
slavash [Wed, 5 Dec 2018 13:22:43 +0000 (13:22 +0000)]
ibcore: Make sure all VNETs are scanned for VLAN interfaces.

The master network interface and the VLANs may reside in different VNETs.
Make sure that all VNETs are searched when scanning for GID entries.

Submitted by:   netapp
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Always check return value from ib_init_ah_from_wc().
slavash [Wed, 5 Dec 2018 13:22:07 +0000 (13:22 +0000)]
ibcore: Always check return value from ib_init_ah_from_wc().

This prevents code from accepting RoCEv1 connections when
only ROCEv2 is enabled and vice versa.

Linux commit:
0c4386ec77cfcd0ccbdbe8c2e67dd3a49b2a4c7f

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Add missing check for failure.
slavash [Wed, 5 Dec 2018 13:21:20 +0000 (13:21 +0000)]
ibcore: Add missing check for failure.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Fix an array index check
slavash [Wed, 5 Dec 2018 13:20:51 +0000 (13:20 +0000)]
ibcore: Fix an array index check

The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:

Overrunning array class->method_table of 80 8-byte elements at element index 127
(byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class)
(which evaluates to 127).

Linux commit:
2fe2f378dd45847d2643638c07a7658822087836

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Check ib_find_pkey() return value.
slavash [Wed, 5 Dec 2018 13:20:22 +0000 (13:20 +0000)]
ibcore: Check ib_find_pkey() return value.

Linux commit:
d3a2418ee36a59bc02e9d454723f3175dcf4bfd9

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Add support for IB_SPEED_HDR in sysfs rate printout.
slavash [Wed, 5 Dec 2018 13:19:52 +0000 (13:19 +0000)]
ibcore: Add support for IB_SPEED_HDR in sysfs rate printout.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Don't access invalid port.
slavash [Wed, 5 Dec 2018 13:19:21 +0000 (13:19 +0000)]
ibcore: Don't access invalid port.

The port number in the listen_id_priv has been observed to be zero which
means no port has been selected. The current code lacks a check for invalid
port number.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Discard unused error codes.
slavash [Wed, 5 Dec 2018 13:18:50 +0000 (13:18 +0000)]
ibcore: Discard unused error codes.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agoibcore: Make sure GID index variable gets initialized.
slavash [Wed, 5 Dec 2018 13:18:20 +0000 (13:18 +0000)]
ibcore: Make sure GID index variable gets initialized.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agolinuxkpi: Really check if PCI is offline
slavash [Wed, 5 Dec 2018 13:17:45 +0000 (13:17 +0000)]
linuxkpi: Really check if PCI is offline

Currently we always return false if for PCI offline query.
Try to read PCI config, if the return value if 0xffff probably the
PCI is offline.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agolinuxkpi: properly implement netif_carrier_ok().
slavash [Wed, 5 Dec 2018 13:17:15 +0000 (13:17 +0000)]
linuxkpi: properly implement netif_carrier_ok().

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agolinuxkpi: Fix for use-after-free when tearing down character devices.
slavash [Wed, 5 Dec 2018 13:16:39 +0000 (13:16 +0000)]
linuxkpi: Fix for use-after-free when tearing down character devices.

Make sure we hold a reference on the character device for every opened file
to prevent the character device to be freed prematurely.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agolinuxkpi: implement idr_is_empty() and ida_is_empty().
slavash [Wed, 5 Dec 2018 13:15:57 +0000 (13:15 +0000)]
linuxkpi: implement idr_is_empty() and ida_is_empty().

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies

5 years agonetmap: align codebase to the current upstream (760279cfb2730a585)
vmaffione [Wed, 5 Dec 2018 11:57:16 +0000 (11:57 +0000)]
netmap: align codebase to the current upstream (760279cfb2730a585)

Changelist:
  - Replace netmap passthrough host support with a more general
    mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop.
    No kernel threads are used to use this feature: the application
    is required to spawn a thread (or a process) and issue a
    SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The
    kernel loop is executed by the ioctl implementation, which returns
    to userspace only when a different thread calls SYNC_KLOOP_STOP
    or the netmap file descriptor is closed.
  - Update the if_ptnet driver to cope with the new data structures,
    and prune all the obsolete ptnetmap code.
  - Add support for "null" netmap ports, useful to allocate netmap_if,
    netmap_ring and netmap buffers to be used by specialized applications
    (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect.
  - Various fixes and code refactoring.

Sponsored by: Sunny Valley Networks
Differential Revision: https://reviews.freebsd.org/D18015

5 years agoAllow bootstrapping libopenbsd on Linux
arichardson [Wed, 5 Dec 2018 10:58:02 +0000 (10:58 +0000)]
Allow bootstrapping libopenbsd on Linux

The getdtablecount.c file won't compile on Linux but it seems like none of
the bootstrap tools actually need it.

Reviewed By: emaste, brooks
Differential Revision: https://reviews.freebsd.org/D14244

5 years agoFix newvers.sh with BUILD_WITH_STRICT_TMPPATH=1
arichardson [Wed, 5 Dec 2018 10:57:57 +0000 (10:57 +0000)]
Fix newvers.sh with BUILD_WITH_STRICT_TMPPATH=1

newvers.sh runs mkfifo which did not exist before this change.
However, I didn't notice before because it is run from a function
where a missing command does cause a noticeable failure.

Reviewed By: emaste, markj
Differential Revision: https://reviews.freebsd.org/D18377

5 years agoTidy up arm64 reloc_jmpslots() implementation.
mmel [Wed, 5 Dec 2018 10:30:53 +0000 (10:30 +0000)]
Tidy up arm64 reloc_jmpslots() implementation.
- don't relocate jump slots multiple times (if LD_BIND_NOW is defined).
- process only R_AARCH64_JUMP_SLOT here, other relocation types are handled
  by reloc_plt().

MFC after: 1 week

5 years agoImplement arm64 version of __tls_get_addr().
mmel [Wed, 5 Dec 2018 10:23:38 +0000 (10:23 +0000)]
Implement arm64 version of __tls_get_addr().

MFC after: 1 week

5 years agoFix style(9).
mmel [Wed, 5 Dec 2018 10:22:14 +0000 (10:22 +0000)]
Fix style(9).
Not a functional change.

MFC after: 1 week

5 years agoEnsure that cylinder-group check-hashes are properly updated when first
mckusick [Wed, 5 Dec 2018 06:31:50 +0000 (06:31 +0000)]
Ensure that cylinder-group check-hashes are properly updated when first
creating them and when correcting them when they are found to be corrupted.

Reported by:  Don Lewis (truckman@)
Sponsored by: Netflix

5 years agoRemove MD __sys_* private symbols.
brooks [Wed, 5 Dec 2018 00:46:09 +0000 (00:46 +0000)]
Remove MD __sys_* private symbols.

No references to any of these exist in the tree. The list was also
erratic with different architectures exporting different things
(arm64 and riscv exported none).

Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18425

5 years agoaltq: manual cleanup after r341507
vangyzen [Tue, 4 Dec 2018 23:53:42 +0000 (23:53 +0000)]
altq:  manual cleanup after r341507

Remove a file that became practically empty.
Fix indentation.

Like r341507, I do not plan to MFC, but anyone else can.

5 years agoaltq: remove ALTQ3_COMPAT code
vangyzen [Tue, 4 Dec 2018 23:46:43 +0000 (23:46 +0000)]
altq: remove ALTQ3_COMPAT code

This code has apparently never compiled on FreeBSD since its
introduction in 2004 (r130365).  It has certainly not compiled
since 2006, when r164033 added #elsif [sic] preprocessor directives.
The code was left in the tree to reduce the diff from upstream (KAME).
Since that upstream is no longer relevant, remove the long-dead code.

This commit is the direct result of:

    unifdef -m -UALTQ3_COMPAT sys/net/altq/*

A later commit will do some manual cleanup.

I do not plan to MFC this.  If that would help you, go for it.

5 years agoext2fs.4: basic updates.
pfg [Tue, 4 Dec 2018 22:51:13 +0000 (22:51 +0000)]
ext2fs.4: basic updates.

Starting with FreeBSD 12 we fully support writing ext4 filesystems.
Mention some features that we don't support while here.

MFC after: 3 days

5 years agoRegen after r341495: Remove NOARGS from oaccept.
brooks [Tue, 4 Dec 2018 21:57:26 +0000 (21:57 +0000)]
Regen after r341495: Remove NOARGS from oaccept.

5 years agoRemove NOARGS from oaccept.
brooks [Tue, 4 Dec 2018 21:56:45 +0000 (21:56 +0000)]
Remove NOARGS from oaccept.

This was in the orignal patch, but lost in a rebase.

Reported by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15816

5 years agoAnother attempt to fix issue with the DIOCGDELETE ioctl(2) not
sobomax [Tue, 4 Dec 2018 21:48:56 +0000 (21:48 +0000)]
Another attempt to fix issue with the DIOCGDELETE ioctl(2) not
handling slightly out-of-bound requests properly (r340187).
Perform range check here rather then rely on g_delete_data() to DTRT.

The g_delete_data() would always return success for requests
starting just the next byte after providers media boundary.

MFC after: 4 weeks

5 years agoOnly gnu/lib/csu when MK_BSD_CRTBEGIN is off.
andrew [Tue, 4 Dec 2018 18:51:28 +0000 (18:51 +0000)]
Only gnu/lib/csu when MK_BSD_CRTBEGIN is off.

We were still building it from Makefile.inc1. Disable it there so we don't
try to build the GNU crtbegin/crtend when the BSD version was asked for.

PR: 233733
Reported by: lwhsu
Reviewed by: emaste
MFC with: r339738
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18428

5 years agoAlways treat firmware request and response sizes as unsigned.
gordon [Tue, 4 Dec 2018 18:28:25 +0000 (18:28 +0000)]
Always treat firmware request and response sizes as unsigned.

This fixes an incomplete bounds check on the guest-supplied request
size where a very large request size could be interpreted as a negative
value and not be caught by the bounds check.

Submitted by: jhb
Reported by: Reno Robert
Approved by: so
Security: FreeBSD-SA-18:14.bhyve
Security: CVE-2018-17160

5 years agoRegen after r341474: Normalize COMPAT_43 syscall declarations.
brooks [Tue, 4 Dec 2018 16:49:14 +0000 (16:49 +0000)]
Regen after r341474: Normalize COMPAT_43 syscall declarations.

5 years agoNormalize COMPAT_43 syscall declarations.
brooks [Tue, 4 Dec 2018 16:48:47 +0000 (16:48 +0000)]
Normalize COMPAT_43 syscall declarations.

Have ogetkerninfo, ogetpagesize, ogethostname, osethostname, and oaccept
declare o<foo>_args structs rather than non-compat ones. Due to a
failure to use NOARGS in most cases this adds only one new declaration.

No changes required in freebsd32 as only ogetpagesize() is implemented
and it has a 32-bit specific implementation.

Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15816

5 years agoFix args cross-threading between gptboot(8) and loader(8) with zfs support.
ian [Tue, 4 Dec 2018 16:43:50 +0000 (16:43 +0000)]
Fix args cross-threading between gptboot(8) and loader(8) with zfs support.

When loader(8) is built with zfs support enabled, it assumes that any extarg
data present is a zfs_boot_args struct, but if the first-stage loader was
gptboot(8) the extarg data is actually a geli_boot_args struct.  Luckily,
zfsboot(8) and gptzfsboot(8) have always passed KARGS_FLAGS_ZFS along with
KARGS_FLAGS_EXTARG, so we can use KARGS_FLAGS_ZFS to decide whether the
extarg data is a zfs_boot_args struct.

To avoid similar problems in the future, gptboot(8) now passes a new
KARGS_FLAGS_GELI to indicate that extarg data is geli_boot_args.  In
loader(8), if the neither KARGS_FLAGS_ZFS nor KARGS_FLAGS_GELI is set but
extarg data is present (which will be the case for gptboot compiled before
this change), we now check for the known size of the geli_boot_args struct
passed by the older versions of gptboot as a way of confirming what type of
extarg data is present.

In a semi-related tidying up, since loader's main() has already decided
what type of extarg data is present and set the global 'zargs' var
accordingly, don't repeat the check in extract_currdev, just check whether
zargs is NULL or not.

X-MFC after: a few days, along with prior related changes.

5 years agoAdd ability to request listing and deleting only for dynamic states.
ae [Tue, 4 Dec 2018 16:12:43 +0000 (16:12 +0000)]
Add ability to request listing and deleting only for dynamic states.

This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but
after rules reloading some state must be deleted. Added new flag '-D'
for such purpose.

Retire '-e' flag, since there can not be expired states in the meaning
that this flag historically had.

Also add "verbose" mode for listing of dynamic states, it can be enabled
with '-v' flag and adds additional information to states list. This can
be useful for debugging.

Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC

5 years agoReimplement how net.inet.ip.fw.dyn_keep_states works.
ae [Tue, 4 Dec 2018 16:01:25 +0000 (16:01 +0000)]
Reimplement how net.inet.ip.fw.dyn_keep_states works.

Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".

Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.

The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.

Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.

ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.

ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.

External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.

Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532

5 years agoggated: do not expose stack data in sendfail()
emaste [Tue, 4 Dec 2018 15:25:15 +0000 (15:25 +0000)]
ggated: do not expose stack data in sendfail()

admbugs: 590
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: ElectroBSD

5 years agoAdd assertion to check that named object has correct type.
ae [Tue, 4 Dec 2018 15:12:28 +0000 (15:12 +0000)]
Add assertion to check that named object has correct type.

Obtained from: Yandex LLC
MFC after: 1 week

5 years agoRestore /var/crash permissions to 0750, as declared in mtree file. After
garga [Tue, 4 Dec 2018 12:34:22 +0000 (12:34 +0000)]
Restore /var/crash permissions to 0750, as declared in mtree file. After
r337337 it changed to 0755.

Reviewed by: loos
Approved by: loos
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D18355