MFC r366573: Add DSCP support for network QoS to iscsi initiator.
Allow the DSCP codepoint also to be configurable
for the traffic in the direction from the initiator
to the target, such that writes and any requests
are also treated in the appropriate QoS class.
MFC r366570: Stop sending tiny new data segments during SACK recovery
Consider the currently in-use TCP options when
calculating the amount of new data to be injected during
SACK loss recovery. That addresses the effect that very small
(new) segments could be injected on partial ACKs while
still performing a SACK loss recovery.
MFC r366569: Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow.
This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt)
option IP(V6)_VLAN_PCP, which can be set to -1 (interface
default), or explicitly to any priority between 0 and 7.
Note that for untagged traffic, explicitly adding a
priority will insert a special 801.1Q vlan header with
vlan ID = 0 to carry the priority setting
MFC r366566;r366567: Extend netstat to display TCP stack and detailed congestion state
Upstreaming the "-c" option used to show detailed per-connection
congestion control state for TCP sessions.
This is one summary patch, which adds the relevant variables into
xtcpcb. As previous "spare" space is used, these changes are ABI
compatible (an older version of netstat will simply not show
the newly available data from newer kernels, and a newer version
of netstat will only show zeroed data querying older kernels.
Ryan Moeller [Fri, 23 Oct 2020 10:48:14 +0000 (10:48 +0000)]
MFC r366771:
bhyve: Update TX descriptor base address and host mapping on change
bhyve sometimes segfaults when using an e1000 NIC with a Windows guest.
We are only updating our tdba and cached host mapping when the low address
register is written and when tx is set enabled, but not when the high address
or length registers are written. It is observed that Windows 10 is occasionally
enabling tx first then writing the registers in the order low, high, len. This
leaves us with a bogus base address and mapping, which causes a segfault later
when we try to copy from a descriptor that has unpredictable garbage in a
pointer.
Updating the address and mapping when any of those registers change seems to fix
that particular issue.
Marcin Wojtas [Thu, 22 Oct 2020 17:31:41 +0000 (17:31 +0000)]
MFC r362574
Fix AccessWidth and BitWidth parsing in SPCR table
The ACPI Specification defines a Generic Address Structure (GAS),
which is used to describe UART controller register layout in the
SPCR table. The driver responsible for parsing it (uart_cpu_acpi)
wrongly associates the Access Size field to the uart_bas's regshft
and the register BitWidth to the regiowidth - according to
the definitions it should be opposite.
This problem remained hidden most likely because the majority of platforms
use 32-bit registers (BitWidth) which are accessed with the according
size (Dword). However on Marvell Armada 8k / Cn913x platforms,
the 32-bit registers should be accessed with Byte granulity, which
unveiled the issue.
This patch fixes above by proper values assignment and slightly improved
parsing.
Note that handling of the AccessWidth set to EFI_ACPI_6_0_UNDEFINED is
needed to work around a buggy SPCR table on EC2 x86 "bare metal" instances.
Brooks Davis [Thu, 22 Oct 2020 16:27:24 +0000 (16:27 +0000)]
MFC r366731:
physio: Don't store user addresses in bio_data
Only assign the address from the iovec to bio_data if it is a kernel
address. This was the single place where bio_data stored (however
briefly) a userspace pointer.
Brooks Davis [Wed, 21 Oct 2020 16:04:57 +0000 (16:04 +0000)]
MFC r366671:
libgssapi: modernize static string array use
Use designated initializers to document positions in the arrays rather
than requiring counting. Use nitems() rather than rolling it by hand to
count elements.
Also, passify a Clang 12 warning about suspcious string concatenation
within an array initializer by adding parentheses.
Alexander Motin [Wed, 21 Oct 2020 00:46:53 +0000 (00:46 +0000)]
MFC r366707: Use RTD3 Entry Latency value as shutdown timeout.
This field was not in specs when the driver was written, but now there
are SSDs with the reported latency of 10s, where hardcoded value of 5s
seems to be not enough sometimes, causing shutdown timeout messages.
MFC r366682:
Join to AllHosts multicast group again when adding an existing IPv4 address.
When SIOCAIFADDR ioctl configures an IPv4 address that is already exist,
it removes old ifaddr. When this IPv4 address is only one configured on
the interface, this also leads to leaving from AllHosts multicast group.
Then an address is added again, but due to the bug, this doesn't lead
to joining to AllHosts multicast group.
MFC r366681:
Add IPv4 fragments reassembling to NAT64LSN.
NAT64LSN requires the presence of upper level protocol header
in a IPv4 datagram to find corresponding state to make translation.
Now it will be handled automatically by nat64lsn instance.
MFC r366669:
Implement more RCU list functions in the LinuxKPI.
This also fixes a bug in the existing list_add_rcu() where the
prev->prev pointer was updated to the new element instead of
next->prev. Currently this function is not widely used.
MFC r366518:
Properly cleanup driver during remove_one() in mlx5core.
Cleanup all host resources, SYSCTLs, MSIX vectors and memory used
by the host and only leave the device allocated memory behind, if any,
because it may still be in use, when the PCI remove function is called.
Else future probe calls may fail due to SYSCTLs already existing.
MFC r366432:
Populate the acquire context field of a ww_mutex in the LinuxKPI.
Bump the FreeBSD version to force recompilation of external kernel modules.
Alexander Motin [Mon, 19 Oct 2020 20:42:01 +0000 (20:42 +0000)]
MFC r360546, r360547 (by imp): Various improvements to this man page:
o Be consistent about device-id and namespace-id
o Use consistent arg markup for these
o document you can use disk names too
o document nsid command better
o document the idenntify command
o add a couple of examples.
Alexander Motin [Mon, 19 Oct 2020 20:40:03 +0000 (20:40 +0000)]
MFC r352671 (by imp): Size is unsigned, so remove the test entirely.
The kernel won't crash if you have a bad value and I'd rather not have
nvmecontrol know the internal details about how the nvme driver limits
the transfer size.
Alexander Motin [Mon, 19 Oct 2020 20:39:00 +0000 (20:39 +0000)]
MFC r352665 (by imp):
After my comnd changes, the number of threads and size weren't set. In
addition, the flags are optional, but were made to be mandatory. Set
these things, as well as santiy check the specified size.
Alexander Motin [Mon, 19 Oct 2020 20:37:04 +0000 (20:37 +0000)]
MFC r352212 (by imp):
Assume all the short args have optional args so allocate space for the
':'. It's slightly wasteful, but much easier (and the savings in bytes
at runtime would be tiny, but the code to do it larger).
Andriy Gapon [Mon, 19 Oct 2020 07:03:04 +0000 (07:03 +0000)]
MFC r365402: musb/allwinner: add support for configuring phy as well as device mode
At least on Orange Pi PC Plus even the host mode does not work without
enabling the phy and setting it to the host mode.
The driver will now parse dr_mode property and will try to configure
itself and the phy accordingly.
OTG mode is not supported yet, so it is treated as the device / peripheral
mode.
The phy is enabled -- powered on -- only for the host mode.
The device mode requires support from a phy driver, e.g., aw_usbphy on
Allwinner platform.
aw_usbphy does not support the device mode, so it cannnot work yet.
MFC r366568:
Fix EINVAL message when CPU binding information is requested for IRQ.
`cpuset -g -x N` along with requested information always prints
message `cpuset: getdomain: Invalid argument'. The EINVAL is returned
from kern_cpuset_getdomain(), since it doesn't expect CPU_LEVEL_WHICH
and CPU_WHICH_IRQ parameters.
To fix the error, do not call cpuset_getdomain() when `-x' is specified.
Alexander Motin [Tue, 13 Oct 2020 17:50:01 +0000 (17:50 +0000)]
MFC r355083 (by rlibby): sysctl: wire old buf before output with sysctl lock
Several sysctl sysctls output to a user buffer while holding a
non-sleepable lock that protects the sysctl topology. They need to wire
the output buffer, or else they may try to sleep on a page fault.
Andriy Gapon [Mon, 12 Oct 2020 11:34:09 +0000 (11:34 +0000)]
MFC r363021 by manu: twsi: Fix for > Allwinner A20
Every revision of twsi after the A20 have a bug where we need to
write again the control register after each interrupts. We also need
to add some delay before writing to this register, a simple read of the
same register does the job so do that.
Also fix the case when we have finish sending all the bytes, it only worked
for 1 byte transfer (the same kind that we do for talking to the PMIC on A20
boards).
While here add more debug messages and rework some of them.
This was tested by talking to a AT23C32 eeprom and a DS3231 RTC from an
H3 and A20 board.
MFC r366206: Add DSCP support for network QoS to iscsi target.
In order to prioritize iSCSI traffic across a network,
DSCP can be used. In order not to rely on "ipfw setdscp"
or in-network reclassification, this adds the dscp value
directly to the portal group (where TCP sessions are accepted).
Allan Jude [Fri, 9 Oct 2020 23:02:09 +0000 (23:02 +0000)]
MFC r364787:
ZFS: whitelist zstd and encryption in the loader
Please note that neither zstd nor encryption is
supported by the loader at this instant. This
change makes it safe to use those features in
one's root pool, but not in one's root dataset.
Allan Jude [Fri, 9 Oct 2020 22:42:04 +0000 (22:42 +0000)]
MFC r364787:
ZFS: whitelist zstd and encryption in the loader
Please note that neither zstd nor encryption is
supported by the loader at this instant. This
change makes it safe to use those features in
one's root pool, but not in one's root dataset.
Warner Losh [Fri, 9 Oct 2020 21:01:53 +0000 (21:01 +0000)]
MFC: r366216 imp
Fix video on PCI heuristic
The video on PCI heuristic was broken. It was supposed to infer a
video device when the last element of the path was a PCI DEVICE PATH
node. However, the last node in the device path is an END node, so
this heuristic never fired.
This leads, among other things, to bhyve only producing output in the
serial connection once we leave the boot loader. This restores the
dual headed boot on bhyve + UEFI (as we did in 11.2), but will favor
serial in the absence of other config which may be a change from 11.2.
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D26572
MFC r366150: TCP: send full initial window when timestamps are in use
The fastpath in tcp_output tries to send out
full segments, and avoid sending partial segments by
comparing against the static t_maxseg variable.
That value does not consider tcp options like timestamps,
while the initial window calculation is using
the correct dynamic tcp_maxseg() function.
Due to this interaction, the last, full size segment
is considered too short and not sent out immediately.
Adjust ssthresh in after_idle to the maximum of
the prior ssthresh, or 3/4 of the prior cwnd. See
RFC2861 section 2 for an in depth explanation for
the rationale around this.
As newreno is the default "fall-through" reaction,
most tcp variants will benefit from this.
Kyle Evans [Thu, 8 Oct 2020 12:56:23 +0000 (12:56 +0000)]
MFC r366466: crunchgen: fix MK_AUTO_OBJ logic after r364166
r364166 converted echo -n `/bin/pwd` to a raw pwd invocation, leaving a
trailing newline at the end of path. This caused a later stat() of it to
erroneously fail and the fallback to MK_AUTO_OBJ=no logic proceeded as
unexpected.
Harry Schmalzbauer bissected the resulting build failure he experienced
(stable/12 host, -HEAD build) down to r365887. This change is mostly
unrelated, except it switches the build to bootstrapped crunchgen - clue!
I then bissected recent crunchgen changes going back a bit since we wouldn't
observe the failure immediately with -CURRENT in most configurations, which
landed me on r364166. After many intense head-scratching minutes and printf
debugging, I realized that the newline was the difference. This is where our
tale ends.
Navdeep Parhar [Mon, 5 Oct 2020 19:45:11 +0000 (19:45 +0000)]
MFC r366354:
cxgbe(4): validate largest_rx_cluster and safest_rx_cluster.
These tunables can only be set to a valid cluster size (2K, 4K, 9K, or
16K) as documented in the man page. Anything else could lead to a
panic on interface up.
Navdeep Parhar [Mon, 5 Oct 2020 19:22:28 +0000 (19:22 +0000)]
MFC r366247:
cxgbe(4): Avoid unnecessary work in the firmware during netmap tx.
Bind the netmap tx queues to a special '0xff' scheduling class which
makes the firmware skip some processing related to rate limiting on the
outgoing traffic. Future firmwares will do this automatically.
Navdeep Parhar [Mon, 5 Oct 2020 18:45:32 +0000 (18:45 +0000)]
MFC r366242:
cxgbe(4): fixes for netmap operation with only some queues active.
- Only active netmap receive queues should be in the RSS lookup table.
- The RSS table should be restored for NIC operation when the last
active netmap queue is switched off, not the first one.
- Support repeated netmap ON/OFF on a subset of the queues. This works
whether the the queues being enabled and disabled are the only ones
active or not. Some kring indexes have to be reset in the driver for
the second case.