rmacklem [Mon, 22 May 2017 21:41:34 +0000 (21:41 +0000)]
MFC: r317931
Fix mount_nfs so that it doesn't create mounttab entries for NFSv4 mounts.
The NFSv4 protocol doesn't use the Mount protocol, so it doesn't make sense
to add an entry for an NFSv4 mount to /var/db/mounttab. Also, r308871
modified umount so that it doesn't remove any entry created by mount_nfs.
rmacklem [Mon, 22 May 2017 19:34:37 +0000 (19:34 +0000)]
MFC: r317906
Fix the client side krpc from doing TCP reconnects for ERESTART from sosend().
When sosend() replies ERESTART in the client side krpc, it indicates that
the RPC message hasn't yet been sent and that the send queue is full or
locked while a signal is posted for the process.
Without this patch, this would result in a RPC_CANTSEND reply from
clnt_vc_call(), which would cause clnt_reconnect_call() to create a new
TCP transport connection. For most NFS servers, this wasn't a serious problem,
although it did imply retries of outstanding RPCs, which could possibly
have missed the DRC.
For an NFSv4.1 mount to AmazonEFS, this caused a serious problem, since
AmazonEFS often didn't retain the NFSv4.1 session and would reply with
NFS4ERR_BAD_SESSION. This implies to the client a crash/reboot which
requires open/lock state recovery.
Three options were considered to fix this:
- Return the ERESTART all the way up to the system call boundary and then
have the system call redone. This is fraught with risk, due to convoluted
code paths, asynchronous I/O RPCs etc. cperciva@ worked on this, but it
is still a work in prgress and may not be feasible.
- Set SB_NOINTR for the socket buffer. This fixes the problem, but makes
the sosend() completely non interruptible, which kib@ considered
inappropriate. It also would break forced dismount when a thread
was blocked in sosend().
- Modify the retry loop in clnt_vc_call(), so that it loops for this case
for up to 15sec. Testing showed that the sosend() usually succeeded by
the 2nd retry. The extreme case observed was 111 loop iterations, or
about 100msec of delay.
This third alternative is what is implemented in this patch, since the
change is:
- localized
- straightforward
- forced dismount is not broken by it.
This patch has been tested by cperciva@ extensively against AmazonEFS.
davidcs [Mon, 22 May 2017 19:22:06 +0000 (19:22 +0000)]
MFC r318382
1. Move Rx Processing to fp_taskqueue(). With this CPU utilization for
processing interrupts drops to around 1% for 100G and under 1% for
other speeds.
2. Use sysctls for TRACE_LRO_CNT and TRACE_TSO_PKT_LEN
3. remove unused mtx tx_lock
4. bind taskqueue kernel thread to the appropriate cpu core
5. when tx_ring is full, stop further transmits till at least 1/16th of
the Tx Ring is empty. In our case 1K entries. Also if there are
rx_pkts to process, put the taskqueue thread to sleep for 100ms,
before enabling interrupts.
6. Use rx_pkt_threshold of 128.
gjb [Mon, 22 May 2017 16:07:17 +0000 (16:07 +0000)]
MFC r307469 (imp):
Allow root_rw_mount to be both lower and upper case. Before, if it was
upper case, you'd wind up with a read-only filesystem when you should
sometimes.
MSDOS and Windows GNU grep uses -u to mean "print byte offsets as if
running on an UNIX system." The option has no effect on systems that
do not use CRLF line endings.
asomers [Mon, 22 May 2017 15:12:49 +0000 (15:12 +0000)]
MFC r318189:
vdev_geom may associate multiple vdevs per g_consumer
vdev_geom.c currently uses the g_consumer's private field to point to a
vdev_t. That way, a GEOM event can cause a change to a ZFS vdev. For
example, when you remove a disk, the vdev's status will change to REMOVED.
However, vdev_geom will sometimes attach multiple vdevs to the same GEOM
consumer. If this happens, then geom events will only be propagated to one
of the vdevs.
Fix this by storing a linked list of vdevs in g_consumer's private field.
Fix the output of very large rebind, renew and lease time options in
lease file.
Some routers set very large values for rebind time (Netgear) and these
are erroneously reported as negative in the leasefile. This was due to a
wrong printf format specification of %ld for an unsigned long on 32-bit
platforms.
They would overflow a signed 32-bit time_t on 32 bit architectures. This
was taken care of, but a compiler optimisation makes this behave
erratically. This could be resolved by adding a -fwrapv flag, but
instead we can check the value before adding the current timestamp to
it.
hselasky [Mon, 22 May 2017 08:17:07 +0000 (08:17 +0000)]
MFC r318531:
mlx4: Use the CQ quota for SRIOV when creating completion EQs
When creating EQs to handle CQ completion events for the PF or for
VFs, we create enough EQE entries to handle completions for the max
number of CQs that can use that EQ.
When SRIOV is activated, the max number of CQs a VF (or the PF) can
obtain is its CQ quota (determined by the Hypervisor resource
tracker). Therefore, when creating an EQ, the number of EQE entries
that the VF should request for that EQ is the CQ quota value (and not
the total number of CQs available in the firmware).
Under SRIOV, the PF, also must use its CQ quota, because the resource
tracker also controls how many CQs the PF can obtain.
Using the firmware total CQs instead of the CQ quota when creating EQs
resulted wasting MTT entries, due to allocating more EQEs than were
needed.
ngie [Mon, 22 May 2017 06:24:43 +0000 (06:24 +0000)]
MFC r317594:
usb(4): manpage cleanup
1. Wrap at <80 columns for readability when editing. Rewrap some lines
prematurely wrapped to better fit in <80 columns and not waste
vertical space.
2. Fix SEE ALSO sorting (sort by section first, then manpage name).
3. Tweak the compound device description slightly by adding soft stops
via commas.
ngie [Mon, 22 May 2017 06:07:09 +0000 (06:07 +0000)]
MFC r315793:
intro(3): fix markup
- Use `Em` with `.It` macro when referring to other libraries, instead of
`Xr`.
- Use `.Em` instead of `.Xr` when referring to libraries.
- Remove commented out lines.
jhibbits [Sat, 20 May 2017 05:12:32 +0000 (05:12 +0000)]
MFC r314370,r318130,r318167:
DTrace related fixes for PowerPC.
r314370:
Unbreak kernel breakpoints, broken for ~4 years now
r318130:
Fix the encoded instruction for FBT traps on powerpc
r318167:
Fix stack tracing in dtrace for powerpc
gjb [Sat, 20 May 2017 01:04:19 +0000 (01:04 +0000)]
Document r316613, C standard library has been updated to make use
of reallocarray(3).
Document r318121, system libraries have been updated to make use
of reallocarray(3).
Document r315282, GNU __nonnull__ attribute have been replaced with
the more benign Clang nullability attributes.
Submitted by: pfg
Sponsored by: The FreeBSD Foundation
gjb [Sat, 20 May 2017 01:04:18 +0000 (01:04 +0000)]
Document r316613, C standard library has been updated to make use
of reallocarray(3).
Document r318121, system libraries have been updated to make use
of reallocarray(3).
Submitted by: pfg
Sponsored by: The FreeBSD Foundation
jhb [Fri, 19 May 2017 23:01:55 +0000 (23:01 +0000)]
MFC 318360: Fix p_endcopy.
For p_endcopy to work correctly, it must be the name of the next element
in struct proc after the end of the copy region, not the name of the
last element in the copy region. Currently, the last element
(p_elf_flags) is not being copied. In addition, the p_xexit and
p_xsig fields should not be copied on fork, so move them out of the
copy region.
Note that for stable/11 the fix is a bit simpler than in HEAD as it
merely consists of formally moving p_xexit and p_xsig out of the
copy region by shrinking the end of the copy region.
gjb [Fri, 19 May 2017 20:11:35 +0000 (20:11 +0000)]
Add a 'Userland Debugging' section.
Document r306786, core dumps now include the process ID (PID) and
command line arguments.
Document r304499, ptrace(2) now supports events for vfork(2).
Submitted by: jhb
Sponsored by: The FreeBSD Foundation
gjb [Fri, 19 May 2017 20:11:34 +0000 (20:11 +0000)]
Document r306520, PCI pass through with bhyve resets functions via
FLR.
Document r306471, PCI pass through with bhyve supports more dynamic
configurations.
Submitted by: jhb
Sponsored by: The FreeBSD Foundation
gjb [Fri, 19 May 2017 20:11:33 +0000 (20:11 +0000)]
Document r306660, Virtual Function devices on T4 and T5 adapters.
Document r306661, TCP Offload Engine on Chelsio T4+ adapters.
Document r306664, cxgbev(4) addition.
Document r309560, several cxgbe(4) and cxgbev(4) updates.
Submitted by: jhb
Sponsored by: The FreeBSD Foundation
hselasky [Fri, 19 May 2017 12:51:13 +0000 (12:51 +0000)]
MFC r313555:
Flexible and asymmetric allocation of EQs and MSI-X vectors for PF/VFs.
Previously, the mlx4 driver queried the firmware in order to get the
number of supported EQs. Under SRIOV, since this was done before the
driver notified the firmware how many VFs it actually needs, the
firmware had to take into account a worst case scenario and always
allocated four EQs per VF, where one was used for events while the
others were used for completions. Now, when the firmware supports the
asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (-->
MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query
the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X
vectors per function. Moreover, when running in the new
firmware/driver mode, the limitation that the number of EQs should be
a power of two is lifted.
Obtained from: Linux (dual BSD/GPLv2 licensed)
Submitted by: Dexuan Cui @ microsoft . com
Differential Revision: https://reviews.freebsd.org/D8867
Sponsored by: Mellanox Technologies
hselasky [Fri, 19 May 2017 12:35:23 +0000 (12:35 +0000)]
MFC r313556:
Change mlx4 QP allocation scheme.
When using Blue-Flame, BF, the QPN overrides the VLAN, CV, and SV
fields in the WQE. Thus, BF may only be used for QPNs with bits 6,7
unset.
The current ethernet driver code reserves a TX QP range with 256b
alignment.
This is wrong because if there are more than 64 TX QPs in use, QPNs >=
base + 65 will have bits 6/7 set.
This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.
The new mechanism introduced here will support reservation for "Eth
QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:
1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
and request "BF enabled QPs" if BF is supported for the function
2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0] - number of QPs
b. param1[31-24] - flags controlling QPs reservation
Bit 31 refers to Eth blueflame supported QPs. Those QPs must have bits
6 and 7 unset in order to be used in Ethernet.
Bits 24-30 of the flags are currently reserved.
When a function tries to allocate a QP, it states the required
attributes for this QP. Those attributes are considered "best-effort".
If an attribute, such as Ethernet BF enabled QP, is a must-have
attribute, the function has to check that attribute is supported
before trying to do the allocation.
In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those
attributes and masks out unsupported attributes as well. In order to
notify VFs which attributes are supported, the VF uses QUERY_FUNC_CAP
command. This command's mailbox is filled by the PF, which notifies
which QP allocation attributes it supports.
Obtained from: Linux (dual BSD/GPLv2 licensed)
Submitted by: Dexuan Cui @ microsoft . com
Differential Revision: https://reviews.freebsd.org/D8868
Sponsored by: Mellanox Technologies
vangyzen [Fri, 19 May 2017 00:33:48 +0000 (00:33 +0000)]
MFC r318354 (by cem)
Correct page frame mask constant used in pmap_change_attr_locked
This was introduced in r290156. It's present in 11.0, but not any 10.x
release unless someone decided to MFC it.
It affects ordinary pages right above the DMAP limit, which is effectively
system memory rounded up to a 1 GB (3rd level superpage) boundary (or up to
a minimum of 4 GB, on small systems).
gjb [Fri, 19 May 2017 00:00:38 +0000 (00:00 +0000)]
Update stable/11 from 11.0-STABLE to 11.1-PRERELEASE, marking the
official start of the code slush.
Set the default mdoc(7) version to 11.1, and update the clang(1)
TARGET_TRIPLE to reflect 11.1. While here, add missing FreeBSD
major versions to mdoc(7).
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation
rpokala [Thu, 18 May 2017 23:41:18 +0000 (23:41 +0000)]
Persistently store NIC's hardware MAC address, and add a way to retrive it
jhb pointed out that (struct ifnet) is part of the network driver KBI, and
thus the offsets of internal fields must not change. Therefore, move the new
"if_hw_addr" field to the end, and consume one of the "if_pspare"s; that's
what they're there for. The new field replaces the *last* element of that
array; that way, offsetof(if_pspare) and offsetof(if_ispare) are unchanged
compared to before r318397.
wulf [Thu, 18 May 2017 21:23:39 +0000 (21:23 +0000)]
MFC r317811:
Fix triple-finger taps reported as double-finger for Elan hw v.4 touchpads
Wait for all advertised head packets after status packet have been received.
This fixes rare but quite annoying issue in Elan hw v.4 touchpads support
when triple-finger taps are reported as double-finger taps under several
circumstances.
MFC r317812:
Reduce default tap_min_queue size for Elan touchpads
Elan hw v.4 touchpads often sends touchpad release packet right after
touchpad touch one. Most probably this happens due to PS/2 limited bandwith.
Reducing of tap_min_queue size to 1 makes multifinger tap detection
more reliable in this case.
MFC r317813:
Adjust Elantech palm width threshold to nearly match synaptics defaults
MFC r317814:
psm(4): reduce cursor jumping on palm detection
This is done with discarding pointer movements rather then mouse packets
MFC r317815:
Enable palm detection on two finger touches for multitouch trackpads.
MFC r317816:
Report 3-rd and 4-th fingers as first finger for Elan hw v.2 and v.3 as
Linux does. It should not affect gesture processing in current state as it
ignores finger coords on 3-finger tap detection but it should make evdev
reports looking more Linux-alike.
MFC r317817:
Set predefined logical touchpad sizes for several ancient Elan hw v.2
models. This change is based on Linux driver.
Determine logical trace size. It used for calculation of touch sizes
in surface units for MT-protocol type B evdev reports.
MFC r317818:
psm(4): Remove sys/libkern.h header inclusion
It is already included via sys/systm.h
MFC r317819:
Reduce synaptics touch sensitivity
Increase hw.psm.synaptics.min_pressure default value from 16 to 32
to nearly match Linux driver (30-35 hysteresis loop).
This makes libinput tap detection more reliable.
marius [Thu, 18 May 2017 21:00:50 +0000 (21:00 +0000)]
MFC: r318282
- Unlike as in the PCI case, when attached to ACPI, Intel Bay Trail
and Braswell eMMC and SDXC controllers share the same IDs. Like in
the PCI case, Braswell eMMC needs the SDHCI_QUIRK_DATA_TIMEOUT_1MHZ
quirk (see r311794 for the corresponding change to the sdhci(4) PCI
PCI front-end), though. However, due to the shared ACPI IDs, this
is trickier to do.
- Intel Apollo Lake eMMC and SDXC controllers are affected by the
APL18 ("Using 32-bit Addressing Mode With SD/eMMC Controller May
Lead to Unpredictable System Behavior") silicon bug. When this
erratum hits, typically both SDHCI and XHCI controllers wedge.
According to Intel, using ADMA2 with 64-bit addressing and 96-bit
descriptors serves as a workaround. Until such times when sdhci(4)
has ADMA2 support, flag DMA as broken for affected interfaces.
This turns out to work around the problem, too, at the cost of
performance.
- In the sdhci(4) ACPI front-end, probe the Intel Apollo Lake eMMC
and SDXC controllers, too.
marius [Thu, 18 May 2017 20:46:20 +0000 (20:46 +0000)]
MFC: r315598
o Add support for eMMC DDR bus speed mode up to 52 MHz to sdhci(4)
and mmc(4). Given that support for DDR52 is not denoted by SDHCI
capability registers, availability of that timing is indicated by
a new quirk SDHCI_QUIRK_MMC_DDR52 and only enabled for Intel SDHCI
controllers so far.
Compared to 50 MHz at SDR high speed typically yielding ~45 MB/s
read throughput with the eMMC chips tested, read performance goes
up to ~80 MB/s at DDR52.
As a side-effect, this change also fixes communication with some
eMMC devices at SDR high speed mode due to the signaling voltage
and UHS bits in the SDHCI controller no longer being left in an
inappropriate state.
o In sdhci(4), add two tunables hw.sdhci.quirk_clear as well as
hw.sdhci.quirk_set, which (when hooked up in the front-end)
allow to set/clear sdhci(4) quirks for debugging and testing
purposes. However, especially for SDHCI controllers on the
PCI bus which have no specific support code so far and, thus,
are picked up as generic SDHCI controllers, hw.sdhci.quirk_set
allows for setting the necessary quirks (if required).
o In mmc(4), check and handle the return values of some more
function calls instead of assuming that everything went right.
In case failures actually are not problematic, indicate that
by casting the return value to void.
trasz [Thu, 18 May 2017 20:37:47 +0000 (20:37 +0000)]
MFC r317803:
Enable automounting of exFAT media.
With fstyp(8) being updated to detect exfat in base r312003, it seems
like a good time to add support for auto-mounting SDXC cards -- which
use exfat by default.
The user will need to locally compile and install sysutils/fusefs-exfat
for this to succeed; logs a message to that effect when not installed.
emaste [Thu, 18 May 2017 17:40:30 +0000 (17:40 +0000)]
MFC LLD changes and enable LLD as /usr/bin/ld on arm64 by default
MFC r316629: do not require binutils port when using lld as ld
r279908 added logic to Makefile.inc1 to automatically set
CROSS_BINUTILS_PREFIX for architectures not supported by the in-tree
binutils: arm64 when first introduced, and later riscv64 as well.
LLVM's LLD linker is now included in the base system, and is enabled by
default for arm64 and capable of linking world and kernel. Thus, avoid
automatically setting CROSS_BINUTILS_PREFIX and requiring the binutils
port if WITH_LLD_IS_LD is true.
--
MFC r317608: revert r313473 (Disable LLD_IS_LD option combinations that fail)
r316647 corrected the build of tblgen and libllvm as dependencies for
LLD so undo the temporary seat-belt.
We still want to extend the build infrastructure to automatically detect
the case where the host LLD can be used instead of building a bootstrap
LLD, and likely extend libllvmminimal to meet LLD's needs for cases
where the build includes LLD but not Clang.
--
MFC r316684: Make WITHOUT_TOOLCHAIN imply WITHOUT_LLD.
LLD is a toolchain component.
--
MFC r316647: Introduce LLD_BOOTSTRAP to control lld as bootstrap linker
Add WITH_LLD_BOOTSTRAP and WITHOUT_LLD_BOOTSTRAP knobs, similar to the
Clang bootstrap knobs.
Reviewed by: andrew
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10793