tuexen [Thu, 31 May 2018 16:14:45 +0000 (16:14 +0000)]
MFC r333382:
When reporting ERROR or ABORT chunks, don't use more data
that is guaranteed to be contigous.
Thanks to Felix Weinrank for finding and reporting this bug
by fuzzing the usrsctp stack.
MFC r333386:
Fix two typos reported by N. J. Mann, which were introduced in
https://svnweb.freebsd.org/changeset/base/333382 by me.
tuexen [Thu, 31 May 2018 16:00:03 +0000 (16:00 +0000)]
MFC r333186:
Send an ICMPv6 PacketTooBig message in case of forwading a packet which
is too big for the outgoing interface and no firewall is involed.
This problem was introduced in
https://svnweb.freebsd.org/changeset/base/324996
Thanks to Irene Ruengeler for finding the bug and testing the fix.
rwlock: diff-reduction of runlock compared to sx sunlock
==
Undo LOCK_PROFILING pessimisation after r313454 and r313455
With the option used to compile the kernel both sx and rw shared ops would
always go to the slow path which added avoidable overhead even when the
facility is disabled.
Furthermore the increased time spent doing uncontested shared lock acquire
would be bogusly added to total wait time, somewhat skewing the results.
Restore old behaviour of going there only when profiling is enabled.
This change is a no-op for kernels without LOCK_PROFILING (which is the
default).
==
sx: fix adaptive spinning broken in r327397
The condition was flipped.
In particular heavy multithreaded kernel builds on zfs started suffering
due to nested sx locks.
For instance make -s -j 128 buildkernel:
before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total
after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total
==
locks: fix a corner case in r327399
If there were exactly rowner_retries/asx_retries (by default: 10) transitions
between read and write state and the waiters still did not get the lock, the
next owner -> reader transition would result in the code correctly falling
back to turnstile/sleepq where it would incorrectly think it was waiting
for a writer and decide to leave turnstile/sleepq to loop back. From this
point it would take ts/sq trips until the lock gets released.
The bug sometimes manifested itself in stalls during -j 128 package builds.
Refactor the code to fix the bug, while here remove some of the gratituous
differences between rw and sx locks.
==
sx: don't do an atomic op in upgrade if it cananot succeed
The code already pays the cost of reading the lock to obtain the waiters
flag. Checking whether there is more than one reader is not a problem and
avoids dirtying the line.
This also fixes a small corner case: if waiters were to show up between
reading the flag and upgrading the lock, the operation would fail even
though it should not. No correctness change here though.
==
mtx: tidy up recursion handling in thread lock
Normally after grabbing the lock it has to be verified we got the right one
to begin with. However, if we are recursing, it must not change thus the
check can be avoided. In particular this avoids a lock read for non-recursing
case which found out the lock was changed.
While here avoid an irq trip of this happens.
==
locks: slightly depessimize lockstat
The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.
Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.
This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total
So note this is still a significant hit.
LOCK_PROFILING results are not affected.
==
rw: whack avoidable re-reads in try_upgrade
==
locks: extend speculative spin waiting for readers to drain
Now that 10 years have passed since the original limit of 10000 was
committed, bump it a little bit.
Spinning waiting for writers is semi-informed in the sense that we always
know if the owner is running and base the decision to spin on that.
However, no such information is provided for read-locking. In particular
this means that it is possible for a write-spinner to completely waste cpu
time waiting for the lock to be released, while the reader holding it was
preempted and is now waiting for the spinner to go off cpu.
Nonetheless, in majority of cases it is an improvement to spin instead of
instantly giving up and going to sleep.
The current approach is pretty simple: snatch the number of current readers
and performs that many pauses before checking again. The total number of
pauses to execute is limited to 10k. If the lock is still not free by
that time, go to sleep.
Given the previously noted problem of not knowing whether spinning makes
any sense to begin with the new limit has to remain rather conservative.
But at the very least it should also be related to the machine. Waiting
for writers uses parameters selected based on the number of activated
hardware threads. The upper limit of pause instructions to be executed
in-between re-reads of the lock is typically 16384 or 32678. It was
selected as the limit of total spins. The lower bound is set to
already present 10000 as to not change it for smaller machines.
Bumping the limit reduces system time by few % during benchmarks like
buildworld, buildkernel and others. Tested on 2 and 4 socket machines
(Broadwell, Skylake).
Figuring out how to make a more informed decision while not pessimizing
the fast path is left as an exercise for the reader.
==
fix uninitialized variable warning in reader locks
r334261: Guard against error when given -t "*..."
r334262: Eliminate ANSI dimming in developer mode
r334359: Fix "-t test" for post-processing profiles
Bump FreeBSD_version directly in stable/11 for ports IGNORE (as in r334290)
Reviewed by: gjb
Approved by: re (gjb)
Sponsored by: Smule, Inc.
jhb [Tue, 29 May 2018 13:54:34 +0000 (13:54 +0000)]
MFC 333606: Make the common interrupt entry point labels local labels.
Kernel debuggers depend on symbol names to find stack frames with a
trapframe rather than a normal stack frame. The labels used for the
shared interrupt entry point for the PTI and non-PTI cases did not
match the existing patterns confusing debuggers. Add the '.L' prefix
to mark these symbols as local so they are not visible in the symbol
table.
royger [Tue, 29 May 2018 07:51:24 +0000 (07:51 +0000)]
MFC r334027: xen-blkback: do not use state 3
Linux will not connect to a backend that's in state 3
(XenbusStateInitialised), it needs to be in state 2
(XenbusStateInitWait) for Linux to attempt to connect to the
backend.
marius [Thu, 24 May 2018 23:11:25 +0000 (23:11 +0000)]
MFC: r333955
- Unbreak booting sparc64 kernels after the metadata unification in
r329190 (MFCed to stable/11 in r332150); sparc64 kernels are always
64-bit but with that revision in place, the loader was treating them
as 32-bit ones.
- In order to reduce the likelihood of this kind of breakage in the
future, #ifdef out md_load() on sparc64 and make md_load_dual() -
which is currently local to metadata.c anyway - static.
- Make md_getboothowto() - also local to metadata.c - static.
- Get rid of the unused DTB pointer on sparc64.
ae [Thu, 24 May 2018 11:02:21 +0000 (11:02 +0000)]
MFC r333986:
Remove check for matching the rulenum, ruleid and rule pointer from
dyn_lookup_ipv[46]_state_locked(). These checks are remnants of not
ready to be committed code, and they are there by accident.
Due to the race these checks can lead to creating of duplicate states
when concurrent threads in the same time will try to add state for two
packets of the same flow, but in reverse directions and matched by
different parent rules.
Reported by: lev
MFC r334039:
Restore the ability to keep states after parent rule deletion.
This feature is disabled by default and was removed when dynamic states
implementation changed to be lockless. Now it is reimplemented with small
differences - when dyn_keep_states sysctl variable is enabled,
dyn_match_ipv[46]_state() function doesn't match child states of deleted
rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state()
function does check that state was not orphaned, and if so, it returns
pointer to default_rule and its position in the rules map. The main visible
difference is that orphaned states still have the same rule number that
they have before parent rule deleted, because now a state has many fields
related to rule and changing them all atomically to point to default_rule
seems hard enough.
Reported by: <lantw44 at gmail.com>
Approved by: re (kib)
ken [Mon, 21 May 2018 18:59:34 +0000 (18:59 +0000)]
MFC r333492:
------------------------------------------------------------------------
r333492 | ken | 2018-05-11 08:50:26 -0600 (Fri, 11 May 2018) | 10 lines
Clear out the entire structure, not just the size of a pointer to it.
sys/dev/ocs/ocs_os.c:
In ocs_thread_create(), use sizeof(*thread) (instead of
sizeof(thread)) as the size argument to memset so that we clear
out the entire thread structure instead of just a few bytes of it.
dim [Sun, 20 May 2018 16:03:21 +0000 (16:03 +0000)]
MFC r333715:
Pull in r322325 from upstream llvm trunk (by Matthias Braun):
PeepholeOpt cleanup/refactor; NFC
- Less unnecessary use of `auto`
- Add early `using RegSubRegPair(AndIdx) =` to avoid countless
`TargetInstrInfo::` qualifications.
- Use references instead of pointers where possible.
- Remove unused parameters.
- Rewrite the CopyRewriter class hierarchy:
- Pull out uncoalescable copy rewriting functionality into
PeepholeOptimizer class.
- Use an abstract base class to make it clear that rewriters are
independent.
- Remove unnecessary \brief in doxygen comments.
- Remove unused constructor and method from ValueTracker.
- Replace UseAdvancedTracking of ValueTracker with DisableAdvCopyOpt
use.
Even though upstream marked this as "No Functional Change", it does
contain some functional changes, and these fix a compiler hang for one
particular source file in the devel/godot port.
gjb [Fri, 18 May 2018 14:57:58 +0000 (14:57 +0000)]
MFC r315733, r315737, r315740, r330054:
r315733 (imp):
Impelemnt ttys onifexists in init.
Implement a new init(8) option in /etc/ttys. If this option is present
on the entry in /etc/ttys, the entry will be active if and only if it
exists. If the name starts with a '/', it will be considered an
absolute path. If not, it will be a path relative to /dev.
This allows one to turn off video console getty that aren't present
(while running a getty on them even when they aren't the system
console). Likewise with serial ports.
It differs from onifconsole in only requiring the device exist rather
than it be listed as one of the system consoles.
r315737 (ngie):
Unbreak world by adding sys/stat.h for stat(2)
r315740 (imp):
Simplify the code a little.
r330054 (trasz):
Improve missing tty handling in init(8). This removes a check that did
nothing - it was checking for ENXIO, which, with devfs, is no longer
returned - and was badly placed anyway, and replaces it with similar
one that works, and is done just before starting getty, instead of being
done when rereading ttys(5).
From the practical point of view, this makes init(8) handle disappearing
terminals (eg /dev/ttyU*) gracefully, without unneccessary getty restarts
and resulting error messages.
Reported by: Bart Ender, Andre Albsmeier
PR: 228315
Blocks: 11.2-BETA2
Approved by: re (marius)
Sponsored by: The FreeBSD Foundation
ae [Fri, 18 May 2018 10:17:13 +0000 (10:17 +0000)]
MFC r333497:
Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsulated packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.
marius [Thu, 17 May 2018 21:22:19 +0000 (21:22 +0000)]
MFC: r333613
The broken DDR52 support of Intel Bay Trail eMMC controllers rumored
in the commit log of r321385 has been confirmed via the public VLI54
erratum. Thus, stop advertising DDR52 for these controllers.
Note that this change should hardly make a difference in practice as
eMMC chips from the same era as these SoCs most likely support HS200
at least, probably even up to HS400ES.
manu [Thu, 17 May 2018 17:00:07 +0000 (17:00 +0000)]
MFC r333737:
release: arm: Format FAT partition as FAT16
r332674 raised the size of the FAT partition from 2MB to 41MB for some
boards. But we format them in FAT12 and this size appears to be to big
for FAT12 and some SoC bootrom cannot cope with that.
Format the msdosfs partition as FAT16,
sbruno [Thu, 17 May 2018 16:32:38 +0000 (16:32 +0000)]
MFC r333499
Add deprecation notice for vxge.
This driver was merged to HEAD one week prior to Exar publicly
announcing theyhad left the Ethernet market. It is not known to be used
and has various code quality issues spotted by Brooks and Hiren. Retire
it in preparation for FreeBSD 12.0.
ae [Thu, 17 May 2018 10:01:47 +0000 (10:01 +0000)]
MFC r333458:
Fix the printing of rule comments.
Change uint8_t type of opcode argument to int in the print_opcode()
function. Use negative value to print the rest of opcodes, because
zero value is O_NOP, and it can't be uses for this purpose.
jhb [Wed, 16 May 2018 21:04:19 +0000 (21:04 +0000)]
MFC 332891,332892: Fixes for atomic_*cmpset() on arm.
332891:
Fix some harmless type mismatches in the ARM atomic_cmpset implementations.
The return value of atomic_cmpset() and atomic_fcmpset() is an int (which
is really a bool) that has the values 0 or 1. Some of the inlines were
using the type being operated on (e.g. uint32_t) as either the return type
of the function, or the type of a local 'ret' variable used to hold the
return value. Fix all of these to just use plain 'int'. Due to C promotion
rules and the fact that the value can only be 0 or 1, these should all be
harmless.
332892:
Implement 32-bit atomic_fcmpset() in userland for armv4/v5.
- Add an implementation of atomic_fcmpset_32() using RAS for armv4/v5.
This fixes recent world breakage due to use of atomic_fcmpset() in
userland.
- While here, be more careful to not expose wrapper macros for 64-bit
atomic_*cmpset to userland for armv4/v5 as only 32-bit cmpset is
implemented.
This has been reviewed, but not runtime-tested, but should fix the arm.arm
and arm.armeb worlds that have been broken for a while.
r331340:
cxgbe(4): Tunnel congestion drops on a port should be cleared when the
stats for that port are cleared.
r331342:
cxgbe(4): Do not read MFG diags information from custom boards.
r331472:
cxgbe(4): Always initialize requested_speed to a valid value.
This fixes an avoidable EINVAL when the user tries to disable AN after
the port is initialized but l1cfg doesn't have a valid speed to use.
r332050:
cxgbe(4): Always display an error message if SIOCSIFFLAGS will leave
IFF_UP and IFF_DRV_RUNNING out of sync. ifhwioctl in the kernel pays no
attention to the return code from the driver ioctl during SIOCSIFFLAGS
so these messages are the only indication that the ioctl was called but
failed.
r333276:
cxgbe(4): Update all firmwares to 1.19.1.0.
r333448:
cxgbe(4): Disable write-combined doorbells by default.
This had been the default behavior but was changed accidentally as part
of the recent iw_cxgbe+OFED overhaul. Fix another bug in that change
while here: the global knob affects all the adapters in the system and
should be left alone by per-adapter code.
ae [Tue, 15 May 2018 11:43:05 +0000 (11:43 +0000)]
MFC r333244:
Immediately propagate EACCES error code to application from tcp_output.
In r309610 and r315514 the behavior of handling EACCES was changed, and
tcp_output() now returns zero when EACCES happens. The reason of this
change was a hesitation that applications that use TCP-MD5 will be
affected by changes in project/ipsec.
TCP-MD5 code returns EACCES when security assocition for given connection
is not configured. But the same error code can return pfil(9), and this
change has affected connections blocked by pfil(9). E.g. application
doesn't return immediately when SYN segment is blocked, instead it waits
when several tries will be failed.
Actually, for TCP-MD5 application it doesn't matter will it get EACCES
after first SYN, or after several tries. Security associtions must be
configured before initiating TCP connection.
I left the EACCES in the switch() to show that it has special handling.
Reported by: Andreas Longwitz <longwitz at incore dot de>
Approved by: re (marius)
hselasky [Tue, 15 May 2018 09:40:52 +0000 (09:40 +0000)]
MFC r333362:
Fix for missing network interface address event when adding the default IPv6
based link-local address.
The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.
Approved by: re (marius)
Reviewed by: network (ae), kib
Differential Revision: https://reviews.freebsd.org/D13407
Sponsored by: Mellanox Technologies
gonzo [Tue, 15 May 2018 02:26:50 +0000 (02:26 +0000)]
MFC r331906:
Approved by: re (gjb)
Fix accidental USB port resets by GPIO on Zynq/Zedboard boards
The Zynq/Zedboard GPIO driver attempts to tri-state all GPIO pins on
boot up but the order in which I reset the hardware can cause the pins
to be briefly held low before being tri-stated. This is a problem on
boards that use GPIO pins to reset devices.
In particular, the Zybo and ZC-706 boards use a GPIO pin as a USB PHY
reset. If U-boot enables the USB port before booting the kernel, the
GPIO driver attach causes a glitch on the USB PHY reset and the USB
port loses power. My fix is to have the GPIO driver leave the pins in
whatever configuration U-boot placed them.
PR: 225713
Submitted by: Thomas Skibo <thoma555-bsd@yahoo.com>
r329188: Use tabs in io.d, fix alignment issues, remove extra newlines
r329334: Add errno definitions to /usr/lib/dtrace/errno.d
r329353: Add inline to errno.d for translating int to string
r329914: Updates and enhancements to io.d to aid DTrace scripting
r329995: Updates and enhancements to signal.d to aid DTrace scripting
r329996: Consistent casing for fallback SIGCHLD (s/Unknown/unknown/)
r330559: Introduce dwatch(1) as a tool for making DTrace more useful
r330560: Bump dwatch(1) internal version from 1.0-beta-91 to 1.0
r330672: Fix display of wrong pid from dtrace_sched(4)
r332865: Add `-dev' option to aid debugging of profiles
r332866: Add profile for send(2)/recv(2) syscalls
r332867: Remove the line used to demonstrate `-dev' option
r333513: Bugfix, usage displayed with `-1Q'
r333514: Separate default values so `-[BK] num' don't affect usage
r333515: Simplify info message test
r333516: Export ARGV to profiles loaded via load_profile()
r333517: Allow `-E code' to override profile EVENT_DETAILS
r333518: Expose process for ip/tcp/udp
r333519: Refactor sendrecv profile
gjb [Mon, 14 May 2018 17:43:43 +0000 (17:43 +0000)]
MFC r333473:
Add a special GCE_LICENSE variable to Makefile.gce, which when set,
will include license metadata in the resultant GCE image.
GCE_LICENSE is unset by default, as it primarily pertains to images
produced by the FreeBSD Project, but for downstream FreeBSD consumers,
it can be set in the make(1) environment in the format of:
The "license" is not a license, per se, but required metadata that
is required by the GCE marketplace. For the FreeBSD Project, the
license name is simply 'freebsd', with the description of 'FreeBSD'.
Approved by: re (marius)
Sponsored by: The FreeBSD Foundation
trasz [Mon, 14 May 2018 15:35:54 +0000 (15:35 +0000)]
MFC r333493:
Set kldxref_enable="YES" for ARM images. Without it, the images are missing
the /boot/kernel/linker.hints file, which breaks loading some of the modules
with dependencies, eg cfiscsi.ko.
This is a minimal fix for ARM images, in order to safely MFC it before
11.2-RELEASE. Afterwards, however, I believe we should actually just change
the default (as in, etc/defaults/rc.conf). The reason is that it's required
for every image that's being cross-built, as kldxref(1) cannot handle files
for non-native architectures. For the one that is not - amd64 - having it
on by default doesn't change anything - the script is noop if the linker.hints
already exists.
The long-term solution would be to rewrite kldxref(1) to handle other
architectures, and generate linker.hints at build time.
Approved by: re (marius@)
Sponsored by: DARPA, AFRL
jtl [Sat, 12 May 2018 01:55:24 +0000 (01:55 +0000)]
r285910 attempted to make shutdown() be POSIX compliant by returning
ENOTCONN when shutdown() is called on unconnected sockets. This change was
slightly modified by r316874, which returns ENOTCONN in the case of an
unconnected datagram socket, but still runs the shutdown code for the
socket. This specifically supports the case where the user-space code is
using the shutdown() call to wakeup another thread blocked on the socket.
In PR 227259, a user is reporting that they have code which is using
shutdown() to wakup another thread blocked on a stream listen socket. This
code is failing, while it used to work on FreeBSD 10 and still works on
Linux.
It seems reasonable to add another exception to support something users are
actually doing, which used to work on FreeBSD 10, and still works on Linux.
And, it seems like it should be acceptable to POSIX, as we still return
ENOTCONN.
This is a direct commit to stable/11. The listen socket code changed
substantially in head, and the code change there will be substantially
more complex. In the meantime, it seems to make sense to commit this
trivial fix to stable/11 given the fact that users appear to depend on
this behavior, this appears to have been an unintended change in stable/11,
and we did not announce the change.
PR: 227259
Reviewed by: ed
Approved by: re (gjb)
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D15021
Tested by: Eric Masson (emss at free.fr)
gjb [Fri, 11 May 2018 21:46:53 +0000 (21:46 +0000)]
Create a sun7i-a20-bananapi.dtb hard link to bananapi.dtb to fix
a boot failure on the Banana Pi SoC.
This is a direct commit to stable/11, as the sun7i-a20-bananapi.dtb
file exists in head, but appears to have been part of a larger
rework of dtb-related files that may have larger consequences than
hard link creation. Note: creating a hard link to dtb files was
an original fix in 12-CURRENT beforehand, introduced in r319603.
Approved by: re (marius)
Sponsored by: The FreeBSD Foundation
gjb [Thu, 10 May 2018 23:58:33 +0000 (23:58 +0000)]
Rename stable/11 from PRERELEASE to BETA1 as part of the 11.2-RELEASE
cycle.
Update the default pkg(8) repository to the 'quarterly' branch to
prevent further 11.2 builds from downgrading packages when invoking
'pkg upgrade' for the duration of the cycle.
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation
sbruno [Wed, 9 May 2018 16:14:12 +0000 (16:14 +0000)]
MFC r333019 r333046 r333085 r333086 r333132
smartpqi(4):
- Microsemi SCSI driver for PQI controllers.
- Found on newer model HP servers.
- Restrict to AMD64 only as per developer request.
The driver provides support for the new generation of PQI controllers
from Microsemi. This driver is the first SCSI driver to implement the
PQI queuing model and it will replace the aacraid driver for Adaptec
Series 9 controllers. HARDWARE Controllers supported by the driver include:
HPE Gen10 Smart Array Controller Family
OEM Controllers based on the Microsemi Chipset.
emaste [Wed, 9 May 2018 14:50:32 +0000 (14:50 +0000)]
MFC r332966: Add deprecation notice for lmc(4)
We intend to remove support before FreeBSD 12 is branched. These are
available only as 32-bit PCI devices. The driver has an ambiguous
license and I have not been successful in contacting the driver's author
in order to address this.
The planned deprecation has been announced on -current and -stable; if
we receive feedback that the driver is still useful and we are able to
resolve the license issue this deprecation notice can be reverted.
Relnotes: Yes
Approved by: re
Sponsored by: The FreeBSD Foundation
emaste [Wed, 9 May 2018 14:38:07 +0000 (14:38 +0000)]
MFC r332446: switch i386 memstick installer images to MBR
Some BIOSes have trouble booting from GPT in non-UEFI mode. This is
commonly reported with Lenovo laptops, including my x220. As we do not
currently support booting FreeBSD/i386 via UEFI there's no reason to
prefer GPT.
The "vestigial swap partition" was added in r265017 to work around an
issue with loader's GPT support, so we should not need it when using
MBR.
We may want to make the same change to amd64, although the issue there is
mitigated by such systems booting via UEFI in the common case.
PR: 227422
Approved by: re
Relnotes: Yes
Sponsored by: The FreeBSD Foundation