marius [Thu, 7 Jun 2018 15:03:47 +0000 (15:03 +0000)]
MFC: r334443 (by cem@)
dhclient(8): allow to supersede interface-mtu option
In some cases broken DHCP servers might send invalid MTU value, so allow to
use 'supersede' in dhclient.conf to override this. When superseded value is
0, MTU value is not updated at all.
dim [Thu, 7 Jun 2018 09:03:42 +0000 (09:03 +0000)]
MFC r334445:
Resolve conflicts between macros in fenv.h and ieeefp.h
This is a follow-up to r321483, which disabled -Wmacro-redefined for
some lib/msun tests.
If an application included both fenv.h and ieeefp.h, several macros such
as __fldcw(), __fldenv() were defined in both headers, with slightly
different arguments, leading to conflicts.
Fix this by putting all the common macros in the machine-specific
versions of ieeefp.h. Where needed, update the arguments in places
where the macros are invoked.
This also slightly reduces the differences between the amd64 and i386
versions of ieeefp.h.
hselasky [Thu, 7 Jun 2018 07:44:54 +0000 (07:44 +0000)]
MFC r334423:
Implement idr_is_empty() in the LinuxKPI and make idr_remove() API compatible
with upstream Linux by returning the pointer to the removed element.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies
pfg [Wed, 6 Jun 2018 22:29:21 +0000 (22:29 +0000)]
MFC r333311:
msdosfs: use vfs_timestamp() to generate timestamps instead of getnanotime().
Most filesystems, with the notable exceptions of msdosfs and autofs use
only vfs_timestamp() to read the current time. This has the benefit of
configurable granularity (using the vfs.timestamp_precision sysctl).
rmacklem [Wed, 6 Jun 2018 22:02:20 +0000 (22:02 +0000)]
MFC: r333580
Fix a slow leak of session structures in the NFSv4.1 server.
For a fairly rare case of a client doing an ExchangeID after a hard reboot,
the old confirmed clientid still exists, but some clients use a new
co_verifier. For this case, the server was not freeing up the sessions on
the old confirmed clientid.
This patch fixes this case. It also adds two LIST_INIT() macros, which are
actually no-ops, since the structure is malloc()d with M_ZERO so the pointer
is already set to NULL.
It should have minimal impact, since the only way I could exercise this
code path was by doing a hard power cycle (pulling the plus) on a machine
running Linux with a NFSv4.1 mount on the server.
Originally spotted during testing of the ESXi 6.5 client.
tuexen [Wed, 6 Jun 2018 20:03:35 +0000 (20:03 +0000)]
MFC r334532:
Don't overflow a buffer if we receive an INIT or INIT-ACK chunk
without a RANDOM parameter but with a CHUNKS or HMAC-ALGO parameter.
Please note that sending this combination violates the specification.
Thanks to Ronald E. Crane for reporting the issue for the userland
stack.
rmacklem [Wed, 6 Jun 2018 01:21:33 +0000 (01:21 +0000)]
MFC: r334396
Strengthen locking for the NFSv4.1 server DestroySession operation.
If a client did a DestroySession on a session while it was still in use,
the server might try to use the session structure after it is free'd.
I think the client has violated RFC5661 if it does this, but this patch
makes DestroySession block all other nfsd threads so no thread could
be using the session when it is free'd. After the DestroySession, nfsd
threads will not be able to find the session. The patch also adds a check
for nd_sessionid being set, although if that was not the case it would have
been all 0s and unlikely to have a false match.
This might fix the crashes described in PR#228497 for the FreeNAS server.
rmacklem [Mon, 4 Jun 2018 20:47:37 +0000 (20:47 +0000)]
MFC: r334252
Fix the sleep event for layout recall.
The sleep for I/O completion during an NFSv4.1 pNFS layout recall used
the wrong event value and could result in the "[nfscl]" thread hung
for the mount.
This patch fixes the event to be the correct.
This bug will only affect NFSv4.1 pnfs mounts and only when the server
does a layout recall callback, so it won't affect many. Without the patch,
a mount without the "pnfs" option will avoid the problem.
Found during testing of the pNFS server.
rmacklem [Mon, 4 Jun 2018 20:21:51 +0000 (20:21 +0000)]
MFC: r333592
Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.
In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.
delphij [Mon, 4 Jun 2018 05:38:22 +0000 (05:38 +0000)]
MFC r333098:
Don't bail out from the check if readboot() returns !FSFATAL.
This can happen when the fsinfo signature is invalid, and the
user have choose to fix it, in which case the code would return
FSBOOTMOD (not FSOK but not FSFATAL either).
All other (fatal) cases would return FSFATAL.
Obtained from: Android Open Source Project
Obtained from: https://android.googlesource.com/platform/external/fsck_msdos/+/d8775a29ea7eac2e5f1504dd21da3725b93b3036
dim [Sun, 3 Jun 2018 17:17:45 +0000 (17:17 +0000)]
MFC r334432:
Fix build of stand with base gcc
* Make autoboot() a static function in stand/common/boot.c, so it does
not shadow local variables in gptboot.c and zfsboot.c.
* Remove -Winline from the Makefiles for gptboot, gptzfsboot and
zfsboot, as gcc will always fail to inline some functions, and there
is nothing we can do about it.
* For gcc <= 4.2.1, silence -Wuninitialized for isoboot, as it produces
a false positive warning.
* Remove deprecated and unnecessary -mcpu=i386 flag from stand/defs.mk,
as there is already a -march=i386 flag further in the file.
MFC r333650, r333652, r333682, r334406, r334409-r334410, and r334489.
r333650:
cxgbe(4): Claim some more T5 and T6 boards.
r333652:
cxgbe(4): Add support for two more flash parts.
r333682:
cxgbe(4): Fall back to a failsafe configuration built into the firmware
if an error is reported while pre-processing the configuration file that
the driver attempted to use.
Also, allow the user to explicitly use the built-in configuration with
hw.cxgbe.config_file="built-in"
r334406:
cxgbe(4): Consider all supported speeds when building the ifmedia list
for a port. Fix other related issues while here:
- Require port lock for access to link_config.
- Allow 100Mbps operation by tracking the speed in Mbps. Yes, really.
- New port flag to indicate that the media list is immutable. It will
be used in future refinements.
This also fixes a bug where the driver reports incorrect media with
recent firmwares.
r334409:
cxgbe(4): Implement ifm_change callback.
r334410:
cxgbe(4): Use ifm for ifmedia just like the rest of the kernel.
No functional change.
r334489:
cxgbe(4): Include full duplex mediaopt in media that can be reported as
active. Always report full duplex in active media.
gjb [Thu, 31 May 2018 23:55:59 +0000 (23:55 +0000)]
MFC r334068 (phil):
Import libxo-0.9.0:
- Add xo_format_is_numeric() with improved logic to decide if format
strings are numeric, so json output quotes them
- Convert docs to sphinx/rst
- update tests
PR: 221676
Approved by: re (marius)
Sponsored by: The FreeBSD Foundation
marius [Thu, 31 May 2018 23:48:27 +0000 (23:48 +0000)]
Akin r302691 in head, synchronize the build stripping for the disc1
image with that of the bootonly image (but similarly modulo games
and groff(1)) as the amd64 disc1 image is overflowing. This also
removes the redundant MK_LLDB.
This is a direct commit to stable/11 rather than a MFC of r302691 as
the the disc1 image stripping previously has been directly modified
in stable/11 by r303027.
gjb [Thu, 31 May 2018 20:01:58 +0000 (20:01 +0000)]
MFC r334310, r334337:
r334310 (imp):
Teach ufs_module.c about bsd labels and probe 'a' partition.
If the check for a UFS partition at offset 0 on the disk fails, check
to see if there's a BSD disklabel at block 1 (standard) or at offset
512 (install images assume 512 sector size). If found, probe for UFS
on the 'a' partition.
This fixes UEFI booting images from a BSD labeled MBR slice when the
'a' partiton isn't at offset 0. This is a stop-gap fix since we plan
on removing boot1.efi in FreeBSD 12. We can't easily do that for 11.2,
however, hence the short MFC window.
r334337 (emaste):
switch amd64 memstick installer images to MBR
A good number of BIOSes have trouble booting from GPT in non-UEFI
mode.
With this change amd64 memsticks remain dual-mode (booting from either
UEFI or CSM); the partitioning type is just switched from GPT to MBR.
PR: 227954
Note, there are two changes specific to stable/11 where there is code
that had diverged from head and never merged back. The two changes are
an include in stand/efi/boot1/ufs_module.c, replacing sys/disk/bsd.h
with sys/disklabel.h and replacing BSD_MAGIC with DISKMAGIC in the
same file. The latter two are direct commits to stable/11 in order to
avoid unexpected regressions at this point of the 11.2 cycle. Thank
you to imp@ for pointing out what changes needed to be made.
tuexen [Thu, 31 May 2018 16:48:08 +0000 (16:48 +0000)]
MFC r333176:
Fix in the documentation that the default hop limit is not 30, but
the value of the sysctl variable net.inet6.ip6.hlim.
This is true since
https://svnweb.freebsd.org/base?view=revision&revision=122574
The default of 30 (which was correct up to r122574) was incorrectly
documented in
https://svnweb.freebsd.org/base?view=revision&revision=130268
Thanks to Timo Voelker for makeing me aware of the inconsistency
between to code and the documentation.
tuexen [Thu, 31 May 2018 16:14:45 +0000 (16:14 +0000)]
MFC r333382:
When reporting ERROR or ABORT chunks, don't use more data
that is guaranteed to be contigous.
Thanks to Felix Weinrank for finding and reporting this bug
by fuzzing the usrsctp stack.
MFC r333386:
Fix two typos reported by N. J. Mann, which were introduced in
https://svnweb.freebsd.org/changeset/base/333382 by me.
tuexen [Thu, 31 May 2018 16:00:03 +0000 (16:00 +0000)]
MFC r333186:
Send an ICMPv6 PacketTooBig message in case of forwading a packet which
is too big for the outgoing interface and no firewall is involed.
This problem was introduced in
https://svnweb.freebsd.org/changeset/base/324996
Thanks to Irene Ruengeler for finding the bug and testing the fix.
rwlock: diff-reduction of runlock compared to sx sunlock
==
Undo LOCK_PROFILING pessimisation after r313454 and r313455
With the option used to compile the kernel both sx and rw shared ops would
always go to the slow path which added avoidable overhead even when the
facility is disabled.
Furthermore the increased time spent doing uncontested shared lock acquire
would be bogusly added to total wait time, somewhat skewing the results.
Restore old behaviour of going there only when profiling is enabled.
This change is a no-op for kernels without LOCK_PROFILING (which is the
default).
==
sx: fix adaptive spinning broken in r327397
The condition was flipped.
In particular heavy multithreaded kernel builds on zfs started suffering
due to nested sx locks.
For instance make -s -j 128 buildkernel:
before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total
after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total
==
locks: fix a corner case in r327399
If there were exactly rowner_retries/asx_retries (by default: 10) transitions
between read and write state and the waiters still did not get the lock, the
next owner -> reader transition would result in the code correctly falling
back to turnstile/sleepq where it would incorrectly think it was waiting
for a writer and decide to leave turnstile/sleepq to loop back. From this
point it would take ts/sq trips until the lock gets released.
The bug sometimes manifested itself in stalls during -j 128 package builds.
Refactor the code to fix the bug, while here remove some of the gratituous
differences between rw and sx locks.
==
sx: don't do an atomic op in upgrade if it cananot succeed
The code already pays the cost of reading the lock to obtain the waiters
flag. Checking whether there is more than one reader is not a problem and
avoids dirtying the line.
This also fixes a small corner case: if waiters were to show up between
reading the flag and upgrading the lock, the operation would fail even
though it should not. No correctness change here though.
==
mtx: tidy up recursion handling in thread lock
Normally after grabbing the lock it has to be verified we got the right one
to begin with. However, if we are recursing, it must not change thus the
check can be avoided. In particular this avoids a lock read for non-recursing
case which found out the lock was changed.
While here avoid an irq trip of this happens.
==
locks: slightly depessimize lockstat
The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.
Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.
This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total
So note this is still a significant hit.
LOCK_PROFILING results are not affected.
==
rw: whack avoidable re-reads in try_upgrade
==
locks: extend speculative spin waiting for readers to drain
Now that 10 years have passed since the original limit of 10000 was
committed, bump it a little bit.
Spinning waiting for writers is semi-informed in the sense that we always
know if the owner is running and base the decision to spin on that.
However, no such information is provided for read-locking. In particular
this means that it is possible for a write-spinner to completely waste cpu
time waiting for the lock to be released, while the reader holding it was
preempted and is now waiting for the spinner to go off cpu.
Nonetheless, in majority of cases it is an improvement to spin instead of
instantly giving up and going to sleep.
The current approach is pretty simple: snatch the number of current readers
and performs that many pauses before checking again. The total number of
pauses to execute is limited to 10k. If the lock is still not free by
that time, go to sleep.
Given the previously noted problem of not knowing whether spinning makes
any sense to begin with the new limit has to remain rather conservative.
But at the very least it should also be related to the machine. Waiting
for writers uses parameters selected based on the number of activated
hardware threads. The upper limit of pause instructions to be executed
in-between re-reads of the lock is typically 16384 or 32678. It was
selected as the limit of total spins. The lower bound is set to
already present 10000 as to not change it for smaller machines.
Bumping the limit reduces system time by few % during benchmarks like
buildworld, buildkernel and others. Tested on 2 and 4 socket machines
(Broadwell, Skylake).
Figuring out how to make a more informed decision while not pessimizing
the fast path is left as an exercise for the reader.
==
fix uninitialized variable warning in reader locks
r334261: Guard against error when given -t "*..."
r334262: Eliminate ANSI dimming in developer mode
r334359: Fix "-t test" for post-processing profiles
Bump FreeBSD_version directly in stable/11 for ports IGNORE (as in r334290)
Reviewed by: gjb
Approved by: re (gjb)
Sponsored by: Smule, Inc.
jhb [Tue, 29 May 2018 13:54:34 +0000 (13:54 +0000)]
MFC 333606: Make the common interrupt entry point labels local labels.
Kernel debuggers depend on symbol names to find stack frames with a
trapframe rather than a normal stack frame. The labels used for the
shared interrupt entry point for the PTI and non-PTI cases did not
match the existing patterns confusing debuggers. Add the '.L' prefix
to mark these symbols as local so they are not visible in the symbol
table.
royger [Tue, 29 May 2018 07:51:24 +0000 (07:51 +0000)]
MFC r334027: xen-blkback: do not use state 3
Linux will not connect to a backend that's in state 3
(XenbusStateInitialised), it needs to be in state 2
(XenbusStateInitWait) for Linux to attempt to connect to the
backend.
marius [Thu, 24 May 2018 23:11:25 +0000 (23:11 +0000)]
MFC: r333955
- Unbreak booting sparc64 kernels after the metadata unification in
r329190 (MFCed to stable/11 in r332150); sparc64 kernels are always
64-bit but with that revision in place, the loader was treating them
as 32-bit ones.
- In order to reduce the likelihood of this kind of breakage in the
future, #ifdef out md_load() on sparc64 and make md_load_dual() -
which is currently local to metadata.c anyway - static.
- Make md_getboothowto() - also local to metadata.c - static.
- Get rid of the unused DTB pointer on sparc64.
ae [Thu, 24 May 2018 11:02:21 +0000 (11:02 +0000)]
MFC r333986:
Remove check for matching the rulenum, ruleid and rule pointer from
dyn_lookup_ipv[46]_state_locked(). These checks are remnants of not
ready to be committed code, and they are there by accident.
Due to the race these checks can lead to creating of duplicate states
when concurrent threads in the same time will try to add state for two
packets of the same flow, but in reverse directions and matched by
different parent rules.
Reported by: lev
MFC r334039:
Restore the ability to keep states after parent rule deletion.
This feature is disabled by default and was removed when dynamic states
implementation changed to be lockless. Now it is reimplemented with small
differences - when dyn_keep_states sysctl variable is enabled,
dyn_match_ipv[46]_state() function doesn't match child states of deleted
rule. And thus they are keept alive until expired. ipfw_dyn_lookup_state()
function does check that state was not orphaned, and if so, it returns
pointer to default_rule and its position in the rules map. The main visible
difference is that orphaned states still have the same rule number that
they have before parent rule deleted, because now a state has many fields
related to rule and changing them all atomically to point to default_rule
seems hard enough.
Reported by: <lantw44 at gmail.com>
Approved by: re (kib)
ken [Mon, 21 May 2018 18:59:34 +0000 (18:59 +0000)]
MFC r333492:
------------------------------------------------------------------------
r333492 | ken | 2018-05-11 08:50:26 -0600 (Fri, 11 May 2018) | 10 lines
Clear out the entire structure, not just the size of a pointer to it.
sys/dev/ocs/ocs_os.c:
In ocs_thread_create(), use sizeof(*thread) (instead of
sizeof(thread)) as the size argument to memset so that we clear
out the entire thread structure instead of just a few bytes of it.
dim [Sun, 20 May 2018 16:03:21 +0000 (16:03 +0000)]
MFC r333715:
Pull in r322325 from upstream llvm trunk (by Matthias Braun):
PeepholeOpt cleanup/refactor; NFC
- Less unnecessary use of `auto`
- Add early `using RegSubRegPair(AndIdx) =` to avoid countless
`TargetInstrInfo::` qualifications.
- Use references instead of pointers where possible.
- Remove unused parameters.
- Rewrite the CopyRewriter class hierarchy:
- Pull out uncoalescable copy rewriting functionality into
PeepholeOptimizer class.
- Use an abstract base class to make it clear that rewriters are
independent.
- Remove unnecessary \brief in doxygen comments.
- Remove unused constructor and method from ValueTracker.
- Replace UseAdvancedTracking of ValueTracker with DisableAdvCopyOpt
use.
Even though upstream marked this as "No Functional Change", it does
contain some functional changes, and these fix a compiler hang for one
particular source file in the devel/godot port.
gjb [Fri, 18 May 2018 14:57:58 +0000 (14:57 +0000)]
MFC r315733, r315737, r315740, r330054:
r315733 (imp):
Impelemnt ttys onifexists in init.
Implement a new init(8) option in /etc/ttys. If this option is present
on the entry in /etc/ttys, the entry will be active if and only if it
exists. If the name starts with a '/', it will be considered an
absolute path. If not, it will be a path relative to /dev.
This allows one to turn off video console getty that aren't present
(while running a getty on them even when they aren't the system
console). Likewise with serial ports.
It differs from onifconsole in only requiring the device exist rather
than it be listed as one of the system consoles.
r315737 (ngie):
Unbreak world by adding sys/stat.h for stat(2)
r315740 (imp):
Simplify the code a little.
r330054 (trasz):
Improve missing tty handling in init(8). This removes a check that did
nothing - it was checking for ENXIO, which, with devfs, is no longer
returned - and was badly placed anyway, and replaces it with similar
one that works, and is done just before starting getty, instead of being
done when rereading ttys(5).
From the practical point of view, this makes init(8) handle disappearing
terminals (eg /dev/ttyU*) gracefully, without unneccessary getty restarts
and resulting error messages.
Reported by: Bart Ender, Andre Albsmeier
PR: 228315
Blocks: 11.2-BETA2
Approved by: re (marius)
Sponsored by: The FreeBSD Foundation
ae [Fri, 18 May 2018 10:17:13 +0000 (10:17 +0000)]
MFC r333497:
Apply the change from r272770 to if_ipsec(4) interface.
It is guaranteed that if_ipsec(4) interface is used only for tunnel
mode IPsec, i.e. decrypted and decapsulated packet has its own IP header.
Thus we can consider it as new packet and clear the protocols flags.
This allows ICMP/ICMPv6 properly handle errors that may cause this packet.
marius [Thu, 17 May 2018 21:22:19 +0000 (21:22 +0000)]
MFC: r333613
The broken DDR52 support of Intel Bay Trail eMMC controllers rumored
in the commit log of r321385 has been confirmed via the public VLI54
erratum. Thus, stop advertising DDR52 for these controllers.
Note that this change should hardly make a difference in practice as
eMMC chips from the same era as these SoCs most likely support HS200
at least, probably even up to HS400ES.
manu [Thu, 17 May 2018 17:00:07 +0000 (17:00 +0000)]
MFC r333737:
release: arm: Format FAT partition as FAT16
r332674 raised the size of the FAT partition from 2MB to 41MB for some
boards. But we format them in FAT12 and this size appears to be to big
for FAT12 and some SoC bootrom cannot cope with that.
Format the msdosfs partition as FAT16,
sbruno [Thu, 17 May 2018 16:32:38 +0000 (16:32 +0000)]
MFC r333499
Add deprecation notice for vxge.
This driver was merged to HEAD one week prior to Exar publicly
announcing theyhad left the Ethernet market. It is not known to be used
and has various code quality issues spotted by Brooks and Hiren. Retire
it in preparation for FreeBSD 12.0.