mav [Thu, 28 Nov 2019 18:28:35 +0000 (18:28 +0000)]
Fix use-after-free in case of L2ARC prefetch failure.
In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO
to read data from the original storage device. Unfortunately pointer
to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if
some other read try to bump the ZIO priority, it will crash.
The problem is reproducible by corrupting L2ARC content and reading
some data with prefetch if l2arc_noprefetch tunable is changed to 0.
With the default setting the issue is probably not reproducible now.
tuexen [Thu, 28 Nov 2019 12:50:25 +0000 (12:50 +0000)]
Really ignore the SCTP association identifier on 1-to-1 style sockets
as requiresd by the socket API specification.
Thanks to Inaki Baz Castillo, who found this bug running the userland
stack with valgrind and reported the issue in
https://github.com/sctplab/usrsctp/issues/408
jeff [Thu, 28 Nov 2019 07:49:25 +0000 (07:49 +0000)]
Garbage collect the mostly unused us_keg field. Use appropriately named
union members in vm_page.h to store the zone and slab. Remove some nearby
dead code.
imp [Thu, 28 Nov 2019 05:40:15 +0000 (05:40 +0000)]
Also turn of teken for RB_MULTIPLE
RB_MULTIPLE without RB_SERIAL set is valid, and means 'Video first, then serial'
to the kernel (so kernel messages go to both, but /etc/rc uses video console
(this should be fixed, btw, but another day)). Check for RB_MULTIPLE as well as
RB_SERIAL where we want to to serial things. This means we'll use the old code
for emulation in these situations, which is likely best since we're outputing to
both and the old code is ligher weight allowing both to keep up w/o weird
scrolling things.
mav [Thu, 28 Nov 2019 02:40:12 +0000 (02:40 +0000)]
Make DMAR allow Intel NTB device to access its own BAR0.
I have no good explanation why it happens, but I found that in B2B mode
at least Xeon v4 NTB leaks accesses to its configuration memory at BAR0
originated from the link side to its host side. DMAR predictably blocks
those, making access to remote scratchpad registers in B2B mode impossible.
This change creates identity mapping in DMAR covering the BAR0 addresses,
making the NTB work fine with DMAR enabled. It seems like allowing single
4KB range at 32KB offset may be enough, but I don't see a reason to be so
specific.
rmacklem [Thu, 28 Nov 2019 02:05:31 +0000 (02:05 +0000)]
Add a cap on credential lifetime for Kerberized NFS.
The kernel RPCSEC_GSS code sets the credential (called a client) lifetime
to the lifetime of the Kerberos ticket, which is typically several hours.
As such, when a user's credentials change such as being added to a new group,
it can take several hours for this change to be recognized by the NFS server.
This patch adds a sysctl called kern.rpc.gss.lifetime_max which can be set
by a sysadmin to put a cap on the time to expire for the credentials, so that
a sysadmin can reduce the timeout.
It also fixes a bug, where time_uptime is added twice when GSS_C_INDEFINITE
is returned for a lifetime. This has no effect in practice, sine Kerberos
never does this.
cy [Thu, 28 Nov 2019 00:46:33 +0000 (00:46 +0000)]
Include fin, the packet information structure (fr_info_t), in the
l4sums DTrace probe, making more information available for the diagnosis
of IPv6 checksum errors.
cem [Thu, 28 Nov 2019 00:46:03 +0000 (00:46 +0000)]
auditd(8): fix long-standing uninitialized memory use bug
The bogus use could lead to an infinite loop depending on how fast the
audit_warn script to execute.
By fixing read(2) interruptibility, d060887 (r335899) revealed another bug
in auditd_wait_for_events. When read is interrupted by SIGCHLD,
auditd_reap_children will always return with errno set to ECHILD. But
auditd_wait_for_events checks errno after that point, expecting it to be
unchanged since read. As a result, it calls auditd_handle_trigger with bogus
stack garbage. The result is the error message "Got unknown trigger 48." Fix
by simply ignoring errno at that point; there's only one value it could've
possibly had, thanks to the check up above.
The best part is we've had a fix for this for like 18 months and just never
merged it. Merge it now.
PR: 234209
Reported by: Marie Helene Kvello-Aune <freebsd AT mhka.no> (2018-12)
Submitted by: asomers (2018-07)
Reviewed by: me (in OpenBSM)
Obtained from: OpenBSM
X-MFC-With: r335899
Security: ¯\_(ツ)_/¯
Differential Revision: https://github.com/openbsm/openbsm/pull/45
chs [Thu, 28 Nov 2019 00:37:43 +0000 (00:37 +0000)]
As part of creating a snapshot, set fs->fs_fmod to 0 in the snapshot image
because nothing ever changes this field for read-only mounts and we want
to verify that it is still 0 when we unmount.
jeff [Thu, 28 Nov 2019 00:19:09 +0000 (00:19 +0000)]
Implement a sysctl tree for uma zones to assist in debugging and provide
more statistcs than are exported via the ABI stable vmstat interface.
Rename uz_count to uz_bucket_size because even I was confused by the
name after returning to the source years later.
alc [Wed, 27 Nov 2019 20:33:49 +0000 (20:33 +0000)]
There is no reason why we need to pin the underlying thread to its current
processor in pmap_invalidate_{all,page,range}(). These functions are using
an instruction that broadcasts the TLB invalidation to every processor, so
even if a thread migrates in the middle of one of these functions every
processor will still perform the required TLB invalidations.
markj [Wed, 27 Nov 2019 20:32:53 +0000 (20:32 +0000)]
iwm(4): Remove _mvm from the namespace.
This was inherited from iwlwifi, which drives devices supported by both
iwn(4) and iwm(4) in FreeBSD. In iwm(4) _mvm is meaningless, so remove
it. OpenBSD made the same change a long time ago. No functional change
intended.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
rlibby [Wed, 27 Nov 2019 19:49:55 +0000 (19:49 +0000)]
uma: trash memory when ctor/dtor supplied too
On INVARIANTS kernels, UMA has a use-after-free detection mechanism.
This mechanism previously required that all of the ctor/dtor/uminit/fini
arguments to uma_zcreate() be NULL in order to function. Now, it only
requires that uminit and fini be NULL; now, the trash ctor and dtor will
be called in addition to any supplied ctor or dtor.
Also do a little refactoring for readability of the resulting logic.
This enables use-after-free detection for more zones, and will allow for
simplification of some callers that worked around the previous
restriction (see kern_mbuf.c).
tuexen [Wed, 27 Nov 2019 19:32:29 +0000 (19:32 +0000)]
Plug two mbuf leaks during INIT-ACK handling.
One leak happens when there is not enough memory to allocate the
the resources for streams. The other leak happens if the are
unknown parameters in the received INIT-ACK chunk which require
reporting and the INIT-ACK requires sending an ABORT due to illegal
parameter combinations.
Hopefully this fixes
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=19083
andrew [Wed, 27 Nov 2019 16:52:46 +0000 (16:52 +0000)]
Support kernels larger than EFI_STAGING_SIZE in loader.efi
With a very large kernel or module the staging area may be too small to
hold it. When this is the case try to allocate more space before failing
in the efi copyin/copyout/readin functions.
rlibby [Wed, 27 Nov 2019 01:54:39 +0000 (01:54 +0000)]
witness: sleepable rm locks are not sleepable in read mode
There are two classes of rm lock, one "sleepable" and one not. But even
a "sleepable" rm lock is only sleepable in write mode, and is
non-sleepable when taken in read mode.
Warn about sleepable rm locks in read mode as non-sleepable locks. Do
this by defining a new lock operation flag, LOP_NOSLEEP, to indicate
that a lock is non-sleepable despite what the LO_SLEEPABLE flag would
indicate, and defining a new witness lock instance flag, LI_SLEEPABLE,
to track the product of LO_SLEEPABLE and LOP_NOSLEEP on the lock
instance.
mjg [Wed, 27 Nov 2019 01:20:55 +0000 (01:20 +0000)]
cache: fix numcache accounting on entry
. entries are never created and .. can reuse existing entries,
meaning the early count bump is both spurious and leading to
overcounting in certain cases.
jeff [Wed, 27 Nov 2019 00:39:23 +0000 (00:39 +0000)]
Use atomics in more cases for object references. We now can completely
omit the object lock if we are above a certain threshold. Hold only a
single vnode reference when the vnode object has any ref > 0. This
allows us to only lock the object and vnode on 0-1 and 1-0 transitions.
jeff [Tue, 26 Nov 2019 22:17:02 +0000 (22:17 +0000)]
Refactor uma_zalloc_arg(). It is a mess of gotos and code which doesn't
make sense after many partial refactors. Attempt to make a smaller cache
footprint for the fast path.
Summary:
Various paths through hypot(x, y) will multiply x and y by a power of
two, perform the calculation in a range where IEEE-754 provides greater
precision, then undo the multiplication to determine the true result.
Undoing that multiplication is implemented as t1*w, where t1=2**k.
2**k is often computed by taking the high word of 1.0, then adding k<<20
(for doubles or long doubles) or k<<23 (for floats) to it, then
overwriting that high word. But when k is negative this left-shifts a
negative value -- and that's undefined behavior in many editions of C
and C++.
This patch should fix all hypot implementations to compute 2**k without
triggering this particular bit of undefined behavior.
Test Plan: I've only very lightly tested out the hypot(double, double)
change, in SpiderMonkey's JavaScript engine, for consistency with prior
behavior. The other functions' changes have more or less only been
eyeballed. Careful examination appreciated! Do note, however, that an
error in any of these changes would most likely produce a value that is
incorrect by a factor of two, so any mistake would most likely be
glaring if invoked.
emaste [Tue, 26 Nov 2019 20:46:20 +0000 (20:46 +0000)]
stop building arm LINT-V5 kernel
r354290 removed arm.arm from universe, but arm.arm kernels were still
found and built during the kernel stage. r354934 tagged armv5 kernel
configs as NO_UNIVERSE, but LINT-V5 remained. Stop building it as well.
Leave the clean rule in place for now so folks don't end up with a stale
LINT-V5.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22560
mmel [Tue, 26 Nov 2019 17:56:39 +0000 (17:56 +0000)]
Finish implementation of RK3299 clocks.
- implement of all but mmc clocks. MMC clocks will be added later by own commit.
- use 'link' clock type for external clocks.
- use macros for initialization of structure's named members.
MFC after: 3 weeks
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D22441
scottl [Tue, 26 Nov 2019 17:25:49 +0000 (17:25 +0000)]
Revert r355021. In my haste to grep for Giant, I missed that it was in
conditional ifdefs for this driver. We will consider removing those ifdefs
in the future.
np [Tue, 26 Nov 2019 05:54:25 +0000 (05:54 +0000)]
cxgbe(4): Allow the driver to specify multiple FECs that the firmware
should try in order to link up with the peer.
Various FEC variables within the driver can now have multiple bits set
instead of being powers of 2. 0 and -1 in the user knobs still mean no
FEC and auto (driver decides) respectively for backward compatibility,
but no-FEC and auto now have their own bits in the internal
representation. There is a new bit that can be set to request the FEC
recommended by the cable/transceiver module.
Add sysctls to display link related capabilities of the local side as
well as the link partner.
Note that all this needs a new firmware and the documentation for the
driver FEC knobs will be updated after that firmware is added to the
driver.
ian [Mon, 25 Nov 2019 19:59:53 +0000 (19:59 +0000)]
Allow opt-out of automatic ntpd leapfile checking/fetching.
When a system has no internet connection, or when it is configured to obtain
ntpd leapfiles from some source other than the internet, or even when the
sysadmin has decided for some reason to customize ntp.conf to eliminate use
of the leapfile, the rc.d/ntpd script emits various error messages related
to the file.
This change allows setting the rc var ntp_db_leapfile to NONE to disable all
automatic processing related to that file in rc.d/ntpd.
chs [Mon, 25 Nov 2019 19:31:38 +0000 (19:31 +0000)]
In ffs_freefile(), use a separate variable to hold the inode number within
the cg rather than reusuing "ino" for this purpose. This reduces the diff
for an upcoming change that improves handling of I/O errors.
oshogbo [Mon, 25 Nov 2019 18:33:21 +0000 (18:33 +0000)]
procdesc: allow to collect status through wait(1) if process is traced
The debugger like truss(1) depends on the wait(2) syscall. This syscall
waits for ALL children. When it is waiting for ALL child's the children
created by process descriptors are not returned. This behavior was
introduced because we want to implement libraries which may pdfork(1).
The behavior of process descriptor brakes truss(1) because it will
not be able to collect the status of processes with process descriptors.
To address this problem the status is returned to parent when the
child is traced. While the process is traced the debugger is the new parent.
In case the original parent and debugger are the same process it means the
debugger explicitly used pdfork() to create the child. In that case the debugger
should be using kqueue()/pdwait() instead of wait().
Add test case to verify that. The test case was implemented by markj@.
emaste [Mon, 25 Nov 2019 18:18:28 +0000 (18:18 +0000)]
remove armv6 LLVM workaround introduced in r341812
r341812 enabled only arm target support in LLVM on arm and armv6,
because ld.bfd 2.17.50 lacked support for range extensions required for
linking such large binaries/libraries. r341812 indicated that the
workaround should be removed once the userland can be linked by lld.
r354289 switched armv6 to use lld by default, so remove the workaround
on armv6. The workaround remains in place for arm (v5), and will
presumably be removed when arm is retired.
rlibby [Mon, 25 Nov 2019 07:38:27 +0000 (07:38 +0000)]
sysctl sysctls: wire old buf before output with sysctl lock
Several sysctl sysctls output to a user buffer while holding a
non-sleepable lock that protects the sysctl topology. They need to wire
the output buffer, or else they may try to sleep on a page fault.
dougm [Mon, 25 Nov 2019 02:19:47 +0000 (02:19 +0000)]
Where 'current' is used to index over vm_map entries, use
'entry'. Where 'entry' is used to identify the starting point for
iteration, use 'first_entry'. These are the naming conventions used in
most of the vm_map.c code. Where VM_MAP_ENTRY_FOREACH can be used, do
so. Squeeze a few lines to fit in 80 columns. Where lines are being
modified for these reasons, look to remove style(9) violations.
bz [Sun, 24 Nov 2019 23:21:47 +0000 (23:21 +0000)]
Allow kernel to compile without BPF.
r297816 added some bpf magic for VIMAGE unconditionally which no longer
allows kernels to compile without bpf (but with other networking).
Add the missing ifdef checks and allow a kernel to compile without bpf
again.
PR: 242136
Reported by: dave mischler.com
MFC after: 2 weeks
ian [Sun, 24 Nov 2019 21:08:56 +0000 (21:08 +0000)]
When doing ARM stack unwinding as part of stack_save(9), do not search
loaded modules (pass 0/false for the can_lock arg). Searching the unwind
info in modules acquires an exclusive sxlock, and the stack(9) functions can
be called in a context where unbounded sleeps are forbidden (such as from
the witness checkorder code).
Just ignoring the existence of modules in stack_save() is not ideal, so I'm
looking for a better solution, but this commit will make it possible to boot
an ARM kernel with WITNESS enabled again, until I get something better.
wulf [Sun, 24 Nov 2019 20:47:40 +0000 (20:47 +0000)]
Linux epoll: Register events with zero event mask
Such an events are legal and should be interpreted as EPOLLERR | EPOLLHUP.
Register a disabled kqueue event in that case as we do not support EPOLLHUP yet.
Required by Linux Steam client.
PR: 240590
Reported by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22516
wulf [Sun, 24 Nov 2019 20:44:14 +0000 (20:44 +0000)]
Linux epoll: Check both read and write kqueue events existence in EPOLL_CTL_ADD
Linux epoll EPOLL_CTL_ADD op handler should always check registration
of both EVFILT_READ and EVFILT_WRITE kevents to deceide if supplied
file descriptor fd is already registered with epoll instance.
wulf [Sun, 24 Nov 2019 20:41:47 +0000 (20:41 +0000)]
Linux epoll: Don't deregister file descriptor after EPOLLONESHOT is fired
Linux epoll does not remove descriptor after one-shot event has been triggered.
Set EV_DISPATCH kqueue flag rather then EV_ONESHOT to get the same behavior.
Required by Linux Steam client.
PR: 240590
Reported by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22513
kib [Sun, 24 Nov 2019 19:12:23 +0000 (19:12 +0000)]
Record part of the owner struct thread pointer into busy_lock.
Record as much bits from curthread into busy_lock as fits. Low bits
for struct thread * representation are zero due to struct and zone
alignment, and they leave space for busy flags (perhaps except
statically allocated thread0). Upper bits are not very interesting
for assert, and in most practical situations recorded value should
allow to manually identify the owner with certainity.
Assert that unbusy is performed by the owner, except few places where
unbusy is done in io completion handler. For this case, add
_unchecked variants of asserts and unbusy primitives.
Reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22298
kib [Sun, 24 Nov 2019 19:06:38 +0000 (19:06 +0000)]
tmpfs: resolve deadlock between rename and unmount.
Top-level kern_renameat() increases the writecount on the mount point,
which, together with tmpfs unmount suspending the mount, already
ensures that unmount cannot proceed while rename unlocks and relocks
all operated vnodes.
Remove vfs_busy() call from tmpfs_rename() which was done while
holding a vnode lock, creating the deadlock. The only intent of the
busy operation seems to be the prevention of unmount, which is already
ensured.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
imp [Sun, 24 Nov 2019 15:37:19 +0000 (15:37 +0000)]
Don't need giant for these drivers dev nodes.
Also, Giant isn't required to busy / unbusy a device, so drop that too while I'm
here. It's not done elsewhere in the tree and in the future will likely be
handled by a node lock to ensure consistency. Leave Giant in place for attach
and removing childing, as that's actually still needed, even if imperfect.
Remove stale comment about contigmalloc taking Giant and calling w/o the lock
held. Neither of these is still true.
imp [Sun, 24 Nov 2019 15:37:14 +0000 (15:37 +0000)]
Hoist locking giant back up into the ioctl handler
Move the locking back into the ioctl handler. This "fixes" the race where we hve
a hot plug event just after the dropping of Giant in pci_find_dbsf, assuming the
driver doesn't then call anything that drops and picks up Giant again... It's a
little safer since don't think it doesn't, but we lack the tools to know for
sure.
imp [Sun, 24 Nov 2019 15:24:05 +0000 (15:24 +0000)]
Fix leak in state machine for commands.
When we get a device departed message from the firmware, we send a TARGET_REST
to the device to let the firmware know we're done and as part of the recovery
process. This will abort all the commands. While the documentation says the IOC
is responsible for writing the completion message for all the commands pending
with an aborted status, we sometimes have queued commands for the target that
haven't been completed so are in the INQUEUE state. So, when we later complete
the pending CCB as aborted, these commands are freed and we hit the "state not
busy" panic.
Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do
that here as well. In talking to Ken, Scott and Justin, they recommended a
series of tests to see if this is 100% safe. Those tests are ongoing, but
preliminary tests suggest this is safe as we see no duplicate completions when
we hit this case at work. We have a machine that has a dodgy powersupply which
usually doesn't apply power to a few drives, but sometimes does when the machine
is under heavy load so we get a rash of the connect / disconnect messages over
half an hour. Without this change, we'd see state not busy panic. With this
change, the drives just annoyingly come and go without affecting the rest of the
machine, but without a complete error injection test suite, it's hard to know if
all edge cases are now covered or not.
lwhsu [Sun, 24 Nov 2019 15:03:35 +0000 (15:03 +0000)]
Fix gcc build
We have -Werror=strict-overflow so gcc complains:
In file included from /tmp/obj/workspace/src/amd64.amd64/tmp/usr/include/bitstring.h:36:0,
from /workspace/src/tests/sys/sys/bitstring_test.c:34:
/workspace/src/tests/sys/sys/bitstring_test.c: In function 'bit_ffc_at_test':
/workspace/src/sys/sys/bitstring.h:239:5: error: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Werror=strict-overflow]
if (_start >= _nbits) {
^
Disable assuming overflow of signed integer will never happen by specifying
-fno-strict-overflow
jhibbits [Sun, 24 Nov 2019 04:35:29 +0000 (04:35 +0000)]
rtld/powerpc: Fix _rtld_bind_start for powerpcspe
Summary:
We need to save off the full 64-bit register, not just the low 32 bits,
of all registers getting saved off in _rtld_bind_start. Additionally,
we need to save off the other SPE registers (SPEFSCR and accumulator),
so that their program state is not affected by the PLT resolver.
imp [Sat, 23 Nov 2019 23:57:26 +0000 (23:57 +0000)]
Add a warning about Giant Locked devices
Add a warning when a device registers with devfs and requests
D_NEEDGIANT. The warning says the device will go away before
13.0. This is needed to flush out the devices in the tree that are
still Giant locked. This warning, or some variant of it, should have
gone into the tree a long time ago...
The intention is to require all devices be converted to not use
automatic giant in this way, or remove any such devices that remain
that we don't have the hardware to test a conversion of.
kbd so far is the only device that can't leave the tree, yet needs
something sensible done to avoid the auto giant lock (even if it is
just doing the wrapping itself). There may be others added to this
list... Any discussions of this topic will take place on arch@.
imp [Sat, 23 Nov 2019 23:44:00 +0000 (23:44 +0000)]
We don't even need Giant here. It isn't protecting anything internal
to geom, and nothing we call requires it to be held. It's left over
from a time when the latter wasn't the case. Retire it.
imp [Sat, 23 Nov 2019 23:43:52 +0000 (23:43 +0000)]
Push Giant down one layer
The /dev/pci device doesn't need GIANT, per se. However, one routine
that it calls, pci_find_dbsf implicitly does. It walks a list that can
change when PCI scans a new bus. With hotplug, this means we could
have a race with that scanning. To prevent that, take out Giant around
scanning the list.
However, given that we have places in the tree that drop giant, if
held when we call into them, the whole use of Giant to protect newbus
may be less effective that we desire, so add a comment about why we're
talking it out, and we'll address the issue when we lock newbus with
something other than Giant.
bdragon [Sat, 23 Nov 2019 21:18:55 +0000 (21:18 +0000)]
[PowerPC] Use QEMU-compatible version of SPE accumulator save
Switch from "evaddumiaaw 0,0" to "evmwumiaa 0,0,0" when persisting the
accumulator. This has the benefit of actually being implemented in QEMU
as it is the form Linux uses for the same task.
Both instructions are functionally equivilent, as we are using them for
their side effect of copying the accumulator to GPRs rather than for the
actual math operation that they are performing.
dim [Sat, 23 Nov 2019 19:35:09 +0000 (19:35 +0000)]
libclang_rt: enable on powerpc*
Summary:
Enable on powerpc64 and in lib/libclang_rt/Makefile change
MACHINE_CPUARCH to MACHINE_ARCH because on powerpc64
MACHINE_ARCH==MACHINE_CPUARCH so the 32-bit library overwrites 64-bit
library during installworld.
This patch doesn't enable any other libclang_rt libraries because they
need to be separately ported.
I have verified that games/julius (which fails on powerpc64 elfv2
without this change because of no libclang_rt profiling library) builds.
Test Plan: Ship it, test on powerpc and powerpcspe