andrew [Thu, 22 Mar 2018 15:32:57 +0000 (15:32 +0000)]
Enter into the EFI environment before dereferencing the runtime services
pointer. This may be within the EFI address space and not the FreeBSD
kernel address space.
imp [Thu, 22 Mar 2018 15:11:53 +0000 (15:11 +0000)]
Revert r331298
Normally, shutdown_nice() just signals init. However, sometimes it
calls kern_reboot directly. For that case, r331298 dropped the Giant
lock before calling it. This turns out to be incorrect for the more
common case where init exists and we just signal it. Restore the old
behavior. The direct call to kern_reboot() doesn't sync buffers to the
disk, so should work with Giant held, so we don't need to drop locks
here for that.
hselasky [Thu, 22 Mar 2018 12:26:27 +0000 (12:26 +0000)]
Clear old MSIX IRQ numbers in the LinuxKPI.
When disabling the MSIX IRQ vectors for a PCI device through the
LinuxKPI, make sure any old MSIX IRQ numbers are no longer visible to
the linux_pci_find_irq_dev() function else IRQs can be requested from
the wrong PCI device.
kevans [Thu, 22 Mar 2018 11:57:59 +0000 (11:57 +0000)]
Partially revert r328780
efi.4th was added to ObsoleteFiles and disconnected from the build, but not
removed from hte repo. We've since found a mild use for it that makes some
amount of sense, so partially revert r328780 and bring it back to life.
jtl [Thu, 22 Mar 2018 09:40:08 +0000 (09:40 +0000)]
Add the "TCP Blackbox Recorder" which we discussed at the developer
summits at BSDCan and BSDCam in 2017.
The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.
It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.
You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.
This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.
There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.
glebius [Thu, 22 Mar 2018 05:07:57 +0000 (05:07 +0000)]
Fix LINT-NOINET build initializing local to false. This is
a dead code, since for NOINET build isipv6 is always true,
but this dead code makes it compilable.
kevans [Thu, 22 Mar 2018 04:16:14 +0000 (04:16 +0000)]
forthloader: Don't break BIOS boots...
I thought I tested this scenario, but clearly I failed to. =(
BIOS boots won't have efi-autoresizecons, so trying to use it as a forth
word fails during include. Use evaluate on "efi-autoresizecons" as a string
instead to move any potential errors to runtime- safely after we've already
checked that we're booting UEFI.
emaste [Thu, 22 Mar 2018 01:00:55 +0000 (01:00 +0000)]
Correct signedness bug in drm_modeset_ctl
drm_modeset_ctl() takes a signed in from userland, does a boundscheck,
and then uses it to index into a structure and write to it. The
boundscheck only checks upper bound, and never checks for nagative
values. If the int coming from userland is negative [after conversion]
it will bypass the boundscheck, perform a negative index into an array
and write to it, causing memory corruption.
Note that this is in the "old" drm driver; this issue does not exist
in drm2.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: cem
MFC after: 1 day
Sponsored by: The FreeBSD Foundation
cem [Wed, 21 Mar 2018 23:52:37 +0000 (23:52 +0000)]
getentropy(3): Fallback to kern.arandom sysctl on older kernels
On older kernels, when userspace program disables SIGSYS, catch ENOSYS and
emulate getrandom(2) syscall with the kern.arandom sysctl (via existing
arc4_sysctl wrapper).
Special care is taken to faithfully emulate EFAULT on NULL pointers, because
sysctl(3) as used by kern.arandom ignores NULL oldp. (This was caught by
getentropy(3) ATF tests.)
emaste [Wed, 21 Mar 2018 23:51:14 +0000 (23:51 +0000)]
Fix kernel memory disclosure in drm_infobufs
drm_infobufs() has a structure on the stack, fills it out and copies it
to userland. There are 2 elements in the struct that are not filled out
and left uninitialized. This will leak uninitialized kernel stack data
to userland.
Submitted by: Domagoj Stolfa <ds815@cam.ac.uk>
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 1 day
Security: Kernel memory disclosure (798)
emaste [Wed, 21 Mar 2018 23:26:42 +0000 (23:26 +0000)]
Fix kernel memory disclosure in ibcs2_getdents
ibcs2_getdents() copies a dirent structure to userland. The ibcs2
dirent structure contains a 2 byte pad element. This element is never
initialized, but copied to userland none-the-less.
Note that ibcs2 has not built on HEAD since r302095.
Submitted by: Domagoj Stolfa <ds815@cam.ac.uk>
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 3 days
Security: Kernel memory disclosure (803)
markj [Wed, 21 Mar 2018 21:15:43 +0000 (21:15 +0000)]
Elide the object lock in the common case in vfs_vmio_unwire().
The object lock was only needed when attempting to free B_DIRECT
buffer pages, and for testing for invalid pages (and freeing them
if so). Handle the latter by instead moving invalid pages near the head
of the inactive queue, where they will be reclaimed quickly.
jhb [Wed, 21 Mar 2018 21:13:26 +0000 (21:13 +0000)]
Ensure thread library is initialized in pthread_testcancel().
Call _thr_check_init() before reading curthread in pthread_testcancel().
If a constructor in a library creates a semaphore via sem_init() and
then waits for it via sem_wait(), the program can core dump in
_pthread_testcancel() called from sem_wait(). This is because the
semaphore implementation lives in libc, so the library's constructors
can be run before libthr's constructors.
glebius [Wed, 21 Mar 2018 20:59:30 +0000 (20:59 +0000)]
The net.inet.tcp.nolocaltimewait=1 optimization prevents local TCP connections
from entering the TIME_WAIT state. However, it omits sending the ACK for the
FIN, which results in RST. This becomes a bigger deal if the sysctl
net.inet.tcp.blackhole is 2. In this case RST isn't send, so the other side of
the connection (also local) keeps retransmitting FINs.
To fix that in tcp_twstart() we will not call tcp_close() immediately. Instead
we will allocate a tcptw on stack and proceed to the end of the function all
the way to tcp_twrespond(), to generate the correct ACK, then we will drop the
last PCB reference.
While here, make a few tiny improvements:
- use bools for boolean variable
- staticize nolocaltimewait
- remove pointless acquisiton of socket lock
kevans [Wed, 21 Mar 2018 20:36:57 +0000 (20:36 +0000)]
UEFI: Ditch console mode setting, choose optimal GOP mode later in boot
boot1 is too early to be deciding a good resolution. Console modes don't map
cleanly/predictably to actual screen resolutions, and GOP does not reflect
the actual screen resolution after a console mode change. Rip it out.
Add an efi-autoresizecons command to loader to choose an optimal screen
resolution based on the current environment. We'll explicitly execute this
later, preferably before we draw anything of value but after we load config
and pick up any tunables we may need to decide where we're going.
This method also allows us to actually pass the correct framebuffer
information on to the kernel.
UGA autoresizing is not implemented because it doesn't have the kind of mode
enumeration that GOP does. If an interested person with relevant hardware
could get in contact, we can take a look at implementing UGA autoresize.
This effectively "fixes" the breakage caused by r327058, but doesn't
actually set the resolution correctly until the interpreter calls
efi-autoresizcons. The lualoader version of this has been included for
reference; the forth equivalent will follow.
Reviewed by: imp (with some hestitation), manu
Differential Revision: https://reviews.freebsd.org/D14788
csjp [Wed, 21 Mar 2018 17:22:42 +0000 (17:22 +0000)]
Document the limitations associated with using the audit syscalls
from jailed process. These might get implemented in jails in the
future, but for now they are not supported.
Discussed on: freebsd-security@
Reviewed by: brueffer@
MFC after: 2 weeks
cem [Wed, 21 Mar 2018 16:18:14 +0000 (16:18 +0000)]
Import Blake2 algorithms (blake2b, blake2s) from libb2
The upstream repository is on github BLAKE2/libb2. Files landed in
sys/contrib/libb2 are the unmodified upstream files, except for one
difference: secure_zero_memory's contents have been replaced with
explicit_bzero() only because the previous implementation broke powerpc
link. Preferential use of explicit_bzero() is in progress upstream, so
it is anticipated we will be able to drop this diff in the future.
sys/crypto/blake2 contains the source files needed to port libb2 to our
build system, a wrapped (limited) variant of the algorithm to match the API
of our auth_transform softcrypto abstraction, incorporation into the Open
Crypto Framework (OCF) cryptosoft(4) driver, as well as an x86 SSE/AVX
accelerated OCF driver, blake2(4).
Optimized variants of blake2 are compiled for a number of x86 machines
(anything from SSE2 to AVX + XOP). On those machines, FPU context will need
to be explicitly saved before using blake2(4)-provided algorithms directly.
Use via cryptodev / OCF saves FPU state automatically, and use via the
auth_transform softcrypto abstraction does not use FPU.
The intent of the OCF driver is mostly to enable testing in userspace via
/dev/crypto. ATF tests are added with published KAT test vectors to
validate correctness.
shurd [Wed, 21 Mar 2018 15:57:36 +0000 (15:57 +0000)]
Update copyright per Matthew Macy
"Under my tutelage Nicole did 85% of the work. At the time it seemed
simplest for a number of reasons to put my copyright on it. I now consider
that to have been a mistake."
andrew [Wed, 21 Mar 2018 15:17:54 +0000 (15:17 +0000)]
Use a table to find the endpoint configuration
On the Allwinner SoCs we need to set a custom endpoint configuration. To
allow for this use a table to store the configuration so the attachment
can override it.
kevans [Wed, 21 Mar 2018 15:09:47 +0000 (15:09 +0000)]
lualoader: Clear up some possible naming confusion
In the original lualoader project, 'escapef' and 'escapeb' were chosen for
'escape fg' and 'escape bg'. We've carried on this naming convention, and as
our use of attributes grow the likeliness of 'escapeb'/'resetb' being
confused upon glance for 'escape bold'/'reset bold' increases.
Fix this by renaming these four functions to {escape,reset}{fg,bg} rather
than {escape,reset}{f,b} for clarity.
imp [Wed, 21 Mar 2018 14:47:03 +0000 (14:47 +0000)]
These interrupts call shutdown_nice() which should be called Giant
unlocked. Rather than dropping it in the interrupt handler, mark these
handlers as MPSAFE.
kib [Wed, 21 Mar 2018 10:26:39 +0000 (10:26 +0000)]
Move sysinit and sysuninit linker sets in the data (writeable) section.
Both sets are sorted in place, and with the introduction of read-only
permissions on the amd64 kernel text, the sorting override depended on
CR0.WP turned off. Make it correct by moving the sets into writeable
part of the KVA, also fixing boot on machines where hand-off from BIOS
to OS occurs with CR0.WP set.
Based on submission by: Peter Lei <peter.lei@ieee.org>
MFC after: 1 week
kevans [Wed, 21 Mar 2018 03:07:16 +0000 (03:07 +0000)]
lualoader: Add primitive hook module, use it to untangle bogus reference
See: comments in the hook module about intended usage, as well as the
introduced use for config.reloaded.
Use the newly introduced hook module to define a "config.reloaded" hook.
This is currently used to register core's clearKernelCache as a reload hook
to avoid a circular dependency and fix this functionality- it didn't
actually work out, and it isn't immediately obvious how it slipped into src.
Other hook types will be introduced into the core lualoader as useful hook
points are identified.
cem [Wed, 21 Mar 2018 01:15:45 +0000 (01:15 +0000)]
Implement getrandom(2) and getentropy(3)
The general idea here is to provide userspace programs with well-defined
sources of entropy, in a fashion that doesn't require opening a new file
descriptor (ulimits) or accessing paths (/dev/urandom may be restricted
by chroot or capsicum).
getrandom(2) is the more general API, and comes from the Linux world.
Since our urandom and random devices are identical, the GRND_RANDOM flag
is ignored.
getentropy(3) is added as a compatibility shim for the OpenBSD API.
truss(1) support is included.
Tests for both system calls are provided. Coverage is believed to be at
least as comprehensive as LTP getrandom(2) test coverage. Additionally,
instructions for running the LTP tests directly against FreeBSD are provided
in the "Test Plan" section of the Differential revision linked below. (They
pass, of course.)
PR: 194204
Reported by: David CARLIER <david.carlier AT hardenedbsd.org>
Discussed with: cperciva, delphij, jhb, markj
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D14500
jamie [Tue, 20 Mar 2018 23:08:42 +0000 (23:08 +0000)]
Represent boolean jail options as an array of structures containing the
flag and both the regular and "no" names, instead of two different string
arrays whose indices need to match the flag's bit position. This makes
them similar to the say "jailsys" options are represented.
Loop through either kind of option array with a structure pointer rather
then an integer index.
melifaro [Tue, 20 Mar 2018 22:57:06 +0000 (22:57 +0000)]
Use count(9) api for the bpf(4) statistics.
Currently each bfp descriptor uses u64 variables to maintain its counters.
On interfaces with high packet rate this leads to unnecessary contention
and inaccurate reporting.
sevan [Tue, 20 Mar 2018 22:41:26 +0000 (22:41 +0000)]
Extend the description of ALTQ to call it a system which is a framework in
altq(4) to match altq(9). This makes preserving the history section as the
author of ALTQ easier in the history section, rather than calling it a framework
in the description & a system in the history.
Add a history section to altq(4) and extend the history section in altq(9)
imp [Tue, 20 Mar 2018 22:07:45 +0000 (22:07 +0000)]
Release the "TUR" reference when clearing the TUR work flag. We mostly
do this right, except when there's no BP and we do a TUR by request.
In that case, we clear the flag, but don't release the reference,
leaking the reference on rare occasion.
imp [Tue, 20 Mar 2018 22:01:18 +0000 (22:01 +0000)]
Push down Giant one layer. In the days of yore, back when Penitums
were the new kids on the block and F00F hacks were all the rage, one
needed to take out Giant to do anything moderately complicated with
the VM, mappings and such. So the pccard / cardbus code held Giant for
the entire insertion or removal process.
Today, the VM is MP safe. The lock is only needed for dealing with
newbus things. Move locking and unlocking Giant to be only around
adding and probing devices in pccard and cardbus.
jhb [Tue, 20 Mar 2018 21:00:45 +0000 (21:00 +0000)]
Use <stdarg.h> instead of <machine/stdarg.h> in userland.
<machine/stdarg.h> is a kernel-only header. The standard header for
userland is <stdarg.h>. Using the standard header in userland avoids
weird build errors when building with external compilers that include
their own stdarg.h header.
kevans [Tue, 20 Mar 2018 20:05:11 +0000 (20:05 +0000)]
lualoader: Reset attributes and color scheme with color.highlight()
Previously, we sent a CSI 0m sequence to reset attributes, which also reset
the color scheme if the terminal defaults didn't match what we're expecting.
Go all-in and reset the color scheme, too, just in case.
emaste [Tue, 20 Mar 2018 19:28:52 +0000 (19:28 +0000)]
Make linuxulator fn declaration match definition
I accidentally swapped 'linux_fixup_elf' to 'linux_elf_fixup' in amd64's
declaration (only), while bringing this change over from git and
encountering a conflict.
kib [Tue, 20 Mar 2018 17:47:29 +0000 (17:47 +0000)]
Disable write protection around patching of XSAVE instruction in the
context switch code.
Some BIOSes give control to the OS with CR0.WP already set, making the
kernel text read-only before cpu_startup().
Reported by: Peter Lei <peter.lei@ieee.org>
Reviewed by: jtl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14768
kib [Tue, 20 Mar 2018 17:43:50 +0000 (17:43 +0000)]
Provide KPI for handling of rw/ro kernel text.
This is a pure syntax patch to create an interface to enable and later
restore write access to the kernel text and other read-only mapped
regions. It is in line with e.g. vm_fault_disable_pagefaults() by
allowing the nesting.
Discussed with: Peter Lei <peter.lei@ieee.org>
Reviewed by: jtl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14768
jhb [Tue, 20 Mar 2018 17:05:23 +0000 (17:05 +0000)]
Set the proper vnet in IPsec callback functions.
When using hardware crypto engines, the callback functions used to handle
an IPsec packet after it has been encrypted or decrypted can be invoked
asynchronously from a worker thread that is not associated with a vnet.
Extend 'struct xform_data' to include a vnet pointer and save the current
vnet in this new member when queueing crypto requests in IPsec. In the
IPsec callback routines, use the new member to set the current vnet while
processing the modified packet.
This fixes a panic when using hardware offload such as ccr(4) with IPsec
after VIMAGE was enabled in GENERIC.
Reported by: Sony Arpita Das and Harsh Jain @ Chelsio
Reviewed by: bz
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14763
kib [Tue, 20 Mar 2018 16:17:55 +0000 (16:17 +0000)]
Check for wrap-around in vm_phys_alloc_seg_contig().
It is possible to provide insane values for size in contigmalloc(9)
request, which usually not reaches the phys allocator due to failing
KVA allocation. But with the forthcoming 4/4 i386, where 32bit
architecture has almost 4G KVA, contigmalloc(1G) is not unreasonable
outright and KVA might be available sometimes.
Then, the calculation of pa_end could wrap around, depending on the
physical address, and the checks in vm_phys_alloc_seg_contig() would
pass while the iteration in the loop after the 'done' label goes out
of the vm_page_array bounds.
Fix it by detecting the wrap.
Reported and tested by: pho
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14767
markj [Tue, 20 Mar 2018 15:51:05 +0000 (15:51 +0000)]
Drop KTR_CONTENTION.
It is incomplete, has not been adopted in the other locking primitives,
and we have other means of measuring lock contention (lock_profiling,
lockstat, KTR_LOCK). Drop it to slightly de-clutter the mutex code and
free up a precious KTR class index.
jhb [Tue, 20 Mar 2018 15:44:17 +0000 (15:44 +0000)]
Add support for MIPS to LLVM's libunwind.
This is originally based on a patch from David Chisnall for soft-float
N64 but has since been updated to support O32, N32, and hard-float ABIs.
The soft-float O32, N32, and N64 support has been committed upstream.
The hard-float changes are still in review upstream.
Enable LLVM_LIBUNWIND on mips when building with a suitable (C+11-capable)
toolchain. This has been tested with external GCC for all ABIs and
O32 and N64 with clang.
andrew [Tue, 20 Mar 2018 13:35:20 +0000 (13:35 +0000)]
Check if the gettime runtime service is valid.
The U-Boot efi runtime service expects us to set the address map before
calling any runtime services. It will then remap a few functions to their
runtime version. One of these is the gettime function. If we call into
this without having set a runtime map we get a page fault.
Add a check to see if this is valid in efi_init() so we don't try to use
the possibly invalid pointer.
imp [Tue, 20 Mar 2018 03:37:14 +0000 (03:37 +0000)]
Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.
Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.
imp [Tue, 20 Mar 2018 03:37:09 +0000 (03:37 +0000)]
Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE
requests that we'll collapse into one DSM_TRIM. By default it is a
256, which is the max that will fit into a 4k page.
imp [Tue, 20 Mar 2018 03:36:51 +0000 (03:36 +0000)]
Note: this isn't a general thing. It only affects u-boot-based arm64
systems. Make sure the note says that specific case only. Also,
provide a recipe to do it.
jhibbits [Tue, 20 Mar 2018 02:01:30 +0000 (02:01 +0000)]
Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs
arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a
pointer. When &bdomain[i] is added to arg2 it widens from uintptr_t to
intmax_t, then gcc whines when it gets cast to a pointer. Casting through
uintptr_t silences this warning.
cem [Tue, 20 Mar 2018 00:16:24 +0000 (00:16 +0000)]
blacklist: Fix minor memory leak in configuration parsing error case
Ordinarily, the continue clause of the for-loop would free 'line.' In this
case we instead return early, missing the free. Add an explicit free to
avoid the leak.
gonzo [Tue, 20 Mar 2018 00:03:49 +0000 (00:03 +0000)]
[ofw] fix errneous checks for OF_finddevice(9) return value
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.