Gleb Smirnoff [Wed, 21 Mar 2018 20:59:30 +0000 (20:59 +0000)]
The net.inet.tcp.nolocaltimewait=1 optimization prevents local TCP connections
from entering the TIME_WAIT state. However, it omits sending the ACK for the
FIN, which results in RST. This becomes a bigger deal if the sysctl
net.inet.tcp.blackhole is 2. In this case RST isn't send, so the other side of
the connection (also local) keeps retransmitting FINs.
To fix that in tcp_twstart() we will not call tcp_close() immediately. Instead
we will allocate a tcptw on stack and proceed to the end of the function all
the way to tcp_twrespond(), to generate the correct ACK, then we will drop the
last PCB reference.
While here, make a few tiny improvements:
- use bools for boolean variable
- staticize nolocaltimewait
- remove pointless acquisiton of socket lock
Kyle Evans [Wed, 21 Mar 2018 20:36:57 +0000 (20:36 +0000)]
UEFI: Ditch console mode setting, choose optimal GOP mode later in boot
boot1 is too early to be deciding a good resolution. Console modes don't map
cleanly/predictably to actual screen resolutions, and GOP does not reflect
the actual screen resolution after a console mode change. Rip it out.
Add an efi-autoresizecons command to loader to choose an optimal screen
resolution based on the current environment. We'll explicitly execute this
later, preferably before we draw anything of value but after we load config
and pick up any tunables we may need to decide where we're going.
This method also allows us to actually pass the correct framebuffer
information on to the kernel.
UGA autoresizing is not implemented because it doesn't have the kind of mode
enumeration that GOP does. If an interested person with relevant hardware
could get in contact, we can take a look at implementing UGA autoresize.
This effectively "fixes" the breakage caused by r327058, but doesn't
actually set the resolution correctly until the interpreter calls
efi-autoresizcons. The lualoader version of this has been included for
reference; the forth equivalent will follow.
Reviewed by: imp (with some hestitation), manu
Differential Revision: https://reviews.freebsd.org/D14788
Document the limitations associated with using the audit syscalls
from jailed process. These might get implemented in jails in the
future, but for now they are not supported.
Discussed on: freebsd-security@
Reviewed by: brueffer@
MFC after: 2 weeks
Conrad Meyer [Wed, 21 Mar 2018 16:18:14 +0000 (16:18 +0000)]
Import Blake2 algorithms (blake2b, blake2s) from libb2
The upstream repository is on github BLAKE2/libb2. Files landed in
sys/contrib/libb2 are the unmodified upstream files, except for one
difference: secure_zero_memory's contents have been replaced with
explicit_bzero() only because the previous implementation broke powerpc
link. Preferential use of explicit_bzero() is in progress upstream, so
it is anticipated we will be able to drop this diff in the future.
sys/crypto/blake2 contains the source files needed to port libb2 to our
build system, a wrapped (limited) variant of the algorithm to match the API
of our auth_transform softcrypto abstraction, incorporation into the Open
Crypto Framework (OCF) cryptosoft(4) driver, as well as an x86 SSE/AVX
accelerated OCF driver, blake2(4).
Optimized variants of blake2 are compiled for a number of x86 machines
(anything from SSE2 to AVX + XOP). On those machines, FPU context will need
to be explicitly saved before using blake2(4)-provided algorithms directly.
Use via cryptodev / OCF saves FPU state automatically, and use via the
auth_transform softcrypto abstraction does not use FPU.
The intent of the OCF driver is mostly to enable testing in userspace via
/dev/crypto. ATF tests are added with published KAT test vectors to
validate correctness.
Stephen Hurd [Wed, 21 Mar 2018 15:57:36 +0000 (15:57 +0000)]
Update copyright per Matthew Macy
"Under my tutelage Nicole did 85% of the work. At the time it seemed
simplest for a number of reasons to put my copyright on it. I now consider
that to have been a mistake."
Andrew Turner [Wed, 21 Mar 2018 15:17:54 +0000 (15:17 +0000)]
Use a table to find the endpoint configuration
On the Allwinner SoCs we need to set a custom endpoint configuration. To
allow for this use a table to store the configuration so the attachment
can override it.
Kyle Evans [Wed, 21 Mar 2018 15:09:47 +0000 (15:09 +0000)]
lualoader: Clear up some possible naming confusion
In the original lualoader project, 'escapef' and 'escapeb' were chosen for
'escape fg' and 'escape bg'. We've carried on this naming convention, and as
our use of attributes grow the likeliness of 'escapeb'/'resetb' being
confused upon glance for 'escape bold'/'reset bold' increases.
Fix this by renaming these four functions to {escape,reset}{fg,bg} rather
than {escape,reset}{f,b} for clarity.
Warner Losh [Wed, 21 Mar 2018 14:47:03 +0000 (14:47 +0000)]
These interrupts call shutdown_nice() which should be called Giant
unlocked. Rather than dropping it in the interrupt handler, mark these
handlers as MPSAFE.
Move sysinit and sysuninit linker sets in the data (writeable) section.
Both sets are sorted in place, and with the introduction of read-only
permissions on the amd64 kernel text, the sorting override depended on
CR0.WP turned off. Make it correct by moving the sets into writeable
part of the KVA, also fixing boot on machines where hand-off from BIOS
to OS occurs with CR0.WP set.
Based on submission by: Peter Lei <peter.lei@ieee.org>
MFC after: 1 week
Kyle Evans [Wed, 21 Mar 2018 03:07:16 +0000 (03:07 +0000)]
lualoader: Add primitive hook module, use it to untangle bogus reference
See: comments in the hook module about intended usage, as well as the
introduced use for config.reloaded.
Use the newly introduced hook module to define a "config.reloaded" hook.
This is currently used to register core's clearKernelCache as a reload hook
to avoid a circular dependency and fix this functionality- it didn't
actually work out, and it isn't immediately obvious how it slipped into src.
Other hook types will be introduced into the core lualoader as useful hook
points are identified.
Conrad Meyer [Wed, 21 Mar 2018 01:15:45 +0000 (01:15 +0000)]
Implement getrandom(2) and getentropy(3)
The general idea here is to provide userspace programs with well-defined
sources of entropy, in a fashion that doesn't require opening a new file
descriptor (ulimits) or accessing paths (/dev/urandom may be restricted
by chroot or capsicum).
getrandom(2) is the more general API, and comes from the Linux world.
Since our urandom and random devices are identical, the GRND_RANDOM flag
is ignored.
getentropy(3) is added as a compatibility shim for the OpenBSD API.
truss(1) support is included.
Tests for both system calls are provided. Coverage is believed to be at
least as comprehensive as LTP getrandom(2) test coverage. Additionally,
instructions for running the LTP tests directly against FreeBSD are provided
in the "Test Plan" section of the Differential revision linked below. (They
pass, of course.)
PR: 194204
Reported by: David CARLIER <david.carlier AT hardenedbsd.org>
Discussed with: cperciva, delphij, jhb, markj
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D14500
Jamie Gritton [Tue, 20 Mar 2018 23:08:42 +0000 (23:08 +0000)]
Represent boolean jail options as an array of structures containing the
flag and both the regular and "no" names, instead of two different string
arrays whose indices need to match the flag's bit position. This makes
them similar to the say "jailsys" options are represented.
Loop through either kind of option array with a structure pointer rather
then an integer index.
Currently each bfp descriptor uses u64 variables to maintain its counters.
On interfaces with high packet rate this leads to unnecessary contention
and inaccurate reporting.
Sevan Janiyan [Tue, 20 Mar 2018 22:41:26 +0000 (22:41 +0000)]
Extend the description of ALTQ to call it a system which is a framework in
altq(4) to match altq(9). This makes preserving the history section as the
author of ALTQ easier in the history section, rather than calling it a framework
in the description & a system in the history.
Add a history section to altq(4) and extend the history section in altq(9)
Warner Losh [Tue, 20 Mar 2018 22:07:45 +0000 (22:07 +0000)]
Release the "TUR" reference when clearing the TUR work flag. We mostly
do this right, except when there's no BP and we do a TUR by request.
In that case, we clear the flag, but don't release the reference,
leaking the reference on rare occasion.
Warner Losh [Tue, 20 Mar 2018 22:01:18 +0000 (22:01 +0000)]
Push down Giant one layer. In the days of yore, back when Penitums
were the new kids on the block and F00F hacks were all the rage, one
needed to take out Giant to do anything moderately complicated with
the VM, mappings and such. So the pccard / cardbus code held Giant for
the entire insertion or removal process.
Today, the VM is MP safe. The lock is only needed for dealing with
newbus things. Move locking and unlocking Giant to be only around
adding and probing devices in pccard and cardbus.
John Baldwin [Tue, 20 Mar 2018 21:00:45 +0000 (21:00 +0000)]
Use <stdarg.h> instead of <machine/stdarg.h> in userland.
<machine/stdarg.h> is a kernel-only header. The standard header for
userland is <stdarg.h>. Using the standard header in userland avoids
weird build errors when building with external compilers that include
their own stdarg.h header.
Kyle Evans [Tue, 20 Mar 2018 20:05:11 +0000 (20:05 +0000)]
lualoader: Reset attributes and color scheme with color.highlight()
Previously, we sent a CSI 0m sequence to reset attributes, which also reset
the color scheme if the terminal defaults didn't match what we're expecting.
Go all-in and reset the color scheme, too, just in case.
Ed Maste [Tue, 20 Mar 2018 19:28:52 +0000 (19:28 +0000)]
Make linuxulator fn declaration match definition
I accidentally swapped 'linux_fixup_elf' to 'linux_elf_fixup' in amd64's
declaration (only), while bringing this change over from git and
encountering a conflict.
Disable write protection around patching of XSAVE instruction in the
context switch code.
Some BIOSes give control to the OS with CR0.WP already set, making the
kernel text read-only before cpu_startup().
Reported by: Peter Lei <peter.lei@ieee.org>
Reviewed by: jtl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14768
This is a pure syntax patch to create an interface to enable and later
restore write access to the kernel text and other read-only mapped
regions. It is in line with e.g. vm_fault_disable_pagefaults() by
allowing the nesting.
Discussed with: Peter Lei <peter.lei@ieee.org>
Reviewed by: jtl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14768
John Baldwin [Tue, 20 Mar 2018 17:05:23 +0000 (17:05 +0000)]
Set the proper vnet in IPsec callback functions.
When using hardware crypto engines, the callback functions used to handle
an IPsec packet after it has been encrypted or decrypted can be invoked
asynchronously from a worker thread that is not associated with a vnet.
Extend 'struct xform_data' to include a vnet pointer and save the current
vnet in this new member when queueing crypto requests in IPsec. In the
IPsec callback routines, use the new member to set the current vnet while
processing the modified packet.
This fixes a panic when using hardware offload such as ccr(4) with IPsec
after VIMAGE was enabled in GENERIC.
Reported by: Sony Arpita Das and Harsh Jain @ Chelsio
Reviewed by: bz
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14763
Check for wrap-around in vm_phys_alloc_seg_contig().
It is possible to provide insane values for size in contigmalloc(9)
request, which usually not reaches the phys allocator due to failing
KVA allocation. But with the forthcoming 4/4 i386, where 32bit
architecture has almost 4G KVA, contigmalloc(1G) is not unreasonable
outright and KVA might be available sometimes.
Then, the calculation of pa_end could wrap around, depending on the
physical address, and the checks in vm_phys_alloc_seg_contig() would
pass while the iteration in the loop after the 'done' label goes out
of the vm_page_array bounds.
Fix it by detecting the wrap.
Reported and tested by: pho
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14767
Mark Johnston [Tue, 20 Mar 2018 15:51:05 +0000 (15:51 +0000)]
Drop KTR_CONTENTION.
It is incomplete, has not been adopted in the other locking primitives,
and we have other means of measuring lock contention (lock_profiling,
lockstat, KTR_LOCK). Drop it to slightly de-clutter the mutex code and
free up a precious KTR class index.
John Baldwin [Tue, 20 Mar 2018 15:44:17 +0000 (15:44 +0000)]
Add support for MIPS to LLVM's libunwind.
This is originally based on a patch from David Chisnall for soft-float
N64 but has since been updated to support O32, N32, and hard-float ABIs.
The soft-float O32, N32, and N64 support has been committed upstream.
The hard-float changes are still in review upstream.
Enable LLVM_LIBUNWIND on mips when building with a suitable (C+11-capable)
toolchain. This has been tested with external GCC for all ABIs and
O32 and N64 with clang.
Andrew Turner [Tue, 20 Mar 2018 13:35:20 +0000 (13:35 +0000)]
Check if the gettime runtime service is valid.
The U-Boot efi runtime service expects us to set the address map before
calling any runtime services. It will then remap a few functions to their
runtime version. One of these is the gettime function. If we call into
this without having set a runtime map we get a page fault.
Add a check to see if this is valid in efi_init() so we don't try to use
the possibly invalid pointer.
Warner Losh [Tue, 20 Mar 2018 03:37:14 +0000 (03:37 +0000)]
Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.
Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.
Warner Losh [Tue, 20 Mar 2018 03:37:09 +0000 (03:37 +0000)]
Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE
requests that we'll collapse into one DSM_TRIM. By default it is a
256, which is the max that will fit into a 4k page.
Warner Losh [Tue, 20 Mar 2018 03:36:51 +0000 (03:36 +0000)]
Note: this isn't a general thing. It only affects u-boot-based arm64
systems. Make sure the note says that specific case only. Also,
provide a recipe to do it.
Justin Hibbits [Tue, 20 Mar 2018 02:01:30 +0000 (02:01 +0000)]
Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs
arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a
pointer. When &bdomain[i] is added to arg2 it widens from uintptr_t to
intmax_t, then gcc whines when it gets cast to a pointer. Casting through
uintptr_t silences this warning.
Conrad Meyer [Tue, 20 Mar 2018 00:16:24 +0000 (00:16 +0000)]
blacklist: Fix minor memory leak in configuration parsing error case
Ordinarily, the continue clause of the for-loop would free 'line.' In this
case we instead return early, missing the free. Add an explicit free to
avoid the leak.
[ofw] fix errneous checks for OF_finddevice(9) return value
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.
Matt Joras [Mon, 19 Mar 2018 22:43:27 +0000 (22:43 +0000)]
Fix initialization of eventhandler mutex.
mtx_init does not do a copy of the name string it is passed. The
eventhandler code incorrectly passed the parameter string directly to
mtx_init instead of using the copy it makes. This was an existing
problem with the code that I dutifully copied over in my changes in r325621.
Reported by: Anton Rang <rang AT acm.org>
Reviewed by: rstone, markj
Approved by: rstone (mentor)
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14764
Kristof Provost [Mon, 19 Mar 2018 21:13:25 +0000 (21:13 +0000)]
pf: Fix memory leak in DIOCRADDTABLES
If a user attempts to add two tables with the same name the duplicate table
will not be added, but we forgot to free the duplicate table, leaking memory.
Ensure we free the duplicate table in the error path.
Eric Joyner [Mon, 19 Mar 2018 20:55:05 +0000 (20:55 +0000)]
ixgbe(4): Update shared code, add support for X552 1G, fix bug
This patch will:
- Update ixgbe shared code
- Add support for Intel(R) Ethernet Connection X552 1000BASE-T
- Add error handling for link state check preventing VF from stopping traffic
after changing PF's MTU value
John Baldwin [Mon, 19 Mar 2018 19:09:15 +0000 (19:09 +0000)]
Revert r318180 and re-enable AIO tests on md(4) by default.
The 'physio' fast-path used by AIO requests on md(4) devices, is not
gated on the unsafe_aio knob. Prior to r327755, some AIO requests could
fail the fast-path and fall back to the slow-path (requests for devices
not supporting unmapped I/O and requests which failed with EFAULT during
the fast-path). However, those cases now return a suitable error rather
than using the slow-path.
Lawrence Stewart [Mon, 19 Mar 2018 16:37:47 +0000 (16:37 +0000)]
Add support for the experimental Internet-Draft "TCP Alternative Backoff with
ECN (ABE)" proposal to the New Reno congestion control algorithm module.
ABE reduces the amount of congestion window reduction in response to
ECN-signalled congestion relative to the loss-inferred congestion response.
More details about ABE can be found in the Internet-Draft:
https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn
The implementation introduces four new sysctls:
- net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to
enable ABE for ECN-enabled TCP connections.
- net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the
multiplicative window decrease factor, specified as a percentage, applied to
the congestion window in response to a loss-based or ECN-based congestion
signal respectively. They default to the values specified in the draft i.e.
beta=50 and beta_ecn=80.
- net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to
non-zero to enable the use of standard beta (50% by default) when repairing
loss during an ECN-signalled congestion recovery episode. It enables a more
conservative congestion response and is provided for the purposes of
experimentation as a result of some discussion at IETF 100 in Singapore.
The values of beta and beta_ecn can also be set per-connection by way of the
TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or
CC_NEWRENO_BETA_ECN CC algo sub-options.
Submitted by: Tom Jones <tj@enoti.me>
Tested by: Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au>
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D11616
Kyle Evans [Mon, 19 Mar 2018 16:16:12 +0000 (16:16 +0000)]
Move /boot/overlays to /boot/dtb/overlays
The former is fairly vague; these are FDT overlays to be applied to the
running system, so /boot/dtb is a sensible location to put it without
cluttering up /boot/dtb even further if desired.
Kyle Evans [Mon, 19 Mar 2018 15:48:31 +0000 (15:48 +0000)]
lualoader: Setup default color scheme if we're using colors
The console may have been set for different colors before lualoader kicks
in; notably, a black-on-white color scheme is not necessarily what we're
expecting.
While here, make color.default() a composition of color.escape() instead of
rewriting the escape sequence to make it more obvious what it's achieving: a
white-on-black color scheme with no attributes set.
Kyle Evans [Mon, 19 Mar 2018 15:27:53 +0000 (15:27 +0000)]
Add note to UPDATING about UEFI changes requiring loader(8) update
These problems have only been observed with boards using U-Boot (e.g. ARM)
where virtual addresses are already set in the memory map by the firmware
and the firmware is expecting a call to SetVirtualAddressMap to be made.
I refrain from mentioning this in the note because this could also be the
case on some not-yet-tested firmware on amd64 and it's not a bad
recommendation for the general case.
Ed Maste [Mon, 19 Mar 2018 15:11:10 +0000 (15:11 +0000)]
linux*_sysvec.c: rationalize whitespace and comments
There's a fair amount of duplication between MD linuxulator files.
Make indentation and comments consistent between the three versions of
linux_sysvec.c to reduce diffs when comparing them.
Warner Losh [Sun, 18 Mar 2018 18:50:48 +0000 (18:50 +0000)]
Don't add links or cleanfiles for NO_OBJ case, in addition to not
creating them. Move them under the if after the all: target. They are
just defines, so it doesn't really matter where we have them.
Ian Lepore [Sun, 18 Mar 2018 18:37:47 +0000 (18:37 +0000)]
Add support for 4K and 32K erase block sizes. Many of the supported chips
have these flags set in the ident table, but there was no code to support
using the smaller erase sizes.
Ian Lepore [Sun, 18 Mar 2018 17:47:57 +0000 (17:47 +0000)]
Make all internal routines return an int error status, and check the
status at all call points. Combine the get_status and wait_for_ready
routines, since waiting for ready is the only reason to ever get status.
Ian Lepore [Sun, 18 Mar 2018 17:25:23 +0000 (17:25 +0000)]
Add sc_parent to the softc and use it in place of device_get_parent() calls
all over the place. Also pass the softc as the arg to all the internal
functions instead of passing a device_t and calling device_get_softc() in
each function.