yuripv [Sat, 27 Oct 2018 21:24:28 +0000 (21:24 +0000)]
Provide basic descriptions for VMX exit reason (from "Intel 64 and IA-32
Architectures Software Developer’s Manual Volume 3"). Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).
wulf [Sat, 27 Oct 2018 20:22:41 +0000 (20:22 +0000)]
evdev: Use console lock as evdev lock for all supported keyboard drivers.
Now evdev part of keyboard drivers does not take any locks if corresponding
input/eventN device node is not opened by userland consumers.
Do not assert console lock inside evdev to handle the cases when keyboard
driver is called from some special single-threaded context like shutdown
thread.
alc [Sat, 27 Oct 2018 17:49:46 +0000 (17:49 +0000)]
Eliminate typically pointless calls to vm_fault_prefault() on soft, copy-
on-write faults. On a page fault, when we call vm_fault_prefault(), it
probes the pmap and the shadow chain of vm objects to see if there are
opportunities to create read and/or execute-only mappings to neighoring
pages. For example, in the case of hard faults, such effort typically pays
off, that is, mappings are created that eliminate future soft page faults.
However, in the the case of soft, copy-on-write faults, the effort very
rarely pays off. (See the review for some specific data.)
eugen [Sat, 27 Oct 2018 17:21:13 +0000 (17:21 +0000)]
rcorder(8): add support for /etc/rc.resume, so it calls "rcorder -k resume"
and runs scripts containing "KEYWORD: resume" with single "resume" argument.
Working example is the port sysutils/cpupdate that defines
extra_commands="resume" to reload CPU microcode cleared
by suspend/resume sequence.
This change does nothing for a system having no scripts with KEYWORD: resume.
eugen [Sat, 27 Oct 2018 16:41:34 +0000 (16:41 +0000)]
mount_msdosfs: do not fail mounts requiring locale name conversion table
that is already present in a kernel statically.
For example, the command "mount_msdosfs -L ru_RU.KOI8-R" fails with error
"mount_msdosfs: msdosfs_iconv: File exists" for a kernel having
options LIBICONV and MSDOSFS_ICONV. After this change, it mounts successfully.
eugen [Sat, 27 Oct 2018 16:14:42 +0000 (16:14 +0000)]
Extend stripeoffset and stripesize of GEOMs from u_int to off_t
GEOM's stripeoffset overflows at 4 gigabyte margin (2^32)
because of its u_int type. This leads to incorrect data in the output
generated by "sysctl kern.geom.confxml" command, "graid list" etc.
when GEOM array has volumes larger than 4G, for example.
This change does not affect ABI but changes KBI. No MFC planned.
cem [Sat, 27 Oct 2018 15:09:35 +0000 (15:09 +0000)]
random(4): Match enabled sources mask to build options
r287023 and r334450 added build option mechanisms to permanently disable
spammy and/or low quality entropy sources.
Follow-up those changes by updating the 'enabled' sources mask to match.
When sources are compile-time disabled, represent them as disabled in the
source mask, and prevent users from modifying that, like pure sources.
(Modifying the mask bit would have no effect, but users might think it did
if it was not prevented.)
eugen [Sat, 27 Oct 2018 07:32:26 +0000 (07:32 +0000)]
ipfw: implement ngtee/netgraph actions for layer-2 frames.
Kernel part of ipfw does not support and ignores rules other than
"pass", "deny" and dummynet-related for layer-2 (ethernet frames).
Others are processed as "pass".
Make it support ngtee/netgraph rules just like they are supported
for IP packets. For example, this allows us to mirror some frames
selectively to another interface for delivery to remote network analyzer
over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900"
attached to "lower" hook of vlan900's ng_ether(4) node, that would be
as simple as:
ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0
np [Sat, 27 Oct 2018 05:26:09 +0000 (05:26 +0000)]
cxgbetool(8): Add a subaction (tcbrss <n>) that can be used with "pass"
action to distribute traffic using the half of the VI's RSS indirection
table.
The value specified should either be the start of the VI's RSS slice
(available at dev.<ifname>.<inst>.rss_base since r339700) or the
midpoint (rss_base + rss_size/2). The traffic that hits the filter will
use the first or second half of the indirection table respectively.
The indirection table can be populated in different ways to achieve
different kinds of traffic/load distributions. For example, r339749
allows a netmap interface to have half the rx queues in the first half
of the table and the rest in the other.
kevans [Sat, 27 Oct 2018 04:10:42 +0000 (04:10 +0000)]
lualoader: Always return a proper dictionary for blacklist
If module_blacklist isn't specified, we have an empty blacklist; effectively
the same as if module_blacklist="" were specified in loader.conf(5).
This was reported when switching to a BE that predated the module_blacklist
introduction, but the problem is valid all the same and likely to be tripped
over in other scenarios.
delphij [Sat, 27 Oct 2018 03:37:14 +0000 (03:37 +0000)]
Restore backward compatibility for "attach" verb.
In r332361 and r333439, two new parameters were added to geli attach
verb using gctl_get_paraml, which requires the value to be present.
This would prevent old geli(8) binary from attaching geli(4) device
as they have no knowledge about the new parameters.
Restore backward compatibility by treating the absense of these two
values as seeing the default value supplied by userland.
imp [Fri, 26 Oct 2018 23:44:50 +0000 (23:44 +0000)]
Fix pointer arithmetic
Pointer math to find the size in bytes only works with char types.
Use correct pointer math to determine if we have enough of a header to
look at or not.
imp [Fri, 26 Oct 2018 23:08:22 +0000 (23:08 +0000)]
Ensure we have a full EFI_DEVICE_PATH header before we try to look at
its length. Some BIOSes pad the length of the device path to an even
amount. When we had a device path that was somehow an odd length, we'd
wind up having 1 byte left that we were bogusly interpreting as a full
device path. We'd then dereference 2 bytes into that to get a length
of the node, which had undefined (and quite undesired) effects.
will take the value from Boot0001.bin file and then decode it as if it
were a load-option. This is useful for debugging handling of such
variables that may be hanging the boot for some people.
markj [Fri, 26 Oct 2018 21:17:06 +0000 (21:17 +0000)]
Don't set NFSv4 ACL inheritance flags on non-directories.
They only make sense in the context of directory ACLs, and attempting
to set them on regular files results in errors, causing a recursive
setfacl invocation to abort.
This is derived from patches by Shawn Webb <shawn.webb@hardenedbsd.org>
and Mitchell Horne <mhorne063@gmail.com>.
PR: 155163
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15061
cem [Fri, 26 Oct 2018 21:03:57 +0000 (21:03 +0000)]
Fortuna: Add failpoints to simulate initial seeding conditions
Set debug.fail_point.random_fortuna_pre_read=return(1) and
debug.fail_point.random_fortuna_seeded=return(1) to return to unseeded
status (sort of). See the Differential URL for more detail.
The goal is to reproduce e.g. Lev's recent CURRENT report[1] about failing
newfs arc4random(3) usage (fixed in r338542).
cem [Fri, 26 Oct 2018 20:55:01 +0000 (20:55 +0000)]
Fortuna: fix a correctness issue in reseed (fortuna_pre_read)
'i' counts the number of pools included in the array 's'. Passing 'i+1' to
reseed_internal() as the number of blocks in 's' is a bogus overrun of the
initialized portion of 's' -- technically UB.
I found this via code inspection, referencing §9.5.2 "Pools" of the Fortuna
chapter, but I would expect Coverity to notice the same issue.
Unfortunately, it doesn't appear to.
cem [Fri, 26 Oct 2018 20:53:01 +0000 (20:53 +0000)]
rijndael (AES): Avoid leaking sensitive data on kernel stack
Noticed this investigating Fortuna. Remove useless duplicate stack copies
of sensitive contents when possible, or if not possible, be sure to zero
them out when we're finished.
cem [Fri, 26 Oct 2018 20:07:46 +0000 (20:07 +0000)]
poll: Unify userspace pollfd pointer name
Some of the poll code used 'fds' and some used 'ufds' to refer to the
uap->fds userspace pointer that was passed around to subroutines. Some of
the poll code used 'fds' to refer to the kernel memory pollfd arrays, which
seemed unnecessarily confusing.
Unify on 'ufds' to refer to the userspace pollfd array.
Additionally, 'bits' is not an accurate description of the kernel pollfd
array in kern_poll, so rename that to 'kfds'. Finally, clean up some logic
with mallocarray() and nitems().
cem [Fri, 26 Oct 2018 19:53:59 +0000 (19:53 +0000)]
dumpon(8): Provide seatbelt against weak RSA keys
The premise of dumpon -k foo.pem is that dump contents will be confidential
except to anyone holding the corresponding RSA private key.
This guarantee breaks down when weak RSA keys are used. Small RSA keys
(e.g. 512 bits) can be broken on a single personal computer in tractible
time. Marginal RSA keys (768 bits) can be broken by EC2 and a few dollars.
Even 1024 bit keys can probably be broken by sophisticated and wealthy
attackers.
NIST SP800-57 (2016) recommends a minimum of 2048 bit RSA keys, and
estimates this provides 112 bits of security.
It would also be good to protect users from weak values of 'e' (i.e., 3) and
perhaps sanity check that their public key .pem does not accidentally
contain their private key as well. These considerations are left as future
work.
Reviewed by: markj, darius AT dons.net.au (previous version)
Discussed with: bjk
Differential Revision: https://reviews.freebsd.org/D17678
dteske [Fri, 26 Oct 2018 19:16:17 +0000 (19:16 +0000)]
Add blank line after each item in "ngctl ls -l"
The output of "ngctl ls -l" is hard to read. To make it easier, add a blank
line after each listed item much how traditional "ls -l" does when listing
the contents of multiple directories.
brooks [Fri, 26 Oct 2018 17:59:25 +0000 (17:59 +0000)]
Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.
The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.
Unlike r339174 this change supports both places FIODGNAME is handled.
imp [Fri, 26 Oct 2018 16:03:30 +0000 (16:03 +0000)]
Redo r339563: Remove joy(4) driver.
This driver was marked as gone in 12. We're at 13 now. Remove it.
Data from nycbug's dmesg cache shows only one potential user,
suggesting it never was used much. However, even though this device
has been obsolete for 15 years at least, sys/joystick.h is included in
a number of graphics packages still, so that remains. A full exprun
is needed before that can be removed.
imp [Fri, 26 Oct 2018 14:27:37 +0000 (14:27 +0000)]
Put a workaround in for command timeout malfunctioning
At least one NVMe drive has a bug that makeing the Command Time Out
PCIe feature unreliable. The workaround is to disable this
feature. The driver wouldn't deal correctly with a timeout anyway.
Only do this for drives that are known bad.
br [Fri, 26 Oct 2018 12:27:07 +0000 (12:27 +0000)]
o Add pmap lock around pmap_fault_fixup() to ensure other thread will not
modify l3 pte after we loaded old value and before we stored new value.
o Preset A(accessed), D(dirty) bits for kernel mappings.
imp [Fri, 26 Oct 2018 04:10:32 +0000 (04:10 +0000)]
Revert r339563.
I held the mistaken belief this was completely unused. While the
driver is unused and likely not relevant for a long time,
sys/joystick.h lives on in maybe half a dozen ports, even though
hardware to use it hasn't been widely used in maybe 15 years.
np [Thu, 25 Oct 2018 22:55:18 +0000 (22:55 +0000)]
cxgbe(4): Add a knob to split the rx queues for a netmap enabled
interface into two groups. Filters can be used to match traffic
and distribute it across a group.
kib [Thu, 25 Oct 2018 22:16:34 +0000 (22:16 +0000)]
Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory. Supposedly the interface is similar
to the same proposed Linux flags.
Reviewed by: jilles (code, previous version of manpages), 0mp (manpages)
Discussed with: allanjude, emaste, jonathan
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17547
mm [Thu, 25 Oct 2018 21:44:17 +0000 (21:44 +0000)]
MFV r339640,339641,339644:
Sync libarchive with vendor
Relevant vendor changes:
PR #1013: Add missing h_base offset when performing absolute seeks in
xar decompression
PR #1061: Add support for extraction of RAR v5 archives
PR #1066: Fix out of bounds read on empty string filename for gnutar, pax
and v7tar
PR #1067: Fix temporary file path buffer overflow in tests
IS #1068: Correctly process and verify integer arguments passed to
bsdcpio and bsdtar
PR #1070: Don't default XAR entry atime/mtime to the current time
andrew [Thu, 25 Oct 2018 17:39:41 +0000 (17:39 +0000)]
Implement a BSD licensed crtbegin/crtend
These are needed for .ctors/.dtors and .jcr handling. The former needs
all the function pointers to be called in the correct order from the
.init/.fini section. The latter just needs to call a gcj specific function
if it exists with a pointer to the start of the .jcr section.
This is currently disabled until __dso_handle support is added.
imp [Thu, 25 Oct 2018 17:17:11 +0000 (17:17 +0000)]
Update comment for AMI00[12]0 override.
The AML is even stupider than always returning 0. It will only return
non-zero for an OS that reports itself as "Windows 2015", at least
on the Threadripper board's AML that I've examined.
Those AMLs also suggest we may need this quirk for AMI0030 as well.
There may be other cases where we need to override the _STA in a
generic way, so we should consider writing code to do that.
0mp [Thu, 25 Oct 2018 15:41:19 +0000 (15:41 +0000)]
efivar(3): Fix some typos and improve style
- Fix some typos.
- Remove redundant semicolons from the synopsis section.
- Stylize variable names and types with Vt and Va respectively.
- Use a list to present non-implemented functions.
- Sort the order of the sections.
- Add a history section.
- Use Nm when "libefivar" is mentioned.
np [Thu, 25 Oct 2018 14:37:26 +0000 (14:37 +0000)]
cxgbe(4): Allow "pass" filters to distribute matching traffic using a
subset of a VI's RSS indirection table.
This makes it possible to make groups out of rx queues and steer
different kinds of traffic to different groups. For example, an
interface with 8 rx queues could have all non-TCP traffic delivered to
queues 0-3 and all TCP traffic to queues 4-7.
Note that it is already possible for filters to steer traffic to a
particular queue or to distribute it using the full indirection table
(much like normal rx does).
brooks [Thu, 25 Oct 2018 04:10:41 +0000 (04:10 +0000)]
Deprecate a number of less used 10 and 10/100 Ethernet devices.
The current deprecated list is: ae, bm, cs, de, dme, ed, ep, ex, fe,
pcn, sf, sn, tl, tx, txp, vx, wb, xe
The list as refined as part of FCP-0101. Per the FCP, devices may be
removed from the deprecation list if enough users are found or they are
converted to iflib.
kevans [Thu, 25 Oct 2018 02:14:35 +0000 (02:14 +0000)]
lualoader: Improve module loading diagnostics
Some fixes:
- Maintain historical behavior more accurately w.r.t verbose_loading;
verbose_loading strictly prints "${module_name...}" and later "failed!"
or "ok" based on load success
- With or without verbose_loading, dump command_errbuf on load failure.
This usually happens prior to ok/failed if we're verbose_loading
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17694
kevans [Thu, 25 Oct 2018 02:04:01 +0000 (02:04 +0000)]
Update lualoader test script a little bit
Use userboot.so from the test directory if possible, fall back to .OBJDIR.
This avoids a problem that we've had since userboot coexistence was added,
where userboot.so alone no longer exists in the .OBJDIR but is instead just
a link installed later.
markj [Wed, 24 Oct 2018 16:41:47 +0000 (16:41 +0000)]
Use a vm_domainset iterator in keg_fetch_slab().
Previously, it used a hand-rolled round-robin iterator. This meant that
the minskip logic in r338507 didn't apply to UMA allocations, and also
meant that we would call vm_wait() for individual domains rather than
permitting an allocation from any domain with sufficient free pages.
Discussed with: jeff
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17420