Mark Johnston [Tue, 30 Oct 2018 17:57:40 +0000 (17:57 +0000)]
Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
first allocation attempt, not after. Fix this by switching
uma_prealloc() to use a vm_domainset iterator, which addresses the
secondary issue of using a signed domain identifier in round-robin
iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
excluding empty domains; otherwise we may frequently specify an empty
domain when calling in to the page allocator, wasting CPU time.
Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
total page counts for now: some vm_phys segments are created before
the SRAT is parsed and are thus always identified as being in domain 0
even when they are not. Then, when bootstrap pages are freed, they
are added to a domain that we had previously thought was empty. Until
this is corrected, we simply exclude them from the per-domain page
count.
Reported and tested by: Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17704
Bjoern A. Zeeb [Tue, 30 Oct 2018 15:46:30 +0000 (15:46 +0000)]
Introduce an EXPERIMENTAL option for both src.conf(5) and the kernel.
In the last decade(s) we have seen both short term or long term projects
committed to the tree which were considered or even marked "experimental".
While out-of-tree development has become easier than it used to be in
CVS times, there still is a need to have the code shipping with HEAD but
not enabled by default.
While people may think about VIMAGE as one of the recent larger, long term
projects, early protocol implementations (before they are standardised)
are others. (Free)BSD historically was one of the operating systems
which would have running code at early stages and help develop and
influence standardisation and the industry.
Give developers an opportunity to be more pro-active for early adoption
or running large scale code changes stumbling over each others but not
the user's feet. I have not added the option to NOTES in order to avoid
breaking supported option builds, which require constant compile testing.
Remove useless call to access(2) from tzcode. Quoting OpenBSD:
> Remove doaccess variable and access(2) call since this interfers with
> applications like zdump(8) because pledge(2) doesn't allow access(2) to
> /usr/share/zoneinfo.
>
> millert@ better described why this call can go away:
>
> "This looks like an attempt to do access checks based on the real uid instead
> of the effective uid. Basically for setuid programs we don't want to allow a
> user to set TZ to a path they should not be able to otherwise access.
>
> However, we already have a check for issetugid() above so I think the doaccess
> bits can just be removed and we can rely on open()."
>
> After discussion with tb@, deraadt@ and millert@, this was also OK'ed by them
Eric van Gyzen [Tue, 30 Oct 2018 14:54:15 +0000 (14:54 +0000)]
Always stop the scheduler when entering kdb
Set curthread->td_stopsched when entering kdb via any vector.
Previously, it was only set when entering via panic, so when
entering kdb another way, mutexes and such were still "live",
and an attempt to lock an already locked mutex would panic.
Marcelo Araujo [Tue, 30 Oct 2018 09:53:57 +0000 (09:53 +0000)]
Allow changing lagg(4) MTU.
Previously, changing the MTU would require destroying the lagg and
creating a new one. Now it is allowed to change the MTU of
the lagg interface and the MTU of the ports will be set to match.
If any port cannot set the new MTU, all ports are reverted to the original
MTU of the lagg. Additionally, when adding ports, the MTU of a port will be
automatically set to the MTU of the lagg. As always, the MTU of the lagg is
initially determined by the MTU of the first port added. If adding an
interface as a port for some reason fails, that interface is reverted to its
original MTU.
Submitted by: Ryan Moeller <ryan@freqlabs.com>
Reviewed by: mav
Relnotes: Yes
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D17576
Andrew Turner [Tue, 30 Oct 2018 09:43:26 +0000 (09:43 +0000)]
Run the csu tests on a DSO. This builds the tests into a shared library,
then runs these from the base test programs. With this we can check
crtbeginS.o and crtendS.o are working as expected.
Yuri Pankov [Tue, 30 Oct 2018 02:37:23 +0000 (02:37 +0000)]
Connect libc/tests/time to the build, adding test cases for strptime()
issues fixed recently, and disabling the failing ones (mostly due to TZ
parsing differences with NetBSD).
Justin Hibbits [Tue, 30 Oct 2018 00:47:40 +0000 (00:47 +0000)]
powerpc/mpc85xx: Reset the PCIe bus on attach
It seems if a Radeon card is already initialized by u-boot, it won't be
reinitialized by the kernel, and the DRM module will fail to attach. This
steals the reset code from mips/octopci.c to blindly reset the bus on attach.
This was tested on a AmigaOne X5000/20, such that it can be booted from the
local video console, and get a video console in FreeBSD.
John Baldwin [Tue, 30 Oct 2018 00:23:37 +0000 (00:23 +0000)]
Permit local kernel modules to be built as part of a kernel build.
Add support for "local" modules. By default, these modules are
located in LOCALBASE/sys/modules (where LOCALBASE defaults to
/usr/local). Individual modules can be built along with a kernel by
defining LOCAL_MODULES to the list of modules. Each is assumed to be
a subdirectory containing a valid Makefile. If LOCAL_MODULES is not
specified, all of the modules present in LOCALBASE/sys/modules are
built and installed along with the kernel.
This means that a port that installs a kernel module can choose to
install its source along with a suitable Makefile to
/usr/local/sys/modules/<foo>. Future kernel builds will then include
that kernel module using the kernel configuration's opt_*.h headers
and install it into /boot/kernel along with other kernel-specific
modules.
This is not trying to solve the issue of folks running GENERIC release
kernels, but is instead aimed at folks who build their own kernels.
For those folks this ensures that kernel modules from ports will
always be using the right KBI, etc. This includes folks running any
KBI-breaking kernel configs (such as PAE).
There are still some kinks to be worked out with cross-building (we
probably shouldn't include local modules in cross-built kernels by
default), but this is a sufficient starting point.
John Baldwin [Tue, 30 Oct 2018 00:19:44 +0000 (00:19 +0000)]
Make battery emptying rate available as sysctl variable.
Curiously, the in-kernel routines always use the design voltage to
convert from mA to mW, but acpiconf in userland uses the current
voltage. As a result, this can report a different mW rate than
acpiconf.
Alex Richardson [Mon, 29 Oct 2018 21:08:34 +0000 (21:08 +0000)]
Fix get_maxfds() in jevents
If RLIM_INFINITY == -1ULL (such as on macOS) the min() call will result
in a value of less than 1 being returned. This causes nftw() to fail
with EINVAL.
While touching this file also fix includes to work on Linux/macOS and don't
declare snprintf since it may have different attributes in the system
headers there.
Alex Richardson [Mon, 29 Oct 2018 21:08:02 +0000 (21:08 +0000)]
rtld: set obj->textsize correctly
With lld-generated binaries the first PT_LOAD will usually be a read-only
segment unless you pass --no-rosegment. For those binaries the textsize is
determined by the next PT_LOAD. To allow both LLD and bfd 2.17 binaries to
be parsed correctly use the end of the last PT_LOAD that is marked as
executable instead.
I noticed that the value was wrong while adding some debug prints for some rtld
changes for CHERI binaries. `obj->textsize` only seems to be used by PPC so the
effect is untested. However, the value before was definitely wrong and the new
result matches the phdrs.
Glen Barber [Mon, 29 Oct 2018 20:53:05 +0000 (20:53 +0000)]
Set OPTIONS_UNSET in the argument list to env(1), and add
AVAHI to the list. This fixes the textproc/docproj build
seemingly following FLAVORS being added.
Specifically, the problem with the dependency chain here is:
- textproc/docproj depends on print/cups, which sets AVAHI=on
by default;
- net/avahi-app depends on devel/gobject-introspection, which
requires python3+;
- graphics/netpbm depends on graphics/mesa-libs, which can
only be built with python2.7;
- textproc/docproj depends on a number of graphics ports for
font rendering, etc.
MFC after: 3 days
MFC before: 12.0-BETA3
Sponsored by: The FreeBSD Foundation
Stephen Hurd [Mon, 29 Oct 2018 14:36:03 +0000 (14:36 +0000)]
Drain grouptaskqueue of the gtask before detaching it.
taskqgroup_detach() would remove the task even if it was running or
enqueued, which could lead to panics (see D17404). With this change,
taskqgroup_detach() drains the task and sets a new flag which prevents the
task from being scheduled again.
I've added grouptask_block() and grouptask_unblock() to allow control
over the flag from other locations as well.
Kyle Evans [Mon, 29 Oct 2018 02:58:30 +0000 (02:58 +0000)]
lualoader: Fix try_include error handling
The previous iteration of try_include attempted to be 'friendly' and error()
out if we hit an error that wasn't ENOENT. This was semi-OK, but fragile as
it relied on pattern matching the error message.
Move the responsibility for handling failure to the caller. Following
a common lua pattern, we'll return the return value of the underlying
require() on success, or false and an error message.
Devin Teske [Sun, 28 Oct 2018 19:29:07 +0000 (19:29 +0000)]
Fix dialog autosizing to accomodate for hline
dialog will conditionally ignore the --hline option if not enough space
was available to accomodate for the text width. Traditionally the width
of the widget had to be 10 wider than the text. Recent updates to dialog
have changed the requirement to be at least 12 wider than the hline text
else the hline text is not rendered at the bottom of the widget.
Devin Teske [Sun, 28 Oct 2018 18:32:47 +0000 (18:32 +0000)]
Fix jail examples in jib, jng, README
The provided example jail configs do not work for multiple interfaces.
Multiple interfaces need to be specified as a comma separated list or
using multiple += lines in jail.conf. In the given example, a space-
separated string is used, which doesn't work with multiple interfaces.
Also added a note to the README about VIMAGE being built-in by default
on amd64 in FreeBSD 12, with appropriate instructions for loading the
necessary netgraph ether module (ng_ether) since it is neither built-
in nor autoloads.
Submitted by: Ryan Moeller <ryan@freqlabs.com>
Reported by: Ryan Moeller <ryan@freqlabs.com>
MFC after: 3 days
Sponsored by: Smule, Inc.
Differential Revision: https://reviews.freebsd.org/D17697
Warner Losh [Sun, 28 Oct 2018 02:58:22 +0000 (02:58 +0000)]
Note that the kenrel doesn't keep track daylight savings time, nor
timezone offset. These values are generally zero.
While one still theoreticall could set these values, that's almost
never done. Users wishing to have an offset between the time of day
clock hardware and UTC use adjkerntz(8) instead.
localtime(3) should be used to find these values for the current
timezone.
Yuri Pankov [Sat, 27 Oct 2018 23:31:42 +0000 (23:31 +0000)]
localedef: define characters in "space" class also as "print", except
for the known conflicts ("control" characters can't be "print"able).
POSIX doesn't explicitly forbid this, and actually includes <space>
character in "print".
Yuri Pankov [Sat, 27 Oct 2018 21:24:28 +0000 (21:24 +0000)]
Provide basic descriptions for VMX exit reason (from "Intel 64 and IA-32
Architectures Software Developer’s Manual Volume 3"). Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).
evdev: Use console lock as evdev lock for all supported keyboard drivers.
Now evdev part of keyboard drivers does not take any locks if corresponding
input/eventN device node is not opened by userland consumers.
Do not assert console lock inside evdev to handle the cases when keyboard
driver is called from some special single-threaded context like shutdown
thread.
Alan Cox [Sat, 27 Oct 2018 17:49:46 +0000 (17:49 +0000)]
Eliminate typically pointless calls to vm_fault_prefault() on soft, copy-
on-write faults. On a page fault, when we call vm_fault_prefault(), it
probes the pmap and the shadow chain of vm objects to see if there are
opportunities to create read and/or execute-only mappings to neighoring
pages. For example, in the case of hard faults, such effort typically pays
off, that is, mappings are created that eliminate future soft page faults.
However, in the the case of soft, copy-on-write faults, the effort very
rarely pays off. (See the review for some specific data.)
Eugene Grosbein [Sat, 27 Oct 2018 17:21:13 +0000 (17:21 +0000)]
rcorder(8): add support for /etc/rc.resume, so it calls "rcorder -k resume"
and runs scripts containing "KEYWORD: resume" with single "resume" argument.
Working example is the port sysutils/cpupdate that defines
extra_commands="resume" to reload CPU microcode cleared
by suspend/resume sequence.
This change does nothing for a system having no scripts with KEYWORD: resume.
Eugene Grosbein [Sat, 27 Oct 2018 16:41:34 +0000 (16:41 +0000)]
mount_msdosfs: do not fail mounts requiring locale name conversion table
that is already present in a kernel statically.
For example, the command "mount_msdosfs -L ru_RU.KOI8-R" fails with error
"mount_msdosfs: msdosfs_iconv: File exists" for a kernel having
options LIBICONV and MSDOSFS_ICONV. After this change, it mounts successfully.
Eugene Grosbein [Sat, 27 Oct 2018 16:14:42 +0000 (16:14 +0000)]
Extend stripeoffset and stripesize of GEOMs from u_int to off_t
GEOM's stripeoffset overflows at 4 gigabyte margin (2^32)
because of its u_int type. This leads to incorrect data in the output
generated by "sysctl kern.geom.confxml" command, "graid list" etc.
when GEOM array has volumes larger than 4G, for example.
This change does not affect ABI but changes KBI. No MFC planned.
Conrad Meyer [Sat, 27 Oct 2018 15:09:35 +0000 (15:09 +0000)]
random(4): Match enabled sources mask to build options
r287023 and r334450 added build option mechanisms to permanently disable
spammy and/or low quality entropy sources.
Follow-up those changes by updating the 'enabled' sources mask to match.
When sources are compile-time disabled, represent them as disabled in the
source mask, and prevent users from modifying that, like pure sources.
(Modifying the mask bit would have no effect, but users might think it did
if it was not prevented.)
Eugene Grosbein [Sat, 27 Oct 2018 07:32:26 +0000 (07:32 +0000)]
ipfw: implement ngtee/netgraph actions for layer-2 frames.
Kernel part of ipfw does not support and ignores rules other than
"pass", "deny" and dummynet-related for layer-2 (ethernet frames).
Others are processed as "pass".
Make it support ngtee/netgraph rules just like they are supported
for IP packets. For example, this allows us to mirror some frames
selectively to another interface for delivery to remote network analyzer
over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900"
attached to "lower" hook of vlan900's ng_ether(4) node, that would be
as simple as:
ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0
Navdeep Parhar [Sat, 27 Oct 2018 05:26:09 +0000 (05:26 +0000)]
cxgbetool(8): Add a subaction (tcbrss <n>) that can be used with "pass"
action to distribute traffic using the half of the VI's RSS indirection
table.
The value specified should either be the start of the VI's RSS slice
(available at dev.<ifname>.<inst>.rss_base since r339700) or the
midpoint (rss_base + rss_size/2). The traffic that hits the filter will
use the first or second half of the indirection table respectively.
The indirection table can be populated in different ways to achieve
different kinds of traffic/load distributions. For example, r339749
allows a netmap interface to have half the rx queues in the first half
of the table and the rest in the other.
Kyle Evans [Sat, 27 Oct 2018 04:10:42 +0000 (04:10 +0000)]
lualoader: Always return a proper dictionary for blacklist
If module_blacklist isn't specified, we have an empty blacklist; effectively
the same as if module_blacklist="" were specified in loader.conf(5).
This was reported when switching to a BE that predated the module_blacklist
introduction, but the problem is valid all the same and likely to be tripped
over in other scenarios.
Xin LI [Sat, 27 Oct 2018 03:37:14 +0000 (03:37 +0000)]
Restore backward compatibility for "attach" verb.
In r332361 and r333439, two new parameters were added to geli attach
verb using gctl_get_paraml, which requires the value to be present.
This would prevent old geli(8) binary from attaching geli(4) device
as they have no knowledge about the new parameters.
Restore backward compatibility by treating the absense of these two
values as seeing the default value supplied by userland.
Warner Losh [Fri, 26 Oct 2018 23:44:50 +0000 (23:44 +0000)]
Fix pointer arithmetic
Pointer math to find the size in bytes only works with char types.
Use correct pointer math to determine if we have enough of a header to
look at or not.
Warner Losh [Fri, 26 Oct 2018 23:08:22 +0000 (23:08 +0000)]
Ensure we have a full EFI_DEVICE_PATH header before we try to look at
its length. Some BIOSes pad the length of the device path to an even
amount. When we had a device path that was somehow an odd length, we'd
wind up having 1 byte left that we were bogusly interpreting as a full
device path. We'd then dereference 2 bytes into that to get a length
of the node, which had undefined (and quite undesired) effects.
will take the value from Boot0001.bin file and then decode it as if it
were a load-option. This is useful for debugging handling of such
variables that may be hanging the boot for some people.
Mark Johnston [Fri, 26 Oct 2018 21:17:06 +0000 (21:17 +0000)]
Don't set NFSv4 ACL inheritance flags on non-directories.
They only make sense in the context of directory ACLs, and attempting
to set them on regular files results in errors, causing a recursive
setfacl invocation to abort.
This is derived from patches by Shawn Webb <shawn.webb@hardenedbsd.org>
and Mitchell Horne <mhorne063@gmail.com>.
PR: 155163
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15061
Conrad Meyer [Fri, 26 Oct 2018 21:03:57 +0000 (21:03 +0000)]
Fortuna: Add failpoints to simulate initial seeding conditions
Set debug.fail_point.random_fortuna_pre_read=return(1) and
debug.fail_point.random_fortuna_seeded=return(1) to return to unseeded
status (sort of). See the Differential URL for more detail.
The goal is to reproduce e.g. Lev's recent CURRENT report[1] about failing
newfs arc4random(3) usage (fixed in r338542).
Conrad Meyer [Fri, 26 Oct 2018 20:55:01 +0000 (20:55 +0000)]
Fortuna: fix a correctness issue in reseed (fortuna_pre_read)
'i' counts the number of pools included in the array 's'. Passing 'i+1' to
reseed_internal() as the number of blocks in 's' is a bogus overrun of the
initialized portion of 's' -- technically UB.
I found this via code inspection, referencing §9.5.2 "Pools" of the Fortuna
chapter, but I would expect Coverity to notice the same issue.
Unfortunately, it doesn't appear to.