kevans [Fri, 24 Jan 2020 16:43:02 +0000 (16:43 +0000)]
caroot: use bsd.obj.mk, not bsd.prog.mk
This directory stages certdata into .OBJDIR and processes it, but does not
actually build a prog-shaped object; bsd.obj.mk provides the minimal support
that we actually need, an .OBJDIR and descent into subdirs. This is
admittedly the nittiest of nits.
emaste [Fri, 24 Jan 2020 14:41:51 +0000 (14:41 +0000)]
Tag NLS aliases with pkgbase package
POSIX and en_US.US_ASCII are aliases (symlinks) to the C locale. They were
not previously tagged with a pkgbase pacakge. Add the tag so that they are
handled correctly on pkgbase-installed/updated systems.
[This is r356990 reapplied with a corrected commit message.]
dougm [Fri, 24 Jan 2020 07:48:11 +0000 (07:48 +0000)]
Most uses of vm_map_clip_start follow a call to vm_map_lookup. Define
an inline function vm_map_lookup_clip_start that invokes them both and
use it in places that invoke both. Drop a couple of local variables
made unnecessary by this function.
mjg [Fri, 24 Jan 2020 07:47:44 +0000 (07:47 +0000)]
vfs: allow v_usecount to transition 0->1 without the interlock
There is nothing to do but to bump the count even during said transition.
There are 2 places which can do it:
- vget only does this after locking the vnode, meaning there is no change in
contract versus inactive or reclamantion
- vref only ever did it with the interlock held which did not protect against
either (that is, it would always succeed)
VCHR vnodes retain special casing due to the need to maintain dev use count.
mjg [Fri, 24 Jan 2020 07:45:59 +0000 (07:45 +0000)]
vfs: stop handling VI_OWEINACT in vget
vget is almost always called with LK_SHARED, meaning the flag (if present) is
almost guaranteed to get cleared. Stop handling it in the first place and
instead let the thread which wanted to do inactive handle the bumepd usecount.
Reviewed by: jeff
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23184
mjg [Fri, 24 Jan 2020 07:44:25 +0000 (07:44 +0000)]
vfs: stop unlocking the vnode upfront in vput
Doing so runs into races with filesystems which make half-constructed vnodes
visible to other users, while depending on the chain vput -> vinactive ->
vrecycle to be executed without dropping the vnode lock.
Impediments for making this work got cleared up (notably vop_unlock_post now
does not do anything and lockmgr stops touching the lock after the final
write). Stacked filesystems keep vhold/vdrop across unlock, which arguably can
now be eliminated.
Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D23344
kevans [Fri, 24 Jan 2020 02:18:09 +0000 (02:18 +0000)]
Drop "All Rights Reserved" from all libbe/bectl files
I sent out an e-mail on 2020/01/21 with a plan to do this to Kyle, Rob, and
Wes; all parties have responded in the affirmative that it's OK to drop it
from these files.
cem [Fri, 24 Jan 2020 01:39:29 +0000 (01:39 +0000)]
random(3): Abstract routines into _r versions on explicit state
The existing APIs simply pass the implicit global state to the _r variants.
No functional change.
Note that these routines are not exported from libc and are not intended to be
exported. If someone wished to export them from libc (which I would
discourage), they should first be modified to match the inconsistent parameter
type / order of the glibc public interfaces of the same names.
I know Ravi will ask, so: the eventual goal of this series is to replace
rand(3) with the implementation from random(3) (D23290). However, I'd like to
wait a bit longer on that one to see if more feedback emerges.
cem [Thu, 23 Jan 2020 23:52:57 +0000 (23:52 +0000)]
cpufreq(4): Fix missing MODULE_DEPEND on hwpstate_intel
DRIVER_MODULE does not actually define a MODULE_VERSION, which is required
to satisfy a MODULE_DEPENDency. Declare one explicitly in
hwpstate_intel(4).
kp [Thu, 23 Jan 2020 22:13:41 +0000 (22:13 +0000)]
pf: Apply kif flags to new group members
If we have a 'set skip on <ifgroup>' rule this flag it set on the group
kif, but must also be set on all members. pfctl does this when the rules
are set, but if groups are added afterwards we must also apply the flags
to the new member. If not, new group members will not be skipped until
the rules are reloaded.
kib [Thu, 23 Jan 2020 17:08:33 +0000 (17:08 +0000)]
Fix r356919.
Instead of waiting for pc_curthread which is overwritten by
init_secondary_tail(), wait for non-NULL pc_curpcb, to be set by the
first context switch.
Assert that pc_curpcb is not set too early.
Reported and tested by: rlibby
Reviewed by: markj, rlibby
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23330
markj [Thu, 23 Jan 2020 16:45:10 +0000 (16:45 +0000)]
vm_map_submap(): Avoid unnecessary clipping.
A submap can only be created from an entry spanning the entire request
range. In particular, if vm_map_lookup_entry() returns false or the
returned entry contains "end".
Since the only use of submaps in FreeBSD is for the static pipe and
execve argument KVA maps, this has no functional effect.
markj [Thu, 23 Jan 2020 16:24:51 +0000 (16:24 +0000)]
Set td_oncpu before dropping the thread lock during a switch.
After r355784 we no longer hold a thread's thread lock when switching it
out. Preserve the previous synchronization protocol for td_oncpu by
setting it together with td_state, before dropping the thread lock
during a switch.
Reported and tested by: pho
Reviewed by: kib
Discussed with: jeff
Differential Revision: https://reviews.freebsd.org/D23270
markj [Thu, 23 Jan 2020 16:07:27 +0000 (16:07 +0000)]
arm64: Don't enable interrupts in init_secondary().
Doing so can cause deadlocks or panics during boot, if an interrupt
handler accesses uninitialized per-CPU scheduler structures. This seems
to occur frequently when running under QEMU or AWS. The idle threads
are set up to release a spinlock section and enable interrupts in
fork_exit(), so there is no need to enable interrupts earlier.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23328
emaste [Thu, 23 Jan 2020 14:11:02 +0000 (14:11 +0000)]
Apply r355819 to sparc64 - fix assertion failure after r355784
From r355819:
Repeat the spinlock_enter/exit pattern from amd64 on other architectures
to fix an assert violation introduced in r355784. Without this
spinlock_exit() may see owepreempt and switch before reducing the
spinlock count. amd64 had been optimized to do a single critical
enter/exit regardless of the number of spinlocks which avoided the
problem and this optimization had not been applied elsewhere.
This is completely untested - I have no obsolete Sparc hardware - but
someone did try testing recent changes on sparc64 (PR 243534).
avg [Thu, 23 Jan 2020 11:05:03 +0000 (11:05 +0000)]
vmxnet3: add support for RSS kernel option
We observe at least one problem: if a UDP socket is connect(2)-ed, then a
received packet that matches the connection cannot be matched to the
corresponding PCB because of an incorrect flow ID. That was oberved for DNS
requests from the libc resolver. We got this problem because FreeBSD
r343291 enabled code that can set rsstype of received packets to values
other than M_HASHTYPE_OPAQUE_HASH. Earlier that code was under 'ifdef
notyet'.
The essence of this change is to use the system-wide RSS key instead of
some historic hardcoded key when the software RSS is enabled and it is
configured to use Toeplitz algorithm (the default).
In all other cases, the driver reports the opaque hash type for received
packets while still using Toeplitz algorithm with the internal key.
avg [Thu, 23 Jan 2020 10:13:56 +0000 (10:13 +0000)]
virtio_scsi: use max target ID plus one as the initiator ID
This bus does not really have a concept of the initiator ID, so use
a guaranteed dummy one that won't conflict with any real target.
This change fixes a problem with virtio_scsi on GCE where disks get
sequential target IDs starting from one. If there are seven or more
disks, then a disk with the target ID of seven would not be discovered
by FreeBSD as that ID was reserved as the initiator ID -- see
scsi_scan_bus().
melifaro [Thu, 23 Jan 2020 09:14:28 +0000 (09:14 +0000)]
Fix epoch-related panic in ipdivert, ensuring in_broadcast() is called
within epoch.
Simplify gigantic div_output() by splitting it into 3 functions,
handling preliminary setup, remote "ip[6]_output" case and
local "netisr" case. Leave original indenting in most parts to ease
diff comparison. Indentation will be fixed by a followup commit.
Reported by: Nick Hibma <nick at van-laarhoven.org>
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D23317
rlibby [Thu, 23 Jan 2020 04:56:38 +0000 (04:56 +0000)]
uma: fix zone domain overlaying pcpu cache with disabled cpus
UMA zone structures have two arrays at the end which are sized according
to the machine: an array of CPU count length, and an array of NUMA
domain count length. The CPU counting was wrong in the case where some
CPUs are disabled (when mp_ncpus != mp_maxid + 1), and this caused the
second array to be overlaid with the first.
rlibby [Thu, 23 Jan 2020 04:56:34 +0000 (04:56 +0000)]
uma: report leaks more accurately
Previously UMA had some false negatives in the leak report at keg
destruction time, where it only reported leaks if there were free items
in the slab layer (rather than allocated items), which notably would not
be true for single-item slabs (large items). Now, report a leak if
there are any allocated pages, and calculate and report the number of
allocated items rather than free items.
jeff [Thu, 23 Jan 2020 03:36:50 +0000 (03:36 +0000)]
Block the thread lock in sched_throw() and use cpu_switch() to unblock
it. The introduction of lockless switch in r355784 created a race to
re-use the exiting thread that was only possible to hit on a hypervisor.
glebius [Thu, 23 Jan 2020 01:30:50 +0000 (01:30 +0000)]
DEVICE_POLLING is an alternative to network interrupts and also
needs to enter epoch. Assert that in the netisr_poll() and do
the work for the idle poll routine.
glebius [Thu, 23 Jan 2020 01:27:58 +0000 (01:27 +0000)]
Enter network epoch in iflib rxeof task.
In upcoming changes ether_input() is going to be changed not
to enter the network epoch. It is going to be responsibility
of network interrupt. In case of iflib - its taskqueue.
jhb [Wed, 22 Jan 2020 21:21:24 +0000 (21:21 +0000)]
Remove support for auto-selecting an external binutils.
All of the in-tree architectures not supported by in-tree binutils are
supported by lld, so the condition is now always false. It also
didn't fully work since the external binutils are installed into a
directory that uses the host's OS version, not the target OS version.
kp [Wed, 22 Jan 2020 21:01:19 +0000 (21:01 +0000)]
pfsync: Ensure we enter network epoch before calling ip_output
As of r356974 calls to ip_output() require us to be in the network epoch.
That wasn't the case for the calls done from pfsyncintr() and
pfsync_defer_tmo().
mav [Wed, 22 Jan 2020 20:36:45 +0000 (20:36 +0000)]
Update route MTUs for bridge, lagg and vlan interfaces.
Those interfaces may implicitly change their MTU on addition of parent
interface in addition to normal SIOCSIFMTU ioctl path, where the route
MTUs are updated normally.
emaste [Wed, 22 Jan 2020 18:55:36 +0000 (18:55 +0000)]
Tag etc/termcap with package=runtime
/etc/termcap is a symlink to /usr/share/misc/termcap, which is in the
runtime package. Tag the symlink with the same package so that it is
handled correctly on pkgbase-installed/updated systems.
emaste [Wed, 22 Jan 2020 18:40:19 +0000 (18:40 +0000)]
Tag NLS aliases with package=runtime
POSIX and en_US.US_ASCII are aliases (symlinks) to the C locale. They were
not previously tagged with a pkgbase pacakge. Add the tag so that they are
handled correctly on pkgbase-installed/updated systems.
glebius [Wed, 22 Jan 2020 17:19:53 +0000 (17:19 +0000)]
Plug possible calls into ip6?_output() without network epoch from SCTP
bluntly adding epoch entrance into the macro that SCTP uses to call
ip6?_output(). This definitely will introduce several epoch recursions.
bz [Wed, 22 Jan 2020 15:06:59 +0000 (15:06 +0000)]
Fix NOINET kernels after r356983.
All gotos to the label are within the #ifdef INET section, which leaves
us with an unused label. Cover the label under #ifdef INET as well to
avoid the warning and compile time error.
melifaro [Wed, 22 Jan 2020 13:53:18 +0000 (13:53 +0000)]
Bring back redirect route expiration.
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.
It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
an expiration time for redirected routes. It defaults to 10 minutes,
analogues with net.inet6.icmp6.redirtimeout.
Implementation uses separate file, route_temporal.c, as route.c is already
bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
route with non-zero rt_expire time is added or rt_expire is changed.
It does not add any overhead when no temporal routes are present.
Callout traverses entire routing tree under wlock, scheduling expired routes
for deletion and calculating the next time it needs to be run. The rationale
for such implemention is the following: typically workloads requiring large
amount of routes have redirects turned off already, while the systems with
small amount of routes will not inhibit large overhead during tree traversal.
This changes also fixes netstat -rn display of route expiration time, which
has been broken since the conversion from kread() to sysctl.
glebius [Wed, 22 Jan 2020 06:10:41 +0000 (06:10 +0000)]
Make in_pcbladdr() require network epoch entered by its callers. Together
with this widen network epoch coverage up to tcp_connect() and udp_connect().
Revisions from r356974 and up to this revision cover D23187.
glebius [Wed, 22 Jan 2020 06:03:45 +0000 (06:03 +0000)]
The network epoch changes in the TCP stack combined with old r286227,
actually make removal of a PCB not needing ipi_lock in any form. The
ipi_list_lock is sufficient.
glebius [Wed, 22 Jan 2020 05:58:29 +0000 (05:58 +0000)]
Relax locking requirements for in_pcballoc(). All pcbinfo fields
modified by this function are protected by the PCB list lock that is
acquired inside the function.
This could have been done even before epoch changes, after r286227.
bdragon [Wed, 22 Jan 2020 02:06:34 +0000 (02:06 +0000)]
[PowerPC] libc backwards compatibility shim for auxv change
As part of the FreeBSD powerpc* flag day (1300070), the auxv numbering was
changed to match every other platform.
See D20799 for more details on that change.
While the kernel and rtld were adapted, libc was not, so old dynamic
binaries broke for reasons other than the ABI change on powerpc64.
Since it's possible to support nearly everything regarding old binaries by
adding compatibility code to libc (as besides rtld, it is the main point
where auxv is digested), we might as well provide compatibility code.
The only unhandled case remaining should be "new format libraries that call
elf_aux_info() which are dynamically linked to by old-format binaries",
which should be quite rare.
kevans [Tue, 21 Jan 2020 22:02:53 +0000 (22:02 +0000)]
posix_spawn: mark error as volatile
In the case of an error, the RFSPAWN'd thread will write back to psa->error
with the correct exit code. Mark this as volatile as the return value is
being actively dorked up for erroneous exits on !x86.
This fixes the following tests, tested on aarch64 (only under qemu, at the
moment):
cy [Tue, 21 Jan 2020 20:21:52 +0000 (20:21 +0000)]
Fix build when WITHOUT_WPA_SUPPLICANT_EAPOL option used.
The build failure was discoved by Michael Dexter's recent Build Options
Survey run, at https://callfortesting.org/results/bos-2020-01-16/\
WITHOUT_WPA_SUPPLICANT_EAPOL-small.txt.
Reported by: Michael Dexter <editor@callfortesting.org> via emaste
MFC after: 2 weeks