ZoL issues:
Improved dnode allocation #6564
Clean up large dnode code #6262
Fix dnode_hold() freeing dnode behavior #8172
Fix dnode allocation race #6414, #6439
Partial: Raw sends must be able to decrease nlevels #6821, #6864
Remove unnecessary txg syncs from receive_object() Closes #7197
This updates FreeBSD large_dnode code (that was imported from ZoL) to a
version that was committed to illumos. It has some cleanups,
improvements and fixes comparing to what we have in FreeBSD now.
I think that the most significant update is 8199 multi-threaded
dmu_object_alloc().
This commit reverts r351077 that was a revert of r351074 and r351076 and
restores those changes. Required atomic operations should be available
now on all platforms where we build ZFS.
Emmanuel Vadot [Mon, 7 Oct 2019 08:11:49 +0000 (08:11 +0000)]
arm: dts: ti: Fix mmc3 instance by setting it to disabled
DTS Import of Linux 5.3 added a patch that rework the L3 mmc instance
in the AM335x SoC but removed the status = 'disabled' on the node.
This cause the kernel to probe the device even if the board doesn't
have this mmc used and since we don't correctly activate the clock
for this module we panic with an external data abort.
Beaglebone(s) don't have this device anyway so simply disabling it.
Patch for the DTS was sent upstream.
https://patchwork.kernel.org/patch/11176921/
Andriy Gapon [Mon, 7 Oct 2019 08:00:54 +0000 (08:00 +0000)]
ZFS: unconditionally use atomic_swap_64
Previously, the code used a plain store on platforms that lacked
atomic_swap_64 and possibly some other platforms as the condition worked
only if atomic_swap_64 was a macro.
Andriy Gapon [Mon, 7 Oct 2019 07:54:34 +0000 (07:54 +0000)]
ZFS: add emulation of atomic_swap_64 and atomic_load_64
Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all. We emulate those operations for
those platforms using a mutex. That is not entirely correct and it's
very efficient. Besides, the loads are plain loads, so torn values are
possible.
Nevertheless, the emulation seems to work for some definition of work.
This change adds atomic_swap_64, which is already used in ZFS code, and
atomic_load_64 that can be used to prevent torn reads.
Andriy Gapon [Mon, 7 Oct 2019 07:37:42 +0000 (07:37 +0000)]
align use of cp15_pmccntr_get with its availability
According to ian, the only armv6 cpu we support is the 1176, so this
change is effectively a no-op.
The change is just to make the code more self-consistent.
The issue was noticed by a standalone module build for armv6.
Kyle Evans [Mon, 7 Oct 2019 03:28:11 +0000 (03:28 +0000)]
Revert r352557: powerpc/loader: Install ubldr without stripping
This was committed due to what was later diagnosed as an msdosfs bug
preventing in-place strip. This bug was fixed in r352564, and we agreed to
keep the workaround in for a bit to allow the driver fix a suitable amount
of propagation time for folks building/installing powerpc/ubldr, seeing as
how we were not in any hurry to revert.
Justin Hibbits [Mon, 7 Oct 2019 03:05:32 +0000 (03:05 +0000)]
loader/powerpc64: Fix HV check for CAS usage
Logic was backwards. The function returns true if it *is* running as a
hypervisor, whereas we want to only call the CAS utility if we're running as a
guest.
Randall Stewart [Sun, 6 Oct 2019 22:29:02 +0000 (22:29 +0000)]
Brad Davis identified a problem with the new LRO code, VLAN's
no longer worked. The problem was that the defines used the
same space as the VLAN id. This commit does three things.
1) Move the LRO used fields to the PH_per fields. This is
safe since the entire PH_per is used for IP reassembly
which LRO code will not hit.
2) Remove old unused pace fields that are not used in mbuf.h
3) The VLAN processing is not in the mbuf queueing code. Consequently
if a VLAN submits to Rack or BBR we need to bypass the mbuf queueing
for now until rack_bbr_common is updated to handle the VLAN properly.
Mateusz Guzik [Sun, 6 Oct 2019 22:14:32 +0000 (22:14 +0000)]
vfs: add optional root vnode caching
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.
Reviewed by: jeff (previous version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21646
Toomas Soome [Sun, 6 Oct 2019 18:38:58 +0000 (18:38 +0000)]
loader.efi: for text mode, use STM to scroll the whole screen
Since local UEFI console is implemented on top of framebuffer,
we need to avoid redrawing the whole screen ourselves, but let
Simple Text Mode to do the scroll for us.
Michael Tuexen [Sun, 6 Oct 2019 08:47:10 +0000 (08:47 +0000)]
Plumb an mbuf leak in a code path that should not be taken. Also avoid
that this path is taken by setting the tail pointer correctly.
There is still bug related to handling unordered unfragmented messages
which were delayed in deferred handling.
This issue was found by OSS-Fuzz testing the usrsctp stack and reported in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17794
Kyle Evans [Sun, 6 Oct 2019 04:19:49 +0000 (04:19 +0000)]
Re-add ALLOW_MIPS_SHARED_TEXTREL, sprinkle it around
Diff partially stolen from CheriBSD; these bits need -Wl,-z,notext in order
to build in an LLVM world. They are needed for all flavors/sizes of MIPS.
This will eventually get fixed in LLVM, but it's unclear when.
Yuri Pankov [Sat, 5 Oct 2019 22:17:54 +0000 (22:17 +0000)]
Mark "private use area" characters as printable.
At least some of the characters in E000-F8FF range are used by Powerline
fonts, and having no attributes for these ranges in UnicodeData.txt
other than "Other, Private Use" it should be safe to mark all of them as
printable. Some actually were before r340491, so this fixes the
regression introduced there as well.
Kyle Evans [Sat, 5 Oct 2019 21:52:06 +0000 (21:52 +0000)]
Remove the remnants of SI_CHEAPCLONE
SI_CHEAPCLONE was introduced in r66067 for use with cloned bpfs. It was
later also used in tty, tun, tap at points. The rough timeline for being
removed in each of these is as follows:
- r181690: bpf switched to use cdevpriv API by ed@
- r181905: ed@ rewrote the TTY later to be mpsafe
- r204464: kib@ removes it from tun/tap, declaring it unused
I've not yet been able to dig up any other consumers in the intervening 9
years. It is no longer set on any devices in the tree and leaves an
interesting situation in make_dev_sv where we're ok with the device already
being set SI_NAMED.
Kyle Evans [Sat, 5 Oct 2019 21:44:18 +0000 (21:44 +0000)]
kern_conf: fully initialize cloned devices with make_dev_args, too
Attempting to initialize si_drv{1,2} with mda_si_drv{1,2} does not work if
you are operating on cloned devices.
clone_create must be called prior to the make_dev* family to create/return
the device on the clonelist as needed. This device is later returned early
in newdev(), prior to si_drv{0,1,2} initialization.
This patch simply breaks out of the loop if we've found a device and
finishes init.
Michael Tuexen [Sat, 5 Oct 2019 13:28:01 +0000 (13:28 +0000)]
Fix a use after free bug when removing remote addresses.
This bug was found by OSS-Fuzz and reported in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=18004
Michael Tuexen [Sat, 5 Oct 2019 12:34:50 +0000 (12:34 +0000)]
Plumb an mbuf leak found by Mark Wodrich from Google by fuzz testing the
userland stack and reporting it in:
https://github.com/sctplab/usrsctp/issues/396
Michael Tuexen [Sat, 5 Oct 2019 09:46:11 +0000 (09:46 +0000)]
Fix the adding of padding to COOKIE-ECHO chunks.
Thanks to Mark Wodrich who found this issue while fuzz testing the
usrsctp stack and reported the issue in
https://github.com/sctplab/usrsctp/issues/382
Alan Somers [Sat, 5 Oct 2019 03:19:53 +0000 (03:19 +0000)]
ZFS: fix several of the "zpool create" tests
* Remove zpool_create_013_neg. FreeBSD doesn't have an equivalent of
Solaris's metadevices. GEOM would be the equivalent, but since all geoms
are the same from ZFS's perspective, this test would be redundant with
zpool_create_012_neg
* Remove zpool_create_014_neg. FreeBSD does not support swapping to regular
files.
* Remove zpool_create_016_pos. This test is redundant with literally every
other test that creates a disk-backed pool.
* s:/etc/vfstab:/etc/fstab in zpool_create_011_neg
* Delete the VTOC-related portion of zpool_create_008_pos. FreeBSD doesn't
use VTOC.
* Replace dumpadm with dumpon and swap with swapon in multiple tests.
* In zpool_create_015_neg, don't require "zpool create -n" to fail. It's
reasonable for that variant to succeed, because it doesn't actually open
the zvol.
* Greatly simplify zpool_create_012_neg. Make it safer, too, but not
interfering with the system's regular swap devices.
* Expect zpool_create_011_neg to fail (PR 241070)
* Delete some redundant cleanup steps in various tests
* Remove some unneeeded ATF timeout specifications. The default is fine.
Eric van Gyzen [Fri, 4 Oct 2019 21:39:11 +0000 (21:39 +0000)]
Add CTLFLAG_STATS to all COUNTER_U64* sysctl OIDs
CTLFLAG_STATS identifies a sysctl OID as statistical or informational,
as opposed to a configurable/tunable OID that changes behavior.
This can be used, for example, to verfiy that the kyua tests do not
modify configurable OIDs when allow_sysctl_side_effects is true.
Add CTLFLAG_STATS to all COUNTER_U64* OIDs.
I will add the flag to more OIDs in a few subsequent commits, to
facilitate MFC. The flag should be added to many more OIDs. I plan to
add it those that my test found and some nearby that looked obvious.
Conrad Meyer [Fri, 4 Oct 2019 18:38:47 +0000 (18:38 +0000)]
nvdimm(4): Add nvdimm_e820 pseudo-bus
nvdimm_e820 is a newbus pseudo driver that looks for "legacy" e820 PRAM
spans and creates ordinary-looking SPA devfs nodes for them
(/dev/nvdimm_spaN).
As these legacy regions lack real NFIT SPA regions and namespace
definitions, they must be administratively sliced up externally using
device.hints. This is similar in purpose to the Linux memmap= mechanism.
It is assumed that systems with working NFIT tables will not have any use
for this driver, and that that will be the prevailing style going forward,
so if there are no explicit hints provided, this driver does not
automatically create any devices.
The registers in ilumos and FreeBSD have a different number.
In the illumos, last 32-bits register defined is SS an in FreeBSD is GS.
While translating register we should comper it to the highest one.
Kyle Evans [Fri, 4 Oct 2019 13:43:07 +0000 (13:43 +0000)]
tuntap(4): loosen up tunclose restrictions
Realistically, this cannot work. We don't allow the tun to be opened twice,
so it must be done via fd passing, fork, dup, some mechanism like these.
Applications demonstrably do not enforce strict ordering when they're
handing off tun devices, so the parent closing before the child will easily
leave the tun/tap device in a bad state where it can't be destroyed and a
confused user because they did nothing wrong.
Concede that we can't leave the tun/tap device in this kind of state because
of software not playing the TUNSIFPID game, but it is still good to find and
fix this kind of thing to keep ifconfig(8) up-to-date and help ensure good
discipline in tun handling.
Kirk McKusick [Fri, 4 Oct 2019 05:28:36 +0000 (05:28 +0000)]
Update ffs_getcg() function to accept a flags parameter to be passed
to breadn_flags() in preparation for later need when doing forcible
unmount when disk dies or is removed.
Certs can be easily examined after installation with `certctl list`, and
certctl blacklist will accept the hashed filename as output by list or as
seen in /etc/ssl/certs
No objection from: secteam
Relnotes: Definite maybe
Kyle Evans [Thu, 3 Oct 2019 20:45:52 +0000 (20:45 +0000)]
certctl(8): let one blacklist based on hashed filenames
It seems reasonable to allow, for instance:
$ certctl list
# reviews output -- ah, yeah, I don't trust that one
$ certctl blacklist ce5e74ef.0
$ certctl rehash
We can unambiguously determine what cert "ce5e74ef.0" refers to, and we've
described it to them in `certctl list` output -- I see little sense in
forcing another level of filesystem inspection to determien what cert file
this physically corresponds to.
Michael Tuexen [Thu, 3 Oct 2019 20:39:17 +0000 (20:39 +0000)]
Cleanup sctp_asconf_error_response() and ensure that the parameter
is padded as required. This fixes the followig bug reported by
OSS-Fuzz for the usersctp stack:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17790
During readdir() we guarantee that the tn_dir.tn_parent does not go
away, but it might be replaced by a parallel rename. Read tn_parent
only once, then use the cached value.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Michael Tuexen [Thu, 3 Oct 2019 18:36:54 +0000 (18:36 +0000)]
Add missing input validation. This could result in reading from
uninitialized memory.
The issue was found by OSS-Fuzz for usrsctp and reported in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17780
John Baldwin [Thu, 3 Oct 2019 18:24:41 +0000 (18:24 +0000)]
Restore description of packets dropped due to full reassembly queue.
r265408 renamed tcps_rcvmemdrop to tcps_rcvreassfull and gave it a more
specific description. r279122 (libxo-ification) reverted that change.
This commit brings it back, but with a small tweak to the description.
Kyle Evans [Thu, 3 Oct 2019 17:54:00 +0000 (17:54 +0000)]
if_tuntap: create /dev aliases when a tuntap device gets renamed
Currently, if you do:
$ ifconfig tun0 create
$ ifconfig tun0 name wg0
$ ls -l /dev | egrep 'wg|tun'
You will see tun0, but no wg0. In fact, it's slightly more annoying to make
the association between the new name and the old name in order to open the
device (if it hadn't been opened during the rename).
Register an eventhandler for ifnet_arrival_events and catch interface
renames. We can determine if the ifnet is a tun easily enough from the
if_dname, which matches the cevsw.d_name from the associated tuntap_driver.
Some locking dance is required because renames don't require the device to
be opened, so it could go away in the middle of handling the ioctl, but as
soon as we've verified this isn't the case we can attempt to busy the tun
and either bail out if the tun device is dying, or we can proceed with the
rename.
We only create these aliases on a best-effort basis. Renaming a tun device
to "usbctl", which doesn't exist as an ifnet but does as a /dev, is clearly
not that disastrous, but we can't and won't create a /dev for that.
Kyle Evans [Thu, 3 Oct 2019 17:46:27 +0000 (17:46 +0000)]
if_tuntap: add a busy/unbusy mechanism, replace destroy OPEN check
A future commit will create device aliases when a tuntap device is renamed
so that it's still easily found in /dev after the rename. Said mechanism
will want to keep the tun alive long enough to either realize that it's
about to go away or complete the alias creation, even if the alias is about
to get destroyed.
While we're introducing it, using it to prevent open devices from going away
makes plenty of sense and keeps the logic on waking up tun_destroy clean, so
we don't have multiple places trying to cv_broadcast unless it's still in
use elsewhere.
Andriy Gapon [Thu, 3 Oct 2019 11:23:10 +0000 (11:23 +0000)]
add ability to set watchdog timeout for a shutdown
This change allows to specify a watchdog(9) timeout for a system
shutdown. The timeout is activated when the watchdogd daemon is
stopped. The idea is to a prevent any indefinite hang during late
stages of the shutdown. The feature is implemented in rc.d/watchdogd,
it builds upon watchdogd -x option.
Note that the shutdown timeout is not actiavted when the watchdogd
service is individually stopped by an operator. It is also not
activated for the 'shutdown' to the single-user mode. In those cases it
is assumed that the operator knows what they are doing and they have
means to recover the system should it hang.
Significant subchanges and implementation details:
- the argument to rc.shutdown, completely unused before, is assigned to
rc_shutdown variable that can be inspected by rc scripts
- init(8) passes "single" or "reboot" as the argument, this is not
changed
- the argument is not mandatory and if it is not set then rc_shutdown is
set to "unspecified"
- however, the default jail management scripts and jail configuration
examples have been updated to pass "jail" to rc.shutdown, just in case
- the new timeout can be set via watchdogd_shutdown_timeout rc option
- for consistency, the regular timeout can now be set via
watchdogd_timeout rc option
- watchdogd_shutdown_timeout and watchdogd_timeout override timeout
specifications in watchdogd_flags
- existing configurations, where the new rc options are not set, should
keep working as before
I am not particularly wed to any of the implementation specifics.
I am open to changing or removing any of them as long as the provided
functionality is the same (or very close) to the proposed one.
For example, I think it can be implemented without using watchdogd -x,
by means of watchdog(1) alone. In that case there would be a small
window between stopping watchdogd and running watchdog, but I think that
that is acceptable.
Andriy Gapon [Thu, 3 Oct 2019 11:08:45 +0000 (11:08 +0000)]
ZFS: add bookmark renaming
The feature is implemented as an extension of the existing
ZFS_IOC_RENAME ioctl. Both the userland and the DSL interfaces support
renaming only a single bookmark at a time. As of now, there is no ZCP
interface to the new functionality. I am going to add it once the DSL
interface passes a test of time.
This change picks up support for zfs_ioc_namecheck_t::ENTITY_NAME that
was added to ZoL as part of Redacted Send/Receive feature by Paul
Dagnelie <pcd@delphix.com>. This is needed to allow a bookmark name in
zc_name.
Gleb Smirnoff [Thu, 3 Oct 2019 02:32:55 +0000 (02:32 +0000)]
- Remove the compile time limit for number of links a ng_bridge node
can handle. Instead using an array on node private data, use per-hook
private data.
- Use NG_NODE_FOREACH_HOOK() to traverse through hooks instead of array.
John Baldwin [Wed, 2 Oct 2019 21:49:39 +0000 (21:49 +0000)]
Fix the EMBEDFS_FORMAT helper variable for riscv64.
It was defined with the wrong MACHINE_ARCH previously. This permits
using an MFS image without defining MD_ROOT_SIZE which has various
benefits (one being that the build is able to treat the MFS image as
a dependency and properly re-link the kernel with the new image when
building with NO_CLEAN).
Colin Percival [Wed, 2 Oct 2019 21:35:39 +0000 (21:35 +0000)]
Switch EC2 AMIs from using the dual-dhclient script to using the new
dual-dhclient-daemon daemon. This makes it possible to stop/restart
the dhclients.
Ed Maste [Wed, 2 Oct 2019 21:01:23 +0000 (21:01 +0000)]
simplify path handling in sysctl_try_reclaim_vnode
MAXPATHLEN / PATH_MAX includes space for the terminating NUL, and namei
verifies the presence of the NUL. Thus there is no need to increase the
buffer size here.
The sysctl passes the string excluding the NUL, so req->newlen equal to
PATH_MAX is too long.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21876
Conrad Meyer [Wed, 2 Oct 2019 19:13:35 +0000 (19:13 +0000)]
nvdimm: Fix error path mis-free
Regression introduced in r343629 when malloc result was renamed from spa to
spa_mapping and the 'spa' name was instead used to iterate a table, but the
free() target was not updated.
Kyle Evans [Wed, 2 Oct 2019 17:18:18 +0000 (17:18 +0000)]
Override the TLS model when building mips64 binaries and static libraries
GCC uses "dynamic" TLS models when -fpic or -fPIC is explicitly
specified on the command line (which is only true for shared libraries).
It uses "static" (or "exec") TLS models otherwise. In particular, GCC
does _not_ use dynamic TLS models when PIC is implicitly enabled (which
it is on MIPS), only if a PIC flag is explicitly provided.
llvm uses "dynamic" TLS models if PIC is enabled either via a PIC flag
or if it is implicily enabled (as on MIPS64). This means that llvm on
MIPS64 always uses "dynamic" TLS models. However, dynamic TLS models
do not work for static binaries and libraries as the __tls_get_addr
function they invoke is only defined in rtld.
Written by: jhb
Reviewed by: arichardson
Differential Revision: https://reviews.freebsd.org/D21699
Kyle Evans [Wed, 2 Oct 2019 17:15:38 +0000 (17:15 +0000)]
clang: use -mxgot for 32-bit mips
Various bits in usr.bin/clang/* will fail to compile without -mxgot due to
truncated relocations. -mxgot entails a speed penalty, but I suspect we
don't care as much about compiler performance in 32-bit mips land.
Kyle Evans [Wed, 2 Oct 2019 17:06:28 +0000 (17:06 +0000)]
Provide generic sub-word atomic *cmpset
Provide *cmpset_{8,16} as wrappers around atomic_fcmpset_32. Initial users
will be mips and sparc64, and perhaps parts of powerpc.
This are not for general consumption; machine/atomic.h should include this
header as needed to provide atomic_{,f}cmpset_{8,16} and machine/atomic.h
should provide acq_ and rel_ variants.
Mark Johnston [Wed, 2 Oct 2019 15:45:49 +0000 (15:45 +0000)]
Disallow fcntl(F_READAHEAD) when the vnode is not a regular file.
The mountpoint may not have defined an iosize parameter, so an attempt
to configure readahead on a device file can lead to a divide-by-zero
crash.
The sequential heuristic is not applied to I/O to or from device files,
and posix_fadvise(2) returns an error when v_type != VREG, so perform
the same check here.
Reported by: syzbot+e4b682208761aa5bc53a@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21864
Kyle Evans [Wed, 2 Oct 2019 15:19:39 +0000 (15:19 +0000)]
libusb: LIBUSB_DEBUG environment variable override of libusb_set_debug
The debug level generally just controls verbosity of libusb for debugging
libusb devices/usage. We allow the environment to set the debug level
independent of the application, but the application will always override
this if it explicitly sets the debug level.
Changing the environment is easy, but patching the software to change the
debug level isn't necessarily easy or possible. Further, there's this
write-only debug_fixed variable that would seem to imply that the debug
level should be fixed, but it isn't currently used. Change the logic to use
strtol() so we can detect real 0 vs. conversion failure, then honor
debug_fixed in libusb_set_debug.
Reviewed by: hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21877
Kyle Evans [Wed, 2 Oct 2019 15:13:40 +0000 (15:13 +0000)]
mips: fcmpset: do not spin on sc failure
For ll/sc architectures, atomic(9) allows failure modes where *old == val
due to write failure and callers should compensate for this. Do not retry on
failure, just leave 0 in ret and fail the operation if we couldn't sc it.
This lets the caller determine if it should retry or not.