Brandon Bergren [Fri, 21 Aug 2020 03:31:01 +0000 (03:31 +0000)]
[PowerPC] Fix translation-related crashes during startup
After spending a lot of time trying to track down what was going on, I have
isolated the "black screen" failures when using boot1 to boot a G4 machine.
It turns out we were replacing the traps before installing the temporary
BAT entry for the bottom of physical memory. That meant that until the MMU
was bootstrapped, the cached translations were the only thing keeping us
from losing.
Throwing boot1 into the mix was affecting execution flow enough to cause us
to hit an uncached page and crash.
Fix this by properly setting up the initial BAT entry at the same time we
are replacing the OpenFirmware traps, so we can continue executing in
segment 0 until the rest of the DMAP has been set up.
A second thing discovered while researching this is that we were entering a
BAT region for segment 16. It turns out this range was a) considered part
of KVA, and b) has firmware mappings with varying attributes.
If we ever accessed an unmapped page in segment 16, it would cause a BAT
entry to be installed for the whole segment, which would bypass the
existing mappings until it was flushed out again.
Instead, translate the OFW memory attributes into VM memory attributes and
install the ranges into the kernel address space properly.
Gleb Smirnoff [Thu, 20 Aug 2020 20:31:47 +0000 (20:31 +0000)]
When we have a command returned by zfs_nextboot() that is longer
than command in the loader.conf, the latter needs to be nul terminated,
otherwise garbage trailer left from zfs_nextboot() will be passed to
parse_cmd() together with loader.conf command.
While here, reset cmd to empty string if read() returns error.
Mark Johnston [Thu, 20 Aug 2020 19:28:19 +0000 (19:28 +0000)]
Enable creation of static userspace probes in incremental builds.
To define USDT probes, dtrace -G makes use of relocations for undefined
symbols: the target address is overwritten with NOPs and the location is
recorded in the DOF section of the output object file. To avoid link
errors, the original relocation is destroyed. However, this means that
the same input object file cannot be processed multiple times, as
happens during incremental rebuilds. Instead, only set the relocation
type to NONE, so that all information required to reconstruct USDT
probes is preserved.
Reported by: bdrewery
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Mark Johnston [Thu, 20 Aug 2020 19:27:49 +0000 (19:27 +0000)]
Remove non-FreeBSD ifdefs from dt_link.c.
This file is too complicated as it is and has diverged a fair bit from
illumos due to toolchain differences, so just drop unused code
(including SPARC support).
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Dimitry Andric [Thu, 20 Aug 2020 18:50:46 +0000 (18:50 +0000)]
Bump kldxref's MAXSEGS to 16, to stop complaints about the kernel
supposedly having too many segments, when lld 11 links it. Such kernels
should load just fine.
Note that we may still do some tweaking of our kernel linker scripts, to
lower the number of segments, although the exact benefit is not entirely
clear.
The AMD's Ryzen 3 3200g XHCI controllers apparently need the evaluate
control endpoint context command, but we don't need to issue this
command when the bMaxPacketSize is received after the read of the USB
device descriptor, because this part should be handled automatically.
Warner Losh [Thu, 20 Aug 2020 17:35:47 +0000 (17:35 +0000)]
Remove the long obsolete ufm driver.
It was a driver for a USB FM tuner that was available in the market in 2002. I
wrote the driver in 2003. I've not used it since 2005 or so, so it's time to
retire this driver. No userland code ever interfaced to the special device it
created. There's no user base: the last bug I received on this driver was in
2004.
Warner Losh [Thu, 20 Aug 2020 16:52:48 +0000 (16:52 +0000)]
Use names suggested by kib@ in review D25969, move call for unmount to not call
with vnode locked, use NOWAIT alloc and only report when we don't overflow.
These changes were accidentally omitted from r364402, except for the not
reporting on overflow. They were lumped in with a debugging commit in my tree
that I omitted w/o realizing this.
Other issues from the review are pending some other changes I need to do first.
Emmanuel Vadot [Thu, 20 Aug 2020 12:50:49 +0000 (12:50 +0000)]
libsa: smbios: Parse the chassis type and export it as smbios.chassis.type
It can be useful to know what type of machine we are running on for desktop
related thing.
It also allow us to support all the DMI variable that linux driver can fetch.
MFC after: 1 week
Sponsored by: Sponsored-by: The FreeBSD Foundation
Mateusz Guzik [Thu, 20 Aug 2020 10:06:50 +0000 (10:06 +0000)]
cache: don't use cache_purge_negative when renaming
It avoidably scans (and evicts) unrelated entries. Instead take
advantage of passed componentname and perform a hash lookup
for the exact one.
Sample data from buildworld probed on cache_purge_negative extended
to count both scanned and evicted entries on each call are below.
At most it has to evict 1.
Pedro F. Giffuni [Thu, 20 Aug 2020 05:08:49 +0000 (05:08 +0000)]
extfs: remove redundant little endian conversion.
The XTIME_TO_NSEC macro already calls the htole32(), so there is no need
to call it twice. This code does nothing on LE platforms and affects only
nanosecond and birthtime fields so it's difficult to notice on regular use.
Rick Macklem [Thu, 20 Aug 2020 03:53:18 +0000 (03:53 +0000)]
Add MSG_TLSAPPDATA to lib/libsysdecode/mktables.
I have no idea what this does (and until now that it even existed), but
apparently it needs this entry changed for the MSG_TLSAPPDATA, since
it is kernel only.
Alan Somers [Thu, 20 Aug 2020 01:31:21 +0000 (01:31 +0000)]
zfs: fix EIO accessing dataset after resuming interrupted receive
ZFS unmounts a dataset while receiving into it and remounts it afterwards.
But if ZFS is resuming an incomplete receive, it screws up and ends up with
a dataset that is mounted, but returns EIO for every access. This commit
fixes that condition.
While the vulnerable code also exists in OpenZFS, the problem is not
reproducible there. Apparently OpenZFS doesn't unmount the destination
dataset during receive, like FreeBSD does.
Mark Johnston [Thu, 20 Aug 2020 00:52:53 +0000 (00:52 +0000)]
Use pmap_mapbios() to map ACPI tables on amd64 and i386.
The ACPI table-mapping code used pmap_kenter_temporary() to create
mappings, which in turn uses the fixed-size crashdump map. Moreover,
the code was not verifying that the table fits in this map, so when
mapping large tables we could clobber adjacent mappings. This use of
pmap_kenter_temporary() appears to predate support in pmap_mapbios() for
creating early mappings, but that restriction no longer applies.
PR: 248746
Reviewed by: kib, mav
Tested by: gallatin, Curtis Villamizar <curtis@ipv6.occnc.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26125
Rick Macklem [Wed, 19 Aug 2020 23:42:33 +0000 (23:42 +0000)]
Add the MSG_TLSAPPDATA flag to indicate "return ENXIO" for non-application TLS
data records.
The kernel RPC cannot process non-application data records when
using TLS. It must to an upcall to a userspace daemon that will
call SSL_read() to process them.
This patch adds a new flag called MSG_TLSAPPDATA that the kernel
RPC can use to tell sorecieve() to return ENXIO instead of a non-application
data record, when that is what is at the top of the receive queue.
I put the code in #ifdef KERN_TLS/#endif, although it will build without
that, so that it is recognized as only useful when KERN_TLS is enabled.
The alternative to doing this is to have the kernel RPC re-queue the
non-application data message after receiving it, but that seems more
complicated and might introduce message ordering issues when there
are multiple non-application data records one after another.
I do not know what, if any, changes will be required to support TLS1.3.
Andrew Gallatin [Wed, 19 Aug 2020 17:59:06 +0000 (17:59 +0000)]
TCP: remove special treatment for hardware (ifnet) TLS
Remove most special treatment for ifnet TLS in the TCP stack, except
for code to avoid mixing handshakes and bulk data.
This code made heroic efforts to send down entire TLS records to
NICs. It was added to improve the PCIe bus efficiency of older TLS
offload NICs which did not keep state per-session, and so would need
to re-DMA the first part(s) of a TLS record if a TLS record was sent
in multiple TCP packets or TSOs. Newer TLS offload NICs do not need
this feature.
At Netflix, we've run extensive QoE tests which show that this feature
reduces client quality metrics, presumably because the effort to send
TLS records atomically causes the server to both wait too long to send
data (leading to buffers running dry), and to send too much data at
once (leading to packet loss).
Warner Losh [Wed, 19 Aug 2020 17:10:09 +0000 (17:10 +0000)]
Document the VFS FS events
MOUNT notifies when a filesystem is mounted
REMOUNT notifies when a filesystem is mounted again
UNMOUNT notifies when a filesystem is unmounted
These events are asynchronous to the actual state of the event (though the data
is recorded at a time when it is stable). The mount event is reported after the
filesystem is mounted. However, in the interim it may be unmounted by another
agent. Likewise, umount is called just before the mountpoint is finished tearing
down. It may be remounted (or maybe if the process scheduling is wonky and devd
gets to run before the last few steps are complete).
Dimitry Andric [Wed, 19 Aug 2020 17:05:30 +0000 (17:05 +0000)]
Fix the mips64 world build after r364284.
Linking the full version of clang 11 results in errors similar to:
lld: error: /usr/src/contrib/llvm-project/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp:736:(.text._ZN5clang4ento22CreateAnalysisConsumerERNS_16CompilerInstanceE+0xE0): relocation R_MIPS_CALL16 out of range: 48920 is not in [-32768, 32767]; references operator new(unsigned long)
Add -mxgot to the compilation flags for llvm libraries to work around
this error. This may be too big of a hammer, but it can always be
refined later.
Alexander Motin [Wed, 19 Aug 2020 16:09:36 +0000 (16:09 +0000)]
Remove some noisy ACPI tables messages from verbose dmesg.
Those messages were printed hundreds of times during boot, often multiple
times for each table. We already print information about the tables in
more organized form once to not duplicate it when random ACPI drivers are
attaching.
Avoid evaluating the XHCI control endpoint context.
The XHCI specification says that the XHCI controller should detect
reception of the USB device descriptors, and automatically update
the max packet size in the control endpoint context.
Warner Losh [Wed, 19 Aug 2020 02:18:11 +0000 (02:18 +0000)]
Three typos:
Amiga is a proper noun
Condition is traditionally spelled starting with 'c'
Some, but not all, of the over/under-voltage instances were hyphenated.
Since they are all adverb phrases, they all need to be hyphenated.
Robert Wing [Wed, 19 Aug 2020 00:09:39 +0000 (00:09 +0000)]
bectl(8): Fix output of `bectl list` for the 'Mountpoint' column.
Currently, the output of `bectl list` doesn't align the 'Mountpoint' column
correctly when the 'mounted' property of a boot environment dataset is longer
than the default column width.
Set the 'Mountpoint' column width to the boot environment dataset with the
longest 'mounted' property or to the default width, whichever is greater.
Mateusz Guzik [Tue, 18 Aug 2020 22:04:22 +0000 (22:04 +0000)]
linux: add sysctl compat.linux.use_emul_path
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.
It defaults to on (== current behavior).
make -C /root/linux-5.3-rc8 -s -j 1 bzImage:
use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
Conrad Meyer [Tue, 18 Aug 2020 20:59:10 +0000 (20:59 +0000)]
gdb(4): Support empty qSupported queries
Technically a client may send a qSupported query without specifying any
client features. We should respond with our supported list in that case
instead of bailing with error.
Mariusz Zaborski [Tue, 18 Aug 2020 19:48:04 +0000 (19:48 +0000)]
zfs: add an option to the bootloader to rewind the ZFS checkpoint
The checkpoints are another way of keeping the state of ZFS.
During the rewind, the pool has to be exported.
This makes checkpoints unusable when using ZFS as root.
Add the option to rewind the ZFS checkpoint at the boot time.
If checkpoint exists, a new option for rewinding a checkpoint will appear in
the bootloader menu.
We fully support boot environments.
If the rewind option is selected, the boot loader will show a list of
boot environments that existed before the checkpoint.
Since cubic calculates cwnd based on absolute
time, retaining RFC3465 (ABC) once-per-window updates
can lead to dramatic changes of cwnd in the convex
region. Updating cwnd for each incoming ack minimizes
this delta, preventing unintentional line-rate bursts.
Mark Johnston [Tue, 18 Aug 2020 14:17:14 +0000 (14:17 +0000)]
Fix handling of ancillary data on non-AF_UNIX Linux sockets.
After r340674, the "continue" would restart the loop without having
updated clen, resulting in an infinite loop. Restore the old behaviour
of simply ignoring all control messages on such sockets, since we
currently only implement handling for AF_UNIX-specific messages.
Reported by: syzkaller
Reviewed by: tijl
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26093
Andriy Gapon [Tue, 18 Aug 2020 12:14:01 +0000 (12:14 +0000)]
iicmux: fix a sign error in comparison
Because pcell_t is unsigned both sides of the comparison were treated as
such.
Because of that error iicmux created all possible sub-buses even if only
a subset was defined in the device tree.
Code was checking for NETMAP_{SW,HW}_RING in req->nr_ringid which
had already been masked by NETMAP_RING_MASK. Therefore, the comparisons
always failed and set NR_REG_ALL_NIC. Check against the original nmr
structure.
Peter Grehan [Tue, 18 Aug 2020 07:08:17 +0000 (07:08 +0000)]
Allow guest device MMIO access from bootmem memory segments.
Recent versions of UEFI have moved local APIC timer initialization into
the early SEC phase which runs out of ROM, prior to self-relocating
into RAM. This results in a hypervisor exit.
Currently bhyve prevents instruction emulation from segments that aren't
marked as "sysmem" aka guest RAM, with the vm_gpa_hold() routine failing.
However, there is no reason for this restriction: the hypervisor already
controls whether EPT mappings are marked as executable.
Warner Losh [Tue, 18 Aug 2020 06:18:18 +0000 (06:18 +0000)]
Document that PC Card will likely be removed before 13.
This was discussed in arch@ a while ago. Most of the 16-bit drivers that it
relied on have been removed. There's only a few other drivers remaining that
support it, and those are very rare the days (even the once ubiquitious wi(1)
is now quite rare).
Indvidual drivers will be handled separately before pccard itself is removed.
Warner Losh [Tue, 18 Aug 2020 06:07:34 +0000 (06:07 +0000)]
Modernize a bit.
Remove PC Card specific information. It's of little value these days and on
the way out after most of its drivers have been removed.
Use iwn instead of wi device.
John Baldwin [Mon, 17 Aug 2020 20:11:43 +0000 (20:11 +0000)]
Add a USE_GCC_TOOLCHAINS knob to make universe.
This uses GCC toolchains instead of LLVM on architectures supported by
GCC. Currently this uses GCC 6 on aarch64, amd64, i386, and mips.
Although this does also try to use GCC 6 on powerpc, it is always
skipped for now since a powerpc-gcc6 package is not available and the
structure of make universe makes it hard to skip a subset of arches
for a target. This should be short-lived as freebsd-gcc9 does include
a powerpc-gcc9 package so powerpc should work once this switches to
GCC 9.