Ed Maste [Sun, 10 May 2020 16:11:19 +0000 (16:11 +0000)]
Add pkgbase METALOG parse/check tool
`metalog.lua` is a script that reads METALOG file created by pkgbase
(make packages) and generates reports about the installed system
and issues.
This was developed as part of Yang's W2020 University of Waterloo co-
operative education term with the FreeBSD Foundation. kevans provided
some initial review; we will iterate on it in the tree.
Submitted by: Yang Wang <2333@outlook.jp>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24563
Emmanuel Vadot [Sun, 10 May 2020 14:09:30 +0000 (14:09 +0000)]
qnlx: Do not redifines types.
r360870 added linux/slab.h into liunx/bitmap.h and this include linux/types.h
The qlnx driver is redefining some of those types so remove them and add an
explicit linux/types.h include.
Pointy hat: manu
Reported by: Austin Shafer <ashafer@badland.io>
Last user of rtalloc1() KPI has been eliminated in rS360631.
As kernel is now fully switched to use new routing KPI defined in
rS359823, remove old lookup functions.
Toomas Soome [Sat, 9 May 2020 06:25:20 +0000 (06:25 +0000)]
loader: vdev_read() can corrupt memory
When reading less than sector size but from sector boundary,
the vdev_read() will read full sector into the provided buffer
and therefore corrupting memory past buffer end.
Kyle Evans [Sat, 9 May 2020 02:01:29 +0000 (02:01 +0000)]
installworld: attempt a certctl rehash at the tail end
This can be run as root or normal user with no problem; if they hadn't
twisted the WITHOUT_CAROOT knob, we'll attempt to use the host certctl to
rehash the DESTDIR. This would allow one to build systems WITHOUT_OPENSSL +
WITH_CAROOT with a populated /etc/ssl that they can then use with an
appropriate *ssl from somewhere else.
Cross-builds are fine because this will always use the host certctl, or just
nag if it's missing and it wasn't a WITHOUT_CAROOT build.
Alan Somers [Fri, 8 May 2020 23:00:02 +0000 (23:00 +0000)]
fusefs: fix two small bugs in the tests' expectations
These two errors have been present since the tests' introduction.
Coincidentally every test (I think there's only one) that cares about that
field also works when the field's value is 0.
Embed dst sockaddr into rtentry and remove rte packet counter
Currently each rtentry has dst&gateway allocated separately from another zone,
bloating cache accesses.
Current 'struct rtentry' has 12 "mandatory" radix pointers in the beginning,
leaving 4 usable pointers/32 bytes in the first 2 cache lines (amd64).
Fields needed for the datapath are destination sockaddr and rt_nhop.
So far it doesn't look like there is other routable addressing protocol other
than IPv4/IPv6/MPLS, which uses keys longer than 20 bytes.
With that in mind, embed dst into struct rtentry, making the first 24 bytes
of rtentry within 128 bytes. That is enough to make IPv6 address within first
128 bytes.
It is still pretty easy to add code for supporting separately-allocated dst,
however it doesn't make a lot of sense in having such code without a use case.
As rS359823 moved the gateway to the nexthop structure, the dst embedding change
removes the need for any additional allocations done by rt_setgate().
Lastly, as a part of cleanup, remove counter(9) allocation code, as this field
is not used in packet processing anymore.
Adrian Chadd [Fri, 8 May 2020 17:01:33 +0000 (17:01 +0000)]
[net80211] Use the unicast key when transmitting DWDS AP multicast frames.
I'm still not sure whether this is the full solution, but here goes.
I have a two node DWDS setup - a main AP with the ethernet bridge uplink
and a satellite AP in the back of the house. They're both AR9344+AR9580
dual band 11n APs.
The problem was that multicast frames was not going from the DWDS AP to
the DWDS STA. Unicast frames are fine, and multicast frames from the
DWDS STA to AP are fine.
Now, multicast and unicast frames from the STA -> AP are just transmitted
using the unicast key. That's fine. However, the AP -> STA multicast
frames by default are transmitted using the current default / multicast
key, the shared one between all STAs in a BSS. Now, the DWDS implementation
ignores non WDS frames - it only allows about 4 address frames outside
of management / EAPOL frames! - so the STA side ignores the normal multicast
frames.
Instead, the AP side uses ieee80211_dwds_mcast() to send multicast frames
to each WDS VAP that was created as part of the "dynamic" part of DWDS.
This should be queuing them individually to each node instead of using
the normal multicast send path; and this is how they should get turned into
4-addr WDS frames.
HOWEVER, ieee80211_encap() was trying to use the default TX key to queue
them rather than the unicast key that's already setup. Since this synthetic
node doesn't have the default TX key setup, transmission fails. Things
would be fine in WEP and in open mode because in both cases you would
have static keys (or no keys) setup. It just fails in WPA mode.
This resolves the issue. AP DWDS multicast is now sent using the unicast
key just like in STA mode and I'm pretty sure the STA mode side will stil
work fine (as it's a STA VAP with a DWDS flag..)
Mark Johnston [Fri, 8 May 2020 14:38:48 +0000 (14:38 +0000)]
Reinitialize thread0's stack base after enabling XSAVE.
Otherwise the initial call to set_top_of_stack(), which occurs before
fpuinit() sets the correct value for cpu_max_ext_state_size, leaves the
stack base at an incorrect location. Then, when the full area is
zeroed, we end up erroneously zeroing part of the following page.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24754
Alan Somers [Fri, 8 May 2020 02:42:15 +0000 (02:42 +0000)]
Fix the sys.geom.class.multipath.misc.fail_on_error test on stable/12
This test uses a gnop feature (delay probability) that isn't available on
stable/12. But it's unnecessary; the test works fine without it. Removing
it simplifies the test and, once MFCed, will allow it to pass on stable/12.
getMainExecutable: Fix hand-rolled AT_EXECPATH for older FreeBSD
Once we hit AT_NULL, we need to bail out of the loop; not just the
enclosing switch. This fixes basic usage (e.g. `cc --version`) when
AT_EXECPATH isn't present on older branches (e.g. under
emu-user-static, at the moment), where we would previously run off
the end of ::environ.
Randall Stewart [Thu, 7 May 2020 20:29:38 +0000 (20:29 +0000)]
When in the SYN-SENT state bbr and rack will not properly send an ACK but instead start the D-ACK timer. This
causes so_reuseport_lb_test to fail since it slows down how quickly the program runs until the timeout occurs
and fails the test
Sponsored by: Netflix inc.
Differential Revision: https://reviews.freebsd.org/D24747
Brandon Bergren [Thu, 7 May 2020 19:32:49 +0000 (19:32 +0000)]
[PowerPC] kernel ifunc support for powerpc*, fix ppc64 relocation oddities.
This is a general cleanup of the relocatable kernel support on powerpc,
needed to enable kernel ifuncs.
* Fix some relocatable issues in the kernel linker, and change to using
a RELOCATABLE_KERNEL #define instead of #ifdef __powerpc__ for parts that
other platforms can use in the future if they wish to have ET_DYN kernels.
* Get rid of the DB_STOFFS hack now that the kernel is relocated to the DMAP
properly across the board on powerpc64.
* Add powerpc64 and powerpc32 ifunc functionality.
* Allow AIM64 virtual mode OF kernels to run from the DMAP like other AIM64
by implementing a virtual mode restart. This fixes the runtime address on
PowerMac G5.
* Fix symbol relocation problems on post-relocation kernels by relocating
the symbol table.
* Add an undocumented method for supplying kernel symbols on powernv and
other powerpc machines using linux-style kernel/initrd loading -- If
you pass the kernel in as the initrd as well, the copy resident in initrd
will be used as a source for symbols when initializing the debugger.
This method is subject to removal once we have a better way of doing this.
ps: extend the non-standard option -d (tree view) to work with -p
Initially it seemed that there were multiple possible ways to do it.
Processing option -p could conditionally add selected processes and
their descendants to the list for further work, but it is not guaranteed
to know whether the -d option has been used or not, and it also doesn't
have access to the process list just yet.
There is also descendant_sort() which has access to all possibly needed
information, but serves the purely post-processing purpose of sorting
output.
Then there is the loop that uses invocation information and full process
list to create a list of processes for final display. It seems the most
natural place to implement this, but indeterminate state of the process
list and volatility of the final list that is being created obstruct
adding an elegant search for all elements of process descendancy trees.
So I opted for adding another loop, just before the one I mentioned
above. For all selected processes it conditionally adds direct
descendants to the end of this list of selected processes.
Andriy Gapon [Thu, 7 May 2020 13:11:32 +0000 (13:11 +0000)]
gpioiic_attach: fix a NULL pointer crash on hints-based systems
The attach method uses GPIO_GET_BUS() to get a "newbus" device
that provides a pin. But on hints-based systems a GPIO controller
driver might not be fully initialized yet and it does not know gpiobus
hanging off it. Thus, GPIO_GET_BUS() cannot be called yet.
The reason is that controller drivers typically create a child gpiobus
using gpiobus_attach_bus() and that leads to the following call chain:
gpiobus_attach_bus() -> gpiobus_attach() ->
bus_generic_attach(gpiobus) -> gpioiic_attach().
So, gpioiic_attach() is called before gpiobus_attach_bus() returns.
I observed this bug with nctgpio driver on amd64.
I think that the problem was introduced in r355276.
The fix is to avoid calling GPIO_GET_BUS() from the attach method.
Instead, we know that on hints-based systems only the parent gpiobus can
provide the pins.
Nothing is changed for FDT-based systems.
Marcin Wojtas [Thu, 7 May 2020 11:28:39 +0000 (11:28 +0000)]
Optimize ENA Rx refill for low memory conditions
Sometimes, especially when there is not much memory in the system left,
allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time
and it is not guaranteed that it'll succeed. In that situation, the
fallback will work, but if the refill needs to take a place for a lot of
descriptors at once, the time spent in m_getjcl looking for memory can
cause system unresponsiveness due to high priority of the Rx task. This
can also lead to driver reset, because Tx cleanup routine is being
blocked and timer service could detect that Tx packets aren't cleaned
up. The reset routine can further create another unresponsiveness - Rx
rings are being refilled there, so m_getjcl will again burn the CPU.
This was causing NVMe driver timeouts and resets, because network driver
is having higher priority.
Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are
enough - ENA MTU is being set to 9K anyway, so it's very unlikely that
more space than 9KB will be needed.
However, 9KB jumbo clusters can still cause issues, so by default the
page size mbuf cluster will be used for the Rx descriptors. This can have a
small (~2%) impact on the throughput of the device, so to restore
original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to
"1" in "/boot/loader.conf" file.
As a part of this patch (important fix), the version of the driver
was updated to v2.1.2.
Randall Stewart [Thu, 7 May 2020 10:46:02 +0000 (10:46 +0000)]
NF has an internal option that changes the tcp_mcopy_m routine slightly (has
a few extra arguments). Recently that changed to only have one arg extra so
that two ifdefs around the call are no longer needed. Lets take out the
extra ifdef and arg.
Jessica Clarke [Wed, 6 May 2020 23:31:30 +0000 (23:31 +0000)]
virtio: Support MMIO bus for all devices
The bus is independent of the device, so all devices can be attached to
either a PCI bus or an MMIO bus. For example, QEMU's virtio-rng-device
gives the MMIO variant of virtio-rng-pci, and is now detected.
Jessica Clarke [Wed, 6 May 2020 23:28:51 +0000 (23:28 +0000)]
virtio_mmio: Support non-transitional version 2 devices
The non-legacy virtio MMIO specification drops the use of PFNs and
replaces them with physical addresses. Whilst many implementations are
so-called transitional devices, also implementing the legacy
specification, TinyEMU[1] does not. Device-specific configuration
registers have also changed to being little-endian, and must be accessed
using a single aligned access for registers up to 32 bits, and two
32-bit aligned accesses for 64-bit registers.
John Baldwin [Wed, 6 May 2020 22:15:09 +0000 (22:15 +0000)]
Deprecate ubsec(4) for FreeBSD 13.0.
With the removal of in-tree consumers of DES, Triple DES, and
MD5-HMAC, the only algorithm this driver still supports is SHA1-HMAC.
This is not very useful as a standalone algorithm (IPsec AH-only with
SHA1 would be the only user).
This driver has also not been kept up to date with the original driver
in OpenBSD which supports a few more cards and AES-CBC on newer cards.
The newest card currently supported by this driver was released in
2005.
Dimitry Andric [Wed, 6 May 2020 19:10:39 +0000 (19:10 +0000)]
Merge commit 4ca2cad94 from llvm git (by Justin Hibbits):
[PowerPC] Add clang -msvr4-struct-return for 32-bit ELF
Summary:
Change the default ABI to be compatible with GCC. For 32-bit ELF
targets other than Linux, Clang now returns small structs in
registers r3/r4. This affects FreeBSD, NetBSD, OpenBSD. There is no
change for 32-bit Linux, where Clang continues to return all structs
in memory.
Add clang options -maix-struct-return (to return structs in memory)
and -msvr4-struct-return (to return structs in registers) to be
compatible with gcc. These options are only for PPC32; reject them on
PPC64 and other targets. The options are like -fpcc-struct-return and
-freg-struct-return for X86_32, and use similar code.
To actually return a struct in registers, coerce it to an integer of
the same size. LLVM may optimize the code to remove unnecessary
accesses to memory, and will return i32 in r3 or i64 in r3:r4.
Ed Maste [Wed, 6 May 2020 18:38:40 +0000 (18:38 +0000)]
binutils: disconnect objdump from the build
The in-tree binutils is old and will not be updated. It does not support
all archs supported by FreeBSD, and for the archs it does support not all
CPU features are supported.
Other tools have migrated to copyfree alternatives. Although llvm-objdump
is nearly a drop-in replacement for GNU objdump it is missing a few options
and has some differences in output format. For now just remove GNU objdump;
ports and developers can use a contemporary, maintained version from ports
or packages. We can revisit installing llvm-objdump as objdump in the
future.
PR: 212319 [exp-run]
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7338
Dimitry Andric [Wed, 6 May 2020 18:13:00 +0000 (18:13 +0000)]
In r358396 I merged llvm upstream commit 2e24219d3, which fixed "error:
unsupported relocation on symbol" when assembling arm 'adr' pseudo
instructions. However, the upstream commit did not take big-endian arm
into account.
Applying the same changes to the big-endian handling is straightforward,
thanks to Andrew Turner and Peter Smith for the hint. This will also be
submitted upstream.
MFC after: immediately, since this fix is meant for stable/11
Mark Johnston [Wed, 6 May 2020 15:01:06 +0000 (15:01 +0000)]
Simplify arm64's pmap_bootstrap() a bit.
locore constructs an L2 page mapping the kernel and preloaded data
starting a KERNBASE (the same as VM_MIN_KERNEL_ADDRESS on arm64).
initarm() and pmap_bootstrap() use the preloaded metadata to
tell it where it can start allocating from.
pmap_bootstrap() currently iterates over the L2 page to find the last
valid entry, but doesn't do anything with the result. Remove the loop
and zap some now-unused local variables.
Rick Macklem [Wed, 6 May 2020 00:44:03 +0000 (00:44 +0000)]
Delete unused function newnfs_trimleading.
The NFS function called newnfs_trimleading() has not been used by the
code in long time. To give you a clue, it still had a K&R style function
declaration.
Delete it, since it is just cruft, as a part of the NFS mbuf handling
cleanup in preparation for adding ext_pgs mbuf support.
The ext_pgs mbuf support for the build/send side is needed by
nfs-over-tls.
Conrad Meyer [Tue, 5 May 2020 17:55:45 +0000 (17:55 +0000)]
pwcache.3: Explicitly document OOM condition
The pwcache functions allocate memory, and may return NULL pointers if that
allocation fails and the corresponding uid or gid was not found in the local
password database. Document this behavior.
Michael Tuexen [Tue, 5 May 2020 17:52:44 +0000 (17:52 +0000)]
Fix the computation of the numbers of entries of the mapping array to
look at when generating a SACK. This was wrong in case of sequence
numbers wrap arounds.
Thanks to Gwenael FOURRE for reporting the issue for the userland stack:
https://github.com/sctplab/usrsctp/issues/462
MFC after: 3 days
Andriy Gapon [Tue, 5 May 2020 12:14:11 +0000 (12:14 +0000)]
acpi_video: try our best to work on systems without non-essential methods
Only _BCL and _BCM methods seem to be essential to the driver's
operation. If _BQC is missing then we can assume that the current
brightness is whatever we set by the last _BCM invocation. If _DCS or
_DGS is missing the we can make assumptions as well.
The change is based on a patch suggested by Anthony Jenkins
<Scoobi_doo@yahoo.com> in PR 207086.
PR: 207086
Submitted by: Anthony Jenkins <Scoobi_doo@yahoo.com (earlier version)
Reviewed by: manu
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D24653
Rick Macklem [Tue, 5 May 2020 00:58:03 +0000 (00:58 +0000)]
Revert r360514, to avoid unnecessary churn of the sources.
r360514 prepared the NFS code for changes to handle ext_pgs mbufs on
the receive side. However, at this time, KERN_TLS does not pass
ext_pgs mbufs up through soreceive(). As such, as this time, only
the send/build side of the NFS mbuf code needs to handle ext_pgs mbufs.
Revert r360514 since the rather extensive changes required for receive
side ext_pgs mbufs are not yet needed.
This avoids unnecessary churn of the sources.
John Baldwin [Tue, 5 May 2020 00:02:04 +0000 (00:02 +0000)]
Initial support for bhyve save and restore.
Save and restore (also known as suspend and resume) permits a snapshot
to be taken of a guest's state that can later be resumed. In the
current implementation, bhyve(8) creates a UNIX domain socket that is
used by bhyvectl(8) to send a request to save a snapshot (and
optionally exit after the snapshot has been taken). A snapshot
currently consists of two files: the first holds a copy of guest RAM,
and the second file holds other guest state such as vCPU register
values and device model state.
To resume a guest, bhyve(8) must be started with a matching pair of
command line arguments to instantiate the same set of device models as
well as a pointer to the saved snapshot.
While the current implementation is useful for several uses cases, it
has a few limitations. The file format for saving the guest state is
tied to the ABI of internal bhyve structures and is not
self-describing (in that it does not communicate the set of device
models present in the system). In addition, the state saved for some
device models closely matches the internal data structures which might
prove a challenge for compatibility of snapshot files across a range
of bhyve versions. The file format also does not currently support
versioning of individual chunks of state. As a result, the current
file format is not a fixed binary format and future revisions to save
and restore will break binary compatiblity of snapshot files. The
goal is to move to a more flexible format that adds versioning,
etc. and at that point to commit to providing a reasonable level of
compatibility. As a result, the current implementation is not enabled
by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option
for userland builds, and the kernel option BHYVE_SHAPSHOT.
Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai
Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz
Relnotes: yes
Sponsored by: University Politehnica of Bucharest
Sponsored by: Matthew Grooms (student scholarships)
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D19495
Randall Stewart [Mon, 4 May 2020 23:02:58 +0000 (23:02 +0000)]
This fixes two issues found by ankitraheja09@gmail.com
1) When BBR retransmits the syn it was messing up the snd_max
2) When we need to send a RST we might not send it when we should
Michael Tuexen [Mon, 4 May 2020 22:02:49 +0000 (22:02 +0000)]
Enter the net epoch before calling the output routine in TCP BBR.
This was only triggered when setting the IPPROTO_TCP level socket
option TCP_DELACK.
This issue was found by runnning an instance of SYZKALLER.
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D24690
Randall Stewart [Mon, 4 May 2020 20:28:53 +0000 (20:28 +0000)]
This commit brings things into sync with the advancements that
have been made in rack and adds a few fixes in BBR. This also
removes any possibility of incorrectly doing OOB data the stacks
do not support it. Should fix the skyzaller crashes seen in the
past. Still to fix is the BBR issue just reported this weekend
with the SYN and on sending a RST. Note that this version of
rack can now do pacing as well.
Randall Stewart [Mon, 4 May 2020 20:19:57 +0000 (20:19 +0000)]
Adjust the fb to have a way to ask the underlying stack
if it can support the PRUS option (OOB). And then have
the new function call that to validate and give the
correct error response if needed to the user (rack
and bbr do not support obsoleted OOB data).
Sponsoered by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D24574
Brooks Davis [Mon, 4 May 2020 17:16:30 +0000 (17:16 +0000)]
Set LG_VADDR to 48 on RISC-V.
The Sv48 PTE format is the largest currently defined address space for
RISC-V. It makes no sense to define a larger size and doing so (at
least for 64-bits) forces rtrees down a slow path.
After converting routing subsystem customers to use nexthop objects
defined in r359823, some fields in struct rtentry became unused.
This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry
along with the code initializing and updating these fields.
Cleanup of the remaining fields will be addressed by D24669.
This commit also changes the implementation of the RTM_CHANGE handling.
Old implementation tried to perform the whole operation under radix WLOCK,
resulting in slow performance and hacks like using RTF_RNH_LOCKED flag.
New implementation looks up the route nexthop under radix RLOCK, creates new
nexthop and tries to update rte nhop pointer. Only last part is done under
WLOCK.
In the hypothetical scenarious where multiple rtsock clients
repeatedly issue RTM_CHANGE requests for the same route, route may get
updated between read and update operation. This is addressed by retrying
the operation multiple (3) times before returning failure back to the
caller.
[evdev] Add AT translated set1 scancodes for F-unlocked F1-12 keys.
"F lock" is a switch between two sets of scancodes for function keys F1-F12
found on some Logitech and Microsoft PS/2 keyboards [1]. When "F lock" is
pressed, then F1-F12 act as function keys and produce usual keyscans for
these keys. When "F lock" is depressed, F1-F12 produced the same keyscans
but prefixed with E0.
Some laptops use [2] E0-prefixed F1-F12 scancodes for non-standard keys.
Alexander Motin [Sun, 3 May 2020 16:14:55 +0000 (16:14 +0000)]
Add session locking in cfiscsi_ioctl_handoff().
While there, remove ifdef around cs_target check in cfiscsi_ioctl_list().
I am not sure why this ifdef was added, but without this check code will
crash below on NULL dereference.