Rick Macklem [Mon, 15 Feb 2021 02:16:58 +0000 (18:16 -0800)]
getdirentries.2: fix for NFS mounts
It was reported that getdirentries(2) was
returning dirents with d_off set to 0 for an NFS
mount.
This is believed to be correct behaviour at
this time (it may change for some NFS mounts
in the future), but is inconsistent with what the
getdirentries(2) man page says.
Toomas Soome [Sun, 21 Feb 2021 10:32:18 +0000 (12:32 +0200)]
loader: autoload_font will hung loader when there is no local console
If we start with console set to comconsole, the local
console (vidconsole, efi) is never initialized and attempt to
use the data can render the loader hung.
Mark Johnston [Mon, 22 Feb 2021 15:03:37 +0000 (10:03 -0500)]
m_uiotombuf_nomap(): Stop clearing PG_ZERO in newly allocated pages
The caller should not be passing M_ZERO in the first place, so PG_ZERO
will not be preserved by the page allocator and clearing it accomplishes
nothing.
Reviewed by: gallatin, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28808
Mark Johnston [Thu, 25 Feb 2021 23:49:47 +0000 (18:49 -0500)]
pmap: Fix largemap restart checks in the kernel_maps sysctl handler
The purpose of these checks is to ensure that the address of the
next-level page table page is valid, since nothing is synchronizing with
a concurrent update of the large map and large map PTPs are freed to the
system. However, if PG_PS is set, there is no next level.
Reported by: rpokala
Reviewed by: kib
Tested by: rpokala
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28922
Address two incorrect calculations and enhance readability of PRR code
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)
Reviewed By: #transport, kbowling
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28813
Mark Johnston [Wed, 24 Feb 2021 15:08:53 +0000 (10:08 -0500)]
iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice. This was introduced when
pfil hooks were added to the iflib receive path. We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.
PR: 253583
Reviewed by: gallatin, erj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28900
Lutz Donnerhacke [Wed, 27 Jan 2021 20:19:14 +0000 (21:19 +0100)]
netgraph/ng_car: Add color marking code
Chained policing should be able to reuse the classification of
traffic. A new mbuf_tag type is defined to handle gereral QoS
marking. A new subtype is defined to track the color marking.
Handle partial data re-sending on ktls/sendfile on FreeBSD
Add a handler for EBUSY sendfile error in addition to
EAGAIN. With EBUSY returned the data still can be partially
sent and user code has to be notified about it, otherwise it
may try to send data multiple times.
Robert Watson [Tue, 16 Feb 2021 15:19:05 +0000 (15:19 +0000)]
Reimplemen FreeBSD/arm64 dtrace_gethrtime() to use the system timer.
dtrace_gethrtime() is the high-resolution nanosecond timestemp used
for the DTrace 'timestamp' built-in variable. The new implementation
uses the EL0 cycle counter and frequency registers in ARMv8-A. This
replaces a previous implementation that relied on an
instrumentation-safe implementation of getnanotime(), which provided
only timer resolution.
Robert Wing [Wed, 17 Feb 2021 09:22:23 +0000 (00:22 -0900)]
automount(8): fix absolute path when creating a mountpoint
When executing automount(8), it will attempt to create the directory where an
autofs filesystem is to be mounted. Explicity set the root path for this
directory to "/".
This fixes the issue where the directory being created was being treated as a
relative path instead of an absolute path (as expected).
Mark Johnston [Thu, 18 Feb 2021 15:50:57 +0000 (10:50 -0500)]
arm64: Include NUMA locality info in the CPU topology
The scheduler uses this topology to try and preserve locality when
migrating threads between CPUs and when performing work stealing.
Ensure that on NUMA systems it will at least take the NUMA topology into
account.
Mark Johnston [Mon, 22 Feb 2021 20:50:09 +0000 (15:50 -0500)]
vm_kern: Avoid sign extension in the KVA_QUANTUM definition
Otherwise, on a powerpc64 NUMA system with hashed page tables, the
first-level superpage reservation size is large enough that the value of
the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is
negative and gets sign-extended when passed to vmem_set_import(). This
results in a boot-time hang on such platforms.
Lutz Donnerhacke [Tue, 26 Jan 2021 15:50:04 +0000 (16:50 +0100)]
netgraph/ng_vlan_rotate: IEEE 802.1ad VLAN manipulation netgraph type
This node is part of an A10-NSP (L2-BSA) development.
Carrier networks tend to stack three or more tags for internal
purposes and therefore hiding the service tags deep inside of the
stack. When decomposing such an access network frame, the processing
order is typically reversed: First distinguish by service, than by
other means.
This new netgragh node allows to bring the relevant VLAN in front (to
the out-most position). This way other netgraph nodes (like ng_vlan)
can operate on this specific type.
The previous implementation was reported to try to coalesce packets
in situations when it should not, that resulted in assertion later.
This implementation better checks the first packet of the chain for
the coallescing elligibility.
Dimitry Andric [Mon, 22 Feb 2021 20:01:09 +0000 (21:01 +0100)]
Fix possibly unitialized variables in __cxa_demangle_gnu3()
After 0ee0dbfb0d26cf4bc37f24f12e76c7f532b0f368 where I imported a more
recent libcxxrt snapshot, the variables 'rtn' and 'has_ret' could in
some cases be used while still uninitialized. Most obviously this would
lead to a jemalloc complaint about a bad free(), aborting the program.
Fix this by initializing a bunch variables in their declarations. This
change has also been sent upstream, with some additional changes to be
used in their testing framework.
Warner Losh [Thu, 11 Feb 2021 23:06:30 +0000 (16:06 -0700)]
efibootmgr: Check for efi supported after parsing args
Move the check for efi variables being supported to after parsing the args. This
allows '-h' to produce both as a normal user as well as on all systems.
Warner Losh [Wed, 17 Feb 2021 22:08:19 +0000 (15:08 -0700)]
uart: only use MSI on devices that advertise 1 MSI vector
This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be
used. Since we're just attaching one instance, the meaning of more than one
vector is not clear and seems to cause problems. Fall back to old methods for
these cards.
Warner Losh [Fri, 19 Feb 2021 22:34:25 +0000 (15:34 -0700)]
boot: remove gptboot.efifat, it never should have been
conical hat reduction: Make sure we also remove gotboot.efifat. It was created,
briefly, and shouldn't have existed in the first place. Kill it at the same
place we kill boot1.efifat.
Mark Johnston [Fri, 19 Feb 2021 22:08:34 +0000 (17:08 -0500)]
iflib: Fix detach of pseudo interfaces
In commit 38bfc6dee33b we added an IFDI_DETACH() call to
iflib_pseudo_deregister() since it looked like it was missing. One is
present in the error-handling path of iflib_pseudo_register(). However,
the detach actually comes from the DEVICE_DETACH() method for the
above-mentioned device_t, so now we're calling IFDI_DETACH() twice when
destroying a pseudo interface.
Fix the problem by not calling IFDI_DETACH() from the device detach
routine. This way we can ensure that iflib de-initialization always
happens in a consistent order. It also ensures that you can't do silly
things like "devctl detach <pseudo ifnet>", which would previously
detach the driver without tearing down the corresponding ifnet.
PR: 253541
Reviewed by: erj
Fixes: 38bfc6dee33b
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28774
Mitchell Horne [Thu, 28 Jan 2021 17:49:47 +0000 (13:49 -0400)]
arm64: extend struct db_reg to include watchpoint registers
The motivation is to provide access to these registers from userspace
via ptrace(2) requests PT_GETDBREGS and PT_SETDBREGS.
This change breaks the ABI of these particular requests, but is
justified by the fact that the intended consumers (debuggers) have not
been taught to use them yet. Making this change now enables active
upstream work on lldb to begin using this interface, and take advantage
of the hardware debugging registers available on the platform.
PR: 252860
Reported by: Michał Górny (mgorny@gentoo.org)
Reviewed by: andrew, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
Sponsored by: The FreeBSD Foundation
Mitchell Horne [Fri, 5 Feb 2021 21:46:48 +0000 (17:46 -0400)]
arm64: handle watchpoint exceptions from EL0
This is a prerequisite to allowing the use of hardware watchpoints for
userspace debuggers.
This is also a slight departure from the x86 behaviour, since `si_addr`
returns the data address that triggered the watchpoint, not the
address of the instruction that was executed. Otherwise, there is no
straightforward way for the application to determine which watchpoint
was triggered. Make a note of this in the siginfo(3) man page.
Reviewed by: jhb, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
Sponsored by: The FreeBSD Foundation
Mitchell Horne [Tue, 9 Feb 2021 18:29:38 +0000 (14:29 -0400)]
arm64: validate breakpoint registers
In particular, we want to disallow setting breakpoints on kernel
addresses from userspace. The control register fields are validated or
ignored as appropriate.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Kyle Evans [Wed, 20 Jan 2021 17:53:05 +0000 (11:53 -0600)]
504ebd612ec: kern: sonewconn: set so_options before pru_attach()
Protocol attachment has historically been able to observe and modify
so->so_options as needed, and it still can for newly created sockets. 779f106aa169 moved this to after pru_attach() when we re-acquire the
lock on the listening socket.
Restore the historical behavior so that pru_attach implementations can
consistently use it. Note that some pru_attach() do currently rely on
this, though that may change in the future. D28265 contains a change to
remove the use in TCP and IB/SDP bits, as resetting the requested linger
time on incoming connections seems questionable at best.
This does move the assignment out from under the head's listen lock, but
glebius notes that head won't be going away and applications cannot
assume any specific ordering with a race between a connection coming in
and the application changing socket options anyways.
TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in
exactly the same form that it appears here modulo slightly different
context. It used to be the case that there was a single pr_usrreq
method with requests dispatched to it; these exact two lines appeared in
tcp_usrreq's PRU_ATTACH handling.
The only purpose of this that I can find is to cause surprising behavior
on accepted connections. Newly-created sockets will never hit these
paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is
set on a listening socket and inherited, one would expect the timeout to
be inherited rather than changed arbitrarily like this -- noting that
SO_LINGER is nonsense on a listening socket beyond inheritance, since
they cannot be 'connected' by definition.
Neither Illumos nor Linux reset the timer like this based on testing and
inspection of Illumos, and testing of Linux.
Fix divide-by-zero panic when ASLR is enabled and superpages disabled
When locating the anonymous memory region for a vm_map with ASLR
enabled, we try to keep the slid base address aligned on a superpage
boundary to minimize pagetable fragmentation and maximize the potential
usage of superpage mappings. We can't (portably) do this if superpages
have been disabled by loader tunable and pagesizes[1] is 0, and it
would be less beneficial in that case anyway.
hkbd: Fix handling of keyboard ErrorRollOver reports
Ignore fantom keyboard state reports entirelly rather than ignore
RollOver states for each key separatelly. Latter results in spurious
release/push pairs of events on each fantom keyboard state report.
Reported by: Jan Martin Mikkelsen <janm_AT_transactionware_DOT_com>
Submitted by: Jan Martin Mikkelsen (initial version)
PR: 253249
MFC after: 1 week
ukbd: Fix handling of keyboard ErrorRollOver reports
Ignore fantom keyboard state reports entirelly rather than ignore
RollOver states for each key separatelly. Latter results in spurious
release/push pairs of events on each fantom keyboard state report.
Reported by: Jan Martin Mikkelsen <janm_AT_transactionware_DOT_com>
Submitted by: Jan Martin Mikkelsen (initial version)
PR: 253249
MFC after: 1 week
hidraw: Make HIDIOCGRDESCSIZE ioctl return report descriptor size
defined by hardware rather than cached one to match HIDIOCGRDESC ioctl.
This fixes errors reported by hid-tools being run against /dev/hidraw#
device node belonging to driver which overloads report descriptor.
ofwfb: fix incorrect colors on powerpc* and add new tunable parameters
- Implements little-endian support (powerpc64le)
- Adds 'hw.ofwfb.physaddr' kernel parameter so user can manually
provide correct address if it's not detected correctly
- Adds 'hw.ofwfb.argb32_pixel' so user can set it manually if
colors are inverted due to incorrect pixel format (default = 1)
- Automatically selects RGBA32 pixel format if NVidia graphic adapter
is detected (sets hw.ofwfb.argb32_pixel=0)
Machines equipped with NVidia graphic adapters tend to use RGBA32
pixel format. By default ARGB32 pixel format is used, proved to work
on machines equipped with ATI graphic adapter and the onboard adapter
used on Talos II and Blackbird machines from Raptor Computing Systems.
Mitchell Horne [Wed, 20 Jan 2021 15:07:53 +0000 (11:07 -0400)]
cgem: improve usage of busdma(9) KPI
BUS_DMA_NOCACHE should only be used when one needs to guarantee the
created mapping has uncached memory attributes, usually as a result
of buggy hardware. Normal use cases should pass BUS_DMA_COHERENT, to
create an appropriate mapping based on the flags passed to
bus_dma_tag_create().
This should have no functional change, since the DMA tags in this driver
are created without the BUS_DMA_COHERENT flag.
Reported by: mmel
Reviewed by: mmel, Thomas Skibo <thomas-bsd@skibo.net>
Thomas Skibo [Mon, 11 Jan 2021 20:58:12 +0000 (16:58 -0400)]
ddb: fix show devmap output on 32-bit arm
The output has been broken since 1b6dd6d772ca. Casting to uintmax_t
before the call to printf is necessary to ensure that 32-bit addresses
are interpreted correctly.
Yannis Planus [Thu, 28 Jan 2021 13:59:07 +0000 (14:59 +0100)]
pf: duplicate frames only once when using dup-to pf rule
When using DUP-TO rule, frames are duplicated 3 times on both output
interfaces and duplication interface. Add a flag to not duplicate a
duplicated frame.
Inspired by a patch from Miłosz Kaniewski milosz.kaniewski at gmail.com
https://lists.freebsd.org/pipermail/freebsd-pf/2015-November/007886.html
Toomas Soome [Thu, 28 Jan 2021 07:45:47 +0000 (09:45 +0200)]
loader: unload command should reset tg_kernel_supported in gfx_state
While loading kernel, we check if vt/vbe backend support is included in
kernel and set the tg_kernel_supported flag in gfx_state. unload
command needs to reset this flag to allow next load to perform
this check with new kernel.
Dimitry Andric [Wed, 27 Jan 2021 21:28:43 +0000 (22:28 +0100)]
Fix loader detection of vbefb support on !amd64
On i386, after 6c7a932d0b8baaaee16eca0ba061bfa6e0e57bfd, the vbefb vt
driver was no longer detected by the loader, if any kernel module was
loaded after the kernel itself.
This was caused by the parse_vt_drv_set() function being called multiple
times, resetting the detection flag. (It was called multiple times,
becuase i386 .ko files are shared objects like the kernel proper, while
this is not the case on amd64.)
Fix this by skipping the set_vt_drv_set lookup if vbefb was already
detected.
Mark Johnston [Tue, 16 Feb 2021 14:30:21 +0000 (09:30 -0500)]
linux: Unmap the VDSO page when unloading
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO. When destroying the VDSO object, we failed to
destroy the mapping and free KVA. Fix this.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28696
Roger Pau Monné [Wed, 20 Jan 2021 18:40:51 +0000 (19:40 +0100)]
xen-blkback: fix leak of grant maps on ring setup failure
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.
Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.
Alexander Motin [Sun, 24 Jan 2021 19:23:04 +0000 (14:23 -0500)]
Exclude reserved iSCSI Initiator Task Tag.
RFC 7143 (11.2.1.8):
An ITT value of 0xffffffff is reserved and MUST NOT be assigned for a
task by the initiator. The only instance in which it may be seen on
the wire is in a target-initiated NOP-In PDU (Section 11.19) and in
the initiator response to that PDU, if necessary.
Alexander Motin [Sun, 24 Jan 2021 18:58:29 +0000 (13:58 -0500)]
Exclude reserved iSCSI Target Transfer Tag.
RFC 7143 (11.7.4):
The Target Transfer Tag values are not specified by this protocol,
except that the value 0xffffffff is reserved and means that the
Target Transfer Tag is not supplied.
Alexander Motin [Wed, 17 Feb 2021 02:15:28 +0000 (21:15 -0500)]
cxgbe(4): Save proper zone index on low memory in refill_fl().
When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it
falls back to safe (4KB) ones. But it still saved into sd->zidx
the original fl->zidx instead of fl->safe_zidx. It caused problems
with the later use of that cluster, including memory and/or data
corruption.
While there, make refill_fl() to use the safe zone for all following
clusters for the call, since it is unlikely that large succeed.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Reviewed by: np, jhb
Differential Revision: https://reviews.freebsd.org/D28716