Don't access a user buffer directly from the kernel.
The handle_string callback for the ENCIOC_SETSTRING ioctl was passing
a user pointer to memcpy(). Fix by using copyin() instead.
For ENCIOC_GETSTRING ioctls, the handler was storing the user pointer
in a CCB's data_ptr field where it was indirected by other code. Fix
this by allocating a temporary buffer (which ENCIOC_SETSTRING already
did) and copying the result out to the user buffer after the CCB has
been processed.
These two sysctls were added to support UFS softupdates journalling
with snapshots. However, the changes to fsck to use them were never
committed and there have never been any in-tree uses of these sysctls.
More details from Kirk:
When journalling got added to soft updates, its journal rollback freed
blocks that it thought were no longer in use. But it does not take
snapshots into account (i.e., if a snapshot is still using it, then it
cannot be freed). So I added the needed logic to fsck by having the
free go through the kernel's blkfree code so it could grab blocks that
were still needed by snapshots. That is done using the setbufoutput
hack. I never got that code working reliably, so it is still sitting
in my work directory. Which also explains why you still cannot take
snapshots on filesystems running with journalling...
In looking over my use of this feature, and in particular the troubles
I was having with it, I conclude that it may be better to extract the
code from the kernel that handles freeing blocks claimed by snapshots
and putting it into fsck directly. My original intent was that it is
complex and at the time changing, so only having to maintain it in one
place was appealing. But at this point it has not changed in years and
the hacks like setinode and setbufoutput to be able to use the kernel
code is sufficiently ugly, that I am leaning towards just extracting
it.
Handle non-dtrace-triggered kernel breakpoint traps in mips.
If DTRACE is enabled at compile time, all kernel breakpoint traps are
first given to dtrace to see if they are triggered by a FBT probe.
Previously if dtrace didn't recognize the trap, it was silently
ignored breaking the handling of other kernel breakpoint traps such as
the debug.kdb.enter sysctl. This only returns early from the trap
handler if dtrace recognizes the trap and handles it.
The current situation results in intermittent breakage if data gets split up
with the sign bit set on the data1 half of it, as PAIR32TO64 will then:
data1 | (data2 << 32) -> resulting in data1 getting sign-extended when it's
implicitly widened and clobbering the result. AFAICT, there's no compelling
reason for these to be signed.
This was most exposed by flakiness in the kqueue timer tests under compat32
after the ABSTIME test got switched over to using a better clock and
microseconds.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24518
Factor out the kmem contig page alloc and reclamation code.
kmem_alloc_attr_domain() and kmem_alloc_contig_domain() duplicated each
other's page allocation and reclamation logic. Place it in a single
function to make it easier to add additional consumers. No functional
change intended.
Reviewed by: jeff, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24475
Correctly set up the initial TCP congestion window
in all cases, by adjust snd_una right after the
connection initialization, to include the one byte
in sequence space occupied by the SYN bit.
This does not change the regular ACK processing,
while making the BYTES_THIS_ACK macro to work properly.
This unbreaks the i386 kqueue timer tests after a recent change switched
NOTE_ABSTIME over to using microseconds. Notably, the data argument (which
holds useconds) is an int64_t, but we were passing it to timer2sbintime
which takes an intptr_t. Perhaps in a previous incarnation, intptr_t would
have made sense, but now it just leads to the timestamp getting truncated
and subsequently rejected when it no longer fits in an intptr_t.
Add some prose and a diagram describing the layout of the cipher IV
for AES-CTR and AES-GCM and how it relates to the ESP IV stored in the
packet after the ESP header. Also, remove an XXX comment about the
initial block counter value used for AES-CTR in esp_output as the
current code matches the RFC (and the equivalent code in esp_input
didn't have the XXX comment).
The sole in-tree user of this flag has been retired, so remove this
complexity from all drivers. While here, add a helper routine drivers
can use to read the current request's IV into a local buffer. Use
this routine to replace duplicated code in nearly all drivers.
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24450
This is the only place that uses CRYPTO_F_IV_GENERATE. All crypto
drivers currently duplicate the same boilerplate code to handle this
case. Doing the generation directly removes complexity from drivers.
It also simplifies support for separate input and output buffers.
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24449
dim [Mon, 20 Apr 2020 19:16:10 +0000 (19:16 +0000)]
Merge commit 64b31d96d from llvm git (by Nemanja Ivanovic):
[PowerPC] Do not attempt to reuse load for 64-bit FP_TO_UINT without
FPCVT
We call the function that attempts to reuse the conversion without
checking whether the target matches the constraints that the callee
expects. This patch adds the check prior to the call.
This should fix 'Assertion failed: ((Op.getOpcode() == ISD::FP_TO_SINT
|| Subtarget.hasFPCVT()) && "i64 FP_TO_UINT is supported only with
FPCVT"), function LowerFP_TO_INTForReuse, file
/usr/src/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp, line 7276'
when building the devel/libslang2 port (and a few others) for PowerPC64.
cem [Mon, 20 Apr 2020 18:01:45 +0000 (18:01 +0000)]
acpi_ec(4): Do not probe "successfully" if an error occurred
All of the 'goto out;' cases in this probe routine without explicit
initialization of 'ret' indicate error cases and were clearly intended
to use the initial definition of 'ret' with ENXIO. However, 'ret' was
accidentally squashed by reuse for a subroutine call near the beginning
of probe.
Use a different variable for the subroutine status to preserve ENXIO ret
for the 'goto out's as a minimal solution to the panic reported at attach
for now.
dim [Mon, 20 Apr 2020 17:39:51 +0000 (17:39 +0000)]
Merge commit ce5173c0e from llvm git (by Reid Kleckner):
Use FinishThunk to finish musttail thunks
FinishThunk, and the invariant of setting and then unsetting
CurCodeDecl, was added in 7f416cc42638 (2015). The invariant didn't
exist when I added this musttail codepath in ab2090d10765 (2014).
Recently in 28328c3771, I started using this codepath on non-Windows
platforms, and users reported problems during release testing
(PR44987).
The issue was already present for users of EH on i686-windows-msvc,
so I added a test for that case as well.
This should fix 'Assertion failed: (!empty() && "popping exception stack
when not empty"), function popTerminate, file
/usr/src/contrib/llvm-project/clang/lib/CodeGen/CGCleanup.h, line 583'
when building the net-p2p/libtorrent-rasterbar
Change kern.evdev.rcpt_mask from 3 to 12 by default. This makes us much
more evdev-friendly, and will prevent everyone using xorg and wayland with
evdev devices (the default) from needing to change this locally.
powerpc32 still uses the old value for the keyboard part, becaues the adb
keyboard driver used there is not evdev compatible.
This matches GNU diff(1) behavior and, more importantly, eliminates any
source of confusion if multiple formatting options are specified.
Note that the committed diff differs slightly from the submitted: I've
modified it so that we initialize diff_format to something that isn't an
accepted format option so that we can also reject --normal -c and -c
--normal, which would've otherwise been accepted because the default was
--normal. After option parsing we default it to D_NORMAL if it's still
unset.
Allow namespace-id specification where it makes sense.
It makes tool more convenient to not require user to explicitly convert
namespace device name into controller device name. There should be no
changes to already existing syntaxes.
Handle trashed queue pointers in vm_page_acquire_unlocked().
vm_page_acquire_unlocked() relies on type-stability of vm_page
structures and assumes that the listq linkage pointers always point to a
vm_page or are NULL. QUEUE_MACRO_DEBUG_TRASH breaks that assumption, so
add an explicit check for a trashed queue pointer before dereferencing.
Reported and tested by: pho
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24472
tests: kqueue: fix some issues with now() on ILP32 platforms
There were ultimately two separate problems here:
- a 32-bit long cannot represent microseconds since 1970 (noted by ian)
- time_t is 32-bit on i386, so now() was wrong anyways even with the correct
return type.
For the first, just explicitly use a uint64_t for now() and all of the
callers. For the second, we need to explicitly cast tv_sec to uint64_t
before it gets multiplied in the SEC_TO_US macro. Casting this instance
rather than generally in the macro was arbitrarily chosen simply because all
other uses are converting small relative time values.
The tests now pass on i386, at least; presumably other ILP32 will be fine
now as well.
cem [Sun, 19 Apr 2020 23:53:47 +0000 (23:53 +0000)]
vmm(4): Bump VM_MAX_MEMMAPS for vmgenid
As a short term solution for the problem reported by Shawn Webb re: r359950,
bump the maximum number of memmaps per VM. This structure is 40 bytes, and the
additional four (fixed array embedded in the struct vm) members increase the
size of struct vm by 3%.
(The vast majority of struct vm is the embedded struct vcpu array, which
accounts for 84% of the size -- over 4 kB.)
Conditionally install Kerberos rc files based on MK_KERBEROS_SCRIPTS
instead of MK_KERBEROS. The reason for this change is some users
prefer to build FreeBSD WITHOUT_KERBEROS, wanting to retain the
Kerberos rc scripts to start/stop MIT Kerberos or Heimdal from ports.
PR: 197337
Reported by: Adam McDougall <ebay at looksharp.net>
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D24252
bridge tests: Ensure that bridges in different jails get different MAC addresses
We used to have a problem where bridges created in different vnet jails
would end up having the same mac address. This is now fixed by
including the jail name as a seed for the mac address generation, but we
should verify that it doesn't regress.
Both DIOCCHANGEADDR and DIOCADDADDR take a struct pf_pooladdr from
userspace. They failed to validate the dyn pointer contained in its
struct pf_addr_wrap member structure.
This triggered assertion failures under fuzz testing in
pfi_dynaddr_setup(). Happily the dyn variable was overruled there, but
we should verify that it's set to NULL anyway.
ifa_grouplookup() uses the data loaded in ifa_load() (through is_a_group()), so
we must call ifa_load() before we can rely on any of the data it populates.
According to the upstream man page (which we don't install), none of
libauditd's symbols are intended to be public. Also, I can't find any
evidence for a port that uses libauditd. Therefore, we should treat it like
other such libraries and use PRIVATELIB.
pmap_bootstrap() expects the kernel's physical load address, but we have
been providing the start of physical memory. This had the nice effect of
protecting the memory used by the SBI runtime firmware, but now that we
have alternate means of achieving that, we should provide the correct
value. This will free up any memory between the SBI firmware and the
kernel for allocation.
The device tree may contain a "reserved-memory" node, whose purpose is
to communicate sections of physical memory that should not be used for
general allocations. Add the logic to parse and exclude these regions.
The particular motivation for this is protection of the SBI runtime
firmware. Currently, there is no mechanism through which the SBI
can communicate the details of its reserved memory region(s) to
a supervisor payload. There has been some discussion recently on how
this can be achieved [1], and it seems that the path going forward
will be to add an entry to the reserved-memory node.
This hasn't caused any issues for us yet, since we exclude all physical
memory below the kernel's load address from being allocated, and on all
currently supported platforms this covers the SBI firmware region. This
will change in another commit, so as a safety measure, ensure that the
lowest 2MB of memory is excluded if this region has not been reported.
Replace our hand-rolled functions with the generic ones provided by
kern/subr_physmem.c. This greatly simplifies the initialization of
physical memory regions and kernel globals.
Tested by: nick
Differential Revision: https://reviews.freebsd.org/D24154
The arm_physmem interface found in arm's MD code provides a convenient
set of routines for adding/excluding physical memory regions and
initializing important kernel globals such as Maxmem, realmem,
phys_avail[], and dump_avail[]. It is especially convenient for FDT
systems, since we can use FDT parsing functions and pass the result
directly to one of these physmem routines. This interface is already in
use on arm and arm64, and can be used to simplify this early
initialization on RISC-V as well.
This requires only a couple trivial changes:
- Move arm_physmem_kernel_addr to arm/machdep.c. It is unused on arm64,
and manipulated entirely in arm MD code.
- Convert arm32_btop/arm64_btop to atop. This is equivalently defined
on all architectures.
- Drop the "arm" prefix.
Add missing feature descriptions to hci_features2str().
The list of possible features in hccontrol/features2str() is incomplete.
Refer to "Bluetooth Core Specification 5.2 Vol. 2 Part C. 3.3 Feature Mask Definition".
Unconditionally use ether_gen_addr() to generate bridge mac addresses. This
function is now less likely to generate duplicate mac addresses across jails.
The old hand rolled hostid based code adds no value.
ethersubr: Make the mac address generation more robust
If we create two (vnet) jails and create a bridge interface in each we end up
with the same mac address on both bridge interfaces.
These very often conflicts, resulting in same mac address in both jails.
Mitigate this problem by including the jail name in the mac address.
The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa.
It is true for zfs, but it is not for e.g. vnode or buffer pagers.
When fixing bogus pages, fix them in both places. Rely on the fact
that pa[0] must have been invalid so it cannot be bogus.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Replace all instances of the typedef mbuf_t with "struct mbuf *".
The typedef mbuf_t was used for the Mac OS/X port of the code long ago.
Since this port is no longer used and the use of mbuf_t obscures what
the code does (and is not consistent with style(9)), it is no longer needed.
This patch replaces all instances of mbuf_t with "struct mbuf *", so that
it is no longer used.
This patch should not result in any semantic change.
cem [Fri, 17 Apr 2020 20:20:03 +0000 (20:20 +0000)]
xen-locore: Silence DWARF2 section warning
Silence the "DWARF2 can only represent one section per compilation unit"
warning in amd64 GENERIC builds by disabling Clang's debuginfo generation for
this assembler file (-g0). The message is replaced by a warning from
ctfconvert that there is no debuginfo to convert (future work).
The file contains some metadata (several ELF notes) and some code. The code
does not appear to have anything that debuginfo would aid.
I looked at the generated debuginfo (readelf -w xen-locore.o) prior to this
change, and the metadata that would be disabled are things like associated
between binary offset and code line number (not especially useful with a
disassembler), and label metadata for the entry points (not especially useful
as this is already in the symbol table).
tty: convert tty_lock_assert to tty_assert_locked to hide lock type
A later change, currently being iterated on in D24459, will in-fact change
the lock type to an sx so that TTY drivers can sleep on it if they need to.
Committing this ahead of time to make the review in question a little more
palatable.
tty_lock_assert() is unfortunately still needed for now in two places to
make sure that the tty lock has not been recursed upon, for those scenarios
where it's supplied by the TTY driver and possibly a mutex that is allowed
to recurse.
Use the right type for 64-bit coprocessor registers.
The use of "int" here caused the compiler to believe that it needs to
insert a "sll $n, $n, 0" to sign extend as part of the implicit cast
to uint64_t.