Alan Somers [Sun, 19 Apr 2020 02:20:39 +0000 (02:20 +0000)]
libauditd: make it a PRIVATELIB
According to the upstream man page (which we don't install), none of
libauditd's symbols are intended to be public. Also, I can't find any
evidence for a port that uses libauditd. Therefore, we should treat it like
other such libraries and use PRIVATELIB.
pmap_bootstrap() expects the kernel's physical load address, but we have
been providing the start of physical memory. This had the nice effect of
protecting the memory used by the SBI runtime firmware, but now that we
have alternate means of achieving that, we should provide the correct
value. This will free up any memory between the SBI firmware and the
kernel for allocation.
The device tree may contain a "reserved-memory" node, whose purpose is
to communicate sections of physical memory that should not be used for
general allocations. Add the logic to parse and exclude these regions.
The particular motivation for this is protection of the SBI runtime
firmware. Currently, there is no mechanism through which the SBI
can communicate the details of its reserved memory region(s) to
a supervisor payload. There has been some discussion recently on how
this can be achieved [1], and it seems that the path going forward
will be to add an entry to the reserved-memory node.
This hasn't caused any issues for us yet, since we exclude all physical
memory below the kernel's load address from being allocated, and on all
currently supported platforms this covers the SBI firmware region. This
will change in another commit, so as a safety measure, ensure that the
lowest 2MB of memory is excluded if this region has not been reported.
Replace our hand-rolled functions with the generic ones provided by
kern/subr_physmem.c. This greatly simplifies the initialization of
physical memory regions and kernel globals.
Tested by: nick
Differential Revision: https://reviews.freebsd.org/D24154
The arm_physmem interface found in arm's MD code provides a convenient
set of routines for adding/excluding physical memory regions and
initializing important kernel globals such as Maxmem, realmem,
phys_avail[], and dump_avail[]. It is especially convenient for FDT
systems, since we can use FDT parsing functions and pass the result
directly to one of these physmem routines. This interface is already in
use on arm and arm64, and can be used to simplify this early
initialization on RISC-V as well.
This requires only a couple trivial changes:
- Move arm_physmem_kernel_addr to arm/machdep.c. It is unused on arm64,
and manipulated entirely in arm MD code.
- Convert arm32_btop/arm64_btop to atop. This is equivalently defined
on all architectures.
- Drop the "arm" prefix.
Add missing feature descriptions to hci_features2str().
The list of possible features in hccontrol/features2str() is incomplete.
Refer to "Bluetooth Core Specification 5.2 Vol. 2 Part C. 3.3 Feature Mask Definition".
Unconditionally use ether_gen_addr() to generate bridge mac addresses. This
function is now less likely to generate duplicate mac addresses across jails.
The old hand rolled hostid based code adds no value.
ethersubr: Make the mac address generation more robust
If we create two (vnet) jails and create a bridge interface in each we end up
with the same mac address on both bridge interfaces.
These very often conflicts, resulting in same mac address in both jails.
Mitigate this problem by including the jail name in the mac address.
The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa.
It is true for zfs, but it is not for e.g. vnode or buffer pagers.
When fixing bogus pages, fix them in both places. Rely on the fact
that pa[0] must have been invalid so it cannot be bogus.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Rick Macklem [Fri, 17 Apr 2020 21:17:51 +0000 (21:17 +0000)]
Replace all instances of the typedef mbuf_t with "struct mbuf *".
The typedef mbuf_t was used for the Mac OS/X port of the code long ago.
Since this port is no longer used and the use of mbuf_t obscures what
the code does (and is not consistent with style(9)), it is no longer needed.
This patch replaces all instances of mbuf_t with "struct mbuf *", so that
it is no longer used.
This patch should not result in any semantic change.
Silence the "DWARF2 can only represent one section per compilation unit"
warning in amd64 GENERIC builds by disabling Clang's debuginfo generation for
this assembler file (-g0). The message is replaced by a warning from
ctfconvert that there is no debuginfo to convert (future work).
The file contains some metadata (several ELF notes) and some code. The code
does not appear to have anything that debuginfo would aid.
I looked at the generated debuginfo (readelf -w xen-locore.o) prior to this
change, and the metadata that would be disabled are things like associated
between binary offset and code line number (not especially useful with a
disassembler), and label metadata for the entry points (not especially useful
as this is already in the symbol table).
tty: convert tty_lock_assert to tty_assert_locked to hide lock type
A later change, currently being iterated on in D24459, will in-fact change
the lock type to an sx so that TTY drivers can sleep on it if they need to.
Committing this ahead of time to make the review in question a little more
palatable.
tty_lock_assert() is unfortunately still needed for now in two places to
make sure that the tty lock has not been recursed upon, for those scenarios
where it's supplied by the TTY driver and possibly a mutex that is allowed
to recurse.
John Baldwin [Fri, 17 Apr 2020 18:24:47 +0000 (18:24 +0000)]
Use the right type for 64-bit coprocessor registers.
The use of "int" here caused the compiler to believe that it needs to
insert a "sll $n, $n, 0" to sign extend as part of the implicit cast
to uint64_t.
John Baldwin [Fri, 17 Apr 2020 18:19:13 +0000 (18:19 +0000)]
Don't try to copyout() to a kernel buffer.
The handle_string callback for the ENCIOC_GET_ENCNAME and
ENCIOC_GETENCID ioctls tries to copy the size of the generated string
out to userland. However, the callback only has access to the kernel
copy of the structure populated by copyin(). The copyout() call
simply overwrites the value in the kernel's copy preventing the
subsequent overflow prevention logic from working.
Fix this by instead doing a copyout() of the updated length in the
caller after the callback returns.
Mark Johnston [Fri, 17 Apr 2020 16:55:14 +0000 (16:55 +0000)]
Always compile minidump_machdep.c on arm.
It is not logically dependent on "device mem", and an arm kernel
compiled without that device fails to link since the minidumpsys()
symbol is referenced by kern_dump.c.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Userspace may pass a negative ps_len value to us, which causes an
assertion failure in malloc().
Treat negative values as zero, i.e. return the required size.
jail(8): improve manual and usage information with more clear
description for "jail -e" mode to show that it does not take
additional jail name argument.
Reported by: David Marec <david.marec@davenulle.org>
MFC after: 3 days
Move M_RPC malloc type into XDR. Both RPC and XDR libraries use
this type, but since RPC depends on XDR (not vice versa) we need
it defined in XDR to make the module loadable without RPC.
tests: kqueue: use a more precise timer for the NOTE_ABSTIME test
Originally noticed while attempting to run the kqueue tests under
qemu-user-static, this apparently just happens sometimes when running in a
jail in general -- the timer will fire off "too early," but it's really just
the result of imprecise measurements (noted by cem).
Kicking this over to NOTE_USECONDS still tests the correct thing while
allowing it to work more consistently; a basic sanity test reveals that we
often end up coming in just less than 200 microseconds after the timer
fired off.
Rick Macklem [Fri, 17 Apr 2020 02:21:46 +0000 (02:21 +0000)]
Add a sanity check for nes_numsecflavor to the NFS server.
Ryan Moeller reported crashes in the NFS server that appear to be
caused by stack corruption in nfsrv_compound(). It appears that
the stack got corrupted just after a NFSv4.1 Lookup that crosses
a server mount point.
Although it is just a "theory" at this point, the most obvious way
the stack could get corrupted would be if nfsvno_checkexp() somehow
acquires an export with a bogus nes_numsecflavor value. This would
cause the copying of the secflavors to run off the end of the array,
which is allocated on the stack below where the corruption occurs.
This sanity check is simple to do and would stop the stack corruption
if the theory is correct. Otherwise, doing the sanity check seems to
be a reasonable safety belt to add to the code.
cdir may have simply failed to resolve (e.g. fget_cap failure in namei
leading to NULL dp passed to AUDIT_ARG_UPATH*_VP); restore the pre-rS358191
behavior of setting cpath[0] = '\0' and bailing out instead of panicking.
This was found by inadvertently running the libc/c063 tests with auditing
enabled, resulting in a panic.
Reviewed by: mjg (committed version actually his)
Differential Revision: https://reviews.freebsd.org/D24445
Adrian Chadd [Thu, 16 Apr 2020 23:31:39 +0000 (23:31 +0000)]
[sh] Fix a "may be unused" warning on mips-gcc
mips-gcc for mips32 was complaining that c was potentially used before
being set. Setting it to 0 before calling fdgetsc() looks like the right
thing to do in this instance; there's an explicit check for c == 0 later
on.
Tested: mips-gcc mips32 build, running /bin/sh on mips32
Adrian Chadd [Thu, 16 Apr 2020 23:29:49 +0000 (23:29 +0000)]
[libsa] Fix typecast of pointer for st_dev
This code was trying to use a pointer value for st_dev, which is definitely
not a pointer. Instead, cast to uintptr_t so it becomes a non-pointer value
before casting it.
Colin Percival [Thu, 16 Apr 2020 21:56:52 +0000 (21:56 +0000)]
Alert devd when acpi_video brightness changes
On my Dell Latitude 7390 laptop, the brightness hotkeys
(Fn+<up/down arrow>) send ACPI notifications which acpi_video
handles by adjusting its brightness setting; but ACPI does not
actually do anything with the backlight.
Announcing brightness changes via devd makes it possible to close
the loop by triggering the intel_backlight utility to perform the
required backlight adjustment.
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24424
Avoid calling protocol drain routines more than once per reclamation event.
mb_reclaim() calls the protocol drain routines for each protocol in each
domain. Some protocols exist in more than one domain and share drain
routines. In the case of SCTP, it also uses the same drain routine for
its SOCK_SEQPACKET and SOCK_STREAM entries in the same domain.
On systems with INET, INET6, and SCTP all defined, mb_reclaim() calls
sctp_drain() four times. On systems with INET and INET6 defined,
mb_reclaim() calls tcp_drain() twice. mb_reclaim() is the only in-tree
caller of the pr_drain protocol entry.
Eliminate this duplication by ensuring that each pr_drain routine is only
specified for one protocol entry in one domain.
Name each pcib driver uniquely.
This remove the warning printed at each arm boot :
module_register: cannot register simplebus/pcib from kernel; already loaded from kernel
One of the goals of the new routing KPI defined in r359823 is to
entirely hide`struct rtentry` from the consumers. It will allow to
improve routing subsystem internals and deliver more features much faster.
This change is one of the ongoing changes to eliminate direct
struct rtentry field accesses.
Additionally, with the followup multipath changes, single rtentry can point
to multiple nexthops.
With that in mind, convert rti_filter callback used when traversing the
routing table to accept pair (rt, nhop) instead of nexthop.
Name each ehci driver uniquely.
This remove the warning printed at each arm boot :
module_register: cannot register simplebus/ehci from kernel; already loaded from kernel
A similar fix was done in r333074 but imx_ehci wasn't renamed and generic_ehci wasn't
present at that time.
Scott Long [Thu, 16 Apr 2020 04:17:06 +0000 (04:17 +0000)]
Add support for some IOCFacts fields that are available with mpr (12Gb)
controllers. It's ugly due to the single codebase for mpr and mps and
not being able to share their respective headers.
Scott Long [Thu, 16 Apr 2020 03:28:28 +0000 (03:28 +0000)]
Add a small hack to the ioctl header files so that both mpr and mps can
be included. This isn't a great solution, but fixing it correctly is a
bigger task and this is the lesser of the existing evils.
prison0's hostuuid will get set by the hostid rc script, either after
generating it and saving it to /etc/hostid or by simply reading /etc/hostid.
Some things (e.g. arbitrary MAC address generation) may use the hostuuid as
a factor in early boot, so providing a way to read /etc/hostid (if it's
available) and using it before userland starts up is desirable. The code is
written such that the preload doesn't *have* to be /etc/hostid, thus not
assuming that there will be newline at the end of the buffer or even the
exact shape of the newline. White trailing whitespace/non-printables
trimmed, the result will be validated as a valid uuid before it's used for
early boot purposes.
The preload can be turned off with hostuuid_load="NO" in /boot/loader.conf,
just as other preloads; it's worth noting that this is a 37-byte file, the
overhead is believed to be generally minimal.
It doesn't seem necessary at this time to be concerned with kern.hostid.
One does wonder if we should consider validating hostuuids coming in
via jail_set(2); some bits seem to care about uuid form and we bother
validating format of smbios-provided uuid and in-fact whatever uuid comes
from /etc/hostid.
Stop attempting to use FADT legacy hardware flag, it is almost always
a lie.
Instead, unless user explicitly disabled the calibration, calibrate
against 8254 ISA clock. Then, obtain the rough value of the expected
TSC frequency from CPUID leafs 0x15/0x16 or even from the CPU
marketing name string. If calibration results look unbelievably bogus
comparing with CPUID leafs report, use CPUID one.
Intel does not recommend to use CPUID leaf 0x16 for the value of the
system time frequency, indeed the error there might be up to 1% which
e.g. makes ntpd give up. If ISA clock is present, we win, if not, we
get some frequency that allows the machine to boot without enormous
delay.
Next improvement would be to use HPET for re-calibration if we decided
that ISA clock gives bogus results, after HPETs are enumerated. This
is a much bigger change since we probably would need to re-evaluate
some constants depending on TSC frequency.
Reviewed by: emaste, jhb, scottl
Tested by: scottl
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D24426
Rick Macklem [Wed, 15 Apr 2020 21:27:52 +0000 (21:27 +0000)]
Fix the NFSv4.2 extended attribute support for remove extended attrbute.
I missed the "atomic" field of the RemoveExtendedAttribute operation's
reply when I implemented it. It worked between FreeBSD client and server,
since it was missed for both, but it did not conform to RFC 8276.
This patch adds the field for both client and server.
Thanks go to Frank for doing interoperability testing of the extended
attribute support against patches for Linux.
Submitted by: Frank van der Linden <fllinden@amazon.com>
Reported by: Frank van der Linden <fllinden@amazon.com>
Default moea64_bpvo_pool_size 327680 was insufficient for initial
memory mapping at boot time on systems with, for example, 64G and
no huge pages enabled.
Submitted by: Andre Silva <afscoelho@gmail.com>
Reviewed by: jhibbits, alfredo
Approved by: jhibbits (mentor)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D24102
Brooks Davis [Wed, 15 Apr 2020 20:23:55 +0000 (20:23 +0000)]
Export argc, argv, envc, envv, and ps_strings in auxargs.
This simplifies discovery of these values, potentially with reducing the
number of syscalls we need to make at runtime. Longer term, we wish to
convert the startup process to pass an auxargs pointer to _start() and
use that rather than walking off the end of envv. This is cleaner,
more C-friendly, and for systems with strong bounds (e.g. CHERI)
necessary.
Brooks Davis [Wed, 15 Apr 2020 20:19:59 +0000 (20:19 +0000)]
Introduce an AUXARGS_ENTRY_PTR() macro.
As the name implys, it uses the a_ptr member of the auxarg entry (except
in compat32 where it uses a_val). This is more correct and required for
systems where a_val is not the same size or hardware type as a_ptr (e.g.
CHERI).