Simon J. Gerraty [Fri, 15 Jan 2021 01:33:05 +0000 (17:33 -0800)]
pkgfs_open: follow symlinks
Caller is not interested in symlinks follow them.
Throw an error if too many links encountered.
Reviewed by: stevek
Sponsored by: Juniper Networks
--This line, and those below, will be ignored--
> Description of fields to fill in above: 76 columns --|
> PR: If a GNATS PR is affected by the change.
> Differential Revision: https://reviews.freebsd.org/D### (*full* phabric URL needed).
> Submitted by: If someone else sent in the change.
> Reviewed by: If someone else reviewed your modification.
> Approved by: If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email.
> MFH: Ports tree branch name. Request approval for merge.
> Relnotes: Set to 'yes' for mention in release notes.
> Security: Vulnerability reference (one per line) or description.
> Sponsored by: If the change was sponsored by an organization.
> Empty fields above will be automatically removed.
Ed Maste [Wed, 13 Jan 2021 18:08:31 +0000 (13:08 -0500)]
elfctl: prefix disable flags with "no"
Some ELF feature flags indicate a request to opt-out of some feature,
for example NT_FREEBSD_FCTL_ASLR_DISABLE indicates that ASLR should be
disabled for the tagged binary. Using "aslr" as the short name for the
flag is confusing as it seems to indicate a request for ASLR to be
enabled. Rename "noaslr", and make a similar change for other opt-out
flags.
Reviewed by: bapt, manu, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28139
Ed Maste [Wed, 13 Jan 2021 19:21:38 +0000 (14:21 -0500)]
elfctl: add backwards compatibility for "no" prefixes
I am going to prefix opt-out ELF feature flag names with "no" to make
their meaning more clear (review D28139), but there are some uses of the
existing names already (e.g., the PR referenced below).
For now accept the older, unprefixed name as well, and emit a warning.
We can revert this after FreeBSD 13 branches.
Michael Tuexen [Wed, 13 Jan 2021 21:48:17 +0000 (22:48 +0100)]
tcp: add sysctl to tolerate TCP segments missing timestamps
When timestamp support has been negotiated, TCP segements received
without a timestamp should be discarded. However, there are broken
TCP implementations (for example, stacks used by Omniswitch 63xx and
64xx models), which send TCP segments without timestamps although
they negotiated timestamp support.
This patch adds a sysctl variable which tolerates such TCP segments
and allows to interoperate with broken stacks.
Add the *.swp and *~ pattern for vim temporary files. Expand the
cscope ones to include all files possibly generated by cscope and also
add some known object formats.
Also remove the leading '?' from cscope.out, or else it doesn't match
the cscope.out file generated by default (as it expects an extra
character in front).
While the package names are a bit prettier this confuse pkg about upgrading :
$ pkg version -t 13.0.s2021011313063 13.0.a1
>
$ pkg version -t 13.0.s2021011313063 13.0_ALPHA1
<
Note that the current scheme isn't good when bumping from ALPHA to BETA or
even BETA to RC:
$ pkg version -t 13.0_ALPHA1 13.0_BETA1
=
$ pkg version -t 13.0_BETA1 13.0_RC1
=
But more thoughts have to be put into this renaming.
Adrian Chadd [Tue, 12 Jan 2021 21:13:20 +0000 (13:13 -0800)]
[mips] revert r366664 - flip mips back from -O2 to -O
Now that I have -head fitting in 8MB of flash again, I can test
out freebsd-head on my home AP test setup. Unfortunately,
the introduction of -O2 in r366664 causes the following infinite
loop shortly after boot:
------
MAP: No valid partition found at map/rootfs.uzip
Warning: no time-of-day clock registered, system time will not be set accurately
start_init: trying /sbin/init
BAD_PAGE_FAULT: pid 1 tid 100001 (init), uid 0: pc 0x4042c320 got a read fault (type 0x2) at 0x2e3a0
Trapframe Register Dump:
zero: 0 at: 0 v0: 0 v1: 0
a0: 0x1af34 a1: 0 a2: 0 a3: 0x7fffeff0
t0: 0 t1: 0 t2: 0 t3: 0
t4: 0 t5: 0 t6: 0 t7: 0
t8: 0 t9: 0x152e8 s0: 0x7fffee84 s1: 0
s2: 0 s3: 0 s4: 0 s5: 0
s6: 0 s7: 0 k0: 0 k1: 0
gp: 0x362c0 sp: 0x7fffedf0 s8: 0 ra: 0x40417df0
sr: 0xf413 mullo: 0 mulhi: 0 badvaddr: 0x2e3a0
cause: 0xffffffff80000008 pc: 0x4042c31c
Page table info for pc address 0x4042c320: pde = 0x80712000, pte = 0xa002065a
Dumping 4 words starting at pc address 0x4042c320: 8f9980e0808200001040006700809825
Page table info for bad address 0x2e3a0: pde = 0, pte = 0
------
I'm not yet sure why, but until I figure it out with the mips64/cheri
folk this should be reverted.
This should only use -O on GCC generated code for MIPS platforms.
Kyle Evans [Thu, 14 Jan 2021 06:34:29 +0000 (00:34 -0600)]
build: `make check`: use a PATH search instead for Kyua
which(1) accepts both relative/absolute paths as well as lone binary
names. Set KYUA to kyua and use which(1) to confirm that it can find one;
if it cannot, just advise the user to set KYUA directly to the kyua binary
rather than assuming a relative location from LOCALBASE.
This allows `make check` to be operated with the version of kyua in base
without losing the flexibility of specifying another one.
ngie@ notes that the original intention was to avoid redundant $PATH lookups
and improve the determinism of the target. A future change will likely push
us back to this state, perhaps in the form of reverting this entirely and
just switching to using kyua in base. Accepting any in $PATH should be
considered a transitional move, at least until it's declared otherwise,
since kyua was only semi-recently added to base.
Kyle Evans [Thu, 14 Jan 2021 06:33:07 +0000 (00:33 -0600)]
tools: git hooks: drop "submitted by" from commit template
With the switch to git, we should strive to properly attribute every
commit appropriately with the metadata that's provided to do so. In this
case, the submitter should be recorded via the author metadata. Committing
an arbitrary patch, one can set it as such:
git commit --author="John Smith <smith@example.com>"
Mitchell Horne [Wed, 13 Jan 2021 18:30:50 +0000 (14:30 -0400)]
arm64: fix early devmap assertion
The purpose of this KASSERT is to ensure that we do not run out of space
in the early devmap. However, the devmap grew beyond its initial size of
2MB in r336519, and this assertion did not grow with it.
A devmap mapping of a 1080p framebuffer requires 1920x1080 bytes, or
1.977 MB, so it is just barely able to fit without triggering the
assertion, provided no other devices are mapped before it. With the
addition of `options GDB` in GENERIC by bbfa199cbc16, the uart is now
mapped for the purposes of a debug port, before mapping the framebuffer.
The presence of both these conditions pushes the selected virtual
address just below the threshold, triggering the assertion.
To fix this, use the correct size of the devmap, defined by
PMAP_MAPDEV_EARLY_SIZE. Since this code is shared with RISC-V, define
it for that platform as well (although it is a different size).
PR: 25241
Reported by: gbe
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
John Baldwin [Wed, 13 Jan 2021 21:13:01 +0000 (13:13 -0800)]
Enable accelerated AES-XTS software crypto in GENERIC.
In particular, using GELI on a root filesystem will only use
accelerated software crypto drivers if they are available before the
root filesystem is mounted. While these modules can be loaded from
the loader, including them in GENERIC provides a better out-of-the-box
experience for users.
Both aesni(4) and armv8crypto(4) provide accelerated implementations
of the default cipher used by GELI (AES-XTS) in addition to other
ciphers.
Kristof Provost [Wed, 13 Jan 2021 18:30:01 +0000 (19:30 +0100)]
pf: Don't hold PF_RULES_WLOCK during copyin() on DIOCRCLRTSTATS
We cannot hold a non-sleepable lock during copyin(). This means we can't
safely count the table, so instead we fall back to the pf_ioctl_maxcount
used in other ioctls to protect against overly large requests.
Emmanuel Vadot [Wed, 13 Jan 2021 17:23:51 +0000 (18:23 +0100)]
Add driver for Synopsys Designware Watchdog timer.
This driver supports some arm and arm64 boards equipped with
"snps,dw-wdt"-compatible watchdog device.
Tested on RK3399-based board (RockPro64).
Once started watchdog device cannot be stopped.
Interrupt handler has mode to kick watchdog even when software does not do it
properly.
This can be controlled via sysctl: dev.dwwdt.prevent_restart.
Also - driver handles system shutdown and prevents from restart when system
is asked to reboot.
Toomas Soome [Wed, 13 Jan 2021 17:05:51 +0000 (19:05 +0200)]
loader.efi: initial terminal size should base on UEFI terminal size
We do select font based on desired terminal size, we do query
UEFI terminal size with conout->QueryMode(), but by mistake, the fallback
values are used.
Removed code iterates over if_addrhead and tries to remove
routes for each ifa.
This is exactly the thing that if_purgeaddrs() do, and
if_purgeaddr() is already called in the end.
Currently we create link-local route by creating an always-on IPv6 prefix
in the prefix list. This prefix is not tied to the link-local ifa.
This leads to the following problems:
First, when flushing interface addresses we skip on-link route, leaving
fe80::/64 prefix on the interface without any IPv6 addresses.
Second, when creating and removing link-local alias we lose fe80::/64 prefix
from the routing table.
Fix this by attaching link-local prefix to the ifa at the initial creation.
linux: bump the default version from 3.10.0 to 3.17.0
This is required for Qt5, as found in Ubuntu Focal. The library contains
the minimum kernel version encoded in an ELF note; this makes rtld ignore
it altogether, with a confusing error message. Without it, things fail
like this:
$ konsole: error while loading shared libraries: libQt5Core.so.5: cannot
open shared object file: No such file or directory
For reference, the Qt kernel version requirements can be found at:
https://github.com/qt/qtbase/blob/dev/src/corelib/global/minimum-linux_p.h
Sponsored by: The FreeBSD Foundation
Reviewed By: emaste
Differential Revision: https://reviews.freebsd.org/D28105
Ed Maste [Wed, 13 Jan 2021 03:24:52 +0000 (22:24 -0500)]
elftcl: add -i flag to ignore unknown flags
This may allow an identical elfctl invocation to be used on multiple
FreeBSD versions, with features not implemented on older releases being
silently ignored.
PR: 252629 (related)
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28130
vm_map_protect: allow to set prot and max_prot in one go.
This prevents a situation where other thread modifies map entries
permissions between setting max_prot, then relocking, then setting prot,
confusing the operation outcome. E.g. you can get an error that is not
possible if operation is performed atomic.
Also enable setting rwx for max_prot even if map does not allow to set
effective rwx protection.
Reviewed by: brooks, markj (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28117
Mitchell Horne [Tue, 12 Jan 2021 21:38:21 +0000 (17:38 -0400)]
if_wg: fix modules load on !x86
Only x86 provides optimized implementations via the blake2 module. The
software "reference" implementation is already included in the crypto(4)
module, we can drop the extra MODULE_DEPEND for other platforms.
Without this change, if_wg.ko could not be loaded due to the missing
dependency.
PR: 252156
Reported by: gbe
Sponsored by: The FreeBSD Foundation
Rick Macklem [Tue, 12 Jan 2021 21:59:52 +0000 (13:59 -0800)]
nfs-over-tls: handle res.gid.gid_val correctly for memory allocation
When the server side nfs-over-tls does an upcall to rpc.tlsservd(8)
for the handshake and the rpc.tlsservd "-u" command line option has
been specified, a list of gids may be returned.
The list will be returned in malloc'd memory pointed to by
res.gid.gid_val. To ensure the malloc occurs, res.gid.gid_val must
be NULL before the call. Then, the malloc'd memory needs to be free'd.
mem_free() just calls free(9), so a NULL pointer argument is fine
and a length argument == 0 is ok, since the "len" argument is not used.
This bug would have only affected nfs-over-tls and only when
rpc.tlsservd(8) is running with the "-u" command line option.
Mariusz Zaborski [Tue, 12 Jan 2021 18:38:10 +0000 (19:38 +0100)]
casper: convert macros to inline functions
In libcasper, the first argument to the function is a structure that
represents a connection to Casper. On systems without Casper, macros
are used to interpose the Casper functions to standard libc ones.
This may cause errors/warnings that the variable is not used.
With the inline function, there is no such problem.
When detaching the if_ure(4) driver, the TX active USB transfer array may
point to freed USB transfers. Given that the number of USB transfers is
very low, simply start all transfers every time there is a packet to
keep safe from use-after-free.
mhorne [Wed, 4 Nov 2020 17:51:10 +0000 (13:51 -0400)]
riscv pmap: add some pv list assertions
Ensure that we don't end up with a superpage in the vm_page_t's pv list.
This may help with debugging the panic reported in PR 250866, in which
l3 in pmap_remove_write() was found to be NULL. Adding a KASSERT to this
function will help narrow down the cause of this panic the next time it
occurs.
Andrew Turner [Tue, 12 Jan 2021 12:14:09 +0000 (12:14 +0000)]
Handle using a sub instruction in the arm64 fbt
Some stack frames are too large for a store pair instruction we already
detect in the arm64 fbt code. Add support for handling subtracting the
stack pointer directly.
Andrew Turner [Tue, 12 Jan 2021 11:37:06 +0000 (11:37 +0000)]
Only allow a store through sp in the arm64 fbt
When searching for an instruction to patch out in the arm64 function
boundary trace we search for a store pair with a write back. This
instruction is commonly used to store two registers to the stack
and update the stack pointer to hold space for more.
This works in many cases, however not all functions use this, e.g.
when the stack frame is too large. In these cases we may find another
instruction of the same type that doesn't store through the stack
pointer. Filter these instructions out and assume if we see one we
are past the function prologue.
Emmanuel Vadot [Tue, 12 Jan 2021 11:02:38 +0000 (12:02 +0100)]
linuxkpi: add kernel_fpu_begin/kernel_fpu_end
With newer AMD GPUs (>=Navi,Renoir) there is FPU context usage in the
amdgpu driver.
The `kernel_fpu_begin/end` implementations in drm did not even allow nested
begin-end blocks.
Emmanuel Vadot [Tue, 22 Dec 2020 18:15:01 +0000 (19:15 +0100)]
linuxkpi: Add shrinker support
A driver can register a shrinker that will be called when the kernel
wants to free some memory.
Add support for that in linuxkpi and call the registered shrinkers
when the lowmem event is triggered.
Emmanuel Vadot [Thu, 10 Dec 2020 16:47:11 +0000 (17:47 +0100)]
linuxkpi: Add more pci functions needed by DRM
-pci_get_class : This function search for a matching pci device based on
the class/subclass and returns a newly created pci_dev.
- pci_{save,restore}_state : This is analogous to ours with the same name
- pci_is_root_bus : Return true if this is the root bus
- pci_get_domain_bus_and_slot : This function search for a matching pci
device based on domain, bus and slot/function concat into a single
unsigned int (devfn) and returns a newly created pci_dev
- pci_bus_{read,write}_config* : Read/Write to the config space.
While here add some helper function to alloc and fill the pci_dev struct.
Emmanuel Vadot [Thu, 10 Dec 2020 17:38:41 +0000 (18:38 +0100)]
pci: Add pci_find_class_from
pci_find_class_from help finding one or multiple device matching
a class and subclass.
If the from argument is not null we will first loop in the device list
until we find the matching device and only then start to check if the
class/subclass matches.
Toomas Soome [Mon, 11 Jan 2021 21:54:23 +0000 (21:54 +0000)]
loader.efi: reworked framebuffer setup
Pass gfx_state to efi_find_framebuffer(), so we can pick between
GOP and UGA in efi_find_framebuffer(), also we can then
set up struct gen_fb in gfx_state from efifb and isolate efi fb data
processing into framebuffer.c.
This change does allow us to clean up efi_cons_init() and reduce
BS->LocateProtocol() calls.
A little downside is that we now need to translate gen_fb back to
efifb in bootinfo.c (for passing to kernel), and we need to add few
-I options to CFLAGS.
libthr malloc: support recursion on thr_malloc_umtx.
One possible way the recursion can happen is during fork: suppose
that fork is called from early code that did not triggered
jemalloc(3) initialization yet. Then we lock thr_malloc lock, and
call malloc_prefork() that might require initialization of jemalloc
pthread_mutexes, calling into libthr malloc. It is safe to allow
recursion for this occurence.
PR: 252579
Reported by: Vasily Postnicov <shamaz.mazum@gmail.com>
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
sigfastblock: do not skip cursig/postsig loop in ast()
Even if sigfastblock block is non-zero, non-blockable signals must be
checked on ast and delivered now. This also affects debugger ability
to attach, because issignal() also calls ptracestop() if there is
a pending stop for debugee.
Instead of checking for sigfastblock, and either setting PENDING flag
for usermode or doing signal delivery loop, always do the loop after
checking, and then handle PENDING bit. issignal() already does the right
thing for fast-blocked case, allowing only STOPs and SIGKILL delivery to
happen.
Reported by: Vasily Postnicov <shamaz.mazum@gmail.com>, markj
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28089
Mateusz Guzik [Tue, 12 Jan 2021 08:47:32 +0000 (08:47 +0000)]
amd64: fix tlb shootdown when all cpus are passed in the bitmap
Right now the routine leaves the current CPU in the map, later tripping
on an assert when filling in the scoreboard: panic: IPI scoreboard is
zero, initiator 1 target 1
Instead pre-check if all CPUs are present in the map and remember that
outcome for later.
Alan Somers [Sun, 10 Jan 2021 03:23:05 +0000 (20:23 -0700)]
lio_listio: validate aio_lio_opcode
Previously, we would accept any kind of LIO_* opcode, including ones
that were intended for in-kernel use only like LIO_SYNC (which is not
defined in userland). The situation became more serious with 022ca2fc7fe08d51f33a1d23a9be49e6d132914e. After that revision, setting
aio_lio_opcode to LIO_WRITEV or LIO_READV would trigger an assertion.
Note that POSIX does not specify what should happen if aio_lio_opcode is
invalid.
Charlie Root [Tue, 12 Jan 2021 01:56:12 +0000 (18:56 -0700)]
ICMP checksum test: Fix for big endian
The in_cksum tests originally tried to simulate a BE environment by
swapping the byte order of the input. But that's overcomplicated, and
didn't actually work on real BE hardware. The correct testing strategy
is just to test on the native endianness, and run the tests in both BE
and LE environments.
Andrew Gallatin [Tue, 12 Jan 2021 01:03:37 +0000 (20:03 -0500)]
amd64: compare TLB shootdown target to all_cpus
On amd64, the pmap code passes all_cpus to
smp_targeted_tlb_shootdown() when unmapping from the
kernel pmap. This function has an optimized path to send IPIs
to all but itself, which it intends to do when the target
is all cpus. However, we need to compare the target cpu mask
with all_cpus, rather than using CPU_ISFULLSET(). Comparing with
CPU_ISFULLSET() will only work when we have MAXCPU cpus active in
the system, otherwise, we'll be sending repeated IPIs, rather than
a single IPI to all CPUs but ourself.
Fixing this should reduce the time spent in native_lapic_ipi_wait()
as we will be sending ipis in parallel, rather than one-by-one.
This is confirmed by dtrace.
Kirk McKusick [Tue, 12 Jan 2021 00:44:41 +0000 (16:44 -0800)]
Eliminate lock order reversal in UFS ffs_unmount().
UFS uses a new "mntfs" pseudo file system which provides private
device vnodes for a file system to safely access its disk device.
The original device vnode is saved in um_odevvp to hold the exclusive
lock on the device so that any attempts to open it for writing will
fail. But it is otherwise unused and has its BO_NOBUFS flag set to
enforce that file systems using mntfs vnodes do not accidentally
use the original devfs vnode. When the file system is unmounted,
um_odevvp is no longer needed and is released.
The lock order reversal happens because device vnodes must be locked
before UFS vnodes. During unmount, the root directory vnode lock
is held. When when calling vrele() on um_odevvp, vrele() attempts to
exclusive lock um_odevvp causing the lock order reversal. The problem
is eliminated by doing a non-blocking exclusive lock on um_odevvp
which will always succeed since there are no users of um_odevvp.
With um_odevvp locked, it can be released using vput which does not
attempt to do a blocking exclusive lock request and thus avoids the
lock order reversal.
For rate-based resources that support throttling (e.g.
readiops/writeips), this fixes a divide-by-zero panic when rctl(8)
passes 0 as the throttle value. For these resources, treat
zero-throttle requests as requests to suspend forward progress as long
as possible using the duration specified in
kern.racct.rctl.throttle_max.
Use rn_match instead of doing indirect calls in fib_algo.
Relevant inet/inet6 code has the control over deciding what
the RIB lookup function currently is. With that in mind,
explicitly set it to the current value (rn_match) in the
datapath lookups. This avoids cost on indirect call.
exec_new_vmspace: print useful error message on ctty if stack cannot be mapped.
After old vmspace is destroyed during execve(2), but before the new space
is fully constructed, an error during image activation cannot be returned
because there is no executing program to receive it.
In the relatively common case of failure to map stack, print some hints
on the control terminal. Note that user has enough knobs to cause stack
mapping error, and this is the most common reason for execve(2) aborting
the process.
Requested by: jhb
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28050
It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE |
PROT_EXEC are never specified together, if vm_map has MAP_WX flag set.
FreeBSD control flag allows specific binary to request WX exempt, and
there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/
disable globally.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28050
Kristof Provost [Mon, 11 Jan 2021 13:09:08 +0000 (14:09 +0100)]
pfctl: Another set skip <group> fix
When retrieving the list of group members we cannot simply use
ifa_lookup(), because it expects the interface to have an IP (v4 or v6)
address. This means that interfaces with no address are not found.
This presents as interfacing being alternately marked as skip and not
whenever the rules are re-loaded.
Happily we only need to fix ifa_grouplookup(). Teach it to also accept
AF_LINK (i.e. interface) node_hosts.