0mp [Tue, 17 Nov 2020 10:48:01 +0000 (10:48 +0000)]
Clean up the synopsis section & fix mandoc warnings
The synopsis section had two very similar entries. The flags documented by
the first one were a strict subset of the second one. Let's just keep only
the second entry for simplicity.
andrew [Tue, 17 Nov 2020 10:27:42 +0000 (10:27 +0000)]
Stop calling gic_v3_detach when we haven't called gic_v3_attach
The former tries to dereference memory allocated by the latter. If counting
the redistributor fails it may try to dereference memory that was never
allocated.
kevans [Tue, 17 Nov 2020 03:36:58 +0000 (03:36 +0000)]
umtx_op: reduce redundancy required for compat32
All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.
umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.
kevans [Tue, 17 Nov 2020 03:34:01 +0000 (03:34 +0000)]
_umtx_op: fix a compat32 bug in UMTX_OP_NWAKE_PRIVATE
Specifically, if we're waking up some value n > BATCH_SIZE, then the
copyin(9) is wrong on the second iteration due to upp being the wrong type.
upp is currently a uint32_t**, so upp + pos advances it by twice as many
elements as it should (host pointer size vs. compat32 pointer size).
Fix it by just making upp a uint32_t*; it's still technically a double
pointer, but the distinction doesn't matter all that much here since we're
just doing arithmetic on it.
Add a test case that demonstrates the problem, placed with the libthr tests
since one messing with _umtx_op should be running these tests. Running under
compat32, the new test case will hang as threads after the first 128 get
missed in the wake. it's not immediately clear how to hit it in practice,
since pthread_cond_broadcast() uses a smaller (sleepq batch?) size observed
to be around ~50 -- I did not spend much time digging into it.
The uintptr_t change makes no functional difference, but i've tossed it in
since it's more accurate (semantically).
br [Mon, 16 Nov 2020 21:55:52 +0000 (21:55 +0000)]
Introduce IOMMU support for arm64 platform.
This adds an arm64 iommu interface and a driver for Arm System Memory
Management Unit version 3.2 (ARM SMMU v3.2) specified in ARM IHI 0070C
document.
Hardware overview is provided in the header of smmu.c file.
The support is disabled by default. To enable add 'options IOMMU' to your
kernel configuration file.
The support was developed on Arm Neoverse N1 System Development Platform
(ARM N1SDP), kindly provided by ARM Ltd.
Currently, PCI-based devices and ACPI platforms are supported only.
The support was tested on IOMMU-enabled Marvell SATA controller,
Realtek Ethernet controller and a TI xHCI USB controller with a low to
medium load only.
Many thanks to Konstantin Belousov for help forming the generic IOMMU
framework that is vital for this project; to Andrew Turner for adding
IOMMU support to MSI interrupt code; to Mark Johnston for help with SMMU
page management; to John Baldwin for explaining various IOMMU bits.
mhorne [Mon, 16 Nov 2020 18:41:49 +0000 (18:41 +0000)]
bsdiff: fix off-by-one error
The program reads oldsize bytes from oldfile, and proceeds to initialize
a suffix array of oldsize elements using divsufsort(). As per the
function's API [1], array indices 0 through n-1 are initialized.
Later, search() is called, but with index bounds [0, n]. Depending on
the contents of the malloc'd buffer, accessing this uninitialized index
at the end of can result in a segmentation fault. Fix this by passing
oldsize-1 to search(), limiting the search bounds to [0, n-1].
This bug is a result of r303285, which introduced divsufsort() as an
alternate suffix sorting function to the existing qsufsort(). It seems
that qsufsort() did initialize the final empty element, meaning it could
be safely accessed. This difference in the implementations was missed at
the time.
[1] https://github.com/y-256/libdivsufsort
Discussed with: cperciva
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26911
hselasky [Mon, 16 Nov 2020 10:15:03 +0000 (10:15 +0000)]
Make mlx5_cmd_exec_cb() a safe API in mlx5core.
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.
The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.
hselasky [Mon, 16 Nov 2020 10:03:18 +0000 (10:03 +0000)]
Use mlx5core to create/destroy all Dynamically Connected Targets, DCTs.
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling drain DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.
grehan [Sun, 15 Nov 2020 12:59:24 +0000 (12:59 +0000)]
Fix regression in AHCI controller settings.
When the AHCI code was reworked to use FreeBSD struct
definitions, the valid element was mis-transcribed resulting
in the UMDA capability being hidden. This prevented Illumos
from using AHCI disk/cdrom drives.
Fix by using definitions that match the code pre-rework.
scottl [Sun, 15 Nov 2020 07:50:29 +0000 (07:50 +0000)]
Fix the previous revision, it suffered from an incomplete change to the
getlocalbase API. Also don't erroneously subtract the lenth from the
buffer a second time.
scottl [Sun, 15 Nov 2020 07:48:52 +0000 (07:48 +0000)]
Because getlocalbase() returns -1 on error, it needs to use a signed type
internally. Do that, and make sure that conversations between signed and
unsigned don't overflow
mjg [Sat, 14 Nov 2020 19:23:07 +0000 (19:23 +0000)]
zfs: disable periodic arc updates
They are only there to provide less innacurate statistics for debuggers.
However, this is quite heavy-weight and instead it would be better to
teach debuggers how to obtain the necessary information.
bapt [Sat, 14 Nov 2020 19:16:39 +0000 (19:16 +0000)]
Change the default locale to C.UTF-8
The C.UTF-8 locales is the same as the actual C locale except it does support
the unicode character set. But the collation etc are still the same as the C
locale one.
Reviewed by: many
Approved by: many
Differential Revision: https://reviews.freebsd.org/D26973
scottl [Sat, 14 Nov 2020 17:57:50 +0000 (17:57 +0000)]
Add the library function getlocalbase and its manual page. This helps to
unify the retrieval of the various ways that the local software base directory,
typically "/usr/local", is expressed in the system.
Reviewed by: se
Differential Revision: https://reviews.freebsd.org/D27022
jtl [Sat, 14 Nov 2020 14:50:34 +0000 (14:50 +0000)]
Fix implicit automatic local port selection for IPv6 during connect calls.
When a user creates a TCP socket and tries to connect to the socket without
explicitly binding the socket to a local address, the connect call
implicitly chooses an appropriate local port. When evaluating candidate
local ports, the algorithm checks for conflicts with existing ports by
doing a lookup in the connection hash table.
In this circumstance, both the IPv4 and IPv6 code look for exact matches
in the hash table. However, the IPv4 code goes a step further and checks
whether the proposed 4-tuple will match wildcard (e.g. TCP "listen")
entries. The IPv6 code has no such check.
The missing wildcard check can cause problems when connecting to a local
server. It is possible that the algorithm will choose the same value for
the local port as the foreign port uses. This results in a connection with
identical source and destination addresses and ports. Changing the IPv6
code to align with the IPv4 code's behavior fixes this problem.
0mp [Sat, 14 Nov 2020 13:07:41 +0000 (13:07 +0000)]
Document the PAGER environment variable
Sometimes users want to use freebsd-update(8) in a non-interactive way and
what they often miss is that they have to set PAGER to cat(1) in order to
avoid interactive prompts from less(1).
wulf [Sat, 14 Nov 2020 10:34:18 +0000 (10:34 +0000)]
LinuxKPI: Exclude linux/acpi.h content on non-ACPI archs.
LinuxKPI ACPI support is based on FreeBSD import of ACPICA which can be
compiled only on aarch64, amd64 and i386. Ifdef-out broken parts on our
side to avoid patching of vendor code.
kib [Sat, 14 Nov 2020 05:30:10 +0000 (05:30 +0000)]
Handle LoR in flush_pagedep_deps().
When operating in SU or SU+J mode, ffs_syncvnode() might need to
instantiate other vnode by inode number while owning syncing vnode
lock. Typically this other vnode is the parent of our vnode, but due
to renames occuring right before fsync (or during fsync when we drop
the syncing vnode lock, see below) it might be no longer parent.
More, the called function flush_pagedep_deps() needs to lock other
vnode while owning the lock for vnode which owns the buffer, for which
the dependencies are flushed. This creates another instance of the
same LoR as was fixed in softdep_sync().
Put the generic code for safe relocking into new SU helper
get_parent_vp() and use it in flush_pagedep_deps(). The case for safe
relocking of two vnodes with undefined lock order was extracted into
vn helper vn_lock_pair().
Due to call sequence
ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(),
ffs_syncvnode() indicates with ERELOOKUP that passed vnode was
unlocked in process, and can return ENOENT if the passed vnode
reclaimed. All callers of the function were inspected.
Because UFS namei lookups store auxiliary information about directory
entry in in-memory directory inode, and this information is then used
by UFS code that creates/removed directory entry in the actual
mutating VOPs, it is critical that directory vnode lock is not dropped
between lookup and VOP. For softdep_prelink(), which ensures that
later link/unlink operation can proceed without overflowing the
journal, calls were moved to the place where it is safe to drop
processing VOP because mutations are not yet applied. Then, ERELOOKUP
causes restart of the whole VFS operation (typically VFS syscall) at
top level, including the re-lookup of the involved pathes. [Note that
we already do the same restart for failing calls to vn_start_write(),
so formally this patch does not introduce new behavior.]
Similarly, unsafe calls to fsync in snapshot creation code were
plugged. A possible view on these failures is that it does not make
sense to continue creating snapshot if the snapshot vnode was
reclaimed due to forced unmount.
It is possible that relock/ERELOOKUP situation occurs in
ffs_truncate() called from ufs_inactive(). In this case, dropping the
vnode lock is not safe. Detect the situation with VI_DOINGINACT and
reschedule inactivation by setting VI_OWEINACT. ufs_inactive()
rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed
this way.
In ffs_truncate(), allocation of the EOF block for partial truncation
is re-done after vnode is synced, since we cannot leave the buffer
locked through ffs_syncvnode().
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
kib [Sat, 14 Nov 2020 05:17:04 +0000 (05:17 +0000)]
Add a framework that tracks exclusive vnode lock generation count for UFS.
This count is memoized together with the lookup metadata in directory
inode, and we assert that accesses to lookup metadata are done under
the same lock generation as they were stored. Enabled under DIAGNOSTICS.
UFS saves additional data for parent dirent when doing lookup
(i_offset, i_count, i_endoff), and this data is used later by VOPs
operating on dirents. If parent vnode exclusive lock is dropped and
re-acquired between lookup and the VOP call, we corrupt directories.
Framework asserts that corruption cannot occur that way, by tracking
vnode lock generation counter. Updates to inode dirent members also
save the counter, while users compare current and saved counters
values.
Also, fix a case in ufs_lookup_ino() where i_offset and i_count could
be updated under shared lock. It is not a bug on its own since dvp
i_offset results from such lookup cannot be used, but it causes false
positive in the checker.
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
kib [Sat, 14 Nov 2020 05:10:39 +0000 (05:10 +0000)]
Add a framework that tracks exclusive vnode lock generation count for UFS.
This count is memoized together with the lookup metadata in directory
inode, and we assert that accesses to lookup metadata are done under
the same lock generation as they were stored. Enabled under DIAGNOSTICS.
UFS saves additional data for parent dirent when doing lookup
(i_offset, i_count, i_endoff), and this data is used later by VOPs
operating on dirents. If parent vnode exclusive lock is dropped and
re-acquired between lookup and the VOP call, we corrupt directories.
Framework asserts that corruption cannot occur that way, by tracking
vnode lock generation counter. Updates to inode dirent members also
save the counter, while users compare current and saved counters
values.
Also, fix a case in ufs_lookup_ino() where i_offset and i_count could
be updated under shared lock. It is not a bug on its own since dvp
i_offset results from such lookup cannot be used, but it causes false
positive in the checker.
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
rmacklem [Sat, 14 Nov 2020 01:49:49 +0000 (01:49 +0000)]
Fix startup of gssd when /usr is a separately mounted local file system.
meowthink@gmail.com reported that the gssd daemon was not
starting, because /etc/rc.d/gssd was executed before his local
/usr file system was mounted.
He fixed the problem by adding mountcritlocal to the REQUIRED
line.
This fix seems safe and works for a separately mounted /usr file
system on a local disk.
The case of a separately mounted remote /usr file system (such as
NFS) is still broken, but there is no obvious solution for that.
Adding mountcritremote would fix the problem, but it would
cause a POLA violation, because all kerberized NFS mounts
in /etc/fstab would need the "late" option specified to work.
emaste [Fri, 13 Nov 2020 19:08:42 +0000 (19:08 +0000)]
Fix `make makeman` after r367577
WITH_INIT_ALL_ZERO and WITH_INIT_ALL_PATTERN are mutually exclusive.
The .error when they were both set broke makeman so demote it to a
warning (and presumably the compiler will fail on an error later on).
We could improve this to make one take precedence but this is sufficient
for now.
MFC with: r367577
Sponsored by: The FreeBSD Foundation
bdragon [Fri, 13 Nov 2020 16:56:03 +0000 (16:56 +0000)]
[PowerPC64LE] Radix MMU fixes for LE.
There were many, many endianness fixes needed for Radix MMU. The Radix
pagetable is stored in BE (as it is read and written to by the MMU hw),
so we need to convert back and forth every time we interact with it when
running in LE.
With these changes, I can successfully boot with radix enabled on POWER9 hw.
bdragon [Fri, 13 Nov 2020 16:49:41 +0000 (16:49 +0000)]
[PowerPC] Allow traversal of oversize OF properties.
In standards such as LoPAPR, property names in excess of the usual 31
characters exist.
This breaks property traversal.
While in IEEE 1275-1994, nextprop is defined explicitly to work with a
32-byte region of memory, using a larger buffer should be fine. There is
actually no way to pass a buffer length to the nextprop call in the OF
client interface, so SLOF actually just blindly overflows the buffer.
So we have to defensively make the buffer larger, to avoid memory
corruption when reading out long properties on live OF systems.
Note also that on real-mode OF, things are pretty tight because we are
allocating against a static bounce buffer in low memory, so we can't just
use a huge buffer to work around this without it being wasteful of our
limited amount of 32-bit physical memory.
This allows a patched ofwdump to operate properly on SLOF (i.e. pseries)
systems, as well as any other PowerPC systems with overlength properties.
kib [Fri, 13 Nov 2020 09:42:32 +0000 (09:42 +0000)]
Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level.
Restart syscalls and some sync operations when filesystem indicated
ERELOOKUP condition, mostly for VOPs operating on metdata. In
particular, lookup results cached in the inode/v_data is no longer
valid and needs recalculating. Right now this should be nop.
Assert that ERELOOKUP is catched everywhere and not returned to
userspace, by asserting that td_errno != ERELOOKUP on syscall return
path.
In collaboration with: pho
Reviewed by: mckusick (previous version), markj
Tested by: markj (syzkaller), pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26136
gnn [Thu, 12 Nov 2020 21:58:47 +0000 (21:58 +0000)]
An earlier commit effectively turned out the fast forwading path
due to its lack of support for ICMP redirects. The following commit
adds redirects to the fastforward path, again allowing for decent
forwarding performance in the kernel.
TCP SYNs in inner traffic will hit hardware listeners when VXLAN/NVGRE
rx parsing is enabled in the chip. t4_tom should pass on these SYNs to
the kernel and let it deal with them as if they arrived on the non-TOE
path.
Reported by: Sony at Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communications
* The Rust compiler produces SHF_ALLOC `.debug_gdb_scripts` (which
normally does not have the flag)
* `.debug_gdb_scripts` sections are removed from `inputSections` due
to --strip-debug/--strip-all
* When processing --gc-sections, pieces of a SHF_MERGE section can be
marked live separately
`=>` segfault when marking liveness of a `.debug_gdb_scripts` which
is not split into pieces (because it is not in `inputSections`)
This patch circumvents the problem by not treating SHF_ALLOC
".debug*" as debug sections (to prevent --strip-debug's stripping)
(which is still useful on its own).
0mp [Thu, 12 Nov 2020 17:28:29 +0000 (17:28 +0000)]
Remove macros from the width arguments passed to Bl macros
I've not removed the Er macro from one of the lists in example.9, however,
because it seems to be doing some special kind of magic. Let's leave it
there for now.
manu [Thu, 12 Nov 2020 14:04:08 +0000 (14:04 +0000)]
pkgbase: Move libprivatezstd from utilities to runtime
libarchive depends on it by default and tar uses libarchive.
So on a update :
1/ runtime contain tar
2/ runtime have libarchive in shlibs_required
3/ libarchive packages depends on utilities
4/ utilities depends on runtime
5/ kaboom
All users of libprivatezstd (libarchive related stuff and objcopy/ar)
are already in utilities.