amd64 pmap: microoptimize local shootdowns for PCID PTI configurations
When pmap operates in PTI mode, we must reload %cr3 on return to
userspace. In non-PCID mode the reload always flushes all non-global
TLB entries and we take advantage of it by only invalidating the KPT
TLB entries (there is no cached UPT entries at all).
In PCID mode, we flush both KPT and UPT TLB explicitly, but we can
take advantage of the fact that PCID mode command to reload %cr3
includes a flag to flush/not flush target TLB. In particular, we can
avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3
on return to usermode should be flushing. This is done by providing
either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is
automatically reset to all-1s on return to usermode.
Similarly, we can avoid flushing UPT TLB on context switch, replacing
it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID
PTI ifunc, leaving only 4 cases instead of 6. This trick is also
applicable both to the TLB shootdown IPI handlers, since handlers
interrupt the target thread.
But then we need to check pc_curpmap in handlers, and this would
reopen the same race for INVPCID machines as was fixed in r306350 for
non-INVPCID. To not introduce the same bug, unconditionally do
spinlock_enter() in pmap_activate().
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25483
Ruslan Bukin [Sat, 18 Jul 2020 13:10:31 +0000 (13:10 +0000)]
o Move iommu_test_boundary() to sys/iommu.h
o Rename DMAR -> IOMMU in comments
o Add IOMMU_PAGE_SIZE / IOMMU_PAGE_MASK macroses
o x86 only: dmar_quirks_pre_use() / dmar_instantiate_rmrr_ctxs()
While it doesn't trigger INVARIANTS or WITNESS on head it does in stable/12.
There's also no reason for it, as we can easily report the out of memory error
to the caller (i.e. userspace). All of these can already fail.
Fix bogomips calculation. Previously it was off by half. This was
verified under VMWare Fusion, comparing to what's reported under CentOS,
and by comparing numbers reported by linuxulator on T420 with a googled
up Linux cpuinfo (https://lkml.org/lkml/2011/11/29/116).
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20693
Fix vnode_pager handling of read ahead/behind pages when a disk read fails.
Rather than marking the read ahead/behind pages valid even though they were
not initialized, free them using the new function vm_page_free_invalid().
Add a new function vm_page_free_invalid() for freeing invalid pages
that might be wired. If the page is wired then it cannot be freed now,
but the thread that eventually unwires it will free it at that point.
Gordon Bergling [Fri, 17 Jul 2020 22:15:02 +0000 (22:15 +0000)]
devstat(9): Update the man page to reflect the current implementation
- Rename devstat_add_entry to devstat_new_entry
- Update the description of devstat_trans_flags
- Add manpage aliases for devstat_start_transaction_bio and devstat_end_transaction_bio
ipfstat -t defaults to IPv4 output. Make consistent with ipfstat -i
and ipfstat -o where without an argument IPv4 and IPv6 states are
shown. Use -4 and -6 to limit the display to IPv4 or IPv6 respectively.
Historically ipfstat listings and stats only listed IPv4 or IPv6 output.
ipfstat would list IPv4 outputs by default while -6 would produce IPv6
outputs. This commit combines the ipfstat -i and -o outputs into one
listing of IPv4 and IPv6 rules. The -4 option lists only IPv4 rules
(as the default before) while -6 continues to list only rules that affect
IPv6.
Fix L2CAP ACL packet PB(Packet Boundary) flag for LE PDU.
ACL packet boundary flag should be 0 instead of 2 for LE PDU.
Some HCI will drop LE packet with PB flag is 2, and if sent,
some target may reject the packet.
Michael Tuexen [Fri, 17 Jul 2020 15:09:49 +0000 (15:09 +0000)]
Improve the locking of address lists by adding some asserts and
rearranging the addition of address such that the lock is not
given up during checking and adding.
Andrew Turner [Fri, 17 Jul 2020 14:39:07 +0000 (14:39 +0000)]
Don't overflow the trap frame when accessing lr or xzr.
When emulating a load pair or store pair in dtrace on arm64 we need to
copy the data between the stack and trap frame. When the registers are
either the link register or the zero register we will access memory
past the end of the trap frame as these are encoded as registers 30 and
31 respectively while the array they access only has 30 entries.
Fix this by creating 2 helper functions to perform the operation with
special cases for these registers.
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
John Baldwin [Thu, 16 Jul 2020 21:58:43 +0000 (21:58 +0000)]
Include ABI note tag in shared libraries.
Split the ELF feature note into a separate file that is linked into
*crt1.o the same as crtbrand.S was before. crtbrand.o is now linked
into crti.o on all platforms in addition to *crt1.o.
John Baldwin [Thu, 16 Jul 2020 21:30:46 +0000 (21:30 +0000)]
Add crypto_initreq() and crypto_destroyreq().
These routines are similar to crypto_getreq() and crypto_freereq() but
operate on caller-supplied storage instead of allocating crypto
requests from a UMA zone.
Michael Tuexen [Thu, 16 Jul 2020 16:46:24 +0000 (16:46 +0000)]
(Re)-allow 0.0.0.0 to be used as an address in connect() for TCP
In r361752 an error handling was introduced for using 0.0.0.0 or
255.255.255.255 as the address in connect() for TCP, since both
addresses can't be used. However, the stack maps 0.0.0.0 implicitly
to a local address and at least two regressions were reported.
Therefore, re-allow the usage of 0.0.0.0.
While there, change the error indicated when using 255.255.255.255
from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect().
Warner Losh [Thu, 16 Jul 2020 14:12:54 +0000 (14:12 +0000)]
Relax the rule against declaring variables in nested scopes and for
initializations.
Relax some overly perscriptive rules against declarations: they may be at the
start of any blocks, even if things aren't super complicated. Allow more
initializations (those that call simple functions, like accessor functions for
newbus are fine). Allow the common idiom of declaring the loop variable in a for
loop.
This tries to codify what common exceptions are today, as well as give
some guidance on when it's best to do these things.
vfs: fix vn_poll performance with either MAC or AUDIT
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.
Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.
This in particular fixes poll2 which passes 128 fds.
ether_ifattach: set mtu before calling if_attach()
if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time
an ethernet interface is setup *after*
SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to
nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu
*before* ifp->if_mtu has been properly set in some scenarios, e.g., USB
ethernet adapter plugged in later on.
For interfaces that are created in early boot, we don't have this issue as
domains aren't constructed enough for them to attach and thus it gets
deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND
*after* the mtu has been set earlier in ether_ifattach().
Adrian Chadd [Wed, 15 Jul 2020 19:34:19 +0000 (19:34 +0000)]
[ar71xx] fix watchdog to work on subsequent SoCs
The AR9341 AHB runs at 225MHz, much faster than the 33MHz of the
AR71xx AHB. So not only is the math going to do weird things, it
will also wrap rather than being clamped.
So:
* clamp! don't wrap!
* tidy up some debugging
* add an option to throw an NMI rather than reset!
Tested:
* AR9341 SoC (TP-Link TL-WDR4300), patting/not patting the watchdog!
Ed Maste [Wed, 15 Jul 2020 18:49:00 +0000 (18:49 +0000)]
openssh: refer to OpenSSL not SSLeay, part 2
This change was made upstream between 7.9p1 and 8.0p1. We've made local
changes in the same places for handling the version_addendum; apply the
SSLeay_version to OpenSSL_version change in advance of importing 8.0p1.
This should have been part of r363225.
Obtained from: OpenSSH-portable a65784c9f9c5
MFC with: r363225
Sponsored by: The FreeBSD Foundation
Merge sendmail 8.16.1 to HEAD: See contrib/sendmail/RELEASE_NOTES for details
Includes build infrastructure & config updates required for changes in 8.16.1
Brooks Davis [Wed, 15 Jul 2020 17:05:37 +0000 (17:05 +0000)]
Don't imply that all action values can be OR'd.
This is neither POSIX compliant nor what the implementation does.
This could be allowed by changing the value of TCSAFLUSH from 2 to 3,
but that doesn't seem worthwhile after 25+ years.
Ed Maste [Wed, 15 Jul 2020 15:35:26 +0000 (15:35 +0000)]
openssh: refer to OpenSSL not SSLeay
This change was made upstream between 7.9p1 and 8.0p1. We've made local
changes in the same places for handling the version_addendum; apply the
SSLeay_version to OpenSSL_version change in advance of importing 8.0p1.
Obtained from: OpenSSH-portable a65784c9f9c5
Sponsored by: The FreeBSD Foundation
Allan Jude [Wed, 15 Jul 2020 14:38:15 +0000 (14:38 +0000)]
zpool-features(7): Note that the boot loader has support for large_blocks
Since r304321 (-current: Aug 18, 2016) and r328866 (stable/11: Feb 5, 2018)
the FreeBSD loader has supported reading from datasets with the
large_blocks feature active.
PR: 247992
Reported by: Anton Saietskii <vsasjason@gmail.com>
MFC after: 2 weeks
Sponsored by: Klara Inc.
Use PVO_PADDR macro to get the physical address from a PVO, instead of
explicitly ANDing pvo_pte.pa with LPTE_RPGN where it is needed. Besides
improving readability, this is needed to support superpages (D25237), where
the steps to get the PA from a PVO are different.
Reviewed by: markj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25654
Eric van Gyzen [Wed, 15 Jul 2020 13:17:16 +0000 (13:17 +0000)]
Fix Coverity issues in OFED
read_ibdiag_config NULL deref
read_ibdiag_config mem leak
ib_mad_inv_field_str Missing comma in a string array initialization
print_node_header NULL deref
diff_node_ports copy-paste error
ibportstate.c main() missing break in switch
set_thresholds NULL ptr deref
dump_unicast_tables leaks mapnd
umad_cm_attr_str dead code
__ibv_close_device close(-1)
check return value of listen()
mlx5 bitmap.h - bad bit shift - UB
get_dst_addr check return value of inet_pton
osm_perfmgr_init check return value of cl_spinlock_init
osm_port_new memory leak on error path
sa_mad_ctrl_rcv_callback missing break in switch case
I did not include CID numbers because these were found by an internal
run at Isilon.
Alex Richardson [Wed, 15 Jul 2020 12:07:59 +0000 (12:07 +0000)]
Allow bootstrapping localdef on non-FreeBSD systems
The current localedef simply assumes that the locale headers on build system
are compatible with those on the target system which is not necessarily true.
It generally works on FreeBSD (as long as we don't change the locale headers),
but Linux and macOS provide completely different locale headers.
This change adds new bootstrap headers that namespace certain xlocale
structures defined or used by in the headers that localdef needs.
This is required since system headers *must* be able to include the "real"
locale headers for printf(), etc., but we also want to access the target
systems's internal locale structures.
Alex Richardson [Wed, 15 Jul 2020 12:07:53 +0000 (12:07 +0000)]
Add missing newline and return in localedef error message
I hit those error messages when using a localedef built against headers
that don't match the target system (cross-building from a Linux host).
This problem will be fixed in the next commit.
Alex Richardson [Wed, 15 Jul 2020 12:07:47 +0000 (12:07 +0000)]
Avoid rebuilding libpmc in every incremental rebuild
Generate libpmc_events.c in a temporary file first and only overwrite it
if the files are actually different.
This avoids compiling and relinking the different variants of libpmc on
every incremental build.
Rick Macklem [Wed, 15 Jul 2020 01:26:28 +0000 (01:26 +0000)]
Fix the pNFS flexible file layout client for servers with small write size.
The code in nfscl_dofflayout() loops when a flexible file layout server
provides a small write data limit (no extant server is known to do this).
If/when it looped, it erroneously reused the "drpc" argument for the
mirror worker thread, corrupting it.
This patch fixes the problem by only using the calling thread after the
first loop iteration.
Found during testing by simulating a server with a small write size.
Since no extant pNFS server is known to provide a small write size,
this fix it not needed in practice at this time.
Unset VIS_SAFE flag as it turned out to be actually unsafe
for continuos top display as it's passing through sequences
resulting cursor movement (backspace, tab, carriage-return),
and explicitly set VIS_TAB for the same reason.
Reported by: Mark Millard <marklmi@yahoo.com>, swills
Tested by: Mark Millard <marklmi@yahoo.com>, swills
cache: make negative shrinker round robin on all lists every time
Previously it would check 4, 3, 2, 1 lists. In practice by the time
it is getting called all lists have some elements and consequently
this does not result in new evictions.
I made an attempt to fix this in r362978, but all it really did was
confine the issue to the $MACHINE_CPUARCH == "riscv" case. The real
problem is that LINKER_FEATURES is not defined here, so bsd.linker.mk
needs to be included.
This error with cleandir only occurs when META_MODE is disabled, which
explains why it was missed by both CI and myself.
Note that this effectively reverts r362978.
Reported by: mjg
Reviewed by: imp, kevans (in IRC)
cache: create a dedicate struct for negative entries
.. and stuff if into the unused target vnode field
This gets rid of concurrent nc_flag modifications racing with the
shrinker and consequently fixes a bug where such a change could have
been missed when cache_ncp_invalidate was being issued..
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently from other CPUs. Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu. After that IPI is sent to all targets
which scan for zeroed scoreboard generation words. Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set. Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.
Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.
Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
this is fine because we in fact do not send IPIs from interrupt
handlers. More for !x2APIC mode where ICR access for write requires
two registers write, we disable interrupts around it. If considered
incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
cache line, to reduce ping-pong.
Michael Tuexen [Tue, 14 Jul 2020 20:32:50 +0000 (20:32 +0000)]
Improve the error handling in generating ASCONF chunks.
In case of errors, the cleanup was not consistent.
Thanks to Felix Weinrank for fuzzing the userland stack and making
me aware of the issue.
Update to D25266, bin/ps: Make the rtprio option actually show
realtime priorities
The current `ps -axO rtprio' show threads running at interrupt
priority such as the [intr] thread as '1:48' and threads running
at kernel priority such as [pagedaemon] as normal:4294967260.
This change shows [intr] as intr:48 and [pagedaemon] as kernel:4.
Andrew Turner [Tue, 14 Jul 2020 18:50:48 +0000 (18:50 +0000)]
Print the arm64 registers in more exception handling panics
It can be useful to get a dump of all registers when investigating why we
received an exception that we are unable to handle. In these cases we
already call panic, however we don't always print the registers.
Add calls to print_registers and print esr and far when applicable.