Mark Johnston [Sat, 7 Mar 2020 00:55:46 +0000 (00:55 +0000)]
Move SMR pointer type definition and access macros to smr_types.h.
The intent is to provide a header that can be included by other headers
without introducing too much pollution. smr.h depends on various
headers and will likely grow over time, but is less likely to be
required by system headers.
Rename SMR_TYPE_DECLARE() to SMR_POINTER():
- One might use SMR to protect more than just pointers; it
could be used for resizeable arrays, for example, so TYPE seems too
generic.
- It is useful to be able to define anonymous SMR-protected pointer
types and the _DECLARE suffix makes that look wrong.
Reviewed by: jeff, mjg, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23988
Warner Losh [Sat, 7 Mar 2020 00:29:12 +0000 (00:29 +0000)]
Reword a comment to describe what's actually going on. We can call invalidate
several times potentially. We just don't do anything on the second and
subsequent calls.
Andreas Tobler [Fri, 6 Mar 2020 21:51:28 +0000 (21:51 +0000)]
Drop 'All rights reserved'
Replace hardcoded sizes by nitems and sizeof
Replace CTLFLAG_NEEDGIANT with CTLFLAG_MPSAFE, I run this driver since a few
years with CTLFLAG_MPSAFE w/o issues.
Andreas Tobler [Fri, 6 Mar 2020 21:32:42 +0000 (21:32 +0000)]
Drop 'All rights reserved'
Replace hardcoded sizes by nitems and sizeof
Replace CTLFLAG_NEEDGIANT with CTLFLAG_MPSAFE, I run this driver since a few
years with CTLFLAG_MPSAFE w/o issues.
Add a HACK to handle a special case for a sensor location.
Mark Johnston [Fri, 6 Mar 2020 20:44:22 +0000 (20:44 +0000)]
Remove dead code from the powerpc uma_small_alloc().
32-bit Book-E doesn't set UMA_MD_SMALL_ALLOC, and 32-bit OEA platforms
have a 32-bit vm_paddr_t. Moreover, this code was wrong in that it
leaked the page if the check failed.
Reviewed by: jhibbits
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23991
Chuck Silvers [Fri, 6 Mar 2020 18:41:37 +0000 (18:41 +0000)]
Add a new "mntfs" pseudo file system which provides private device vnodes for
file systems to safely access their disk devices, and adapt FFS to use it.
Also add a new BO_NOBUFS flag to allow enforcing that file systems using
mntfs vnodes do not accidentally use the original devfs vnode to create buffers.
Dimitry Andric [Fri, 6 Mar 2020 17:02:14 +0000 (17:02 +0000)]
Merge commit f75939599 from llvm git (by Erich Keane):
Reland r374450 with Richard Smith's comments and test fixed.
The behavior from the original patch has changed, since we're no
longer allowing LLVM to just ignore the alignment. Instead, we're
just assuming the maximum possible alignment.
This fixes 'Assertion failed: (Alignment != 0 && "Invalid Alignment"),
function CreateAlignmentAssumption', when building recent versions of
v8, which invoke __builtin_assume_aligned() with its alignment argument
set to 4GiB or more.
Clang will now report a warning, and show the maximum possible alignment
instead, e.g.:
huge-align.cpp:1:27: warning: requested alignment must be 536870912 bytes or smaller; maximum alignment assumed [-Wbuiltin-assume-aligned-alignment]
void *f(void *g) { return __builtin_assume_aligned(g, 4294967296); }
^ ~~~~~~~~~~
Upstream PR: https://bugs.llvm.org/show_bug.cgi?id=43839
Reported by: cem
MFC after: 3 days
Leandro Lupori [Fri, 6 Mar 2020 12:37:04 +0000 (12:37 +0000)]
ixl: Add missing conversions from/to LE16
This fixes some errors on PPC64, during attach and when trying to assign an IP
to an interface. With this change, basic operation of X710 NICs is now
possible.
This also fixes builds with IXL_DEBUG enabled
Reviewed by: erj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23975
Justin Hibbits [Fri, 6 Mar 2020 01:50:15 +0000 (01:50 +0000)]
Fix a mistaken conditional in mfi_tbolt_send_frame()
As written, the condition of (cdb[0] != 0x28 || cdb[0] != 0x2A) will always
be true, since if it's one, it's obviously not the other. Reading the code,
the intent appears to be that it should only perform the operation if it's
neither, otherwise the conditional can be elided.
Justin Hibbits [Fri, 6 Mar 2020 01:45:03 +0000 (01:45 +0000)]
powerpc/powerpc64: Enforce natural alignment in memcpy
Summary:
POWER architecture CPUs (Book-S) require natural alignment for
cache-inhibited storage accesses. Since we can't know the caching model
for a page ahead of time, always enforce natural alignment in memcpy.
This fixes a SIGBUS in X with acceleration enabled on POWER9.
As part of this, revert r358672, it's no longer necessary with this fix.
Ed Maste [Thu, 5 Mar 2020 20:53:43 +0000 (20:53 +0000)]
libelf: rationalize error handling in ELF note conversion
Previously _libelf_cvt_NOTE_tom (to host) returned false if a note's
namesz + descsz exceeded the buffer size, while _libelf_cvt_NOTE_tof
(to file) silently truncated. Return false in the latter case too.
Leandro Lupori [Thu, 5 Mar 2020 20:04:41 +0000 (20:04 +0000)]
[aacraid] Port driver to big-endian
Port aacraid driver to big-endian (BE) hosts.
The immediate goal of this change is to make it possible to use the
aacraid driver on PowerPC64 machines that have Adaptec Series 8 SAS
controllers.
Adapters supported by this driver expect FIB contents in little-endian
(LE) byte order. All FIBs have a fixed header part as well as a data
part that depends on the command being issued to the controller.
In this way, on BE hosts, the FIB header and all FIB data structures
used in aacraid.c and aacraid_cam.c need to be converted to LE before
being sent to the adapter and converted to BE when coming from it.
The functions to convert each struct are on aacraid_endian.c.
For little-endian (LE) targets, they are macros that expand
to nothing.
In some cases, when only a few fields of a large structure are used,
the fields are converted inline, by the code using them.
PR: 237463
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23887
Ucred is passed to bread(9) so that non-local filesystems use proper
credentials. But, since clean buffer might be cached unless
buf_pager_relbuf is not enabled, it makes credentials to have extra
reference until buffer is reclaimed. Ucred reference would prevent
jail from destroying if creds are jailed.
Dereferencing the read credentials on the valid buffer avoid that, and
should be fine because the buffer is valid and does not need re-read.
PR: 238032
Reported by: bz
Reproduced and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23775
Tijl Coosemans [Thu, 5 Mar 2020 14:41:27 +0000 (14:41 +0000)]
Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.
[PowerPC64] restrict memcpy/bcopy optimization to POWER ISA >=V2.07
VSX instructions were added in POWER ISA V2.06 (POWER7), but it
requires data to be word-aligned. Such requirement was removed in
ISA V2.07B (POWER8).
Since current memcpy/bcopy optimization relies on VSX instructions
handling misalignment transparently, and kernel doesn't currently
implement an alignment error handler, this optimzation should be
restrict to ISA V2.07 onwards.
SIGBUS on stxvd2x instruction was reproduced in POWER7+ CPU.
Warner Losh [Thu, 5 Mar 2020 06:20:17 +0000 (06:20 +0000)]
xpt_async is submitting a CCB, not finishing it up, so use xpt_action() instead
of xpt_done(). Add the missing XPT_ASYNC case to xpt_action_default. xpt_async
wants to use the side-effect of the xpt_done() routine to queue this to the
camisr thread so it can be done in that context. However, this breaks the
symmetry that you create a ccb and call xpt_action() for it to be
dispatched. Restore that symmetry by having it go through that path. As far as I
can tell, this is the only CCB that we create and call xpt_done() on directly.
Gleb Smirnoff [Wed, 4 Mar 2020 22:27:16 +0000 (22:27 +0000)]
When a machine boots the NFS mounting script is executed after
interfaces are configured, but for many interfaces (e.g. all Intel)
ifconfig causes link renegotiation, so the first attempt to mount
NFS always fails. After that mount_nfs sleeps for 30 seconds, while
only a couple seconds are actually required for interface to get up.
Instead of sleeping, do select(2) on routing socket and check if
some interface became UP and in this case retry immediately.
Brooks Davis [Wed, 4 Mar 2020 21:27:12 +0000 (21:27 +0000)]
Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal. We take advantage of the mandatory
zeroing of fields not listed in the initializer.
Remove kern_mmap_fpcheck() and use kern_mmap_req().
The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations. In CheriBSD we have already added two more arguments.
I reported this in https://bugs.llvm.org/show_bug.cgi?id=44715, and
initially reverted the upstream change in r357259 to work around it.
However, after some discussion with Fangrui Song in the upstream ticket,
I think we can classify this as an unfortunate interaction between using
-Ttext=0 in combination with --no-rosegment. (We added the latter
in r332090, because btxld does not correctly handle input with more
than 2 PT_LOAD segments.)
Fangrui suggested to use a linker script instead, and Warner was already
attempting this in r305353, but had to revert it due to "crypto-using
boot problems" (not sure what those were :).
This review updates the stand/i386/boot.ldscript to handle more
sections, inserts some symbols like _edata and such that we use in
libsa, and also discards any .interp section.
It uses ORG which is defined on the linker command line using
--defsym ORG=value to set the start of all the sections.
Ed Maste [Wed, 4 Mar 2020 20:29:49 +0000 (20:29 +0000)]
readelf: check note namesz and descsz
Previously corrupt note namesz or descsz (perhaps caused by readelf's
current lack of endian support for notes) resulted in a crash. Check
that namesz and descsz do not extend beyond the end of the buffer before
trying to access name and desc data.
Reported by: jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Emmanuel Vadot [Wed, 4 Mar 2020 20:01:03 +0000 (20:01 +0000)]
dwmmc: Rework the DMA engine
Each segment can be up to 4096 bytes in chain structure according to the
RK3399 TRM Part 2.
Set the buffers in full ring where the last one point to the first one.
Correctly reports the MMC_IVAR_MAX_DATA.
Use CACHE_LINE_SIZE for bus_dma alignment.
Implement a detaching flag for the sound(4) subsystem to take
appropriate actions when we are trying to detach an audio device,
but cannot because someone is using it.
This avoids applications having to wait for the DSP read data
timeout before they receive any error indication.
Tested with virtual_oss(8).
Bjoern A. Zeeb [Wed, 4 Mar 2020 17:17:02 +0000 (17:17 +0000)]
Implement optional table entry limits for if_llatbl.
Implement counting of table entries linked on a per-table base
with an optional (if set > 0) limit of the maximum number of table
entries.
For that the public lltable_link_entry() and lltable_unlink_entry()
functions as well as the internal function pointers change from void
to having an int return type.
Given no consumer currently sets the new llt_maxentries this can be
committed on its own. The moment we make use of the table limits,
the callers of the link function must check the return value as
it can change and entries might not be added.
Adjustments for IPv6 (and possibly IPv4) will follow.
Michael Tuexen [Wed, 4 Mar 2020 16:41:25 +0000 (16:41 +0000)]
When using automatically generated flow labels and using TCP SYN
cookies, use the same flow label for the segments sent during the
handshake and after the handshake.
This fixes a bug by making sure that sc_flowlabel is always stored in
network byte order.
Reviewed by: bz@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23957
Bjoern A. Zeeb [Wed, 4 Mar 2020 16:20:59 +0000 (16:20 +0000)]
Add new ICMPv6 counters for Anti-DoS limits.
Add four new counters for ND6 related Anti-DoS measures.
We split these out into a separate upfront commit so that we only
change the struct size one time. Implementations using them will
follow.
Michael Tuexen [Wed, 4 Mar 2020 12:22:53 +0000 (12:22 +0000)]
Don't send an uninitilised traffic class in the IPv6 header, when
sending a TCP segment from the TCP SYN cache (like a SYN-ACK).
This fix initialises it to zero. This is correct for the ECN bits,
but is does not honor the DSCP what an application might have set via
the IPPROTO_IPV6 level socket options IPV6_TCLASS. That will be
fixed separately.
Reviewed by: Richard Scheffenegger
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23900
Leandro Lupori [Wed, 4 Mar 2020 12:21:38 +0000 (12:21 +0000)]
[aacraid] Add missing unmap call for SYNC mode
This issue was observed on a PowerPC64 machine with an Adaptec RAID Controller
with PCI device ID 0x028d. After several read/write operations, the kernel was
panic'ing in bus_dmamap_sync(). This was due to a missing aac_unmap_command()
in the SYNC path.
Chuck Silvers [Wed, 4 Mar 2020 00:22:50 +0000 (00:22 +0000)]
if vm_pager_get_pages_async() returns an error, release the sfio->nios
refcount that we took earlier that represents the I/O that ended up
not being started.
Warner Losh [Tue, 3 Mar 2020 17:40:29 +0000 (17:40 +0000)]
Get rid of silly /* FALLTHROUGH */ lines
Consistently omit /* FALLTHROUGH */ when we have a case statement that does
nothing. Since compilers don't warn about stacked case statements, and we were
inconsistent, resolve by removing extras.
Alexander Motin [Tue, 3 Mar 2020 15:05:13 +0000 (15:05 +0000)]
Increase number of write completion threads, matching ZoL.
Our iSCSI benchmarks on a large 80-core system show that previous limit
of 8 threads can be a bottleneck. At some points this change increases
write IOPS by as much as 50%. I am still not sure that so many threads
is really required, but we tested lower amounts and got no significant
benefits, while latencies were a bit worse, so decided to not diverge.
Bjoern A. Zeeb [Tue, 3 Mar 2020 14:15:30 +0000 (14:15 +0000)]
tcp_hpts: make RSS kernel compile again.
Add proper #includes, and #ifdefs and some style fixes to make RSS
kernels compile again. There are still possible issues with uin16_t
vs. uint_t cpuid which I am not going near.
Bjoern A. Zeeb [Tue, 3 Mar 2020 14:07:44 +0000 (14:07 +0000)]
upic_ktrls: make RSS compile again here
The results of ktls_get_cpu() are stored in u_int and NETISR_CPUID_NONE
requires u_int. Adjust uint16_t to uint_t in order to make RSS kernels
compile some more again.
HPTS still has to be fixed, which is a bit more complicated.
Bjoern A. Zeeb [Tue, 3 Mar 2020 13:48:12 +0000 (13:48 +0000)]
ip6: retire in6_selectroute_fib() as promised 8 years ago
In r231852 I added in6_selectroute_fib() as a compat function with the
fibnum as an extra argument compared to in6_selectroute() to keep the
KPI stable.
Way too late retire this function again and add the fib to in6_selectroute()
which also only has a single consumer now and was an orphan function before.
- Sort options.
- Do not use macros (like .Ar) to specify width for Bl (macros within that
string are not expanded).
- Use Cm instead of Ar for mode names.
- Fix some typos reported by mandoc.
- Move the documentation of the PID file from the -P flag description to
the FILES section.
Bjoern A. Zeeb [Tue, 3 Mar 2020 11:32:47 +0000 (11:32 +0000)]
ip6_output: use new routing KPI when not passed a cached route
Implement the equivalent of r347375 (IPv4) for the IPv6 output path.
In IPv6 we get passed a cached route (and inp) by udp6_output()
depending on whether we acquired a write lock on the INP.
In case we neither bind nor connect a first UDP packet would come in
with a cached route (wlocked) and all further packets would not.
In case we bind and do not connect we never write-lock the inp.
When we do not pass in a cached route, rather than providing the
storage for a route locally and pass it over the old lookup code
and down the stack, use the new route lookup KPI and acquire all
details we need to send the packet.
Compared to the IPv4 code the IPv6 code has a couple of possible
complications: given an option with a routing hdr/caching route there,
and path mtu (ro_pmtu) case which now equally has to deal with the
possibility of having a route which is NULL passed in, and the
fwd_tag in case a firewall changes the next hop (something to
factor out in the future).
Bjoern A. Zeeb [Tue, 3 Mar 2020 09:45:16 +0000 (09:45 +0000)]
fib6_rte_to_nh_*: return a link-local gw address with scope embedded
In fib6_rte_to_nh_* when returning a link-local gateway address
currently we do clear the scope. That could be recovered using
the ifp returned as well, but the code in general seems to
expect a link-local address with scope embeedded as otherwise
the "dst" (gw) passed to the output routines will not include
scope and not send the packet out (the right interface).
Do not clear the scope when returning a link-local address and
allow packets to go out (the right interface).
Remove the (now) extra scope recovery in the IPv6 fast-fwd code.
Andrew Turner [Tue, 3 Mar 2020 08:28:16 +0000 (08:28 +0000)]
Store the boot exception level on arm64 so it can be queried later
A hypervisor, e.g. bhyve, will need to know what exception levelthe kernel
was in when it started booting. If it was EL2 we can then enable said
hypervisor.
Store the boot exception level and allow the kernel to later query it.
Obtained from: https://github.com/FreeBSD-UPB/freebsd (earlier version)
Sponsored by: Innovate UK
Justin Hibbits [Tue, 3 Mar 2020 03:22:00 +0000 (03:22 +0000)]
powerpc/powernv: powernv_node_numa_domain() fix non-NUMA case
If NUMA is not enabled in the kernel config, or is disabled at boot, this
function should just return domain 0 regardless of what's in the device
tree.
In r358471, we interrupted the case block that would eventually lead
to the path related tokens not being processed. Restore this behavior and
and move AUE_JAIL_SET in this block, as it may conditionally contain a
path token.
Conrad Meyer [Tue, 3 Mar 2020 00:20:08 +0000 (00:20 +0000)]
Add extremely useful calendar(1) application to FreeBSD
It does extremely useful things like execute sendmail and spew dubiously
accurate factoids.
From the feedback, it seems like it is an essential utility in a modern unix
and not at all a useless bikeshed. How do those Linux people live without it?
Reverts r358561.