hrs [Sat, 7 Mar 2020 08:41:10 +0000 (08:41 +0000)]
Fix an issue of net.inet.igmp.stats handler.
The header of (struct igmpstat) could be cleared by sysctl(3).
This can be reproduced by "netstat -s -z -p igmp".
jhibbits [Sat, 7 Mar 2020 03:58:58 +0000 (03:58 +0000)]
compat: Allow explicit overriding of COMPAT_ARCH and COMPAT_CPUTYPE
Summary:
Allow src.conf to override the inferred COMPAT_ARCH and COMPAT_CPUTYPE
variables, such that a different CPU target can be specified explicitly
for the general target vs the compat target.
markj [Sat, 7 Mar 2020 00:55:46 +0000 (00:55 +0000)]
Move SMR pointer type definition and access macros to smr_types.h.
The intent is to provide a header that can be included by other headers
without introducing too much pollution. smr.h depends on various
headers and will likely grow over time, but is less likely to be
required by system headers.
Rename SMR_TYPE_DECLARE() to SMR_POINTER():
- One might use SMR to protect more than just pointers; it
could be used for resizeable arrays, for example, so TYPE seems too
generic.
- It is useful to be able to define anonymous SMR-protected pointer
types and the _DECLARE suffix makes that look wrong.
Reviewed by: jeff, mjg, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23988
imp [Sat, 7 Mar 2020 00:29:12 +0000 (00:29 +0000)]
Reword a comment to describe what's actually going on. We can call invalidate
several times potentially. We just don't do anything on the second and
subsequent calls.
andreast [Fri, 6 Mar 2020 21:51:28 +0000 (21:51 +0000)]
Drop 'All rights reserved'
Replace hardcoded sizes by nitems and sizeof
Replace CTLFLAG_NEEDGIANT with CTLFLAG_MPSAFE, I run this driver since a few
years with CTLFLAG_MPSAFE w/o issues.
andreast [Fri, 6 Mar 2020 21:32:42 +0000 (21:32 +0000)]
Drop 'All rights reserved'
Replace hardcoded sizes by nitems and sizeof
Replace CTLFLAG_NEEDGIANT with CTLFLAG_MPSAFE, I run this driver since a few
years with CTLFLAG_MPSAFE w/o issues.
Add a HACK to handle a special case for a sensor location.
markj [Fri, 6 Mar 2020 20:44:22 +0000 (20:44 +0000)]
Remove dead code from the powerpc uma_small_alloc().
32-bit Book-E doesn't set UMA_MD_SMALL_ALLOC, and 32-bit OEA platforms
have a 32-bit vm_paddr_t. Moreover, this code was wrong in that it
leaked the page if the check failed.
Reviewed by: jhibbits
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23991
chs [Fri, 6 Mar 2020 18:41:37 +0000 (18:41 +0000)]
Add a new "mntfs" pseudo file system which provides private device vnodes for
file systems to safely access their disk devices, and adapt FFS to use it.
Also add a new BO_NOBUFS flag to allow enforcing that file systems using
mntfs vnodes do not accidentally use the original devfs vnode to create buffers.
dim [Fri, 6 Mar 2020 17:02:14 +0000 (17:02 +0000)]
Merge commit f75939599 from llvm git (by Erich Keane):
Reland r374450 with Richard Smith's comments and test fixed.
The behavior from the original patch has changed, since we're no
longer allowing LLVM to just ignore the alignment. Instead, we're
just assuming the maximum possible alignment.
This fixes 'Assertion failed: (Alignment != 0 && "Invalid Alignment"),
function CreateAlignmentAssumption', when building recent versions of
v8, which invoke __builtin_assume_aligned() with its alignment argument
set to 4GiB or more.
Clang will now report a warning, and show the maximum possible alignment
instead, e.g.:
huge-align.cpp:1:27: warning: requested alignment must be 536870912 bytes or smaller; maximum alignment assumed [-Wbuiltin-assume-aligned-alignment]
void *f(void *g) { return __builtin_assume_aligned(g, 4294967296); }
^ ~~~~~~~~~~
Upstream PR: https://bugs.llvm.org/show_bug.cgi?id=43839
Reported by: cem
MFC after: 3 days
luporl [Fri, 6 Mar 2020 12:37:04 +0000 (12:37 +0000)]
ixl: Add missing conversions from/to LE16
This fixes some errors on PPC64, during attach and when trying to assign an IP
to an interface. With this change, basic operation of X710 NICs is now
possible.
This also fixes builds with IXL_DEBUG enabled
Reviewed by: erj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23975
jhibbits [Fri, 6 Mar 2020 01:50:15 +0000 (01:50 +0000)]
Fix a mistaken conditional in mfi_tbolt_send_frame()
As written, the condition of (cdb[0] != 0x28 || cdb[0] != 0x2A) will always
be true, since if it's one, it's obviously not the other. Reading the code,
the intent appears to be that it should only perform the operation if it's
neither, otherwise the conditional can be elided.
jhibbits [Fri, 6 Mar 2020 01:45:03 +0000 (01:45 +0000)]
powerpc/powerpc64: Enforce natural alignment in memcpy
Summary:
POWER architecture CPUs (Book-S) require natural alignment for
cache-inhibited storage accesses. Since we can't know the caching model
for a page ahead of time, always enforce natural alignment in memcpy.
This fixes a SIGBUS in X with acceleration enabled on POWER9.
As part of this, revert r358672, it's no longer necessary with this fix.
emaste [Thu, 5 Mar 2020 20:53:43 +0000 (20:53 +0000)]
libelf: rationalize error handling in ELF note conversion
Previously _libelf_cvt_NOTE_tom (to host) returned false if a note's
namesz + descsz exceeded the buffer size, while _libelf_cvt_NOTE_tof
(to file) silently truncated. Return false in the latter case too.
luporl [Thu, 5 Mar 2020 20:04:41 +0000 (20:04 +0000)]
[aacraid] Port driver to big-endian
Port aacraid driver to big-endian (BE) hosts.
The immediate goal of this change is to make it possible to use the
aacraid driver on PowerPC64 machines that have Adaptec Series 8 SAS
controllers.
Adapters supported by this driver expect FIB contents in little-endian
(LE) byte order. All FIBs have a fixed header part as well as a data
part that depends on the command being issued to the controller.
In this way, on BE hosts, the FIB header and all FIB data structures
used in aacraid.c and aacraid_cam.c need to be converted to LE before
being sent to the adapter and converted to BE when coming from it.
The functions to convert each struct are on aacraid_endian.c.
For little-endian (LE) targets, they are macros that expand
to nothing.
In some cases, when only a few fields of a large structure are used,
the fields are converted inline, by the code using them.
PR: 237463
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23887
dim [Thu, 5 Mar 2020 18:09:19 +0000 (18:09 +0000)]
Revert r357259, after the merge from head which added linker scripts for
stand/i386 boot:
Revert upstream lld r371957 (git commit 06bb7dfbd) by Fangrui Song:
[ELF] Map the ELF header at imageBase
If there is no readonly section, we map:
* The ELF header at imageBase+maxPageSize
* Program headers at imageBase+maxPageSize+sizeof(Ehdr)
* The first section .text at imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers)
Due to the interaction between Writer<ELFT>::fixSectionAlignments and
LinkerScript::allocateHeaders,
`alignDown(p_vaddr(R PT_LOAD)) = alignDown(p_vaddr(RX PT_LOAD))`.
The RX PT_LOAD will override the R PT_LOAD at runtime, which is not ideal:
```
// PHDR at 0x401034, should be 0x400034
PHDR 0x000034 0x00401034 0x00401034 0x000a0 0x000a0 R 0x4
// R PT_LOAD contains just Ehdr and program headers.
// At 0x401000, should be 0x400000
LOAD 0x000000 0x00401000 0x00401000 0x000d4 0x000d4 R 0x1000
LOAD 0x0000d4 0x004010d4 0x004010d4 0x00001 0x00001 R E 0x1000
```
* createPhdrs allocates the headers to the R PT_LOAD.
* fixSectionAlignments assigns `imageBase+maxPageSize+sizeof(Ehdr)+sizeof(program headers)` (formula: `alignTo(dot, maxPageSize) + dot % config->maxPageSize`) to addrExpr of .text
* allocateHeaders computes the minimum address among SHF_ALLOC sections, i.e. addr(.text)
* allocateHeaders sets address of ELF header to `addr(.text)-sizeof(Ehdr)-sizeof(program headers) = imageBase+maxPageSize`
The main observation is that when the SECTIONS command is not used, we
don't have to call allocateHeaders. This requires an assumption that
the presence of PT_PHDR and addresses of headers can be decided
regardless of address information.
This may seem natural because dot is not manipulated by a linker script.
The other thing is that we have to drop the special rule for -T<section>
in `getInitialDot`. If -Ttext is smaller than the image base, the headers
will not be allocated with the old behavior (allocateHeaders is called)
but always allocated with the new behavior.
The behavior change is not a problem. Whether and where headers are
allocated can vary among linkers, or ld.bfd across different versions
(--enable-separate-code or not). It is thus advised to use a linker
script with the PHDRS command to have a consistent behavior across
linkers. If PT_PHDR is needed, an explicit --image-base can be a simpler
alternative.
kib [Thu, 5 Mar 2020 15:52:34 +0000 (15:52 +0000)]
buffer pager: deref ucred immediately after read.
Ucred is passed to bread(9) so that non-local filesystems use proper
credentials. But, since clean buffer might be cached unless
buf_pager_relbuf is not enabled, it makes credentials to have extra
reference until buffer is reclaimed. Ucred reference would prevent
jail from destroying if creds are jailed.
Dereferencing the read credentials on the valid buffer avoid that, and
should be fine because the buffer is valid and does not need re-read.
PR: 238032
Reported by: bz
Reproduced and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23775
tijl [Thu, 5 Mar 2020 14:41:27 +0000 (14:41 +0000)]
Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.
alfredo [Thu, 5 Mar 2020 14:13:22 +0000 (14:13 +0000)]
[PowerPC64] restrict memcpy/bcopy optimization to POWER ISA >=V2.07
VSX instructions were added in POWER ISA V2.06 (POWER7), but it
requires data to be word-aligned. Such requirement was removed in
ISA V2.07B (POWER8).
Since current memcpy/bcopy optimization relies on VSX instructions
handling misalignment transparently, and kernel doesn't currently
implement an alignment error handler, this optimzation should be
restrict to ISA V2.07 onwards.
SIGBUS on stxvd2x instruction was reproduced in POWER7+ CPU.
imp [Thu, 5 Mar 2020 06:20:17 +0000 (06:20 +0000)]
xpt_async is submitting a CCB, not finishing it up, so use xpt_action() instead
of xpt_done(). Add the missing XPT_ASYNC case to xpt_action_default. xpt_async
wants to use the side-effect of the xpt_done() routine to queue this to the
camisr thread so it can be done in that context. However, this breaks the
symmetry that you create a ccb and call xpt_action() for it to be
dispatched. Restore that symmetry by having it go through that path. As far as I
can tell, this is the only CCB that we create and call xpt_done() on directly.
glebius [Wed, 4 Mar 2020 22:27:16 +0000 (22:27 +0000)]
When a machine boots the NFS mounting script is executed after
interfaces are configured, but for many interfaces (e.g. all Intel)
ifconfig causes link renegotiation, so the first attempt to mount
NFS always fails. After that mount_nfs sleeps for 30 seconds, while
only a couple seconds are actually required for interface to get up.
Instead of sleeping, do select(2) on routing socket and check if
some interface became UP and in this case retry immediately.
brooks [Wed, 4 Mar 2020 21:27:12 +0000 (21:27 +0000)]
Introduce kern_mmap_req().
This presents an extensible interface to the generic mmap(2)
implementation via a struct pointer intended to use a designated
initializer or compount literal. We take advantage of the mandatory
zeroing of fields not listed in the initializer.
Remove kern_mmap_fpcheck() and use kern_mmap_req().
The motivation for this change is a desire to keep the core
implementation from growing an ever-increasing number of arguments
that must be specified in the correct order for the lowest-level
implementations. In CheriBSD we have already added two more arguments.
I reported this in https://bugs.llvm.org/show_bug.cgi?id=44715, and
initially reverted the upstream change in r357259 to work around it.
However, after some discussion with Fangrui Song in the upstream ticket,
I think we can classify this as an unfortunate interaction between using
-Ttext=0 in combination with --no-rosegment. (We added the latter
in r332090, because btxld does not correctly handle input with more
than 2 PT_LOAD segments.)
Fangrui suggested to use a linker script instead, and Warner was already
attempting this in r305353, but had to revert it due to "crypto-using
boot problems" (not sure what those were :).
This review updates the stand/i386/boot.ldscript to handle more
sections, inserts some symbols like _edata and such that we use in
libsa, and also discards any .interp section.
It uses ORG which is defined on the linker command line using
--defsym ORG=value to set the start of all the sections.
emaste [Wed, 4 Mar 2020 20:29:49 +0000 (20:29 +0000)]
readelf: check note namesz and descsz
Previously corrupt note namesz or descsz (perhaps caused by readelf's
current lack of endian support for notes) resulted in a crash. Check
that namesz and descsz do not extend beyond the end of the buffer before
trying to access name and desc data.
Reported by: jhb
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
manu [Wed, 4 Mar 2020 20:01:03 +0000 (20:01 +0000)]
dwmmc: Rework the DMA engine
Each segment can be up to 4096 bytes in chain structure according to the
RK3399 TRM Part 2.
Set the buffers in full ring where the last one point to the first one.
Correctly reports the MMC_IVAR_MAX_DATA.
Use CACHE_LINE_SIZE for bus_dma alignment.
hselasky [Wed, 4 Mar 2020 17:23:20 +0000 (17:23 +0000)]
Implement a detaching flag for the sound(4) subsystem to take
appropriate actions when we are trying to detach an audio device,
but cannot because someone is using it.
This avoids applications having to wait for the DSP read data
timeout before they receive any error indication.
Tested with virtual_oss(8).
Implement optional table entry limits for if_llatbl.
Implement counting of table entries linked on a per-table base
with an optional (if set > 0) limit of the maximum number of table
entries.
For that the public lltable_link_entry() and lltable_unlink_entry()
functions as well as the internal function pointers change from void
to having an int return type.
Given no consumer currently sets the new llt_maxentries this can be
committed on its own. The moment we make use of the table limits,
the callers of the link function must check the return value as
it can change and entries might not be added.
Adjustments for IPv6 (and possibly IPv4) will follow.
tuexen [Wed, 4 Mar 2020 16:41:25 +0000 (16:41 +0000)]
When using automatically generated flow labels and using TCP SYN
cookies, use the same flow label for the segments sent during the
handshake and after the handshake.
This fixes a bug by making sure that sc_flowlabel is always stored in
network byte order.
Reviewed by: bz@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23957
Add four new counters for ND6 related Anti-DoS measures.
We split these out into a separate upfront commit so that we only
change the struct size one time. Implementations using them will
follow.
tuexen [Wed, 4 Mar 2020 12:22:53 +0000 (12:22 +0000)]
Don't send an uninitilised traffic class in the IPv6 header, when
sending a TCP segment from the TCP SYN cache (like a SYN-ACK).
This fix initialises it to zero. This is correct for the ECN bits,
but is does not honor the DSCP what an application might have set via
the IPPROTO_IPV6 level socket options IPV6_TCLASS. That will be
fixed separately.
Reviewed by: Richard Scheffenegger
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23900
luporl [Wed, 4 Mar 2020 12:21:38 +0000 (12:21 +0000)]
[aacraid] Add missing unmap call for SYNC mode
This issue was observed on a PowerPC64 machine with an Adaptec RAID Controller
with PCI device ID 0x028d. After several read/write operations, the kernel was
panic'ing in bus_dmamap_sync(). This was due to a missing aac_unmap_command()
in the SYNC path.
chs [Wed, 4 Mar 2020 00:22:50 +0000 (00:22 +0000)]
if vm_pager_get_pages_async() returns an error, release the sfio->nios
refcount that we took earlier that represents the I/O that ended up
not being started.
imp [Tue, 3 Mar 2020 17:40:29 +0000 (17:40 +0000)]
Get rid of silly /* FALLTHROUGH */ lines
Consistently omit /* FALLTHROUGH */ when we have a case statement that does
nothing. Since compilers don't warn about stacked case statements, and we were
inconsistent, resolve by removing extras.
mav [Tue, 3 Mar 2020 15:05:13 +0000 (15:05 +0000)]
Increase number of write completion threads, matching ZoL.
Our iSCSI benchmarks on a large 80-core system show that previous limit
of 8 threads can be a bottleneck. At some points this change increases
write IOPS by as much as 50%. I am still not sure that so many threads
is really required, but we tested lower amounts and got no significant
benefits, while latencies were a bit worse, so decided to not diverge.
Add proper #includes, and #ifdefs and some style fixes to make RSS
kernels compile again. There are still possible issues with uin16_t
vs. uint_t cpuid which I am not going near.
The results of ktls_get_cpu() are stored in u_int and NETISR_CPUID_NONE
requires u_int. Adjust uint16_t to uint_t in order to make RSS kernels
compile some more again.
HPTS still has to be fixed, which is a bit more complicated.
ip6: retire in6_selectroute_fib() as promised 8 years ago
In r231852 I added in6_selectroute_fib() as a compat function with the
fibnum as an extra argument compared to in6_selectroute() to keep the
KPI stable.
Way too late retire this function again and add the fib to in6_selectroute()
which also only has a single consumer now and was an orphan function before.
0mp [Tue, 3 Mar 2020 13:25:08 +0000 (13:25 +0000)]
powerd.8: Improve style & fix typos
- Sort options.
- Do not use macros (like .Ar) to specify width for Bl (macros within that
string are not expanded).
- Use Cm instead of Ar for mode names.
- Fix some typos reported by mandoc.
- Move the documentation of the PID file from the -P flag description to
the FILES section.
ip6_output: use new routing KPI when not passed a cached route
Implement the equivalent of r347375 (IPv4) for the IPv6 output path.
In IPv6 we get passed a cached route (and inp) by udp6_output()
depending on whether we acquired a write lock on the INP.
In case we neither bind nor connect a first UDP packet would come in
with a cached route (wlocked) and all further packets would not.
In case we bind and do not connect we never write-lock the inp.
When we do not pass in a cached route, rather than providing the
storage for a route locally and pass it over the old lookup code
and down the stack, use the new route lookup KPI and acquire all
details we need to send the packet.
Compared to the IPv4 code the IPv6 code has a couple of possible
complications: given an option with a routing hdr/caching route there,
and path mtu (ro_pmtu) case which now equally has to deal with the
possibility of having a route which is NULL passed in, and the
fwd_tag in case a firewall changes the next hop (something to
factor out in the future).