jchandra [Thu, 7 Feb 2019 04:50:16 +0000 (04:50 +0000)]
pci_host_generic_acpi: use IORT data for MSI/MSI-X
Use the information from IORT parsing to translate the PCI RID to
GIC ITS device ID. And similarly, use the information to find the
PIC XREF identifier to be used for PCI devices.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D18004
kib [Thu, 7 Feb 2019 02:56:10 +0000 (02:56 +0000)]
Use ifunc to select the barrier instruction for RDTSC.
This optimizes out runtime switch and removes yet another cpuid from
libc.
Note that this is the first use of ifunc in i386 libc, so
ifunc-capable toolchain is required for building runnable userspace on
i386, same as on amd64.
Discussed with: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
jchandra [Thu, 7 Feb 2019 02:30:33 +0000 (02:30 +0000)]
arm64 acpi: Add support for IORT table
Add new file arm64/acpica/acpi_iort.c to support the "IO Remapping
Table" (IORT). The table is specified in ARM document "ARM DEN 0049D"
titled "IO Remapping Table Platform Design Document". The IORT table
has information on the associations between PCI root complexes, SMMU
blocks and GIC ITS blocks in the system.
The changes are to parse and save the information in the IORT table.
The API to use this information is added to sys/dev/acpica/acpivar.h.
The acpi_iort.c also has code to check the GIC ITS nodes seen in the
IORT table with corresponding entries in MADT table (for validity)
and with entries in SRAT table (for proximity information).
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D18002
kib [Thu, 7 Feb 2019 02:17:34 +0000 (02:17 +0000)]
Port sysctl kern.elf32.read_exec from amd64 to i386.
Make it more comprehensive on i386, by not setting nx bit for any
mapping, not just adding PF_X to all kernel-loaded ELF segments. This
is needed for the compatibility with older i386 programs that assume
that read access implies exec, e.g. old X servers with hand-rolled
module loader.
Reported and tested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Thu, 7 Feb 2019 02:09:34 +0000 (02:09 +0000)]
Fix resume on i386 PAE.
It was broken before PAE/no-PAE merge, but since now PAE is the
default, resume is apparently becomes for all machines.
The corrected issues:
- the trampoline page is not mapped executable, so machine faults when
paging is on;
- MSR.EFER and %cr4 both should be loaded before paging is enabled,
otherwise paging structures are invalid (cr4.PAE and EFER.NX).
- MSR.EFER and %cr4 should be only loaded if present. I attempt to handle
this by not touching the registers if the value is zero.
There are some more bits still not quite correct, e.g. unconditional
access to %cr4 in resumectx.
Reported and debugging help by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
ngie [Wed, 6 Feb 2019 21:24:44 +0000 (21:24 +0000)]
Clean up all directories created by `make hier`
The logic I introduced in r322511 unfortunately left chflags schg'ed
directories behind created by `make hier` (in the stock /etc/mtree
files, this is limited to /var/empty).
The proposed change calls `chflags -R 0` and `rm -Rf ...` to clean all
of the directories that could not be removed by `${MAKE} clean`.
`${MAKE} clean` in bsd.obj.mk calls `cleandir`/`cleanobj`, which handles
the first directory tree walk/removal.
mmel [Wed, 6 Feb 2019 06:03:44 +0000 (06:03 +0000)]
Adapt FreeBSD specific DT stub for Jetson TK1 board to be consistent with
update of devicetree to 4.19 in r340337.
Our build system doesn't provide dependencies for included DTS files, so
nobody noticed this issue for long time.
jah [Wed, 6 Feb 2019 04:36:28 +0000 (04:36 +0000)]
r341692 changed cap_syslog(3) to preserve the stdio descriptors inherited
from its parent so that LOG_PERROR would work. However, this caused
dhclient(8)'s stdio streams to remain open across daemonization, breaking
the ability to capture its foreground output as done in netconfig_ipv4.
Fix this by reverting r341692 and instead passing the parent's stderr
descriptor as an argument to cap_openlog() only when LOG_PERROR is specified
in logopt.
jhibbits [Wed, 6 Feb 2019 03:52:14 +0000 (03:52 +0000)]
powerpc: Bind IRQs to only one interrupt on QorIQ SoCs
The QorIQ SoCs don't actually support multicast interrupts, and the
references state explicitly that multicast is undefined behavior. Avoid the
undefined behavior by binding to only a single CPU, using a quirk to
determine if this is necessary.
avos [Wed, 6 Feb 2019 01:34:14 +0000 (01:34 +0000)]
iwn(4): plug initialization path vs interrupt handler races
There are few places in interrupt handler where the driver
lock is dropped; ensure that device is still running before
processing remaining ring entries.
imp [Tue, 5 Feb 2019 22:53:36 +0000 (22:53 +0000)]
Add quirk for Sansisk X400 drives
Certain versions of Sandisk x400 firmware can hang under extremely
heavly load of large I/Os for prolonged periods of time. Newer /
current versions work fine, and should be used where possible. Where
not possible, this quirk ensures that I/O requests are limited to 128k
to avoids the bug, even under extreme load. Since MAXPHYS is 128k,
only users with custom kernels are at risk on the older firmware.
Once all known users of the older firmware have upgraded, this quirk
will be removed.
kib [Tue, 5 Feb 2019 20:09:31 +0000 (20:09 +0000)]
Make it possible to override PAE mode on boot.
Initialize the static kenv in pmap_cold() and fetch user opinion on
vm.pmap.pae_mode tunable if hardware is capable. Note that the static
environment is reinitilized in init386() later when paging is enabled.
Reviewed by: bde
Discussed with: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
luporl [Tue, 5 Feb 2019 18:16:14 +0000 (18:16 +0000)]
[ppc64] llan: fix fatal kernel trap when system is low on memory
When running several builders in parallel, on QEMU, with 8GB of
memory, a fatal kernel trap (0x300 (data storage interrupt))
caused by llan driver is sometimes observed, when the system
starts to run out of swap space.
This happens because, at llan_intr(), a phyp call to add a
logical LAN buffer is always made when llan_add_rxbuf() fails,
even if it fails to allocate a new buffer.
bde [Tue, 5 Feb 2019 17:17:12 +0000 (17:17 +0000)]
Fix missing translation of old ioctls for KDSETMODE, KDSBORDER and
CONS_SETWINORG. After translation, the last 2 are not supported, but
the first one has incomplete support that is enough to run old versions
of X.
bde [Tue, 5 Feb 2019 16:59:29 +0000 (16:59 +0000)]
My recent fix for programmable function keys in syscons only worked
when TEKEN_CONS25 is configured. Fix this by adding a function to
set the flag that enables the fix and always calling this function
for syscons.
Expand the man page for teken_set_cons25(). This function is not
very useful since it can only set but not clear 1 flag. In practice,
it is only used when TEKEN_CONS25 is configured and all that does is
choose the the default emulation for syscons at compile time.
bde [Tue, 5 Feb 2019 15:34:55 +0000 (15:34 +0000)]
Fix zapping of static hints and env in init_static_kenv(). Environments
are terminated by 2 NULs, but only 1 NUL was zapped. Zapping only 1
NUL just splits the first string into an empty string and a corrupted
string. All other strings in static hints and env remained live early
in the boot when they were supposed to be disabled.
Support calling init_static_kenv() very early in the boot, so as to
use the env very early in the boot. Then the pointer to the loader
env may change after the first call due to enabling paging or otherwise
remapping the pointer. Another call is needed to register the change.
Don't use the previous pointer in this (or any) later call.
vmaffione [Tue, 5 Feb 2019 12:10:48 +0000 (12:10 +0000)]
netmap: refactor logging macros and pipes
Changelist:
- Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr
and nm_prlim, to avoid possible naming conflicts.
- Add netmap_krings_mode_commit() helper function and use that
to reduce code duplication.
- Refactor pipes control code to export some functions that
can be reused by the veth driver (on Linux) and epair(4).
- Add check to reject API requests with version less than 11.
- Small code refactoring for the null adapter.
jchandra [Tue, 5 Feb 2019 06:25:35 +0000 (06:25 +0000)]
arm, acpi: increase size of memory region arrays
Bump up MAX_HWCNT and MAX_EXCNT to 32 when ACPI is enabled. These are
the sizes of the hwregions and exregions arrays respectively. ACPI
firmware typically has more memory regions and the current value of
16 is not sufficient for some platforms.
This commit fixes a failure seen with AMI firmware on Cavium's Sabre
ThunderX2 reference platform. This platform needs 21 physical memory
regions and 18 excluded regions to boot correctly with the current
firmware release.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D19073
jhibbits [Tue, 5 Feb 2019 04:47:41 +0000 (04:47 +0000)]
powerpc: Don't idle with the wait instruction on booke
It appears idling via 'wait' on e5500 causes strange behaviors, such as
top(1) simply hanging sporadically, until input. Until this can possibly be
sorted out (interrupt issue?), just don't idle on this hardware. The SoCs
are low power already, and the wait state doesn't save much anyway.
cem [Tue, 5 Feb 2019 03:32:58 +0000 (03:32 +0000)]
extattr_list_vp: Only take shared vnode lock
List is a 'read'-type operation that does not modify shared state; it's safe
for multiple thread to proceed concurrently. This is reflected in the vnode
operation LISTEXTATTR locking protocol specification, which only requires a
shared lock.
(Similar to previous r248933.)
Reported by: Case van Rij <case.vanrij AT isilon.com>
Reviewed by: kib, mjg
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19082
imp [Mon, 4 Feb 2019 21:28:25 +0000 (21:28 +0000)]
Regularize the Netflix copyright
Use recent best practices for Copyright form at the top of
the license:
1. Remove all the All Rights Reserved clauses on our stuff. Where we
piggybacked others, use a separate line to make things clear.
2. Use "Netflix, Inc." everywhere.
3. Use a single line for the copyright for grep friendliness.
4. Use date ranges in all places for our stuff.
Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)
kib [Mon, 4 Feb 2019 21:16:15 +0000 (21:16 +0000)]
Fixes for very early use of the pthread_mutex_* and libthr malloc.
When libthr is statically linked into the binary, order of the
constructors execution is not deterministic. It is possible for the
application constructor to use pthread_mutex_* functions before the
libthr initialization was done.
Handle it by:
- making thr_malloc.c locking functions operational when curthread is not
yet set;
- making __thr_malloc_init() idempotent, allowing more than one call to it;
- unconditionally calling __thr_malloc_init() before initializing
a process-private mutex.
Reported and tested by: mmel
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
marius [Mon, 4 Feb 2019 20:46:57 +0000 (20:46 +0000)]
o As illustrated by e. g. figure 7-14 of the Intel 82599 10 GbE
controller datasheet revision 3.3, in the context of Ethernet
MACs the control data describing the packet buffers typically
are named "descriptors". Each of these descriptors references
one buffer, multiple of which a packet can be composed of.
By contrast, in comments, messages and the names of structure
members, iflib(4) refers to DMA resources employed for RX and
TX buffers (rather than control data) as "desc(riptors)".
This odd naming convention of iflib(4) made reviewing r343085
and identifying wrong and missing bus_dmamap_sync(9) calls in
particular way harder than it already is. This convention may
also explain why the netmap(4) part of iflib(4) pairs the DMA
tags for control data with DMA maps of buffers and vice versa
in calls to bus_dma(9) functions.
Therefore, change iflib(4) to refer to buf(fers) when buffers
and not the usual understanding of descriptors is meant. This
change does not include corrections to the DMA resources used
in the netmap(4) parts. However, it revises error messages to
state which kind of allocation/creation failed. Specifically,
the "Unable to allocate tx_buffer (map) memory" copy & pasted
inappropriately on several occasions was replaced with proper
messages.
o Enhance some other error messages to indicate which half - RX
or TX - they apply to instead of using identical text in both
cases and generally canonicalize them.
o Correct the descriptions of iflib_{r,t}xsd_alloc() to reflect
reality; current code doesn't use {r,t}x_buffer structures.
o In iflib_queues_alloc():
- Remove redundant BUS_DMA_NOWAIT of iflib_dma_alloc() calls,
- change the M_WAITOK from malloc(9) calls into M_NOWAIT. The
return values are already checked, deferred DMA allocations
not being an option at this point, BUS_DMA_NOWAIT has to be
used anyway and prior malloc(9) calls in this function also
specify M_NOWAIT.
ngie [Mon, 4 Feb 2019 19:12:45 +0000 (19:12 +0000)]
Avoid the DNS lookup for "localhost"
ci.FreeBSD.org does not have access to a DNS resolver/network (unlike my test
VM), so in order for the test to pass on the host, it needs to avoid the DNS
lookup by using the numeric host address representation.
dim [Mon, 4 Feb 2019 18:07:03 +0000 (18:07 +0000)]
Use NLDT to get number of LDTs on i386
Compiling a GENERIC kernel for i386 with clang 8.0 results in the
following warning:
/usr/src/sys/i386/i386/sys_machdep.c:542:40: error: 'sizeof ((ldt))' will return the size of the pointer, not the array itself [-Werror,-Wsizeof-pointer-div]
nldt = pldt != NULL ? pldt->ldt_len : nitems(ldt);
^~~~~~~~~~~
/usr/src/sys/sys/param.h:299:32: note: expanded from macro 'nitems'
#define nitems(x) (sizeof((x)) / sizeof((x)[0]))
~~~~~~~~~~~ ^
Indeed, 'ldt' is declared as 'union descriptor *', so nitems() is not
the right way to determine the number of LDTs. Instead, the NLDT define
from sys/x86/include/segments.h should be used.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D19074
luporl [Mon, 4 Feb 2019 16:02:03 +0000 (16:02 +0000)]
powerpc64: Add a trap stack area
Currently, the trap code switches to the the temporary stack in the dbtrap
section. It works in most cases, but in the beginning of the execution, the
temp stack is being used, as starting in the powerpc_init() code.
In this current scenario, the stack is being overwritten, which causes the
return of breakpoint() to take abnormal execution.
This current patchset create a small stack to use by the dbtrap: codepath
avoiding the corruption of the temporary stack.
mav [Mon, 4 Feb 2019 01:24:10 +0000 (01:24 +0000)]
Check element type before setting LEDs.
With r319610, sesutil started twiddling the bits of every SES device.
Not everything is a disk slot, there are also fan controllers, temperature
sensors, even power supplies, among other things controlled by SES.
Add a type check to make sure we are only operating on device slot and array
device slot elements. Other type elements will be skipped, but it would be
simple to add additional cases for controlling the ident LEDs of other
element types (which are not necessarily the same bits).
Rather than doing raw bit manipulation of an unstructured byte array using
unnamed numeric constants, leverage existing code abstractions.
Submitted by: Ryan Moeller <ryan@freqlabs.com>
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D19052
kib [Sun, 3 Feb 2019 21:28:58 +0000 (21:28 +0000)]
i386: Do not ever store to other-CPU counter64 slot.
On CPUs supporting cmpxchg8b, fetch is performed by cmpxchg8b on
corresponding CPU slot, which unconditionally write to the slot. If
for that slot, the owner CPU increments it, then both CPUs might run
the cmpxchg8b instruction concurrently and this might race and
override the incremental write. So the counter update would be lost.
Fix it by implementing fetch as IPI and accumulation of result. It is
acceptable for rare counter64 fetch operation to be more expensive.
Diagnosed and tested by: Andreas Longwitz <longwitz@incore.de>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
markj [Sun, 3 Feb 2019 18:43:20 +0000 (18:43 +0000)]
Allow vm_page_free_prep() to dequeue pages without the page lock.
This is a step towards being able to free pages without the page
lock held. The approach is simply to add an implementation of
vm_page_dequeue_deferred() which does not assert that the page
lock is held. Formally, the page lock is required to set
PGA_DEQUEUE, but in the case of vm_page_free_prep() we get the
same mutual exclusion for free by virtue of the fact that no
other references to the page may exist.
markj [Sun, 3 Feb 2019 18:38:58 +0000 (18:38 +0000)]
Fix a race in vm_page_dequeue_deferred().
To detect the case where the page is already marked for a deferred
dequeue, we must read the "queue" and "aflags" fields in a
precise order. Otherwise, a race with a concurrent
vm_page_dequeue_complete() could leave the page with PGA_DEQUEUE
set despite it already having been dequeued. Fix the problem by
using vm_page_queue() to check the queue state, which correctly
handles the race.
andrew [Sun, 3 Feb 2019 12:46:27 +0000 (12:46 +0000)]
Enable COVERAGE and KCOV by default on arm64 and amd64.
This allows userspace to trace the kernel using the coverage sanitizer
found in clang. It will also allow other coverage tools to be built as
modules and attach into the same framework.
new_kmem_alloc(9) is a Solaris/illumos malloc(9). FreeBSD and NetBSD
never get here, however a test for SOLARIS, as redundant as this test is,
serves to document that this is the illumos definition. This should help
those who come after me to follow the code more easily.
Kernel module shim sources have no business being in the userland
build directory, especially those for other operating systems.
The kernel module shims for other operating systems are hereby removed.
The kernel module shim for FreeBSD, mlfk_ipl.c, is already in
sys/contrib/ipfilter/netinet. The one here is never used and should
not be in the userland build directory either.
mlfk_rule.c isn't used either however we will keep it in case someone
wishes to use this shim to load rules via a kernel module, handy for
embedded. In that case it should be copied to
sys/contrib/ipfilter/netinet and a Makefile created to employ it.
(Probably a useful documentation project when time permits.)
Remove #ifdefs for ancient and irrelevant operating systems from
ipfilter.
When ipfilter was written the UNIX and UNIX-like systems in use
were diverse and plentiful. IRIX, Tru64 (OSF/1) don't exist any
more. OpenBSD removed ipfilter shortly after the first time the
ipfilter license terms changed in the early 2000's. ipfilter on AIX,
HP/UX, and Linux never really caught on. Removal of code for operating
systems that ipfilter will never run on again will simplify the code
making it easier to fix bugs, complete partially implemented features,
and extend ipfilter.
Unsupported previous version FreeBSD code and some older NetBSD code
has also been removed.
What remains is supported FreeBSD, NetBSD, and illumos. FreeBSD and
NetBSD have collaborated exchanging patches, while illumos has expressed
willingness to have their ipfilter updated to 5.1.2, provided their
zone-specific updates to their ipfilter are merged (which are of interest
to FreeBSD to allow control of ipfilters in jails from the global zone).
For 11n / 11ac we are still using non-11n rates for management and
multicast traffic by default; check 'MCS rate' bit to determine how
to print them correctly.
avos [Sun, 3 Feb 2019 02:32:13 +0000 (02:32 +0000)]
net80211(4): fix rate check when 'roaming' ifconfig(8) option is set to 'auto'
Do not try to clear 'basic rate' bit from roamRate; it cannot be here and,
actually, this operation clears 'MCS rate' bit instead, breaking comparison
for 11n / 11ac modes.
Tested with RTL8188CUS, HOSTAP mode + RTL8821AU, STA mode.
avos [Sun, 3 Feb 2019 01:32:02 +0000 (01:32 +0000)]
net80211(4): do not setup roaming parameters for unsupported modes.
ifconfig(8) prints per-mode parameters if they are non-zero; since
we have 13 possible modes with 3...5 typically supported this change
should greatly reduce amount of information for 'ifconfig <wlan> list roam'
command.
While here ensure that sta_roam_check() will not use roaming parameters
for unsupported modes (it should not).
vmaffione [Sat, 2 Feb 2019 22:39:29 +0000 (22:39 +0000)]
netmap: upgrade sync-kloop support
Add SYNC_KLOOP_MODE option, and add support for direct mode, where application
executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback.
pkelsey [Sat, 2 Feb 2019 21:14:53 +0000 (21:14 +0000)]
Fix interrupt index configuratoin when using MSI interrupts.
When in MSI mode, the device was only being configured with one
interrupt index, but it needs two - one for the actual interrupt and
one to park the tx queue at.
Also clarified comments relating to interrupt index assignment.
Reported by: Yuri Pankov <yuripv@yuripv.net>
MFC after: 1 day
jhibbits [Sat, 2 Feb 2019 04:15:16 +0000 (04:15 +0000)]
powerpc/powernv: Add a driver for the POWER9 XIVE interrupt controller
The XIVE (External Interrupt Virtualization Engine) is a new interrupt
controller present in IBM's POWER9 processor. It's a very powerful,
very complex device using queues and shared memory to improve interrupt
dispatch performance in a virtualized environment.
This yields a ~10% performance improvment over the XICS emulation mode,
measured in both buildworld, and 'dd' from nvme to /dev/null.
mav [Sat, 2 Feb 2019 04:11:59 +0000 (04:11 +0000)]
Fix integer math overflow in UMA hash_alloc().
512GB of ZFS ABD ARC means abd_chunk zone of 128M 4KB items. To manage
them UMA tries to allocate 2GB hash table, which size does not fit into
the int variable, causing later allocation failure, which makes ARC shrink
back below the 512GB, not letting it to use more RAM. With this change I
easily reached >700GB ARC size on 768GB RAM machine.