CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

[ig4] Handle controller startup errors

Fail the attach on controller startup errors. For some reason the
dell xps 13 says there's I2C controller, but the controller appears
to be permanente disabled and will refuse to enable.

Obtained from: DragonflyBSD (509820b)

[ig4] Give common name to PCI and ACPI device drivers

They share common device driver code with different bus attachments

This commit starts a bunch of changes which have following properties:

Reviewed by: imp (previous version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D22016

Take arm.arm (armv5) out of universe

It's on the chopping block in two months, the CI tinderbox doesn't bother with
it anymore either, and buildworld fails today due to an issue linking clang.
It's not worth investigating and it just eats up CPU cycles running universe
builds.

armv6: Switch to LLD by default

This could just be ${__TT} == "arm", except armv5 isn't slated for death until
EOY.

arm tinderbox builds. Let's see what else shakes out.

bhyve: add backend rx backpressure to virtio-net

If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().

Reviewed by: bryanv, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20987

bhyve: fix mistake introduced by r352841

MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20973

Utilize ASIDs to reduce both the direct and indirect costs of context
switching.  The indirect costs being unnecessary TLB misses that are
incurred when ASIDs are not used.  In fact, currently, when we perform a
context switch on one processor, we issue a broadcast TLB invalidation that
flushes the TLB contents on every processor.

Mark all user-space ("ttbr0") page table entries with the non-global flag so
that they are cached in the TLB under their ASID.

Correct an error in pmap_pinit0().  The pointer to the root of the page
table was being initialized to the root of the kernel-space page table
rather than a user-space page table.  However, the root of the page table
that was being cached in process 0's md_l0addr field correctly pointed to a
user-space page table.  As long as ASIDs weren't being used, this was
harmless, except that it led to some unnecessary page table switches in
pmap_switch().  Specifically, other kernel processes besides process 0 would
have their md_l0addr field set to the root of the kernel-space page table,
and so pmap_switch() would actually change page tables when switching
between process 0 and other kernel processes.

Implement a workaround for Cavium erratum 27456 affecting ThunderX machines.
(I would like to thank andrew@ for providing the code to detect the affected
machines.)

Address integer overflow in the definition of TCR_ASID_16.

Setup TCR according to the PARange and ASIDBits fields from
ID_AA64MMFR0_EL1.  Previously, TCR_ASID_16 was unconditionally set.

Modify build_l1_block_pagetable so that lower attributes, such as ATTR_nG,
can be specified as a parameter.

Eliminate some unused code.

Earlier versions were tested to varying degrees by: andrew, emaste, markj

MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D21922

Add support for setting hardware breakpoints from ptrace on arm64.

Implement get/fill_dbregs on arm64. This is used by ptrace with the
PT_GETDBREGS and PT_SETDBREGS requests. It allows userspace to set hardware
breakpoints.

The struct dbreg is based on Linux to ease adding hardware breakpoint
support to debuggers.

Reviewed by: jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22195

loader: zfs.c is missing malloc checks, fix it

malloc() can return NULL, we need to check the return value.

loader: we do not support booting from pool with log device

If pool has log device, stop there and tell about it.

loader: should check malloc in zfs_dev_open

malloc can return NULL.

amd64: Store %cr3 into pcpu saved_ucr3 on double fault.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

amd64 ddb: Add printing of kernel/user and saved user %cr3 values from pcpu.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

loader: calculate physical vdev psize from asize

Since physical device asize is calculated from psize and the asize is stored
in pool label, we can use asize to set the value of psize, which is used to
calculate the location of the pool labels.

MFC after: 1 week

loader: userboot/test should accept more than one disk

allow to specify multiple -d options, test -d disk1 -d disk2 ..

Downgrade the firmware images imported in r354201.

Version 43 requires further modifications to iwm(4), and this was not
caught in some initial testing. Version 34 works and is the version
available on Intel's web site.

MFC with: r354201
Sponsored by: The FreeBSD Foundation

powerpc: Add display of raw instruction values to x/I in ddb.

The "alternate format" character 'I' previously had the same behavior as
the "display as an instruction" character 'i'. With this change, it will now
prefix each disassembled instruction with the raw hex value.

As PowerPC instructions are always 32 bits and always aligned, and there are
no alternate modes that would affect instruction decoding or display, this
seemed to me to be the obvious interpretation of "alternate format".

Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D22223

powerpc: Fix incorrect disassembly of the cntlzw instruction in ddb.

Noticed while comparing disassembly between ddb and objdump.

Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D22121

MFV r354257:

Update sqlite3-3.29.0 (3290000) --> sqlite3-3.30.1 (3300100)

MFC after: 1 month

Remove lock from CTL camsim frontend.

CAM does not need a SIM lock for quite a while, and CTL never needed it.

MFC after: 2 weeks

r354264 did mix up the directory path

The correct path is sys/cddl/contrib/opensolaris/common/lz4, not
sys/cddl/contrib/opensolaris/lz4

Reported by: Michael Butler

Add support for building Book-E kernels with clang/lld.

This involved several changes:

* Since lld does not like text relocations, replace SMP boot page text relocs
in booke/locore.S with position-independent math, and track the virtual base
in the SMP boot page header.

* As some SPRs are interpreted differently on clang due to the way it handles
platform-specific SPRs, switch m*dear and m*esr mnemonics out for regular
m*spr. Add both forms of SPR_DEAR to spr.h so the correct encoding is selected.

* Change some hardcoded 32 bit things in the boot page to be pointer-sized, and
fix alignment.

* Fix 64-bit build of booke/pmap.c when enabling pmap debugging.

Additionally, I took the opportunity to document how the SMP boot page works.

Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D21999

r354253 did miss the fact that libzpool is built as fake kernel

We build libzpool as kernel like, use _FAKE_KERNEL check to include
kernel api in libzpool.

r354253 did miss the updates to sys/conf/files and sys/conf/kern.pre.mk

Reported by: Brandon Bergren

RISC-V: Remove EARLY_AP_STARTUP from GENERIC

This option is causing boot to fail for the Hifive Unleashed and older
versions of QEMU (3.1.1). Remove it from the GENERIC config for now.

Reported by: br
MFC after: 1 week

Import sqlite3-3.30.1 (3300100)

Add __isnan()/__isnanf() aliases for compatibility with glibc and CUDA

Even though clang comes with a number of internal CUDA wrapper headers,
compiling sample CUDA programs will result in errors similar to:

In file included from <built-in>:1:
In file included from /usr/lib/clang/9.0.0/include/__clang_cuda_runtime_wrapper.h:204:
/usr/home/arr/cuda/var/cuda-repo-10-0-local-10.0.130-410.48/usr/local/cuda-10.0//include/crt/math_functions.hpp:2910:7: error: no matching function for call to '__isnan'
  if (__isnan(a)) {
      ^~~~~~~
/usr/lib/clang/9.0.0/include/__clang_cuda_device_functions.h:460:16: note: candidate function not viable: call to __device__ function from __host__ function
__DEVICE__ int __isnan(double __a) { return __nv_isnand(__a); }
               ^

CUDA expects __isnan() and __isnanf() declarations to be available,
which are glibc specific extensions, equivalent to the regular isnan()
and isnanf().

To provide these, define __isnan() and __isnanf() as aliases of the
already existing static inline functions __inline_isnan() and
__inline_isnanf() from math.h.

Reported by: arrowd
PR: 241550
MFC after: 1 week

r354253 did miss lz4.c from sys/cddl/boot/zfs.

Remove duplicate lz4 implementations

Port illumos change: https://www.illumos.org/issues/11667

Move lz4.c out of zfs tree to opensolaris/common/lz4, adjust it to be
usable from kernel/stand/userland builds, so we can use just one single
source. Add lz4.h to declare lz4_compress() and lz4_decompress().

MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D22037

loader: libi386/comconsole.c cstyle cleanup

Only cstyle, no functional changes.

If /usr/obj is a symlink, cpio(1) needs persuation to DTRT.

Mention that ports/net-mgmt/libsmi is required

loader: fall back to term_emu on efi console with serial backend

In case of efi console having serial backend (video + serial or only serial),
we need to stick with old emulator till we can draw console.

Eventually we would need to get console terminal emulator to be removed
from serial console because the serial link already has the terminal.

However, we need to implement comconsole on all efi platforms first, then
we need the ability to draw console, so we do not have to use SimpleTextOutput
protocol (which will write both on video and serial in case of multiplexed
ComOut).

Differential Revision: https://reviews.freebsd.org/D22161

lualoader: rewrite try_include using lfs + dofile

Actual modules get require()'d in, rather than try_include(). All instances
of try_include should be provided with proper hooks/API in the rest of
loader to do the work they need to do, since we can't rely on them to exist.
Convert this now to lfs + dofile since we won't really be treating them as
modules.

lfs is required because dofile will properly throw an error if the file
doesn't exist, which is not in the spirit of 'optionally included'.

Getting out of the pcall game allows us to provide a loader.exit() style
call that backs out to the common bits of loader (autoboot sequence unless
disabled with a loader.setenv("autoboot_delay", "NO")). The most ideal way
identified so far to implement loader.exit() is to throw a special
abort-style error that indicates to the caller in interp_lua that we've not
actually errored out, just continue execution. Otherwise, we have to hack in
logic to bubble up and return from loader.lua without continuing further,
which gets kind of ugly depending on the context in which we're aborting.

A compat shim is provided temporarily in case the executing loader doesn't
yet have loader.lua_path, which was just added in r354246.

liblua: add loader.lua_path

As described previously, loader.lua_path is absolute path where scripts are
installed. A future commit will use this to build paths for dofile in
try_include, rather than the current pcall/require setup that makes it more
difficult to coordinate loader aborts from local.lua -- we do not need the
flexibility of require(), and local.lua is in-fact not a 'module-like' file
as we will not be referencing anything from it.

stand: consolidate knowledge of lua path

Multiple places coordinate to 'know' where lua scripts are installed. Knock
this down to being formally defined (and overridable) in exactly one spot,
defs.mk, and spread the knowledge to loaders and liblua alike. A future
commit will expose this to lua as loader.lua_path, so it can build absolute
paths to lua scripts as needed.

MFC after: 1 week

Fix regression from r353026. Pointer was increased instead of value
pointed to.

PR: 241646
Submitted by: Aleksandr Fedorov <aleksandr.fedorov itglobal.com>

powerpc/mpc85xx: Set description for the MPC85xx RC bridge

Make valdiate_rx_req_id static inline because it uses other static
inline functions. gcc complains about this, most likely due to
the subtle differences between inline and static inline functions
defined in headers.

Some more taskqueue optimizations.

- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
- Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache.  Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
- Move static and dynamic struct taskqueue fields into different cache
lines.  Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
- While there, correct some TQ_SLEEP() wait messages.

This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock.  Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.

MFC after: 1 week
Sponsored by: iXsystems, Inc.

We don't support configuring serial PCI cards in EFI. Make this clearer in the
source rather than obfuscaring it behind NO_PCI (nothing else declares that,
so it's not making the ifdefs clearer).

[PPC64] Fix GDB sigtramp detection

Current implementation of ppcfbsd_pc_in_sigtramp() seems to take only 32-bit
PowerPC in account, as on 64-bit PowerPC most kernel instruction addresses will
be wrongly reported as in sigtramp.

This change adds proper sigtramp detection for PPC64.

Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22199

Temporarily skip lib.libexecinfo.backtrace_test.backtrace_fmt_basic on i386

PR: 241562
Sponsored by: The FreeBSD Foundation

loader: asprinf does crash arm64 due to missing NULL pointer check

PCHAR macro needs to check if d is NULL.

MFC after: 3 days

mdmfs(8): add -k skel option to populate fs from a skeleton

mdmfs(8) lacks the ability to populate throwaway memory filesystems from an
existing directory.

This features permits an interesting setup where /var for instance lives on
a device where wear-leveling is something you want to avoid as much as
possible and nonetheless you don't want to lose your logs, ports metadata,
etc. Here are the steps:

1. Copy /var to /var.bak;
2. Mount an mfs into /var using -k /var.bak at startup;
3. Synchronize /var to /var.bak weekly and on shutdown.

Note that this more or less mimics OpenBSD's mount_mfs(8) -P flag.

PR: 146254
Submitted by: jlh (many moons ago)
MFC after: 1 week

powerpc/booke: Fix TLB1 entry accounting

It's possible, with per-CPU mappings, for TLB1 indices to get out of sync.
This presents a problem when trying to insert an entry into TLB1 of all
CPUs.  Currently that's done by assuming (hoping) that the TLBs are
perfectly synced, and inserting to the same index for all CPUs.  However,
with aforementioned private mappings, this can result in overwriting
mappings on the other CPUs.

An example:

    CPU0                    CPU1
    <setup all mappings>    <idle>
        3 private mappings
      kick off CPU 1
                            initialize shared mappings (3 indices low)
                            Load kernel module, triggers 20 new mappings
      Sync mappings at N-3
                            initialize 3 private mappings.

At this point, CPU 1 has all the correct mappings, while CPU 0 is missing 3
mappings that were shared across to CPU 1.  When CPU 0 tries to access
memory in one of the overwritten mappings, it hangs while tripping through
the TLB miss handler.  Device mappings are not stored in any page table.

This fixes by introducing a '-1' index for tlb1_write_entry_int(), so each
CPU searches for an available index private to itself.

MFC after: 3 weeks

geli: raise WARNS to 6

MFC after: 2 weeks
Sponsored by: Axcient

truss: centralize pointer-constructing casts.

In nearly all cases, the caller has a uintptr_t compatible argument so
this eliminates a large number of casts.

Add a print_pointer function to centralize printing pointers.

Reviewed by: jhb
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22212

netmap: fix build issue in netmap_user.h

The issue was a comparison of integers of different signs
on 32 bit architectures.

Reported by: jenkins
MFC after: 1 week

add valectl to the system commands

The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap

Reviewed by: hrs, bcr, lwhsu, kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22146

avoid kernel stack data leak in core dump thrmisc note

bzero the entire thrmisc struct, not just the padding. Other core dump
notes are already done this way.

Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation

Allow bsd.compat.mk to be reliably included outside Makefile.inc1.

Replace explicit TARGET_* variables with COMPAT_* versions defined based
on where the file is being included.

Also, require that bsd.compat.mk be included directly. It's not going to
be widely used so always loading it in bsd.prog.mk doesn't make sense.
Instead users can include it directly.

Reviewed by: imp, bdrewery (prior revision)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22059

Update ENA version to v2.1.0

In this release the netmap support was introduced.

Moreover, it is also now possible to use the LLQ mode of the driver on
the arm64 AWS instances (A1 type).

Differential Revision: https://reviews.freebsd.org/D21938
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Add support for ENA NETMAP partial initialization

In NETMAP mode not all queues need to be allocated to NETMAP. Some of
them could be left to the kernel. Configuration is managed by the flags
nr_mode and nr_pending_mode provided per each NETMAP kring.

ENA driver checks those flags and perform proper rings initialization.

Differential Revision: https://reviews.freebsd.org/D21937
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Add support for ENA NETMAP Tx

Two new tables are added to ena_tx_buffer structure:
* netmap_map_seg stores DMA mapping structures,
* netmap_buf_idx stores buff indexes taken from the slots.

When Tx resources are being set, the new mapping structures are created
and netmap Tx rings are being reset.

When Tx resources are being released, used netmap bufs are unmapped from
DMA and then mapping structures are destroyed.

When Tx interrupt occurrs, ena_netmap_tx_irq is called.

ena_netmap_txsync callback signalizes that there are new packets which
should be transmitted.
First, it fills ena_netmap_ctx. Then it performs two actions:
* ena_netmap_tx_frames moves packets from netmap ring to NIC,
* ena_netmap_tx_cleanup restores buffers from NIC and gives them back
to the userspace app.
0 is returned in case of Tx error that could be handled by the driver.

ena_netmap_tx_frames checks if there are packets ready for transmission.
Then, for each of them, ena_netmap_tx_frame is called. If error occurs,
transmitting is stopped, but if the error was cause due to HW ring being
full, information about that is not propagated to the userspace app.
When all packets are ready, doorbell is written to NIC and netmap ring
state is updated.

Parsing of one packet is done by the ena_netmap_tx_frame function.
First, it checks if number of slots does not exceed NIC limit. Invalid
packets are being dropped and the error is propagated to the upper
layer. As each netmap buffer has equal size, which is typically greater
then 2KiB, there shouldn't be any packets which contain too many slots.
Then, the ena_com_tx_ctx structure is being filled. As netmap does not
support any hardware offloads, ena_com_tx_meta structure is set to zero.
After that, ena_netmap_map_slots maps all memory slots for DMA.
If the device works in the LLQ mode, the push header is being determined
by checking if the header fits within the first socket.
If so, the portion of data is being copied directly from the slot.
In other case, the data is copied to the intermediate buffer.
First slots are treated the same as as the others, because DMA mapping
has no impact on LLQ mode. Index of each netmap buffer is taken from
slot and stored in netmap_buf_idx array. In case of mapping error,
memory is unmapped and packets are put back to the netmap ring.

ena_netmap_tx_cleanup performs out of order cleanup of sent buffers.
First, req_id is taken and is validated. As validate_tx_req_id from
ena.c is specific to kernels mbuf, another implementation is provided.
Each req_id is cleaned up by ena_netmap_tx_clean_one function. Buffers
are being unmaped from DMA and put back to netmap ring. In the end,
state of netmap and NIC rings are being updated.

Differential Revision: https://reviews.freebsd.org/D21936
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Add support for ENA NETMAP Rx

Most of code used for Rx ring initialization could be reused in NETMAP.
Reset of NETMAP ring and new alloc method was added. Driver decides if
use kernels mbufs or NETMAPs slots based on IFCAP_NETMAP flag. It
allows to reuse ena_refill_rx_bufs, which provides proper handling of
Rx out of order completion.

ena_netmap_alloc_rx_slot takes exactly the same arguments as
ena_alloc_rx_mbuf, but instead of allocating one mbuf it takes one slot
from NETMAP ring. Based on queue id proper netmap_ring is found. As
NETMAP provides the "partial opening" feature not all of the rings are
avaiable. Not used points to invalid ring. If there is available slot,
it is taken from the ring. Its buffer is mapped to DMA and its index is
stored in ena_rx_buffer field in ena_rx_buffer structure. Then ena_buf
is filled with addresses and ring state is updated.

Cleanup is handled by ena_netmap_free_rx_slot. It unmaps DMA and returns
buffer to ring. As we could not return more bufs than we have taken and
we should not override occupied slots, buf_index should be 0. It is
being checked by assertion.

ena_netmap_rxsync callback puts received packets back to NETMAP ring and
passes them to user space by updating ring pointers. First it fills
ena_netmap_ctx.
Then it performs two actions:
* ena_netmap_rx_frames moves received frames from NIC to NETMAP ring,
* ena_netmap_rx_cleanup fills NIC ring with slots released by userspace
app.

In case of Rx error that could be handled by NIC driver (for example by
performing reset) rx sync should return 0.

ena_netmap_rx_frames first checks if NETMAP ring is in consistent
state and then in the loop receives new frames. When all available
frames are taken nr_hwtail is updated.

Receiving one frame is handled by ena_netmap_rx_frame. If no error
occurrs, each Descriptor is loaded by ena_netmap_rx_load_desc function.
If packets take more than one segments NS_MOREFRAG flag must be set in
all, but not last slot. In case of wrong req_id packet is removed from
NETMAP ring. If packet is successful received counters are updated.

Refiling of NIC ring is performed by ena_netmap_rx_cleanup function.
It calculates number of available slots and call ena_refill_rx_bufs with
proper number.

Differential Revision: https://reviews.freebsd.org/D21935
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Introduce NETMAP support in ENA

Mock implementation of NETMAP routines is located in ena_netmap.c/.h
files. All code is protected under the DEV_NETMAP macro. Makefile was
updated with files and flag.

As ENA driver provide own implementations of (un)likely it must be
undefined before including NETMAP headers.

ena_netmap_attach function is called on the end of NIC attach. It fills
structure with NIC configuration and callbacks. Then provides it to
netmap_attach. Similarly netmap_detach is called during ena_detach.

Three callbacks are used.
nm_register is implemented by ena_netmap_reg. It is called when user
space application open or close NIC in NETMAP mode. Current action is
recognized based on onoff parameter: true means on and false off. As
NICs rings need to be reconfigured ena_down and ena_up are reused.
When user space application wants to receive new packets from NIC
nm_rxsync is called, and when there are new packets ready for Tx
nm_txsync is called.

Differential Revision: https://reviews.freebsd.org/D21934
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Split Rx/Tx from initialization code in ENA driver

Move Rx/Tx routines to separate file.
Some functions:
* ena_restore_device,
* ena_destroy_device,
* ena_up,
* ena_down,
* ena_refill_rx_bufs
could be reused in upcoming netmap code in the driver. To make it
possible, they were moved to ena.h header.

Differential Revision: https://reviews.freebsd.org/D21933
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Fix ENA keep-alive timeout due to prolonged reset

When the ENA_FLAG_DEVICE_RUNNING flag is disabled, the AENQ handlers
aren't executed. To fix that, the watchdog timestamp should be updated
just before enabling the watchdog.

Timer service was always being enabled, even if the device wasn't up
before the reset. That shouldn't happen, as the timer service is being
executed only for working interface.

Differential Revision: https://reviews.freebsd.org/D21932
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Add WC support for arm64 in the ENA driver

As the pmamp_change_attr() is public on arm64 since r351131, it can be
used on the arm64 to map memory range as with the write combined
attribute.

It requires the driver to use generic VM_MEMATTR_WRITE_COMBINING flag
instead of the x86 specific PAT_WRITE_COMBINING.

Differential Revision: https://reviews.freebsd.org/D21931
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Fix pmap_change_attr() on arm64 to allow KV addresses

Altough in the comment above the pmap_change_attr() it was mentioned
that VA could be in KV or DMAP memory space. However,
pmap_change_attr_locked() was accepting only the values inside the DMAP
memory range.

To fix that, the condition check was changed so also the va inside the
KV memory range would be accepted.

The sample use case that wasn't supported is the PCI Device that has the
BAR which should me mapped with the Write Combine attribute - for
example BAR2 of the ENA network controller on the A1 instances on AWS.

Tested on A1 AWS instance and changed ENA BAR2 mapped resource to be
write-combined memory region.

Differential Revision: https://reviews.freebsd.org/D22055
MFC after: 2 weeks
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Fix a typo in r353895.

Reported by: andrew
MFC after: 3 days
Sponsored by: The FreeBSD Foundation

Fix GDB machdep code for PPC/PPC64

There was a couple issues with GDB machdep code for PPC/PPC64, the main ones being:
- wrong register sizes being returned
- pcb_context index was wrong (this affects all PPC variants)

Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22201

[PPC64] Fix trapstk overflow

In some scenarios, the 4K trapstk may overflow, corrupting tmpstk.

This was observed during remote debugging, with the following steps:

At remote host (R):
- enter kdb during boot
- switch to gdb backend

At local host (L):
- attach gdb to R
- try to read an invalid memory position

At R:
- a DSI trap occurs and kdb restarts (all this occurs on trapstk)
- while printing the stacktrace, trapstk overflows and corrupts tmpstk

Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22200

iicbb: allow longer SCL low timeout and other improvements

First, SCL low timeout is set to 25 milliseconds by default as opposed
to 1 millisecond before.  The new value is based on the SMBus
specification.  The timeout can be changed on a per bus basis using
dev.iicbb.N.scl_low_timeout sysctl.

The driver uses DELAY to wait for high SCL up to 1 millisecond, then it
switches to pause_sbt(SBT_1MS) for the rest of the timeout.

While here I made a number of other changes.  'udelay' that's used for
timing clock and data signals is now calculated based on the requested
bus frequency (dev.iicbus.N.frequency) instead of being hardcoded to 10
microseconds.  The calculations are done in such a fashion that the
default bus frequency of 100000 is converted to udelay of 10 us.  This
is for backward compatibility.  The actual frequency will be less than a
quarter (I think) of the requested frequency.

Also, I added detection of stuck low SCL in a few places.  Previously,
the code would just carry on after the SCL low timeout and that might
potentially lead to misinterpreted bits.

Finally, I fixed several style issues near the code that I changed.
Many more are still remaining.

Tested by accessing HTU21 temperature and humidity sensor in this setup:
  superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
  gpio1: <Nuvoton GPIO controller> at GPIO ldn 0x07 on superio0
  pcib0: allocated type 4 (0x220-0x226) for rid 0 of gpio1
  gpiobus1: <GPIO bus> on gpio1
  gpioiic0: <GPIO I2C bit-banging driver> at pins 14-15 on gpiobus1
  gpioiic0: SCL pin: 14, SDA pin: 15
  iicbb0: <I2C bit-banging driver> on gpioiic0
  iicbus0: <Philips I2C bus> on iicbb0 master-only
  iic0: <I2C generic I/O> on iicbus0

Discussed with: ian, imp
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D22109

cxgbe(4): Use correct size while converting lpacaps32 to native
endianness.

iflib: cleanup memory leaks on driver detach

From Jake:
The iflib stack failed to release all of the memory allocated under
M_IFLIB during device detach.

Specifically, the ifmp_ring, the ift_ifdi Tx DMA info, and the ifr_ifdi Rx
DMA info were not being released.

Release this memory so that iflib won't leak memory when a device
detaches.

Since we're freeing the ift_ifdi pointer during iflib_txq_destroy we
need to call this only after iflib_dma_free in iflib_tx_structures_free.

Additionally, also ensure that we destroy the callout mutex associated
with each Tx queue when we free it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D22157

Use the new cam_sim_alloc_dev function to properly initialize SIM

Using cam_sim_alloc_dev() allows to properly set sim_dev field so that
sdiob(4) can attach to the CAM device that represents SDIO card.
The same change for SDHCI driver happened in r348800.

Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D22192

Remove redundant hw sysctl declaration. gcc CI complains, but clang doesn't.

Move all the sys/dev/[a-j]* that are common to files.x86

All these device entries are common between the two files. Move them to
files.x86. Also sort entries from this range into proper order in files.amd64.

Remove duplicate lines.

Add firmware images for Intel 9000-series wifi chips.

This is in preparation for adding the corresponding support to iwm(4).

Version 46 is the latest but contains unrecognized TLVs, so use version
43 for now.

Obtained from: linux-firmware
MFC after: 1 month
Sponsored by: The FreeBSD Foundation

vm_page_wire_mapped: explain why failure does not affect correctness.

Reviewed by: markj (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22196

Set the userspace execute never bit on kernel mappings.

Arm64 allows us to create execute only mappings. To make sure userspace is
unable to accidentally execute kernel code set the user execute never
bit in the kernel page tables.

MFC after: 1 week
Sponsored by: DARPA, AFRL

Make hyperv keyboard work again.

r351049 bogusly deleted these lines from files.amd64 but failed to add them to
files.x86. Since this works on i386, add them to files.x86 rather than just
adding them back to files.amd64.

PR: 240734
Reported by: Michael Pro

ow(4): clean up stray white space

MFC after: 2 weeks

ARM64: Treat alignment faults as bus errors

Summary:
ARM64 currently treats all data abort exceptions as page faults. This
can cause infinite loops on non-page fault faults, such as alignment faults.

Since kernel-side alignment faults should be avoided, this adds support directly
to the el0 fault handler, instead of the data_abort() handler.

Test Plan: Tested on rpi3, with a misaligned ldm test.

Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D22133

ow(4): protocol timings can now be changed as sysctl-s / tunables

I limited potentially infinite timings by 960 us based on a footnote on
page 38 of Maxim Integrated Application Note 937, Book of iButton
Standards: "In order not to mask interrupt signalling by other devices
on the 1–Wire bus, tRSTL + tR should always be less than 960 us."

MFC after: 3 weeks

ow(4): increase regular mode recovery time, t_rec, to 15 us

Previously we used the minimal value of 1 us and it was really tight.
Application Note 3829 has a table describing recommended t_rec values
for various bus voltages, temperature conditions and numbers of slave
devices.  The new value decreases the maximum possible data rate from
16.3 Kbit/s to 13.3 Kbit/s, but it allows for up to four slaves on a
3.3V bus (under room temperature).

References:
- Maxim Integrated Application Note 3829
  Determining the Recovery Time for Multiple-Slave 1-Wire(R) Networks

- Maxim Integrated Application Note 937
  Book of iButton Standards

Discussed with: imp (D22108)
MFC after: 3 weeks

Allow exceptions to be masked when in userspace

We may want to mask exceptions when in userspace. This was previously
impossible as threads are created with all exceptions unmasked and
signals expected userspace to mask any. Fix these by copying the
mask state on thread creation and allow exceptions to be masked on
signal return, as long as they don't change.

Sponsored by: DARPA, AFRL

Allow the userspace ID register fields to be read from the kernel

To allow consistent values to be used in both the kernel and userspace
create a function for these to be read from the kernel. They use a newly
created macro with the name of the ID register to read. For now there is
redundant information in the user_regs array as it still holds the CRm and
Op2 values, however this will be fixed in a later change.

This will be used by ptrace to allow hardware breakpoints in userspace.

Sponsored by: DARPA, AFRL

Use a lowercase name for arm64 special registers so they don't conflict
with macros of the same name.

Sponsored by: DARPA, AFRL

Move the MRS instruction decode macros to armreg.h

These instructions are used to access the registers described in armreg.h,
and will be used in a future change to create a per-register identification
macro.

Sponsored by: DARPA, AFRL

Update the debug monitor handling to work after userspace has started

The debug monitor register state is now stored in a struct and updated
when required. Currently there is only a kernel state, however a
per-process state will be added in a future change.

Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22128

Use an array of handlers in the data and instruction aborts

Previously we would call data_abort on all data and instruction aborts
however this is incorrect for most abort types. Move to use an array
of function pointers to allow for more handlers to be easily added.

Reviewed by: jhibbits
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22170

Fix the armv8 crypto driver after r354170.

Sponsored by: DARPA, AFRL

There's nothing architecture specific in "options STATS"; move it from
sys/amd64/conf/NOTES to sys/conf/NOTES.

Suggested by: jhb@
Sponsored by: Klara Inc, Netflix

Add two files missed in r354170

Sponsored by: DARPA, AFRL

Rename the macros to extract a single arm64 ID field.

Because of the previous naming scheme the old ID_AA64PFR0_EL1 macro
collided with a potential macro for the register of the same name. To fix
this collision rename these macros.

Sponsored by: DARPA, AFRL

amd64: Fix typo: RDPRU bit is 0x10, not 0x04

Bit 4 != 4, of course.

X-MFC-With: r354162

amd64: Define and decode new AMD64 feature bits

These are documented in revisions 3.32 of the public AMD64 Vol. 2 and
revision 3.28 of Vol. 3, published October and September 2019, respectively.

FreeBSD'fy ZFS zlib zalloc/zfree callbacks.

The previous code came from OpenSolaris, which in my understanding require
allocation size to be known to free memory.  To store that size previous
code allocated additional 8 byte header.  But I have noticed that zlib
with present settings allocates 64KB context buffers for each call, that
could be efficiently cached by UMA, but addition of those 8 bytes makes
them fall back to physical RAM allocations, that cause huge overhead and
lock congestion on small blocks.  Since FreeBSD's free() does not have
the size argument, switching to it solves the problem, increasing write
speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads
test system from ~60MB/s to ~600MB/s.

MFC after: 1 week
Sponsored by: iXsystems, Inc.

Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22116

Use atomics and a shared object lock to protect the object reference count.

Certain consumers still need to guarantee a stable reference so we can not
switch entirely to atomics yet. Exclusive lock holders can still modify
and examine the refcount without using the ref api.

Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21598

Drop the object lock earlier in fault and don't relock it after pmap_enter().

Recent changes in object and page locking have enabled more lock pushdown.

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22036

Drop the object lock in vfs_bio and cluster where it is now safe to do so.

Recent changes to busy/valid/dirty have enabled page based synchronization
and the object lock is no longer required in many cases.

Reviewed by: kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21597

Fix column title alignment.

MFC after: 2 weeks

arm64: rockchip: typec_phy: Rename timeout to retry

Declare retry in the function scope.
Rename it to retry as there is a timeout function which was
causing to code to compile.

Reported by: jhibbits
MFC after: 1 month
X-MFC-WITH: r354089

libexecinfo test: Don't strip installed test

It turns out that a test of backtrace symbol resolution and formatting
requires symbols. Another option mightt be building with -rdynamic instead,
but this works for now.

Re-enabled skipped CI test, as it should now pass.

PR: 241562
Submitted by: lwhsu
Reported by: lwhsu
X-MFC-With: r354126, r354135, r354144

There is a long standing problem with multicast programming for NICs
and IPv6.  With IPv6 we may call if_addmulti() in context of processing
of an incoming packet.  Usually this is interrupt context.  While most
of the NIC drivers are able to reprogram multicast filters without
sleeping, some of them can't.  An example is e1000 family of drivers.
With iflib conversion the problem was somewhat hidden.  Iflib processes
packets in private taskqueue, so going to sleep doesn't trigger an
assertion.  However, the sleep would block operation of the driver and
following incoming packets would fill the ring and eventually would
start being dropped.  Enabling epoch for the full time of a packet
processing again started to trigger assertions for e1000.

Fix this problem once and for all using a general taskqueue to call
if_ioctl() method in all cases when if_addmulti() is called in a
non sleeping context.  Note that nobody cares about returned value.

Reviewed by: hselasky, kib
Differential Revision:   https://reviews.freebsd.org/D22154