Fail the attach on controller startup errors. For some reason the
dell xps 13 says there's I2C controller, but the controller appears
to be permanente disabled and will refuse to enable.
Conrad Meyer [Sun, 3 Nov 2019 19:36:34 +0000 (19:36 +0000)]
Take arm.arm (armv5) out of universe
It's on the chopping block in two months, the CI tinderbox doesn't bother with
it anymore either, and buildworld fails today due to an issue linking clang.
It's not worth investigating and it just eats up CPU cycles running universe
builds.
If a VM is flooded with more ingress packets than the guest OS
can handle, the current virtio-net code will keep reading those
packets and drop most of them as no space is available in the
receive queue. This is an undesirable receive livelock, which
is a waste of CPU and memory resources and potentially opens to
DoS attacks.
With this change, virtio-net uses the new netbe_rx_disable()
function to disable ingress operation in the backend while the
guest is short on RX buffers. Once the guest makes more buffers
available to the RX virtqueue, ingress operation is enabled again
by calling netbe_rx_enable().
Alan Cox [Sun, 3 Nov 2019 17:45:30 +0000 (17:45 +0000)]
Utilize ASIDs to reduce both the direct and indirect costs of context
switching. The indirect costs being unnecessary TLB misses that are
incurred when ASIDs are not used. In fact, currently, when we perform a
context switch on one processor, we issue a broadcast TLB invalidation that
flushes the TLB contents on every processor.
Mark all user-space ("ttbr0") page table entries with the non-global flag so
that they are cached in the TLB under their ASID.
Correct an error in pmap_pinit0(). The pointer to the root of the page
table was being initialized to the root of the kernel-space page table
rather than a user-space page table. However, the root of the page table
that was being cached in process 0's md_l0addr field correctly pointed to a
user-space page table. As long as ASIDs weren't being used, this was
harmless, except that it led to some unnecessary page table switches in
pmap_switch(). Specifically, other kernel processes besides process 0 would
have their md_l0addr field set to the root of the kernel-space page table,
and so pmap_switch() would actually change page tables when switching
between process 0 and other kernel processes.
Implement a workaround for Cavium erratum 27456 affecting ThunderX machines.
(I would like to thank andrew@ for providing the code to detect the affected
machines.)
Address integer overflow in the definition of TCR_ASID_16.
Setup TCR according to the PARange and ASIDBits fields from
ID_AA64MMFR0_EL1. Previously, TCR_ASID_16 was unconditionally set.
Modify build_l1_block_pagetable so that lower attributes, such as ATTR_nG,
can be specified as a parameter.
Eliminate some unused code.
Earlier versions were tested to varying degrees by: andrew, emaste, markj
Andrew Turner [Sun, 3 Nov 2019 15:42:08 +0000 (15:42 +0000)]
Add support for setting hardware breakpoints from ptrace on arm64.
Implement get/fill_dbregs on arm64. This is used by ptrace with the
PT_GETDBREGS and PT_SETDBREGS requests. It allows userspace to set hardware
breakpoints.
The struct dbreg is based on Linux to ease adding hardware breakpoint
support to debuggers.
Toomas Soome [Sun, 3 Nov 2019 11:09:06 +0000 (11:09 +0000)]
loader: calculate physical vdev psize from asize
Since physical device asize is calculated from psize and the asize is stored
in pool label, we can use asize to set the value of psize, which is used to
calculate the location of the pool labels.
Mark Johnston [Sun, 3 Nov 2019 03:23:27 +0000 (03:23 +0000)]
Downgrade the firmware images imported in r354201.
Version 43 requires further modifications to iwm(4), and this was not
caught in some initial testing. Version 34 works and is the version
available on Intel's web site.
MFC with: r354201
Sponsored by: The FreeBSD Foundation
Brandon Bergren [Sun, 3 Nov 2019 02:18:45 +0000 (02:18 +0000)]
powerpc: Add display of raw instruction values to x/I in ddb.
The "alternate format" character 'I' previously had the same behavior as
the "display as an instruction" character 'i'. With this change, it will now
prefix each disassembled instruction with the raw hex value.
As PowerPC instructions are always 32 bits and always aligned, and there are
no alternate modes that would affect instruction decoding or display, this
seemed to me to be the obvious interpretation of "alternate format".
Brandon Bergren [Sat, 2 Nov 2019 21:15:56 +0000 (21:15 +0000)]
Add support for building Book-E kernels with clang/lld.
This involved several changes:
* Since lld does not like text relocations, replace SMP boot page text relocs
in booke/locore.S with position-independent math, and track the virtual base
in the SMP boot page header.
* As some SPRs are interpreted differently on clang due to the way it handles
platform-specific SPRs, switch m*dear and m*esr mnemonics out for regular
m*spr. Add both forms of SPR_DEAR to spr.h so the correct encoding is selected.
* Change some hardcoded 32 bit things in the boot page to be pointer-sized, and
fix alignment.
* Fix 64-bit build of booke/pmap.c when enabling pmap debugging.
Additionally, I took the opportunity to document how the SMP boot page works.
Dimitry Andric [Sat, 2 Nov 2019 16:59:53 +0000 (16:59 +0000)]
Add __isnan()/__isnanf() aliases for compatibility with glibc and CUDA
Even though clang comes with a number of internal CUDA wrapper headers,
compiling sample CUDA programs will result in errors similar to:
In file included from <built-in>:1:
In file included from /usr/lib/clang/9.0.0/include/__clang_cuda_runtime_wrapper.h:204:
/usr/home/arr/cuda/var/cuda-repo-10-0-local-10.0.130-410.48/usr/local/cuda-10.0//include/crt/math_functions.hpp:2910:7: error: no matching function for call to '__isnan'
if (__isnan(a)) {
^~~~~~~
/usr/lib/clang/9.0.0/include/__clang_cuda_device_functions.h:460:16: note: candidate function not viable: call to __device__ function from __host__ function
__DEVICE__ int __isnan(double __a) { return __nv_isnand(__a); }
^
CUDA expects __isnan() and __isnanf() declarations to be available,
which are glibc specific extensions, equivalent to the regular isnan()
and isnanf().
To provide these, define __isnan() and __isnanf() as aliases of the
already existing static inline functions __inline_isnan() and
__inline_isnanf() from math.h.
Toomas Soome [Sat, 2 Nov 2019 12:28:04 +0000 (12:28 +0000)]
Remove duplicate lz4 implementations
Port illumos change: https://www.illumos.org/issues/11667
Move lz4.c out of zfs tree to opensolaris/common/lz4, adjust it to be
usable from kernel/stand/userland builds, so we can use just one single
source. Add lz4.h to declare lz4_compress() and lz4_decompress().
Toomas Soome [Sat, 2 Nov 2019 09:50:36 +0000 (09:50 +0000)]
loader: fall back to term_emu on efi console with serial backend
In case of efi console having serial backend (video + serial or only serial),
we need to stick with old emulator till we can draw console.
Eventually we would need to get console terminal emulator to be removed
from serial console because the serial link already has the terminal.
However, we need to implement comconsole on all efi platforms first, then
we need the ability to draw console, so we do not have to use SimpleTextOutput
protocol (which will write both on video and serial in case of multiplexed
ComOut).
Kyle Evans [Sat, 2 Nov 2019 04:01:39 +0000 (04:01 +0000)]
lualoader: rewrite try_include using lfs + dofile
Actual modules get require()'d in, rather than try_include(). All instances
of try_include should be provided with proper hooks/API in the rest of
loader to do the work they need to do, since we can't rely on them to exist.
Convert this now to lfs + dofile since we won't really be treating them as
modules.
lfs is required because dofile will properly throw an error if the file
doesn't exist, which is not in the spirit of 'optionally included'.
Getting out of the pcall game allows us to provide a loader.exit() style
call that backs out to the common bits of loader (autoboot sequence unless
disabled with a loader.setenv("autoboot_delay", "NO")). The most ideal way
identified so far to implement loader.exit() is to throw a special
abort-style error that indicates to the caller in interp_lua that we've not
actually errored out, just continue execution. Otherwise, we have to hack in
logic to bubble up and return from loader.lua without continuing further,
which gets kind of ugly depending on the context in which we're aborting.
A compat shim is provided temporarily in case the executing loader doesn't
yet have loader.lua_path, which was just added in r354246.
Kyle Evans [Sat, 2 Nov 2019 03:41:30 +0000 (03:41 +0000)]
liblua: add loader.lua_path
As described previously, loader.lua_path is absolute path where scripts are
installed. A future commit will use this to build paths for dofile in
try_include, rather than the current pcall/require setup that makes it more
difficult to coordinate loader aborts from local.lua -- we do not need the
flexibility of require(), and local.lua is in-fact not a 'module-like' file
as we will not be referencing anything from it.
Kyle Evans [Sat, 2 Nov 2019 03:37:58 +0000 (03:37 +0000)]
stand: consolidate knowledge of lua path
Multiple places coordinate to 'know' where lua scripts are installed. Knock
this down to being formally defined (and overridable) in exactly one spot,
defs.mk, and spread the knowledge to loaders and liblua alike. A future
commit will expose this to lua as loader.lua_path, so it can build absolute
paths to lua scripts as needed.
Warner Losh [Sat, 2 Nov 2019 02:05:09 +0000 (02:05 +0000)]
Make valdiate_rx_req_id static inline because it uses other static
inline functions. gcc complains about this, most likely due to
the subtle differences between inline and static inline functions
defined in headers.
Alexander Motin [Fri, 1 Nov 2019 22:49:44 +0000 (22:49 +0000)]
Some more taskqueue optimizations.
- Optimize enqueue for two task priority values by adding new tq_hint
field, pointing to the last task inserted into the middle of the list.
In case of more then two priority values it should halve average search.
- Move tq_active insert/remove out of the taskqueue_run_locked loop.
Instead of dirtying few shared cache lines per task introduce different
mechanism to drain active tasks, based on task sequence number counter,
that uses only cache lines already present in cache. Since the new
mechanism does not need ordering, switch tq_active from TAILQ to LIST.
- Move static and dynamic struct taskqueue fields into different cache
lines. Move lock into its own cache line, so that heavy lock spinning
by multiple waiting threads would not affect the running thread.
- While there, correct some TQ_SLEEP() wait messages.
This change fixes certain ZFS write workloads, causing huge congestion
on taskqueue lock. Those workloads combine some large block writes to
saturate the pool and trigger allocation throttling, which uses higher
priority tasks to requeue the delayed I/Os, with many small blocks to
generate deep queue of small tasks for taskqueue to sort.
Warner Losh [Fri, 1 Nov 2019 21:26:43 +0000 (21:26 +0000)]
We don't support configuring serial PCI cards in EFI. Make this clearer in the
source rather than obfuscaring it behind NO_PCI (nothing else declares that,
so it's not making the ifdefs clearer).
Leandro Lupori [Fri, 1 Nov 2019 11:28:43 +0000 (11:28 +0000)]
[PPC64] Fix GDB sigtramp detection
Current implementation of ppcfbsd_pc_in_sigtramp() seems to take only 32-bit
PowerPC in account, as on 64-bit PowerPC most kernel instruction addresses will
be wrongly reported as in sigtramp.
This change adds proper sigtramp detection for PPC64.
Kyle Evans [Fri, 1 Nov 2019 03:10:53 +0000 (03:10 +0000)]
mdmfs(8): add -k skel option to populate fs from a skeleton
mdmfs(8) lacks the ability to populate throwaway memory filesystems from an
existing directory.
This features permits an interesting setup where /var for instance lives on
a device where wear-leveling is something you want to avoid as much as
possible and nonetheless you don't want to lose your logs, ports metadata,
etc. Here are the steps:
1. Copy /var to /var.bak;
2. Mount an mfs into /var using -k /var.bak at startup;
3. Synchronize /var to /var.bak weekly and on shutdown.
Note that this more or less mimics OpenBSD's mount_mfs(8) -P flag.
Justin Hibbits [Fri, 1 Nov 2019 02:55:58 +0000 (02:55 +0000)]
powerpc/booke: Fix TLB1 entry accounting
It's possible, with per-CPU mappings, for TLB1 indices to get out of sync.
This presents a problem when trying to insert an entry into TLB1 of all
CPUs. Currently that's done by assuming (hoping) that the TLBs are
perfectly synced, and inserting to the same index for all CPUs. However,
with aforementioned private mappings, this can result in overwriting
mappings on the other CPUs.
An example:
CPU0 CPU1
<setup all mappings> <idle>
3 private mappings
kick off CPU 1
initialize shared mappings (3 indices low)
Load kernel module, triggers 20 new mappings
Sync mappings at N-3
initialize 3 private mappings.
At this point, CPU 1 has all the correct mappings, while CPU 0 is missing 3
mappings that were shared across to CPU 1. When CPU 0 tries to access
memory in one of the overwritten mappings, it hangs while tripping through
the TLB miss handler. Device mappings are not stored in any page table.
This fixes by introducing a '-1' index for tlb1_write_entry_int(), so each
CPU searches for an available index private to itself.
The valectl(4) program is used to manage vale(4) switches.
Add it to the system commands so that it can be used right away.
This program was previously called vale-ctl, and stored in
tools/tools/netmap
Brooks Davis [Thu, 31 Oct 2019 20:37:19 +0000 (20:37 +0000)]
Allow bsd.compat.mk to be reliably included outside Makefile.inc1.
Replace explicit TARGET_* variables with COMPAT_* versions defined based
on where the file is being included.
Also, require that bsd.compat.mk be included directly. It's not going to
be widely used so always loading it in bsd.prog.mk doesn't make sense.
Instead users can include it directly.
Marcin Wojtas [Thu, 31 Oct 2019 16:02:42 +0000 (16:02 +0000)]
Add support for ENA NETMAP partial initialization
In NETMAP mode not all queues need to be allocated to NETMAP. Some of
them could be left to the kernel. Configuration is managed by the flags
nr_mode and nr_pending_mode provided per each NETMAP kring.
ENA driver checks those flags and perform proper rings initialization.
Differential Revision: https://reviews.freebsd.org/D21937
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Marcin Wojtas [Thu, 31 Oct 2019 15:59:29 +0000 (15:59 +0000)]
Add support for ENA NETMAP Tx
Two new tables are added to ena_tx_buffer structure:
* netmap_map_seg stores DMA mapping structures,
* netmap_buf_idx stores buff indexes taken from the slots.
When Tx resources are being set, the new mapping structures are created
and netmap Tx rings are being reset.
When Tx resources are being released, used netmap bufs are unmapped from
DMA and then mapping structures are destroyed.
When Tx interrupt occurrs, ena_netmap_tx_irq is called.
ena_netmap_txsync callback signalizes that there are new packets which
should be transmitted.
First, it fills ena_netmap_ctx. Then it performs two actions:
* ena_netmap_tx_frames moves packets from netmap ring to NIC,
* ena_netmap_tx_cleanup restores buffers from NIC and gives them back
to the userspace app.
0 is returned in case of Tx error that could be handled by the driver.
ena_netmap_tx_frames checks if there are packets ready for transmission.
Then, for each of them, ena_netmap_tx_frame is called. If error occurs,
transmitting is stopped, but if the error was cause due to HW ring being
full, information about that is not propagated to the userspace app.
When all packets are ready, doorbell is written to NIC and netmap ring
state is updated.
Parsing of one packet is done by the ena_netmap_tx_frame function.
First, it checks if number of slots does not exceed NIC limit. Invalid
packets are being dropped and the error is propagated to the upper
layer. As each netmap buffer has equal size, which is typically greater
then 2KiB, there shouldn't be any packets which contain too many slots.
Then, the ena_com_tx_ctx structure is being filled. As netmap does not
support any hardware offloads, ena_com_tx_meta structure is set to zero.
After that, ena_netmap_map_slots maps all memory slots for DMA.
If the device works in the LLQ mode, the push header is being determined
by checking if the header fits within the first socket.
If so, the portion of data is being copied directly from the slot.
In other case, the data is copied to the intermediate buffer.
First slots are treated the same as as the others, because DMA mapping
has no impact on LLQ mode. Index of each netmap buffer is taken from
slot and stored in netmap_buf_idx array. In case of mapping error,
memory is unmapped and packets are put back to the netmap ring.
ena_netmap_tx_cleanup performs out of order cleanup of sent buffers.
First, req_id is taken and is validated. As validate_tx_req_id from
ena.c is specific to kernels mbuf, another implementation is provided.
Each req_id is cleaned up by ena_netmap_tx_clean_one function. Buffers
are being unmaped from DMA and put back to netmap ring. In the end,
state of netmap and NIC rings are being updated.
Differential Revision: https://reviews.freebsd.org/D21936
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Marcin Wojtas [Thu, 31 Oct 2019 15:57:44 +0000 (15:57 +0000)]
Add support for ENA NETMAP Rx
Most of code used for Rx ring initialization could be reused in NETMAP.
Reset of NETMAP ring and new alloc method was added. Driver decides if
use kernels mbufs or NETMAPs slots based on IFCAP_NETMAP flag. It
allows to reuse ena_refill_rx_bufs, which provides proper handling of
Rx out of order completion.
ena_netmap_alloc_rx_slot takes exactly the same arguments as
ena_alloc_rx_mbuf, but instead of allocating one mbuf it takes one slot
from NETMAP ring. Based on queue id proper netmap_ring is found. As
NETMAP provides the "partial opening" feature not all of the rings are
avaiable. Not used points to invalid ring. If there is available slot,
it is taken from the ring. Its buffer is mapped to DMA and its index is
stored in ena_rx_buffer field in ena_rx_buffer structure. Then ena_buf
is filled with addresses and ring state is updated.
Cleanup is handled by ena_netmap_free_rx_slot. It unmaps DMA and returns
buffer to ring. As we could not return more bufs than we have taken and
we should not override occupied slots, buf_index should be 0. It is
being checked by assertion.
ena_netmap_rxsync callback puts received packets back to NETMAP ring and
passes them to user space by updating ring pointers. First it fills
ena_netmap_ctx.
Then it performs two actions:
* ena_netmap_rx_frames moves received frames from NIC to NETMAP ring,
* ena_netmap_rx_cleanup fills NIC ring with slots released by userspace
app.
In case of Rx error that could be handled by NIC driver (for example by
performing reset) rx sync should return 0.
ena_netmap_rx_frames first checks if NETMAP ring is in consistent
state and then in the loop receives new frames. When all available
frames are taken nr_hwtail is updated.
Receiving one frame is handled by ena_netmap_rx_frame. If no error
occurrs, each Descriptor is loaded by ena_netmap_rx_load_desc function.
If packets take more than one segments NS_MOREFRAG flag must be set in
all, but not last slot. In case of wrong req_id packet is removed from
NETMAP ring. If packet is successful received counters are updated.
Refiling of NIC ring is performed by ena_netmap_rx_cleanup function.
It calculates number of available slots and call ena_refill_rx_bufs with
proper number.
Differential Revision: https://reviews.freebsd.org/D21935
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Marcin Wojtas [Thu, 31 Oct 2019 15:51:18 +0000 (15:51 +0000)]
Introduce NETMAP support in ENA
Mock implementation of NETMAP routines is located in ena_netmap.c/.h
files. All code is protected under the DEV_NETMAP macro. Makefile was
updated with files and flag.
As ENA driver provide own implementations of (un)likely it must be
undefined before including NETMAP headers.
ena_netmap_attach function is called on the end of NIC attach. It fills
structure with NIC configuration and callbacks. Then provides it to
netmap_attach. Similarly netmap_detach is called during ena_detach.
Three callbacks are used.
nm_register is implemented by ena_netmap_reg. It is called when user
space application open or close NIC in NETMAP mode. Current action is
recognized based on onoff parameter: true means on and false off. As
NICs rings need to be reconfigured ena_down and ena_up are reused.
When user space application wants to receive new packets from NIC
nm_rxsync is called, and when there are new packets ready for Tx
nm_txsync is called.
Differential Revision: https://reviews.freebsd.org/D21934
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Marcin Wojtas [Thu, 31 Oct 2019 15:44:26 +0000 (15:44 +0000)]
Split Rx/Tx from initialization code in ENA driver
Move Rx/Tx routines to separate file.
Some functions:
* ena_restore_device,
* ena_destroy_device,
* ena_up,
* ena_down,
* ena_refill_rx_bufs
could be reused in upcoming netmap code in the driver. To make it
possible, they were moved to ena.h header.
Marcin Wojtas [Thu, 31 Oct 2019 15:39:54 +0000 (15:39 +0000)]
Fix ENA keep-alive timeout due to prolonged reset
When the ENA_FLAG_DEVICE_RUNNING flag is disabled, the AENQ handlers
aren't executed. To fix that, the watchdog timestamp should be updated
just before enabling the watchdog.
Timer service was always being enabled, even if the device wasn't up
before the reset. That shouldn't happen, as the timer service is being
executed only for working interface.
Differential Revision: https://reviews.freebsd.org/D21932
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Marcin Wojtas [Thu, 31 Oct 2019 15:16:10 +0000 (15:16 +0000)]
Fix pmap_change_attr() on arm64 to allow KV addresses
Altough in the comment above the pmap_change_attr() it was mentioned
that VA could be in KV or DMAP memory space. However,
pmap_change_attr_locked() was accepting only the values inside the DMAP
memory range.
To fix that, the condition check was changed so also the va inside the
KV memory range would be accepted.
The sample use case that wasn't supported is the PCI Device that has the
BAR which should me mapped with the Write Combine attribute - for
example BAR2 of the ENA network controller on the A1 instances on AWS.
Tested on A1 AWS instance and changed ENA BAR2 mapped resource to be
write-combined memory region.
Differential Revision: https://reviews.freebsd.org/D22055
MFC after: 2 weeks
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Leandro Lupori [Thu, 31 Oct 2019 12:03:47 +0000 (12:03 +0000)]
Fix GDB machdep code for PPC/PPC64
There was a couple issues with GDB machdep code for PPC/PPC64, the main ones being:
- wrong register sizes being returned
- pcb_context index was wrong (this affects all PPC variants)
Andriy Gapon [Thu, 31 Oct 2019 11:31:13 +0000 (11:31 +0000)]
iicbb: allow longer SCL low timeout and other improvements
First, SCL low timeout is set to 25 milliseconds by default as opposed
to 1 millisecond before. The new value is based on the SMBus
specification. The timeout can be changed on a per bus basis using
dev.iicbb.N.scl_low_timeout sysctl.
The driver uses DELAY to wait for high SCL up to 1 millisecond, then it
switches to pause_sbt(SBT_1MS) for the rest of the timeout.
While here I made a number of other changes. 'udelay' that's used for
timing clock and data signals is now calculated based on the requested
bus frequency (dev.iicbus.N.frequency) instead of being hardcoded to 10
microseconds. The calculations are done in such a fashion that the
default bus frequency of 100000 is converted to udelay of 10 us. This
is for backward compatibility. The actual frequency will be less than a
quarter (I think) of the requested frequency.
Also, I added detection of stuck low SCL in a few places. Previously,
the code would just carry on after the SCL low timeout and that might
potentially lead to misinterpreted bits.
Finally, I fixed several style issues near the code that I changed.
Many more are still remaining.
Tested by accessing HTU21 temperature and humidity sensor in this setup:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
gpio1: <Nuvoton GPIO controller> at GPIO ldn 0x07 on superio0
pcib0: allocated type 4 (0x220-0x226) for rid 0 of gpio1
gpiobus1: <GPIO bus> on gpio1
gpioiic0: <GPIO I2C bit-banging driver> at pins 14-15 on gpiobus1
gpioiic0: SCL pin: 14, SDA pin: 15
iicbb0: <I2C bit-banging driver> on gpioiic0
iicbus0: <Philips I2C bus> on iicbb0 master-only
iic0: <I2C generic I/O> on iicbus0
Ilya Bakulin [Wed, 30 Oct 2019 20:43:27 +0000 (20:43 +0000)]
Use the new cam_sim_alloc_dev function to properly initialize SIM
Using cam_sim_alloc_dev() allows to properly set sim_dev field so that
sdiob(4) can attach to the CAM device that represents SDIO card.
The same change for SDHCI driver happened in r348800.
Andrew Turner [Wed, 30 Oct 2019 17:32:35 +0000 (17:32 +0000)]
Set the userspace execute never bit on kernel mappings.
Arm64 allows us to create execute only mappings. To make sure userspace is
unable to accidentally execute kernel code set the user execute never
bit in the kernel page tables.
Warner Losh [Wed, 30 Oct 2019 17:18:11 +0000 (17:18 +0000)]
Make hyperv keyboard work again.
r351049 bogusly deleted these lines from files.amd64 but failed to add them to
files.x86. Since this works on i386, add them to files.x86 rather than just
adding them back to files.amd64.
Justin Hibbits [Wed, 30 Oct 2019 15:30:40 +0000 (15:30 +0000)]
ARM64: Treat alignment faults as bus errors
Summary:
ARM64 currently treats all data abort exceptions as page faults. This
can cause infinite loops on non-page fault faults, such as alignment faults.
Since kernel-side alignment faults should be avoided, this adds support directly
to the el0 fault handler, instead of the data_abort() handler.
Test Plan: Tested on rpi3, with a misaligned ldm test.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D22133
Andriy Gapon [Wed, 30 Oct 2019 15:26:41 +0000 (15:26 +0000)]
ow(4): protocol timings can now be changed as sysctl-s / tunables
I limited potentially infinite timings by 960 us based on a footnote on
page 38 of Maxim Integrated Application Note 937, Book of iButton
Standards: "In order not to mask interrupt signalling by other devices
on the 1–Wire bus, tRSTL + tR should always be less than 960 us."
Andriy Gapon [Wed, 30 Oct 2019 15:15:53 +0000 (15:15 +0000)]
ow(4): increase regular mode recovery time, t_rec, to 15 us
Previously we used the minimal value of 1 us and it was really tight.
Application Note 3829 has a table describing recommended t_rec values
for various bus voltages, temperature conditions and numbers of slave
devices. The new value decreases the maximum possible data rate from
16.3 Kbit/s to 13.3 Kbit/s, but it allows for up to four slaves on a
3.3V bus (under room temperature).
References:
- Maxim Integrated Application Note 3829
Determining the Recovery Time for Multiple-Slave 1-Wire(R) Networks
- Maxim Integrated Application Note 937
Book of iButton Standards
Andrew Turner [Wed, 30 Oct 2019 14:05:50 +0000 (14:05 +0000)]
Allow exceptions to be masked when in userspace
We may want to mask exceptions when in userspace. This was previously
impossible as threads are created with all exceptions unmasked and
signals expected userspace to mask any. Fix these by copying the
mask state on thread creation and allow exceptions to be masked on
signal return, as long as they don't change.
Andrew Turner [Wed, 30 Oct 2019 13:45:40 +0000 (13:45 +0000)]
Allow the userspace ID register fields to be read from the kernel
To allow consistent values to be used in both the kernel and userspace
create a function for these to be read from the kernel. They use a newly
created macro with the name of the ID register to read. For now there is
redundant information in the user_regs array as it still holds the CRm and
Op2 values, however this will be fixed in a later change.
This will be used by ptrace to allow hardware breakpoints in userspace.
Andrew Turner [Wed, 30 Oct 2019 12:33:36 +0000 (12:33 +0000)]
Move the MRS instruction decode macros to armreg.h
These instructions are used to access the registers described in armreg.h,
and will be used in a future change to create a per-register identification
macro.
Andrew Turner [Wed, 30 Oct 2019 10:51:24 +0000 (10:51 +0000)]
Update the debug monitor handling to work after userspace has started
The debug monitor register state is now stored in a struct and updated
when required. Currently there is only a kernel state, however a
per-process state will be added in a future change.
Andrew Turner [Wed, 30 Oct 2019 10:42:52 +0000 (10:42 +0000)]
Use an array of handlers in the data and instruction aborts
Previously we would call data_abort on all data and instruction aborts
however this is incorrect for most abort types. Move to use an array
of function pointers to allow for more handlers to be easily added.
Andrew Turner [Wed, 30 Oct 2019 10:06:57 +0000 (10:06 +0000)]
Rename the macros to extract a single arm64 ID field.
Because of the previous naming scheme the old ID_AA64PFR0_EL1 macro
collided with a potential macro for the register of the same name. To fix
this collision rename these macros.
Alexander Motin [Tue, 29 Oct 2019 21:25:19 +0000 (21:25 +0000)]
FreeBSD'fy ZFS zlib zalloc/zfree callbacks.
The previous code came from OpenSolaris, which in my understanding require
allocation size to be known to free memory. To store that size previous
code allocated additional 8 byte header. But I have noticed that zlib
with present settings allocates 64KB context buffers for each call, that
could be efficiently cached by UMA, but addition of those 8 bytes makes
them fall back to physical RAM allocations, that cause huge overhead and
lock congestion on small blocks. Since FreeBSD's free() does not have
the size argument, switching to it solves the problem, increasing write
speed to ZVOLs with 4KB block size and GZIP compression on my 40-threads
test system from ~60MB/s to ~600MB/s.
Jeff Roberson [Tue, 29 Oct 2019 20:58:46 +0000 (20:58 +0000)]
Use atomics and a shared object lock to protect the object reference count.
Certain consumers still need to guarantee a stable reference so we can not
switch entirely to atomics yet. Exclusive lock holders can still modify
and examine the refcount without using the ref api.
Conrad Meyer [Tue, 29 Oct 2019 18:24:36 +0000 (18:24 +0000)]
libexecinfo test: Don't strip installed test
It turns out that a test of backtrace symbol resolution and formatting
requires symbols. Another option mightt be building with -rdynamic instead,
but this works for now.
Re-enabled skipped CI test, as it should now pass.
Gleb Smirnoff [Tue, 29 Oct 2019 17:36:06 +0000 (17:36 +0000)]
There is a long standing problem with multicast programming for NICs
and IPv6. With IPv6 we may call if_addmulti() in context of processing
of an incoming packet. Usually this is interrupt context. While most
of the NIC drivers are able to reprogram multicast filters without
sleeping, some of them can't. An example is e1000 family of drivers.
With iflib conversion the problem was somewhat hidden. Iflib processes
packets in private taskqueue, so going to sleep doesn't trigger an
assertion. However, the sleep would block operation of the driver and
following incoming packets would fill the ring and eventually would
start being dropped. Enabling epoch for the full time of a packet
processing again started to trigger assertions for e1000.
Fix this problem once and for all using a general taskqueue to call
if_ioctl() method in all cases when if_addmulti() is called in a
non sleeping context. Note that nobody cares about returned value.