mhorne [Fri, 15 Nov 2019 03:34:27 +0000 (03:34 +0000)]
RISC-V: add support for SBI spec v0.2
The Supervisor Binary Interface (SBI) specification v0.2 is a backwards
incompatible update to the SBI call interface for kernels running in
supervisor mode. The goal of this update was to make it easier for new
and optional functionality to be added to the SBI.
SBI functions are now called by passing an "extension ID" and a
"function ID" which are passed in a7 and a6 respectively. SBI calls
will also return an error and value in the following struct:
struct sbi_ret {
long error;
long value;
}
This version introduces several new functions under the "base"
extension. It is expected that all SBI implementations >= 0.2 will
support this base set of functions, as they implement some essential
services such as obtaining the SBI version, CPU implementation info, and
extension probing.
Existing SBI functions have been designated as "legacy". For the time
being they will remain implemented, but it is expected that in the
future their functionality will be duplicated or replaced by new SBI
extensions. Each legacy function has been assigned its own extension ID,
and for now we simply probe and assert for their existence.
Compatibility with legacy SBI implementations (such as BBL) is
maintained by checking the output of sbi_get_spec_version(). This
function is guaranteed to succeed by the new spec, but will return an
error in legacy implementations. We use this as an indicator of whether
or not we can rely on the new SBI base extensions.
For further info on the Supervisor Binary Interface, see:
https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc
mhorne [Fri, 15 Nov 2019 03:22:08 +0000 (03:22 +0000)]
RISC-V: pass arg6 in sbi_call
Allow for an additional argument to sbi_call which will be passed in a6.
This is required for SBI spec 0.2 support, as a6 will indicate the SBI
function ID.
While here, introduce some macros to clean up the calls.
mhorne [Fri, 15 Nov 2019 03:18:11 +0000 (03:18 +0000)]
plic: support irq distribution
Our PLIC implementation only enables interrupts on the boot cpu.
Implement plic_bind_intr() so that they can be redistributed near the
end of boot during intr_irq_shuffle().
This also slightly modifies how enable bits are handled in an attempt to
better fit the PIC interface. plic_enable_intr()/plic_disable_intr() are
converted to manage an interrupt source's threshold value, since this
value can be used as to globally enable/disable an irq. All handing of the
per-context enable bits is moved to the new methods plic_setup_intr()
and plic_bind_intr().
mhorne [Fri, 15 Nov 2019 03:15:14 +0000 (03:15 +0000)]
plic: fix context calculation
The RISC-V PLIC (platform level interrupt controller) registers are divided up
by "context", which is purposefully left ambiguous in the PLIC spec. Currently
we assume each CPU number corresponds 1-to-1 with a context number, but that is
not correct. Most existing PLIC implementations (such as SiFive's) have
multiple contexts per-cpu. For example, a single CPU might have a context for
machine mode interrupts and a context for supervisor mode interrupts. To
complicate things further, FreeBSD renumbers the CPUs during boot, but the PLIC
driver still assumes that CPU ID equals the RISC-V hart number, meaning
interrupt enables/claims might be performed for the wrong context registers.
To fix this, we must calculate each CPU's context number during
attachment. This is done by reading the interrupt properties from the
device tree, from which a mapping from context to RISC-V hart to CPU
number can be created.
jpaetzel [Thu, 14 Nov 2019 23:31:20 +0000 (23:31 +0000)]
Add the pvscsi driver to the tree.
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi. The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.
Error handling in this driver has not been extensively tested
yet.
jhibbits [Thu, 14 Nov 2019 21:58:40 +0000 (21:58 +0000)]
Boot arm64 kernel using booti command from U-boot.
Summary:
Boot arm64 kernel using booti command from U-boot. booti can relocate initrd
image into higher ram addresses, therefore align the initrd load address to 1GiB
and create VA = PA map for it. Create L2 pagetable entries to copy the initrd
image into KVA.
(parts of the code in https://reviews.freebsd.org/D13861 was referred and used
as appropriate)
Submitted by: Siddharth Tuli <siddharthtuli_gmail.com>
Reviewed by: manu
Sponsored by: Juniper Networks, Inc
Differential Revision: https://reviews.freebsd.org/D22255
kevans [Thu, 14 Nov 2019 18:38:56 +0000 (18:38 +0000)]
arm64: busdma_bounce: fix BUS_DMA_ALLOCNOW for non-paged aligned sizes
For any size that isn't page-aligned, we end up not pre-allocating enough
for a single mapping because we truncate the size instead of rounding up to
make sure the last bit is accounted for, leaving us one page shy of what we
need to fulfill a request.
ian [Thu, 14 Nov 2019 16:46:27 +0000 (16:46 +0000)]
Rewrite arm/stack_machdep.c for EABI; add stack(9) support to arm kernels.
The old stack_machdep.c code was written for the APCS ABI (aka "oldabi").
When we switched to ARM EABI (back in freebsd 10) this file never got
updated, and apparently nobody noticed that until now.
The new implementation uses the same stack unwinder code used by the
arm implemenation of the db_trace stuff.
bdragon [Thu, 14 Nov 2019 04:34:17 +0000 (04:34 +0000)]
powerpc: Kernel fixes for ppc32 and powerpcspe w/ lld
Fix wrong section ordering that was causing a ".got is not contiguous with
other relro sections" lld error. This also brings ldscript.powerpc and
ldscript.powerpcspe closer to ldscript.powerpc64.
Also, remove unnecessary text relocs from the ppc32 AIM trap code.
kib [Wed, 13 Nov 2019 22:39:46 +0000 (22:39 +0000)]
amd64: only set PCB_FULL_IRET pcb flag when #gp or similar exception comes
from usermode.
If CPU supports RDFSBASE, the flag also means that userspace fsbase
and gsbase are already written into pcb, which might be not true when
we handle #gp from kernel.
The offender is rdmsr_safe(), and the visible result is corrupted
userspace TLS base.
Reported by: pstef
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
jhb [Wed, 13 Nov 2019 21:49:46 +0000 (21:49 +0000)]
Refine r354661 to unbreak the GCC_BOOTSTRAP case.
MK_CLANG_IS_CC controls installing links for GCC, not just clang. Set
MK_CLANG_IS_CC to the value of MK_CLANG_BOOTSTRAP. This will leave it
as "no" if no bootstrap compiler is being built or GCC 4.2.1 is being
used as the bootstrap compiler, and "yes" if clang is being used as
the bootstrap compiler.
Submitted by: bdrewery (kind of, he suggested this on IRC while I was
testing the original patch)
Reviewed by: kevans, imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22350
trasz [Wed, 13 Nov 2019 20:27:38 +0000 (20:27 +0000)]
Add 'linux_mounts_enable' rc.conf(5) variable, to make it possible
to disable mounting Linux-specific filesystems under /compat/linux
when 'linux_enable' is set to YES.
Reviewed by: netchild, ian (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22320
kevans [Wed, 13 Nov 2019 18:21:06 +0000 (18:21 +0000)]
ssp: further refine the conditional used for constructor priority
__has_attribute(__constructor__) is a better test for clang than
defined(__clang__). Switch to it instead.
While we're already here and touching it, pfg@ nailed down when GCC actually
introduced the priority argument -- 4.3. Use that instead of our
hammer-guess of GCC >= 5 for the sake of correctness.
dougm [Wed, 13 Nov 2019 15:56:07 +0000 (15:56 +0000)]
Define wrapper functions vm_map_entry_{succ,pred} to act as wrappers
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.
Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
bz [Wed, 13 Nov 2019 12:05:48 +0000 (12:05 +0000)]
nd6 defrouter: consolidate nd_defrouter manipulations in nd6_rtr.c
Move the nd_defrouter along with the sysctl handler from nd6.c to
nd6_rtr.c and make the variable file static. Provide (temporary)
new accessor functions for code manipulating nd_defrouter from nd6.c,
and stop exporting functions no longer needed outside nd6_rtr.c.
This also shuffles a few functions around in nd6_rtr.c without
functional changes.
Given all nd_defrouter logic is now in one place we can tidy up the
code, locking and, and other open items.
bz [Wed, 13 Nov 2019 11:21:02 +0000 (11:21 +0000)]
lltabl: remove dead code
Remove the long (8? years ago) #if 0 marked function lltable_drain() and
while here also remove the unused function llentry_alloc() which has call
paths tools keep finding and are never used.
kevans [Wed, 13 Nov 2019 03:00:32 +0000 (03:00 +0000)]
ssp: rework the logic to use priority=200 on clang builds
The preproc logic was added at the last minute to appease GCC 4.2, and
kevans@ did clearly not go back and double-check that the logic worked out
for clang builds to use the new variant.
It turns out that clang defines __GNUC__ == 4. Flip it around and check
__clang__ as well, leaving a note to remove it later.
jhibbits [Wed, 13 Nov 2019 02:22:00 +0000 (02:22 +0000)]
powerpc64: Don't guard ISA 3.0 partition table setup with hw_direct_map
PowerISA 3.0 eliminated the 64-bit bridge mode which allowed 32-bit kernels
to run on 64-bit AIM/Book-S hardware. Since therefore only a 64-bit kernel
can run on this hardware, and 64-bit native always has the direct map, there
is no need to guard it.
kevans [Wed, 13 Nov 2019 02:14:17 +0000 (02:14 +0000)]
ssp: add a priority to the __stack_chk_guard constructor
First, this commit is a NOP on GCC <= 4.x; this decidedly doesn't work
cleanly on GCC 4.2, and it will be gone soon anyways so I chose not to dump
time into figuring out if there's a way to make it work. xtoolchain-gcc,
clocking in as GCC6, can cope with it just fine and later versions are also
generally ok with the syntax. I suspect very few users are running GCC4.2
built worlds and also experiencing potential fallout from the status quo.
For dynamically linked applications, this change also means very little.
rtld will run libc ctors before most others, so the situation is
approximately a NOP for these as well.
The real cause for this change is statically linked applications doing
almost questionable things in their constructors. qemu-user-static, for
instance, creates a thread in a global constructor for their async rcu
callbacks. In general, this works in other places-
- On OpenBSD, __stack_chk_guard is stored in an .openbsd.randomdata section
that's initialized by the kernel in the static case, or ld.so in the
dynamic case
- On Linux, __stack_chk_guard is apparently stored in TLS and such a problem
is circumvented there because the value is presumed stable in the new
thread.
On FreeBSD, the rcu thread creation ctor and __guard_setup are both unmarked
priority. qemu-user-static spins up the rcu thread prior to __guard_setup
which starts making function calls- some of these are sprinkled with the
canary. In the middle of one of these functions, __guard_setup is invoked in
the main thread and __stack_chk_guard changes- qemu-user-static is promptly
terminated for an SSP violation that didn't actually happen.
This is not an all-too-common problem. We circumvent it here by giving the
__stack_chk_guard constructor a solid priority. 200 was chosen because that
gives static applications ample range (down to 101) for working around it
if they really need to. I suspect most applications will "just work" as
expected- the default/non-prioritized flavor of __constructor__ functions
run last, and the canary is generally not expected to change as of this
point at the very least.
This took approximately three weeks of spare time debugging to pin down.
imp [Wed, 13 Nov 2019 01:58:43 +0000 (01:58 +0000)]
Fix a race between daopen and damediapoll
When we do a daopen, we call dareprobe and wait for the results. The repoll runs
the da state machine up through the DA_STATE_RC* and then exits.
For removable media, we poll the device every 3 seconds with a TUR to see if it
has disappeared. This introduces a race. If the removable device has lots of
partitions, and if it's a little slow (like say a USB2 connected USB stick),
then we can have a fair amount of time that this reporbe is going on for. If,
during that time, damediapoll fires, it calls daschedule which changes the
scheduling priority from NONE to NORMAL. When that happens, the careful single
stepping in the da state machine is disrupted and we wind up sceduling multiple
read capacity calls. The first one succeeds and releases the reference. The
second one succeeds and releases the reference (and panics if the right code is
compiled into the da driver).
To avoid the race, only do the TUR calls while in state normal, otherwise just
reschedule damediapoll. This prevents the race from happening.
jhb [Wed, 13 Nov 2019 00:53:45 +0000 (00:53 +0000)]
Create a file to hold shared routines for dealing with T6 key contexts.
ccr(4) and TLS support in cxgbe(4) construct key contexts used by the
crypto engine in the T6. This consolidates some duplicated code for
helper functions used to build key contexts.
jhb [Tue, 12 Nov 2019 21:29:52 +0000 (21:29 +0000)]
Force MK_CLANG_IS_CC on in XMAKE.
This ensures that a bootstrap clang compiler is always installed as cc
in WORLDTMP. If it is only installed as 'clang' then /usr/bin/cc is
used during the build instead of the bootstrap compiler.
vmaffione [Tue, 12 Nov 2019 21:07:51 +0000 (21:07 +0000)]
bhyve: rework mevent processing to fix a race condition
At the end of both mevent_add() and mevent_update(), mevent_notify()
is called to wakeup the I/O thread, that will call kevent(changelist)
to update the kernel.
A race condition is possible where the client calls mevent_add() and
mevent_update(EV_ENABLE) before the I/O thread has the chance to wake
up and call mevent_build()+kevent(changelist) in response to mevent_add().
The mevent_add() is therefore ignored by the I/O thread, and
kevent(fd, EV_ENABLE) is called before kevent(fd, EV_ADD), resuliting
in a failure of the kevent(fd, EV_ENABLE) call.
kib [Tue, 12 Nov 2019 18:01:33 +0000 (18:01 +0000)]
Workaround for Intel SKL002/SKL012S errata.
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs. For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs. The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.
Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.
Reviewed by: alc, emaste, markj, scottl
Security: CVE-2018-12207
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21884
scottph [Tue, 12 Nov 2019 16:24:37 +0000 (16:24 +0000)]
nvdimm(4): Fix various problems when the using the second label index block
struct nvdimm_label_index is dynamically sized, with the `free`
bitfield expanding to hold `slot_cnt` entries. Fix a few places
where we were treating the struct as though it had a fixed sized.
scottph [Tue, 12 Nov 2019 15:50:30 +0000 (15:50 +0000)]
nvdimm(4): Only expose namespaces for accessible data SPAs
Apply the same user accessible filter to namespaces as is applied
to full-SPA devices. Also, explicitly filter out control region
SPAs which don't expose the nvdimm data area.
bz [Tue, 12 Nov 2019 15:46:28 +0000 (15:46 +0000)]
netinet*: update *mp to pass the proper value back
In ip6_[direct_]input() we are looping over the extension headers
to deal with the next header. We pass a pointer to an mbuf pointer
to the handling functions. In certain cases the mbuf can be updated
there and we need to pass the new one back. That missing in
dest6_input() and route6_input(). In tcp6_input() we should also
update it before we call tcp_input().
In addition to that mark the mbuf NULL all the times when we return
that we are done with handling the packet and no next header should
be checked (IPPROTO_DONE). This will eventually allow us to assert
proper behaviour and catch the above kind of errors more easily,
expecting *mp to always be set.
This change is extracted from a larger patch and not an exhaustive
change across the entire stack yet.
bz [Tue, 12 Nov 2019 13:57:17 +0000 (13:57 +0000)]
netstat: igmp stats, error on unexpected information, not only warn
The igmp stats tend to print two lines of warning for an unexpected
version and length. Despite an invalid version and struct size it
continues to try to do something with the data. Do not try to parse
the remainder of the struct and error on warning.
Note the underlying issue of the data not being available properly
is still there and needs to be fixed seperately.
royger [Tue, 12 Nov 2019 10:31:28 +0000 (10:31 +0000)]
xen: fix dispatching of NMIs
Currently NMIs are sent over event channels, but that defeats the
purpose of NMIs since event channels can be masked. Fix this by
issuing NMIs using a hypercall, which injects a NMI (vector #2) to the
desired vCPU.
Note that NMIs could also be triggered using the emulated local APIC,
but using a hypercall is better from a performance point of view
since it doesn't involve instruction decoding when not using x2APIC
mode.
Reported and Tested by: avg
Sponsored by: Citrix Systems R&D
tsoome [Tue, 12 Nov 2019 10:02:39 +0000 (10:02 +0000)]
reverting r354594
In our case the structure is more complex and simple static initializer
will upset compiler diagnostics - using memset is still better than building
more complext initializer.
karels [Tue, 12 Nov 2019 01:03:08 +0000 (01:03 +0000)]
Fix netstat -gs with ip_mroute module and/or vnet
The code for "netstat -gs -f inet" failed if the kernel namelist did not
include the _mrtstat symbol. However, that symbol is not in a standard
kernel even with the ip_mroute module loaded, where the functionality is
available. It is also not in a kernel with MROUTING but also VIMAGE, as
there can be multiple sets of stats. However, when running the command
on a live system, the symbol is not used; a sysctl is used. Go ahead
and try the sysctl in any case, and complain that IPv4 MROUTING is not
present only if the sysctl fails with ENOENT. Also fail if _mrtstat is
not defined when running on a core file; netstat doesn't know about vnets,
so can only work if MROUTING was included, and VIMAGE was not.
kib [Mon, 11 Nov 2019 21:59:20 +0000 (21:59 +0000)]
amd64: Issue MFENCE on context switch on AMD CPUs when reusing address space.
On some AMD CPUs, in particular, machines that do not implement
CLFLUSHOPT but do provide CLFLUSH, the CLFLUSH instruction is only
synchronized with MFENCE.
Code using CLFLUSH typicall needs to brace it with MFENCE both before
and after flush, see for instance pmap_invalidate_cache_range(). If
context switch occurs while inside the protected region, we need to
ensure visibility of flushes done on the old CPU, to new CPU.
For all other machines, locked operation done to lock switched thread,
should be enough. For case of different address spaces, reload of
%cr3 is serializing.
Reviewed by: cem, jhb, scottph
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22007
markj [Mon, 11 Nov 2019 20:44:30 +0000 (20:44 +0000)]
Fix handling of PIPE_EOF in the direct write path.
Suppose a writing thread has pinned its pages and gone to sleep with
pipe_map.cnt > 0. Suppose that the thread is woken up by a signal (so
error != 0) and the other end of the pipe has simultaneously been
closed. In this case, to satisfy the assertion about pipe_map.cnt in
pipe_destroy_write_buffer(), we must mark the buffer as empty.
avg [Mon, 11 Nov 2019 19:06:04 +0000 (19:06 +0000)]
db_nextframe/i386: reduce the number of special frame types
This change removes TRAP_INTERRUPT and TRAP_TIMERINT frame types.
Their names are a bit confusing: trap + interrupt, what is that?
The TRAP_TIMERINT name is too specific -- can it only be used for timer
"trap-interrupts"? What is so special about them?
My understanding of the code is that INTERRUPT, TRAP_INTERRUPT and
TRAP_TIMERINT differ only in how an offset from callee's frame pointer to a
trap frame on the stack is calculated. And that depends on a number of
arguments that a special handler passes to a callee (a function with a
normal C calling convention).
So, this change makes that logic explicit and collapses all interrupt frame
types into the INTERRUPT type.
vangyzen [Mon, 11 Nov 2019 17:41:52 +0000 (17:41 +0000)]
tip/cu: check for EOF on input on the local side
If cu reads an EOF on the input side, it goes into a tight loop
sending a garbage byte to the remote. With this change, it exits
gracefully, along with its child.
imp [Mon, 11 Nov 2019 17:36:57 +0000 (17:36 +0000)]
Add asserts for some state transitions
For the PROBEWP and PROBERC* states, add assertiosn that both the da device
state is in the right state, as well as the ccb state is the right one when we
enter dadone_probe{wp,rc}. This will ensure that we don't sneak through when
we're re-probing the size and write protection status of the device and thereby
leak a reference which can later lead to an invalidated peripheral going away
before all references are released (and resulting panic).
Reviewed by: scottl, ken
Differential Revision: https://reviews.freebsd.org/D22295
imp [Mon, 11 Nov 2019 17:36:52 +0000 (17:36 +0000)]
Update the softc state of the da driver before releasing the CCB.
There are contexts where releasing the ccb triggers dastart() to be run
inline. When da was written, there was always a deferral, so it didn't matter
much. Now, with direct dispatch, we can call dastart from the dadone*
routines. If the probe state isn't updated, then dastart will redo things with
stale information. This normally isn't a problem, because we run the probe state
machine once at boot... Except that we also run it for each open of the device,
which means we can have multiple threads racing each other to try to kick off
the probe. However, if we update the state before we release the CCB, we can
avoid the race. While it's needed only for the probewp and proberc* states, do
it everywhere because it won't hurt the other places.
The race here happens because we reprobe dozens of times on boot when drives
have lots of partitions. We should consider caching this info for 1-2 seconds
to avoid this thundering hurd.
Reviewed by: scottl, ken
Differential Revision: https://reviews.freebsd.org/D22295
avg [Mon, 11 Nov 2019 17:11:49 +0000 (17:11 +0000)]
db_nextframe/amd64: remove TRAP_INTERRUPT frame type
Besides the confusing name, this type is effectively unused.
In all cases where it could be set, the INTERRUPT type is set by the
earlier code. The conditions for TRAP_INTERRUPT are a subset of the
conditions for INTERRUPT.
dougm [Mon, 11 Nov 2019 16:59:49 +0000 (16:59 +0000)]
swap_pager_meta_free() frees allocated blocks in a way that
exploits the sparsity of allocated blocks in a range, without
issuing an "are you there?" query for every block in the range.
swap_pager_copy() is not so smart. Modify the implementation
of swap_pager_meta_free() slightly so that swap_pager_copy()
can use that smarter implementation too.
Based on an observation of: Yoshihiro Ota (ota_j.email.ne.jp)
Reviewed by: kib,alc
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22280
glebius [Mon, 11 Nov 2019 06:28:25 +0000 (06:28 +0000)]
It is unclear why in6_pcblookup_local() would require write access
to the PCB hash. The function doesn't modify the hash. It always
asserted write lock historically, but with epoch conversion this
fails in some special cases.
cognet [Mon, 11 Nov 2019 00:21:05 +0000 (00:21 +0000)]
linprocfs: Make sure to report -1 as tty when we have no controlling tty.
When reporting a process' stats, we can't just provide the tty as an
unsigned long, as if we have no controlling tty, the tty would be NODEV, or
-1. Instaed, just special-case NODEV.
Submitted by: Juraj Lutter <otis@sk.FreeBSD.org>
MFC after: 1 week
jhibbits [Sun, 10 Nov 2019 22:08:07 +0000 (22:08 +0000)]
Consolidate powerpcspe CFLAGS
Don't depend on CPUTYPE to define powerpcspe CFLAGS, they should be set
unconditionally. This reduces duplication. Also, set some CFLAGS as
gcc-only, because clang's SPE support always uses the SPE ABI, it's not an
optional feature.
kib [Sun, 10 Nov 2019 09:28:18 +0000 (09:28 +0000)]
amd64: move common_tss into pcpu.
This saves some memory, around 256K I think. It removes some code,
e.g. KPTI does not need to specially map common_tss anymore. Also,
common_tss become domain-local.
Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22231
alc [Sun, 10 Nov 2019 05:22:01 +0000 (05:22 +0000)]
Eliminate a redundant pmap_load() from pmap_remove_pages().
There is no reason why the pmap_invalidate_all() in pmap_remove_pages()
must be performed before the final PV list lock release. Move it past
the lock release.
Eliminate a stale comment from pmap_page_test_mappings(). We implemented
a modified bit in r350004.
mav [Sun, 10 Nov 2019 03:37:45 +0000 (03:37 +0000)]
Add compact scraptchpad protocol for ntb_transport(4).
Previously ntb_transport(4) required at least 6 scratchpad registers,
plus 2 more for each additional memory window. That is too much for some
configurations, where several drivers have to share resources of the same
NTB hardware. This patch introduces new compact version of the protocol,
requiring only 3 scratchpad registers, plus one more for each additional
memory window. The optimization is based on fact that neither of version,
number of windows or number of queue pairs really need more then one byte
each, and window sizes of 4GB are not very useful now. The new protocol
is activated automatically when the configuration is low on scratchpad
registers, or it can be activated explicitly with loader tunable.
mav [Sun, 10 Nov 2019 03:24:53 +0000 (03:24 +0000)]
Allow splitting PLX NTB BAR2 into several memory windows.
Address Lookup Table (A-LUT) being enabled allows to specify separate
translation for each 1/128th or 1/256th of the BAR2. Previously it was
used only to limit effective window size by blocking access through some
of A-LUT elements. This change allows A-LUT elements to also point
different memory locations, providing to upper layers several (up to 128)
independent memory windows. A-LUT hardware allows even more flexible
configurations than this, but NTB KPI have no way to manage that now.
kevans [Sun, 10 Nov 2019 03:06:03 +0000 (03:06 +0000)]
bcm2835_sdhci: don't panic in DMA interrupt if curcmd went away
This is an exceptional case; generally found during controller errors.
A panic when we attempt to acess slot->curcmd->data is less ideal than
warning, and other verbiage will be emitted to indicate the exact error.