mw [Thu, 30 May 2019 13:13:15 +0000 (13:13 +0000)]
Trigger reset in ENA if there are too many Rx descriptors
Whenever the driver will receive too many descriptors from the device,
it should trigger the device reset, as it is indicating that the device
is in invalid state.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
mw [Thu, 30 May 2019 13:12:14 +0000 (13:12 +0000)]
Remove RSS support in ENA
Receive Side Scaling is optional feature that could be enabled in kernel
configuration by defining flag RSS.
Kernel uses hash to store and find protocol control block which is
stored in hash tables.
Kernel and NIC hash functions must be consistent. Otherwise case lookup
fails.
To achieve this kernel provides API to set proper hash key to NIC.
As it is not possible to change key for virtual ENA NIC, this driver
cannot support RSS function.
ENA is designed to work in virtual environments so supporting hardware
version of this card is unnecessary.
mw [Thu, 30 May 2019 13:09:53 +0000 (13:09 +0000)]
Add notification AENQ handler for ENA
Notification AENQ handler is responsible for handling requests from ENA
device. Missing Tx threshold, Tx timeout and keep alive timeout can be
set using hints from the aenq descriptor which can be delivered in the
ENA admin notification.
The queue suspending and resuming tasks are not supported by the
driver.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
mw [Thu, 30 May 2019 13:08:00 +0000 (13:08 +0000)]
Print information when ENA admin error occurs
ENA_ADMIN_FATAL_ERROR and ENA_ADMIN_WARNING aenq groups were indicated
as supported, so the unimplemented_aenq_handler() will print out error
message, whenever an error will occur within the ENA admin context.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
jchandra [Thu, 30 May 2019 01:32:00 +0000 (01:32 +0000)]
gicv3_its: do LPI init only once per CPU
The initialization required for LPIs (setting up pending tables etc.)
has to be done just once per CPU, even in the case where there are
multiple ITS blocks associated with the CPU.
Add a flag lpi_enabled in the per-cpu distributor info for this and
use it to ensure that we call its_init_cpu_lpi() just once.
This enables us to support platforms where multiple GIC ITS blocks
can generate LPIs to a CPU.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D19844
jchandra [Thu, 30 May 2019 01:24:47 +0000 (01:24 +0000)]
gicv3_its: refactor LPI init into a new function
Move the per-cpu LPI intialization to a separate function. This is
in preparation for a commit that does LPI init only once for a CPU,
even when there are multiple ITS blocks associated with the CPU.
No functional changes in this commit.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D19843
jchandra [Thu, 30 May 2019 01:21:08 +0000 (01:21 +0000)]
gic_v3: consolidate per-cpu redistributor information
Update 'struct gic_redists' to consolidate all per-cpu redistributor
information into a new 'struct redist_pcpu'. Provide a new interface
(GICV3_IVAR_REDIST) for the GIC driver, which can be used to retrieve
the per-cpu data.
This per-cpu redistributor struct will be later used to improve the
GIC ITS setup.
While there, remove some unused fields in gic_v3_var.h interface.
No functional changes.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D19842
rpokala [Wed, 29 May 2019 23:50:31 +0000 (23:50 +0000)]
Add bits related to SANITIZE, SED, and form-factor to (struct ata_params)
Based on ATA-ACS-4, recognize several bit-fields related to the ATA SANITIZE
feature-set, Self-Encrypting Drives, and form-factor identification.
As part of this change, the name of word 48 of (struct ata_params) is being
changed. The previous name, "usedmovsd" does not appear to be related to the
previous definition of the word ("double-word IO supported"). The word was
defined that way in ATA-1 (1994), but it was marked "Reserved" (meaning
"unused, but might be used in the future") in ATA-2 (1996). It stayed that
way until ATA-8 (2008), which re-defined it as implemented in this change.
The field is not used in-tree.
erj [Wed, 29 May 2019 22:24:10 +0000 (22:24 +0000)]
iflib: provide probe wrapper for vendor drivers
From Jake:
Vendor drivers that exist out-of-tree generally should return
BUS_PROBE_VENDOR from their device probe functions. This helps ensure
that a vendor replacement driver will supersede the in-kernel driver for
a given device.
Currently, if a vendor wants to implement a driver based on iflib, it
will always report BUS_PROBE_DEFAULT.
Add a wrapper function, iflib_device_probe_vendor() which can be used in
place of iflib_device_probe(). This function will just return
BUS_PROBE_VENDOR whenever iflib_device_probe() would return
BUS_PROBE_DEFAULT.
While vendor drivers can already implement such a wrapper themselves,
providing it in the iflib.h header makes it easier for the vendor driver
to do the right thing.
allanjude [Wed, 29 May 2019 20:34:35 +0000 (20:34 +0000)]
Fix assertion in ZFS TRIM code
Due to an attempt to check two conditions at once in a macro not designed
as such, the assertion would always evaluate to true.
#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) do { \
const TYPE __left = (TYPE)(LEFT); \
const TYPE __right = (TYPE)(RIGHT); \
if (!(__left OP __right)) \
assfail3(#LEFT " " #OP " " #RIGHT, \
(uintmax_t)__left, #OP, (uintmax_t)__right, \
__FILE__, __LINE__); \
_NOTE(CONSTCOND) } while (0)
#define ASSERT3U(x, y, z) VERIFY3_IMPL(x, y, z, uint64_t)
Mean that we compared:
left = (type == ZIO_TYPE_FREE || psize)
OP = "<="
right = (SPA_MAXBLOCKSIZE)
If the type was not FREE, 0 is less than SPA_MAXBLOCKSIZE (16MB)
If the type is ZIO_TYPE_FREE, 1 is less than SPA_MAXBLOCKSIZE
The constraint on psize (physical size of the FREE operation) is never
checked against SPA_MAXBLOCKSIZE
Reported by: Ka Ho Ng <khng300@gmail.com>
Reviewed by: kevans
MFC after: 2 weeks
Sponsored by: Klara Systems
kib [Wed, 29 May 2019 14:05:27 +0000 (14:05 +0000)]
Do not go into sleep in sleepq_catch_signals() when SIGSTOP from
PT_ATTACH was consumed.
In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is
non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero
return code to EINTR.
Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was
delivered right after the thread was added to the sleepqueue but not
yet really sleep, and cursig() caused debugger attach, the thread
sleeps instead of returning to the userspace boundary with EINTR.
avg [Wed, 29 May 2019 09:08:20 +0000 (09:08 +0000)]
revert r273728 and parts of r306589, iicbus no-stop by default feature
Since drm2 removal, there has not been any consumer of the feature in the
tree. I am also unaware of any out-of-tree consumer.
More importantly, the feature has been broken from the very start, both
before and after r306589, because the ivar was set on a device that does
not support it and it was read from another device that also does not
support it.
A bus-wide no-stop flag cannot be implemented as an ivar as iicbus
attaches as a child of various drivers. Implementing the ivar in each
and every I2C driver is just impractical.
If we ever want to implement this feature properly, then probably the
easiest way to do it would be via a flag in the softc of iicbus.
In fact, we might have to do that in the stable branches if we want to
fix the code for them.
Reported by: ian (long time ago)
MFC after: 1 month (maybe)
X-MFC-note: cannot just merge the change, must keep drm2 happy
jhibbits [Wed, 29 May 2019 02:02:56 +0000 (02:02 +0000)]
Add missing powerpc64 relocation support to libdwarf
Summary:
Due to missing relocation support in libdwarf for powerpc64, handling of dwarf
info on unlinked objects was bogus.
Examining raw dwarf data on objects compiled on ppc64 with a modern compiler
(in-tree gcc tends to hide the issue, since it only rarely generates relocations
in .debug_info and uses DW_FORM_str instead of DW_FORM_strp for everything), you
will find that the dwarf data appears corrupt, with repeated references to the
compiler version where things like types and function names should appear.
This happens because the 0 offset of .debug_str contains the compiler version,
and without applying the relocations, *all* indirect strings in .dwarf_info will
end up pointing to it.
This corruption then propogates to the CTF data, as ctfconvert relies on
libdwarf to read the dwarf info, for every compiled object (when building a
kernel.)
However, if you examine the dwarf data on a compiled executable, it will appear
correct, because during final link the relocations get applied and baked in by
the linker.
kevans [Wed, 29 May 2019 01:08:30 +0000 (01:08 +0000)]
if_bridge(4): Complete bpf auditing of local traffic over the bridge
There were two remaining "gaps" in auditing local bridge traffic with
bpf(4):
Locally originated outbound traffic from a member interface is invisible to
the bridge's bpf(4) interface. Inbound traffic locally destined to a member
interface is invisible to the member's bpf(4) interface -- this traffic has
no chance after bridge_input to otherwise pass it over, and it wasn't
originally received on this interface.
I call these "gaps" because they don't affect conventional bridge setups.
Alas, being able to establish an audit trail of all locally destined traffic
for setups that can function like this is useful in some scenarios.
johalun [Tue, 28 May 2019 20:54:59 +0000 (20:54 +0000)]
pseudofs: Ignore unsupported commands in vop_setattr.
Users of pseudofs (e.g. lindebugfs), should be able to receive
input from command line via commands like "echo 1 > /path/to/file".
Currently this fails because sh tries to truncate the file first and
vop_setattr returns not supported error for this. This patch simply
ignores the error and returns 0 instead.
mav [Tue, 28 May 2019 18:32:04 +0000 (18:32 +0000)]
Fix array out of bound panic introduced in r306219.
As I see, different NICs in different configurations may have different
numbers of TX and RX queues. The code was assuming 1:1 mapping between
event queues (interrupts) and TX/RX queues. Since number of interrupts
is set to maximum of TX and RX queues, when those two are different, the
system is doomed.
I have no documentation or deep knowledge about this hardware, so this
change is based on general observations and code reading. If some of my
guesses are wrong, please do better. I just confirmed HP NC550SFP NICs
are working now.
kevans [Tue, 28 May 2019 16:12:16 +0000 (16:12 +0000)]
bectl(8): Address Coverity complaints
CID 1400451: case 0 is missing a break/return and falling through to the
default case. waitpid(0, ...) makes little sense in the child, we likely
wanted to terminate immediately.
CID 1400453: size argument uses sizeof(char **) instead of sizeof(char *)
and is assigned to a char **; sizeof's match but "this isn't a portable
assumption".
dougm [Tue, 28 May 2019 15:47:00 +0000 (15:47 +0000)]
Implement the ffs and fls functions, and their longer counterparts, in
cpufunc, in terms of __builtin_ffs and the like, for arm32 v6 and v7
architectures, and use those, rather than the simple libkern
implementations, in building arm32 kernels.
Reviewed by: manu
Approved by: kib, markj (mentors)
Tested by: iz-rpi03_hs-karlsruhe.de, mikael.urankar_gmail.com, ian
Differential Revision: https://reviews.freebsd.org/D20412
ae [Tue, 28 May 2019 11:45:00 +0000 (11:45 +0000)]
Rework r348303 to reduce the time of holding global BPF lock.
It appeared that using NET_EPOCH_WAIT() while holding global BPF lock
can lead to another panic:
spin lock 0xfffff800183c9840 (turnstile lock) held by 0xfffff80018e2c5a0 (tid 100325) too long
panic: spin lock held too long
...
#0 sched_switch (td=0xfffff80018e2c5a0, newtd=0xfffff8000389e000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2133
#1 0xffffffff80bf9912 in mi_switch (flags=256, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:439
#2 0xffffffff80c21db7 in sched_bind (td=<optimized out>, cpu=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2704
#3 0xffffffff80c34c33 in epoch_block_handler_preempt (global=<optimized out>, cr=0xfffffe00005a1a00, arg=<optimized out>)
at /usr/src/sys/kern/subr_epoch.c:394
#4 0xffffffff803c741b in epoch_block (global=<optimized out>, cr=<optimized out>, cb=<optimized out>, ct=<optimized out>)
at /usr/src/sys/contrib/ck/src/ck_epoch.c:416
#5 ck_epoch_synchronize_wait (global=0xfffff8000380cd80, cb=<optimized out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465
#6 0xffffffff80c3475e in epoch_wait_preempt (epoch=0xfffff8000380cd80) at /usr/src/sys/kern/subr_epoch.c:513
#7 0xffffffff80ce970b in bpf_detachd_locked (d=0xfffff801d309cc00, detached_ifp=<optimized out>) at /usr/src/sys/net/bpf.c:856
#8 0xffffffff80ced166 in bpf_detachd (d=<optimized out>) at /usr/src/sys/net/bpf.c:836
#9 bpf_dtor (data=0xfffff801d309cc00) at /usr/src/sys/net/bpf.c:914
To fix this add the check to the catchpacket() that BPF descriptor was
not detached just before we acquired BPFD_LOCK().
andrew [Tue, 28 May 2019 10:55:59 +0000 (10:55 +0000)]
The alignment is passed into contigmalloc_domainset in the 7th argument.
KUBSAN was complaining the pointer contigmalloc_domainset returned was
misaligned. Fix this by using the correct argument to find the alignment
in the function signature.
cem [Mon, 27 May 2019 17:33:20 +0000 (17:33 +0000)]
kldxref(8): Sort MDT_MODULE info first in linker.hints output
MDT_MODULE info is required to be ordered before any other MDT metadata for
a given kld because it serves as an implicit record boundary between
distinct klds for linker.hints consumers. kldxref(8) has previously relied
on the assumption that MDT_MODULE was ordered relative to other module
metadata in kld objects by source code ordering.
However, C does not require implementations to emit file scope objects in
any particular order, and it seems that GCC 6.4.0 and/or binutils 2.32 ld
may reorder emitted objects with respect to source code ordering.
So: just take two passes over a given .ko's module metadata, scanning for
the MDT_MODULE on the first pass and the other metadata on subsequent
passes. It's not super expensive and not exactly a performance-critical
piece of code. This ensures MDT_MODULE is always ordered before
MDT_PNP_INFO and other MDTs, regardless of compiler/linker movement. As a
fringe benefit, it removes the requirement that care be taken to always
order MODULE_PNP_INFO after DRIVER_MODULE in source code.
kib [Mon, 27 May 2019 15:21:26 +0000 (15:21 +0000)]
Correct some inconsistencies in the earliest created kernel page
tables which affect demotion.
The last last-level page table under 2M mappings below KERNend was
only partially initialized. When that page was used as the hardware
page table for demotion of the 2M mapping, the result was not
consistent. Since pmap_demote_pde() is switched to use PG_PROMOTED as
the test for the validity of the saved last level page table page, we
can keep page table pages zero-initialized instead. Demotion would
fill them as needed.
Only map the created page tables beyond KERNend, there is no need to
pre-promote PTmap after KERNend, because the extra mapping is not used.
Only round up *firstaddr to 2M boundary when it is below rounded
KERNend. Sometimes the allocpages() calls advance *firstaddr past the
end of the last 2MB page mapping. In that case, this conditional
avoids wasting an average of 1MB of physical memory.
Update comments to explain action in more clean and direct language.
Reported and tested by: pho
In collaboration with: alc
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20380
ae [Mon, 27 May 2019 12:41:41 +0000 (12:41 +0000)]
Fix possible NULL pointer dereference.
bpf_mtap() can invoke catchpacket() for already detached descriptor.
And this can lead to NULL pointer dereference, since bd_bif pointer
was reset to NULL in bpf_detachd_locked(). To avoid this, use
NET_EPOCH_WAIT() when descriptor is removed from interface's descriptors
list. After the wait it is safe to modify descriptor's content.
mckusick [Mon, 27 May 2019 06:22:43 +0000 (06:22 +0000)]
Add function name and line number debugging information to softupdates
worklist structures to help track their movement between work lists.
No functional change to the operation of soft updates intended.
jhibbits [Mon, 27 May 2019 03:18:56 +0000 (03:18 +0000)]
powerpc/dtrace: Fix fbt function probing for ELFv2
'.' function names exist only in ELFv1. ELFv2 does away with function
descriptors, and look more like they do on powerpc(32) and most other
platforms, as direct function pointers. Stop blacklisting regular function
names in ELFv2.
jchandra [Sun, 26 May 2019 23:04:21 +0000 (23:04 +0000)]
arm64 nexus: remove incorrect warning
acpi_config_intr() will be called when an arm64 system booted with ACPI.
We do the interrupt mapping for ACPI interrupts in nexus_acpi_map_intr()
on arm64, so acpi_config_intr() has to just return success without
printing this error message.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D19432
tuexen [Sun, 26 May 2019 17:18:14 +0000 (17:18 +0000)]
When an ACK segment as the third message of the three way handshake is
received and support for time stamps was negotiated in the SYN/SYNACK
exchange, perform the PAWS check and only expand the syn cache entry if
the check is passed.
Without this check, endpoints may get stuck on the incomplete queue.
Reviewed by: jtl@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D20374
dim [Sun, 26 May 2019 15:44:58 +0000 (15:44 +0000)]
Pull in r361696 from upstream llvm trunk (by Sanjay Patel):
[SelectionDAG] soften assertion when legalizing narrow vector FP ops
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not be
so aggressive with an assert here. There's no telling what
out-of-tree targets look like.
This fixes an assertion when building the graphics/mesa-dri port for
PowerPC64.
Reported by: Mark Millard <marklmi26-fbsd@yahoo.com>
PR: 238082
MFC after: 3 days
sef [Sat, 25 May 2019 07:26:30 +0000 (07:26 +0000)]
Add an AESNI-optimized version of the CCM/CBC cryptographic and authentication
code. The primary client of this is probably going to be ZFS encryption.
cem [Sat, 25 May 2019 01:59:24 +0000 (01:59 +0000)]
virtio_pci(4): Fix typo in read_ivar method
Prior to this revision, vtpci's BUS_READ_IVAR method on VIRTIO_IVAR_SUBVENDOR
accidentally returned the PCI subdevice.
The typo seems to have been introduced with the original commit adding
VIRTIO_IVAR_{{SUB,}DEVICE,{SUB,}VENDOR} to virtio_pci. The commit log and code
strongly suggest that the ivar was intended to return the subvendor rather than
the subdevice; it was likely just a copy/paste mistake.
mckusick [Sat, 25 May 2019 00:07:49 +0000 (00:07 +0000)]
When using the destroy option to shut down a nop GEOM module, I/O
operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the module needs
to wait for the draining to happen. The waiting is done by adding
a g_nop_providergone() function to wait for the I/O operations to
finish up. This change is similar to change -r345758 made to the
memory-disk driver.
kib [Fri, 24 May 2019 23:28:11 +0000 (23:28 +0000)]
Fix too loose assert in pmap_large_unmap().
The upper bound for the valid address from the large map used
LARGEMAP_MAX_ADDRESS instead of LARGEMAP_MIN_ADDRESS. Provide a
function-like macro for proper upper value.
Noted by: markj
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20386
cem [Fri, 24 May 2019 22:33:14 +0000 (22:33 +0000)]
Disable intr_storm_threshold mechanism by default
The ixl.4 manual page has documented that the threshold falsely detects
interrupt storms on 40Gbit NICs as long ago as 2015, and we have seen
similar false positives with the ioat(4) DMA device (which can push GB/s).
For example, synthetic load can be generated with tools/tools/ioat
'ioatcontrol 0 200 8192 1 1000' (allocate 200x8kB buffers, generate an
interrupt for each one, and do this for 1000 milliseconds). With
storm-detection disabled, the Broadwell-EP version of this device is capable
of generating ~350k real interrupts per second.
The following historical context comes from jhb@: Originally, the threshold
worked around incorrect routing of PCI INTx interrupts on single-CPU systems
which would end up in a hard hang during boot. Since the threshold was
added, our PCI interrupt routing was improved, most PCI interrupts use
edge-triggered MSI instead of level-triggered INTx, and typical systems have
multiple CPUs available to service interrupts.
On the off chance that the threshold is useful in the future, it remains
available as a tunable and sysctl.
jhb [Fri, 24 May 2019 22:30:40 +0000 (22:30 +0000)]
Restructure mbuf send tags to provide stronger guarantees.
- Perform ifp mismatch checks (to determine if a send tag is allocated
for a different ifp than the one the packet is being output on), in
ip_output() and ip6_output(). This avoids sending packets with send
tags to ifnet drivers that don't support send tags.
Since we are now checking for ifp mismatches before invoking
if_output, we can now try to allocate a new tag before invoking
if_output sending the original packet on the new tag if allocation
succeeds.
To avoid code duplication for the fragment and unfragmented cases,
add ip_output_send() and ip6_output_send() as wrappers around
if_output and nd6_output_ifp, respectively. All of the logic for
setting send tags and dealing with send tag-related errors is done
in these wrapper functions.
For pseudo interfaces that wrap other network interfaces (vlan and
lagg), wrapper send tags are now allocated so that ip*_output see
the wrapper ifp as the ifp in the send tag. The if_transmit
routines rewrite the send tags after performing an ifp mismatch
check. If an ifp mismatch is detected, the transmit routines fail
with EAGAIN.
- To provide clearer life cycle management of send tags, especially
in the presence of vlan and lagg wrapper tags, add a reference count
to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
Provide a helper function (m_snd_tag_init()) for use by drivers
supporting send tags. m_snd_tag_init() takes care of the if_ref
on the ifp meaning that code alloating send tags via if_snd_tag_alloc
no longer has to manage that manually. Similarly, m_snd_tag_rele
drops the refcount on the ifp after invoking if_snd_tag_free when
the last reference to a send tag is dropped.
This also closes use after free races if there are pending packets in
driver tx rings after the socket is closed (e.g. from tcpdrop).
In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
Drivers now also check this flag instead of checking snd_tag against
NULL. This avoids false positive matches when a forwarded packet
has a non-NULL rcvif that was treated as a send tag.
- cxgbe was relying on snd_tag_free being called when the inp was
detached so that it could kick the firmware to flush any pending
work on the flow. This is because the driver doesn't require ACK
messages from the firmware for every request, but instead does a
kind of manual interrupt coalescing by only setting a flag to
request a completion on a subset of requests. If all of the
in-flight requests don't have the flag when the tag is detached from
the inp, the flow might never return the credits. The current
snd_tag_free command issues a flush command to force the credits to
return. However, the credit return is what also frees the mbufs,
and since those mbufs now hold references on the tag, this meant
that snd_tag_free would never be called.
To fix, explicitly drop the mbuf's reference on the snd tag when the
mbuf is queued in the firmware work queue. This means that once the
inp's reference on the tag goes away and all in-flight mbufs have
been queued to the firmware, tag's refcount will drop to zero and
snd_tag_free will kick in and send the flush request. Note that we
need to avoid doing this in the middle of ethofld_tx(), so the
driver grabs a temporary reference on the tag around that loop to
defer the free to the end of the function in case it sends the last
mbuf to the queue after the inp has dropped its reference on the
tag.
- mlx5 preallocates send tags and was using the ifp pointer even when
the send tag wasn't in use. Explicitly use the ifp from other data
structures instead.
- Sprinkle some assertions in various places to assert that received
packets don't have a send tag, and that other places that overwrite
rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.
asomers [Fri, 24 May 2019 20:27:50 +0000 (20:27 +0000)]
Remove "struct ucred*" argument from vtruncbuf
vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it. Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20377
luporl [Fri, 24 May 2019 18:41:31 +0000 (18:41 +0000)]
Make options MD_ROOT_MEM default on PPC64
Having this option enabled by default on PowerPC64 kernels makes
booting ISO images much easier when on PowerNV.
With it, the ISO may simply be given to the -i flag of kexec.
Better yet, the ISO may be loop mounted on PetitBoot and its
kernel may be used to load itself.
Without this option, booting ISOs on remote PPC64 machines usually
involve preparing a separate kernel, with this option enabled.
ken [Fri, 24 May 2019 17:58:29 +0000 (17:58 +0000)]
Fix FC-Tape bugs caused in part by r345008.
The point of r345008 was to reset the Command Reference Number (CRN)
in some situations where a device stayed in the topology, but had
changed somehow.
This can include moving from a switch connection to a direct
connection or vice versa, or a device that temporarily goes away
and comes back. (e.g. moving to a different switch port)
There were a couple of bugs in that change:
- We were reporting that a device had not changed whenever the
Establish Image Pair bit was not set. That is not quite correct.
Instead, if the Establish Image Pair bit stays the same (set or
not), the device hasn't changed in that way.
- We weren't setting PRLI Word0 in the port database when a new
device arrived, so comparisons with the old value for the
Establish Image Pair bit weren't really possible. So, make sure
PRLI Word0 is set in the port database for new devices.
- We were resetting the CRN whenever the Establish Image Pair bit
was set for a device, even when the device had stayed the same
and the value of the bit hadn't changed. Now, only reset the
CRN for devices that have changed, not devices that sayed the
same.
The result of all of this was that if we had a single FC device on
an FC port and it went away and came back, we would wind up
correctly resetting the CRN.
But, if we had multiple devices connected via a switch, and there
was any change in one or more of those devices, all of the devices
that stayed the same would also have their CRN values reset.
The result, from a user standpoint, is that the tape drives, etc.
would all start to time out commands and the initiator would send
aborts.
sys/dev/isp/isp.c:
In isp_pdb_add_update(), look at whether the Establish
Image Pair bit has changed as part of the check to
determine whether a device is still the same. This was
causing erroneous change notifications. Also, when
creating a new port database entry, initialize the
PRLI Word 0 values.
sys/dev/isp/isp_freebsd.c:
In isp_async(), in the changed/stayed case, instead of
looking at the Establish Image Pair bit to determine
whether to reset the CRN, look at the command value.
(Changed vs. Stayed.) Only reset the CRN for devices
that have changed.
kib [Fri, 24 May 2019 17:19:06 +0000 (17:19 +0000)]
Fix a corner case in demotion of kernel mappings.
It is possible for the kernel mapping to be created with superpage by
directly installing pde using pmap_enter_2mpage() without filling the
corresponding page table page. This can happen e.g. if the range is
already backed by reservation and vm_fault_soft_fast() conditions are
satisfied, which was observed on the pipe_map.
In this case, demotion must fill the page obtained from the pmap
radix, same as if the page is newly allocated. Use PG_PROMOTED bit as
an indicator that the page is valid, instead of the wire count of the
page table page.
Since the PG_PROMOTED bit is set on pde when we leave TLB entries for
4k pages around, which in particular means that the ptes were filled,
it provides more correct indicator. Note that pmap_protect_pde()
clears PG_PROMOTED, which handles the case when protection was changed
on the superpage without adjusting ptes.
Reported by: pho
In collaboration with: alc
Tested by: alc, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20380
markj [Fri, 24 May 2019 15:45:43 +0000 (15:45 +0000)]
Modernize the MAKE_JUST_KERNELS hint in the top-level makefile.
It doesn't make sense to limit to -j12 anymore, build scalability
is better than it used to be. Fold the hint into the description
of the universe target.
ae [Fri, 24 May 2019 11:06:24 +0000 (11:06 +0000)]
Add `missing` and `or-flush` options to "ipfw table <NAME> create"
command to simplify firewall reloading.
The `missing` option suppresses EEXIST error code, but does check that
existing table has the same parameters as new one. The `or-flush` option
implies `missing` option and additionally does flush for table if it
is already exist.
Submitted by: lev
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18339
delphij [Fri, 24 May 2019 02:44:15 +0000 (02:44 +0000)]
cryptodeflate: Drop z_stream zbuf.state->dummy from SDT probe.
For older versions of zlib, dummy was a workaround for compilers that do not
handle opaque type definition well; on FreeBSD, it's representing a value
that is not really useful for monitoring purposes, and the field would be gone
in newer zlib versions.
PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20222
kevans [Fri, 24 May 2019 01:53:45 +0000 (01:53 +0000)]
bectl(8): Add a test for jail/unjail of numeric BE names
Fixed by r348215, bectl ujail first attempts the trivial fetch of a jid by
passing the first argument to 'ujail' to jail_getid(3) in case a jid/name
have been passed in instead of a BE name. For numerically named BEs, this
was doing the wrong thing: instead of failing to locate the jid specified
and falling back to mountpath search, jail_getid(3) would return the input
as-is.
While here, I've fixed bectl_jail_cleanup which still used a hard-coded pool
name that was overlooked w.r.t. other work that was in-flight around the
same time.