Alan Somers [Thu, 26 Nov 2020 23:35:13 +0000 (23:35 +0000)]
MFC r365956:
fsx: fix build with WARNS=6
* signed/unsigned comparisons
* use standard warn(3)
* Suppress warnings about local vars and funcs not declared static
* const-correctness
* declaration shadows a variable in the global scope
Alan Somers [Thu, 26 Nov 2020 23:34:02 +0000 (23:34 +0000)]
MFC r365910:
fix integer underflow in getgrnam_r and getpwnam_r
Sometimes nscd(8) will return a 1-byte buffer for a nonexistent entry. This
triggered an integer underflow in grp_unmarshal_func, causing getgrnam_r to
return ERANGE instead of 0.
Fix the user's buffer size check, and add a correct check for a too-small
nscd buffer.
Marcin Wojtas [Wed, 25 Nov 2020 03:24:43 +0000 (03:24 +0000)]
MFC: Merge ENA v2.3.0 driver
r367805 Update ENA driver version to v2.3.0
r367803 Rename descriptions of the supported ENA devices
r367802 Add ENI metrics for the ENA driver
r367801 Add SPDX license tag to the ENA driver files
r367800 Add Rx offsets support for the ENA driver
r367799 Adjust ENA driver files to latest ena-com changes
r367795 Fix completion descriptors alignment for the ENA
Alexander Motin [Tue, 24 Nov 2020 13:17:12 +0000 (13:17 +0000)]
MFC r367044: Introduce support of SCSI Command Priority.
SAM-3 specification introduced concept of Task Priority, that was renamed
to Command Priority in SAM-4, and supported by all modern SCSI transports.
It provides 15 levels of relative priorities: 1 - highest, 15 - lowest and
0 - default. SAT specification for SATA devices translates priorities 1-3
into NCQ high priority.
This change adds new "priority" field into empty spots of struct ccb_scsiio
and struct ccb_accept_tio of CAM and struct ctl_scsiio of CTL. Respective
support is added into iscsi(4), isp(4), mpr(4), mps(4) and ocs_fc(4) drivers
for both initiator and where applicable target roles. Minimal support was
added to CTL to receive the priority value from different frontends, pass it
between HA controllers and report in few places.
This patch does not add consumers of this functionality, so nothing should
really change yet, since the field is still set to 0 (default) on initiator
and not actively used on target. Those are to be implemented separately.
I've confirmed priority working on WD Red SATA disks connected via mpr(4)
and properly transferred to CTL target via iscsi(4), isp(4) and ocs_fc(4).
While there, added missing tag_action support to ocs_fc(4) initiator role.
Navdeep Parhar [Tue, 24 Nov 2020 00:17:05 +0000 (00:17 +0000)]
MFC r366929 and r367608.
r366929:
cxgbe(4): fix the size of the iq/eq maps.
The firmware can allocate ingress and egress context ids anywhere from
its configured range. Size the iq/eq maps to match the entire range
instead of assuming that the firmware always allocates the first
available context id.
Reported by: Baptiste Wicht @ Verisign
r367608:
cxgbev(4): Make sure that the iq/eq map sizes are correct for VFs.
Navdeep Parhar [Mon, 23 Nov 2020 23:53:21 +0000 (23:53 +0000)]
MFC r366532 and r366862.
r366532:
cxgbe(4): knobs to drop various kinds of undesirable frames on ingress.
These kind of drops come for free in the sense that they do not use the
filter TCAM or any other resource that wouldn't normally be used during
rx. Frames dropped by the hardware get counted in the MAC's rx stats
but are not delivered to the driver.
hw.cxgbe.attack_filter
Set to 1 to enable the "attack filter". Default is 0. The attack
filter will drop an incoming frame if any of these conditions is true:
src ip/ip6 == dst ip/ip6; tcp and src/dst ip is not unicast; src/dst ip
is loopback (127.x.y.z); src ip6 is not unicast; src/dst ip6 is loopback
(::1/128) or unspecified (::/128); tcp and src/dst ip6 is mcast
(ff00::/8).
hw.cxgbe.drop_ip_fragments
Set to 1 to drop all incoming IP fragments. Default is 0. Note that
this drops valid frames.
hw.cxgbe.drop_pkts_with_l2_errors
Set to 1 to drop incoming frames with Layer 2 length or checksum errors.
Default is 1.
hw.cxgbe.drop_pkts_with_l3_errors
Set to 1 to drop incoming frames with IP version, length, or checksum
errors. Default is 0.
hw.cxgbe.drop_pkts_with_l4_errors
Set to 1 to drop incoming frames with Layer 4 length, checksum, or other
errors. Default is 0.
r366862:
cxgbe(4): Updates to the drop features from r366532.
Navdeep Parhar [Mon, 23 Nov 2020 23:46:07 +0000 (23:46 +0000)]
MFC r365732 and r366589.
r365732:
cxgbe(4): Get the count of FCS errors from the MAC and not MPS for T6 ports.
The MPS register on the T6 counts something other than FCS errors despite its
name.
r366589:
cxgbe(4): More fixes for the T6 FCS error counter.
r365732 was the first attempt to get an accurate count but it was
writing to some read-only registers to clear them and that obviously
didn't work. Instead, note the counter's value when it is supposed to
be cleared and subtract it from future readings.
dev.<port>.stats.rx_fcs_error should not be serviced from the MPS
register for T6.
The stats.* sysctls should all use T5_PORT_REG for T5 and above. This
must have been missed in the initial T5 support years ago. Fix it while
here.
MFC r367594:
Fix possible NULL pointer dereference.
lagg(4) replaces if_output method of its child interfaces and expects
that this method can be called only by child interfaces. But it is
possible that lagg_port_output() could be called by children of child
interfaces. In this case ifnet's if_lagg field is NULL. Add check that
lp is not NULL.
Perhaps it made sense in 1998 (r32836), but now it feels a bit out of
place. We tend to avoid documenting non-essential ports variables in
the manual page (we try to document them in the Porter's Handbook instead).
Alexander Motin [Wed, 18 Nov 2020 02:05:59 +0000 (02:05 +0000)]
MFC r367600: Make CTL nicer to increased MAXPHYS.
Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O,
that is even more overkill for MAXPHYS of 1MB. This change limits maximum
allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds
new 128KB UMA zone for smaller I/Os. The patch factors out alloc/free,
so later we could make it use more zones or malloc() if we'd like.
Kirk McKusick [Tue, 17 Nov 2020 05:48:00 +0000 (05:48 +0000)]
MFC of 340927 and 367034.
Move clear of UFS feature flags from ufs_mountfs() to ffs_sbget() to
ensure that the appropriate feature flags get cleared by filesystem
utilities as well as the kernel when they modify the filesystem.
Note 340927 is relevant for this even though it was done for a
different reason at the time.
An earlier commit effectively turned out the fast forwading path
due to its lack of support for ICMP redirects. The following commit
adds redirects to the fastforward path, again allowing for decent
forwarding performance in the kernel.
Reviewed by: ae, melifaro (also helped with the MFC)
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
It is rather common for the ports users to replace su(1) with sudo(8)
within the SU_CMD variable. Let's document it in the manual page (so far
it's been hidden in a comment within bsd.commands.mk).
Kristof Provost [Sun, 15 Nov 2020 11:56:16 +0000 (11:56 +0000)]
MFC r366500:
bridge: call member interface ioctl() without NET_EPOCH
We're not allowed to hold NET_EPOCH while sleeping, so when we call ioctl()
handlers for member interfaces we cannot be in NET_EPOCH. We still need some
protection of our CK_LISTs, so hold BRIDGE_LOCK instead.
That requires changing BRIDGE_LOCK into a sleepable lock, and separating the
BRIDGE_RT_LOCK, to protect bridge_rtnode lists. That lock is taken in the data
path (while in NET_EPOCH), so it cannot be a sleepable lock.
* The Rust compiler produces SHF_ALLOC `.debug_gdb_scripts` (which
normally does not have the flag)
* `.debug_gdb_scripts` sections are removed from `inputSections` due
to --strip-debug/--strip-all
* When processing --gc-sections, pieces of a SHF_MERGE section can be
marked live separately
`=>` segfault when marking liveness of a `.debug_gdb_scripts` which
is not split into pieces (because it is not in `inputSections`)
This patch circumvents the problem by not treating SHF_ALLOC
".debug*" as debug sections (to prevent --strip-debug's stripping)
(which is still useful on its own).
Kyle Evans [Sat, 14 Nov 2020 15:33:39 +0000 (15:33 +0000)]
MFC r367604: umtx: drop incorrect timespec32 definition
This works for amd64, but none others -- drop it, because we already have a
proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses
time32_t.
While toying around with lua bindings for libbe(3), I discovered that I
apparently never documented this, despite having documented
be_is_auto_snapshot_name that references it.
r366821:
libbe(3): install MLINKS for all of the functions provided
Kyle Evans [Sat, 14 Nov 2020 15:19:36 +0000 (15:19 +0000)]
MFC r366769: MFC r366760: lua: update to 5.3.6
This release contains some minor bugfixes; notably:
- 2x minor Makefile fixes (not used in base)
- Long brackets with a huge number of '=' overflow some internal buffer
arithmetic.
- Joining an upvalue with itself can cause a use-after-free crash.
See here for examples: http://www.lua.org/bugs.html#5.3.5
r367239:
Add plug and play information macroses for ACPI and I2C buses.
Matching table format is compatible with ACPI_ID_PROBE bus method.
Note that while ACPI_ID_PROBE matches against _HID and all _CIDs, current
acpi_pnpinfo_str() exports only _HID and first _CID. That means second
and further _CIDs should be added to both acpi_pnpinfo_str() and
ACPICOMPAT_PNP_INFO if device matching against them is required.
Kyle Evans [Sat, 14 Nov 2020 02:11:04 +0000 (02:11 +0000)]
MFC r366435: lualoader: improve the design of the brand-/logo- mechanism
In the previous world order, any brand/logo was forced to pull in the
drawer and call drawer.add{Brand,Logo} with the name their brand/logo is
taking and a table describing it.
In the new world order, these files just need to return a table that maps
out graphics types to a table of the exact same format as what was
previously being passed back into the drawer. The appeal here is not needing
to grab a reference back to the drawer module and having a cleaner
data-driven looking format for these. The format has been renamed to 'gfx-*'
prefixes and each one can provide a logo and a brand.
drawer.addBrand/drawer.addLogo will remain in place until FreeBSD 13, as
there's no overhead to them and it's not yet worth the break in
compatibility with any pre-existing brands and logos.
Kyle Evans [Sat, 14 Nov 2020 02:00:50 +0000 (02:00 +0000)]
MFC r366430: ngctl: add -c (compact output) for the dot command
The output of "ngctl dot" is suitable for small netgraph networks. Even
moderate complex netgraph setups (about a dozen nodes) are hard to
understand from the .dot output, because each node and each hook are shown
as a full blown structure.
This patch allows to generate much more compact output and graphs by
omitting the extra structures for the individual hooks. Instead the names of
the hooks are labels to the edges.
Kyle Evans [Sat, 14 Nov 2020 01:58:33 +0000 (01:58 +0000)]
MFC r367448: vt: resolve conflict between VT_ALT_TO_ESC_HACK and DBG
When using the ALT+CTRL+ESC sequence to break into kdb, the keyboard is
completely borked when you return. watch(8) shows that it's working, but
it's inserting escape sequences.
Further investigation revealed that VT_ALT_TO_ESC_HACK is the default and
directly conflicts with this sequence, so upon return from the debugger
ALKED is set.
If they triggered the break to debugger, it's safe to assume they didn't
mean to use VT_ALT_TO_ESC_HACK, so just unset it to reduce the surprise when
the keyboard seems non-functional upon return.
Kyle Evans [Sat, 14 Nov 2020 01:55:54 +0000 (01:55 +0000)]
MFC r367440: epoch: support non-preemptible epochs checking in_epoch()
Previously, non-preemptible epochs could not check; in_epoch() would always
fail, usually because non-preemptible epochs don't imply THREAD_NO_SLEEPING.
For default epochs, it's easy enough to verify that we're in the given
epoch: if we're in a critical section and our record for the given epoch
is active, then we're in it.
This patch also adds some additional INVARIANTS bookkeeping. Notably, we set
and check the recorded thread in epoch_enter/epoch_exit to try and catch
some edge-cases for the caller. It also checks upon freeing that none of the
records had a thread in the epoch, which may make it a little easier to
diagnose some improper use if epoch_free() took place while some other
thread was inside.
This version differs slightly from what was just previously reviewed by the
below-listed, in that in_epoch() will assert that no CPU has this thread
recorded even if it *is* currently in a critical section. This is intended
to catch cases where the caller might have somehow messed up critical
section nesting, we can catch both if they exited the critical section or if
they exited, migrated, then re-entered (on the wrong CPU).
- Removed a bunch of redundant headers
- Don't explicitly initialize to 0
- The !error check prior to setting imgp->interpreter_name is redundant, all
error paths should and do return or go to 'done'. We have larger problems
otherwise.
r367439:
imgact_binmisc: minor re-organization of imgact_binmisc_exec exits
Notably, streamline error paths through the existing 'done' label, making it
easier to quickly verify correct cleanup.
Future work might add a kernel-only flag to indicate that a interpreter uses
#a. Currently, all executions via imgact_binmisc pay the penalty of
constructing sname/fname, even if they will not use it. qemu-user-static
doesn't need it, the stock rc script for qemu-user-static certainly doesn't
use it, and I suspect these are the vast majority of (if not the only)
current users.
r367441:
binmiscctl(8): miscellaneous cleanup
- Bad whitespace in Makefile.
- Reordered headers, sys/ first.
- Annotated fatal/usage __dead2 to help `make analyze` out a little bit.
- Spell a couple of sizeof constructs as "nitems" and "howmany" instead.
r367442:
imgact_binmisc: validate flags coming from userland
We may want to reserve bits in the future for kernel-only use, so start
rejecting any that aren't the two that we're currently expecting from
userland.
r367444:
imgact_binmisc: abstract away the list lock (NFC)
This module handles relatively few execs (initial qemu-user-static, then
qemu-user-static handles exec'ing itself for binaries it's already running),
but all execs pay the price of at least taking the relatively expensive
sx/slock to check for a match when this module is loaded. Future work will
almost certainly swap this out for another lock, perhaps an rmslock.
The RLOCK/WLOCK phrasing was chosen based on what the callers are really
wanting, rather than using the verbiage typically appropriate for an sx.
r367452:
imgact_binmisc: reorder members of struct imgact_binmisc_entry (NFC)
This doesn't change anything at the moment since the out-of-order elements
were a pair of uint32_t, but future additions may have caused unnecessary
padding by following the existing precedent.
r367456:
imgact_binmisc: move some calculations out of the exec path
The offset we need to account for in the interpreter string comes in two
variants:
1. Fixed - macros other than #a that will not vary from invocation to
invocation
2. Variable - #a, which is substitued with the argv0 that we're replacing
Note that we don't have a mechanism to modify an existing entry. By
recording both of these offset requirements when the interpreter is added,
we can avoid some unnecessary calculations in the exec path.
Most importantly, we can know up-front whether we need to grab
calculate/grab the the filename for this interpreter. We also get to avoid
walking the string a first time looking for macros. For most invocations,
it's a swift exit as they won't have any, but there's no point entering a
loop and searching for the macro indicator if we already know there will not
be one.
While we're here, go ahead and only calculate the argv0 name length once per
invocation. While it's unlikely that we'll have more than one #a, there's no
reason to recalculate it every time we encounter an #a when it will not
change.
I have not bothered trying to benchmark this at all, because it's arguably a
minor and straightforward/obvious improvement.
r367477:
imgact_binmisc: limit the extent of match on incoming entries
imgact_binmisc matches magic/mask from imgp->image_header, which is only a
single page in size mapped from the first page of an image. One can specify
an interpreter that matches on, e.g., --offset 4096 --size 256 to read up to
256 bytes past the mapped first page.
The limitation is that we cannot specify a magic string that exceeds a
single page, and we can't allow offset + size to exceed a single page
either. A static assert has been added in case someone finds it useful to
try and expand the size, but it does seem a little unlikely.
While this looks kind of exploitable at a sideways squinty-glance, there are
a couple of mitigating factors:
1.) imgact_binmisc is not enabled by default,
2.) entries may only be added by the superuser,
3.) trying to exploit this information to read what's mapped past the end
would be worse than a root canal or some other relatably painful
experience, and
4.) there's no way one could pull this off without it being completely
obvious.
The first page is mapped out of an sf_buf, the implementation of which (or
lack thereof) depends on your platform.
Bjoern A. Zeeb [Thu, 12 Nov 2020 17:26:19 +0000 (17:26 +0000)]
MFC r367538:
arm64: bs_sr_<N> take II
In r367327 generic_bs_sr_<n> were derived from mips. Given we are calling
generic_bs_w_<n> and no write directly, we do not have to do the address
calculations ourselves as eneric_bs_w_<n> will do a str val [bsh, offset].
All we actually have to do is increment offset.
Dimitry Andric [Wed, 11 Nov 2020 22:18:24 +0000 (22:18 +0000)]
MFC r367485:
Merge commit 354d3106c from llvm git (by Kai Luo):
[PowerPC] Skip combining (uint_to_fp x) if x is not simple type
Current powerpc64le backend hits
```
Combining: t7: f64 = uint_to_fp t6
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291:
llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() &&
"Expected a SimpleValueType!"' failed.
```
This patch fixes it by skipping combination if `t6` is not simple
type.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47660.
Fix premature decision in the presence of type-dependent expression
operands on whether AltiVec vector initializations from single
expressions are "splat" operations.
Verify that the instantiation is able to determine the correct cast
semantics for both the scalar type and the vector type case.
Note that, because the change only affects the single-expression case
(and the target type is an AltiVec-style vector type), the
replacement of a parenthesized list with a parenthesized expression
does not change the semantics of the program in a program-observable
manner.
This should fix 'Assertion failed: (isScalarType()), function
getScalarTypeKind, file /usr/src/contrib/llvm-project/clang/lib/AST
/Type.cpp, line 2146', when building the graphics/opencv-core port for
powerpc64le.
The way netmap TX is handled in iflib when TX interrupts are not
used (IFC_NETMAP_TX_IRQ not set) has some issues:
- The netmap_tx_irq() function gets called by iflib_timer(), which
gets scheduled with tick granularity (hz). This is not frequent
enough for 10Gbps NICs and beyond (e.g., ixgbe or ixl). The end
result is that the transmitting netmap application is not woken
up fast enough to saturate the link with small packets.
- The iflib_timer() functions also calls isc_txd_credits_update()
to ask for more TX completion updates. However, this violates
the netmap requirement that only txsync can access the TX queue
for datapath operations. Only netmap_tx_irq() may be called out
of the txsync context.
This change introduces per-tx-queue netmap timers, using microsecond
granularity to ensure that netmap_tx_irq() can be called often enough
to allow for maximum packet rate. The timer routine simply calls
netmap_tx_irq() to wake up the netmap application. The latter will
wake up and call txsync to collect TX completion updates.
This change brings back line rate speed with small packets for ixgbe.
For the time being, timer expiration is hardcoded to 90 microseconds,
in order to avoid introducing a new sysctl.
We may eventually implement an adaptive expiration period or use another
deferred work mechanism in place of timers.
Also, fix the timers usage to make sure that each queue is serviced
by a different CPU.
Brooks Davis [Tue, 10 Nov 2020 18:07:13 +0000 (18:07 +0000)]
MFC r367302:
sysvshm: pass relevant uap members as arguments
Alter shmget_allocate_segment and shmget_existing to take the values
they want from struct shmget_args rather than passing the struct
around. In general, uap structures should only be the interface to
sys_<foo> functions.
This makes one small functional change and records the allocated space
rather than the requested space. If this turns out to be a problem
(e.g. if software tries to find undersized segments by exact size
rather than using keys), we can correct that easily.