rscheff [Fri, 9 Oct 2020 14:33:09 +0000 (14:33 +0000)]
Add DSCP support for network QoS to iscsi initiator.
Allow the DSCP codepoint also to be configurable
for the traffic in the direction from the initiator
to the target, such that writes and any requests
are also treated in the appropriate QoS class.
gbe [Fri, 9 Oct 2020 14:03:45 +0000 (14:03 +0000)]
Fix a few mandoc issues
- no blank before trailing delimiter
- whitespace at end of input line
- sections out of conventional order
- normalizing date format
- AUTHORS section without An macro
rscheff [Fri, 9 Oct 2020 12:44:56 +0000 (12:44 +0000)]
Stop sending tiny new data segments during SACK recovery
Consider the currently in-use TCP options when
calculating the amount of new data to be injected during
SACK loss recovery. That addresses the effect that very small
(new) segments could be injected on partial ACKs while
still performing a SACK loss recovery.
rscheff [Fri, 9 Oct 2020 12:06:43 +0000 (12:06 +0000)]
Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow.
This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt)
option IP(V6)_VLAN_PCP, which can be set to -1 (interface
default), or explicitly to any priority between 0 and 7.
Note that for untagged traffic, explicitly adding a
priority will insert a special 801.1Q vlan header with
vlan ID = 0 to carry the priority setting
Fix EINVAL message when CPU binding information is requested for IRQ.
`cpuset -g -x N` along with requested information always prints
message `cpuset: getdomain: Invalid argument'. The EINVAL is returned
from kern_cpuset_getdomain(), since it doesn't expect CPU_LEVEL_WHICH
and CPU_WHICH_IRQ parameters.
To fix the error, do not call cpuset_getdomain() when `-x' is specified.
imp [Fri, 9 Oct 2020 01:48:14 +0000 (01:48 +0000)]
Create in-tree LINT files
Now that config(8) has supported include for 19 years, transition to
including the NOTES files. include support didn't exist at the time,
nor did the envvar stuff recently added. Now that it does, eliminate
the building of LINT files by just including everything you need.
Note: This may cause conflicts with updating in some cases.
find sys -name LINT\* -rm
is suggested across this commit to remove the generated LINT
files.
rmacklem [Fri, 9 Oct 2020 01:04:28 +0000 (01:04 +0000)]
Make vn_generic_copy_file_range() interruptible via a signal.
Without this patch, when vn_generic_copy_file_range() is
doing a large copy, it will remain in the function for a
considerable amount of time, delaying handling of any
outstanding signals until the copy completes.
This patch adds checks for signals that need to be
processed after each successful data copy cycle.
When sig_intr() returns non-zero, vn_generic_copy_file_range()
will return.
The check "if (len < savlen)" ensures that some data
has been copied, so that progress will be made.
Note that, since copy_file_range(2) is allowed to
return fewer bytes copied than requested, it
will never return EINTR/ERESTART when sig_intr()
returns non-zero.
imp [Fri, 9 Oct 2020 00:27:45 +0000 (00:27 +0000)]
Stop ignoring makeLINT generated files
We're going to check these files in shortly since we don't need to
generate them anymore. Generated files cause issues for different work
flows anyway.
imp [Fri, 9 Oct 2020 00:16:26 +0000 (00:16 +0000)]
Initial support for implementing the bootXXX.efi workaround
Too many version of UEFI firmware (so far only confirmed on amd64)
don't really support efibootmgr selection of boot. That's the most
reliable, when it works, since there's no guesswork. However, many do
not save, unmolested, the variables that efibootmgr sets, so as a
fallback we also install loader.efi as bootXXX.efi (where XXX is
either aa64 or x64) if it doesn't already exist in /efi/boot on the
ESP. The standard only defines this for removable devices, but it's
almost ubiquitously used as a fallback. Many BIOSes implement a drive
selection feature that takes over the efibootmgr protocol, rendinering
it useless (either generally, or for those vendors not on the short
list). bootxxx.efi works around this. However, we don't install it
unconditionally there, as that breaks some popular multi-boot setups.
kib [Thu, 8 Oct 2020 22:46:15 +0000 (22:46 +0000)]
vm_page_dump_index_to_pa(): Add braces to the expression involving + and &.
The precedence of the '&' operator is less than of '+'. Added braces
do change the order of evaluation into the natural one, in my opinion.
On the other hand, the value of the expression should not change since
all elements should have page-aligned values.
This fixes a gcc warning reported.
Reported by: adrian
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Thu, 8 Oct 2020 22:41:02 +0000 (22:41 +0000)]
Do not leak B_BARRIER.
Normally when a buffer with B_BARRIER is written, the flag is cleared
by g_vfs_strategy() when creating bio. But in some cases FFS buffer
might not reach g_vfs_strategy(), for instance when copy-on-write
reports an error like ENOSPC. In this case buffer is returned to
dirty queue and might be written later by other means. Among then
bdwrite() reasonably asserts that B_BARRIER is not set.
In fact, the only current use of B_BARRIER is for lazy inode block
initialization, where write of the new inode block is fenced against
cylinder group write to mark inode as used. The situation could be
seen that we break dependency by updating cg without written out
inode. Practically since CoW was not able to find space for a copy of
inode block, for the same reason cg group block write should fail.
Reported by: pho
Discussed with: chs, imp, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26511
kib [Thu, 8 Oct 2020 22:34:34 +0000 (22:34 +0000)]
sig_intr(9): return early if AST is not scheduled.
Check td_flags for relevant AST requests lock-less. This opens the
race slightly wider where sig_intr() returns false negative, but might
be it is worth it.
Requested by: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Thu, 8 Oct 2020 22:31:11 +0000 (22:31 +0000)]
Do not allow to use O_BENEATH as an oracle.
Specifically, if lookup() returned any error and the topping directory
was not latched, which means that (non-existent) path did not returned
to the topping location, give ENOTCAPABLE a priority over the lookup()
error.
PR: 249960
Reviewed by: emaste, ngie
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26695
imp [Thu, 8 Oct 2020 20:56:06 +0000 (20:56 +0000)]
Remove APM BIOS support
APM BIOS was relevant only to early laptops (approximately P166 or
P200 and slower). These have not been relevant for a long time, and
this code has been untested for a long time (as far as I can
tell). The APM compat code in ACPI and the apm(8) command is not being
retired. Both of these items are still in use (apm(8) is more
scriptable than the replacement acpiconf, for the most part). This has
been commented out of i386 GENERIC since 2002. This code is not
relevant to any other port.
mhorne [Thu, 8 Oct 2020 18:29:17 +0000 (18:29 +0000)]
Fix a loop condition
The correct way to identify the end of the metadata is two adjacent
entries set to zero/MODINFO_END. I made a typo and this was checking the
first entry twice.
Reported by: rpokala
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
mhorne [Thu, 8 Oct 2020 18:02:05 +0000 (18:02 +0000)]
Add a routine to dump boot metadata
The boot metadata (also referred to as modinfo, or preload metadata)
provides information about the size and location of the kernel,
pre-loaded modules, and other metadata (e.g. the EFI framebuffer) to be
consumed during by the kernel during early boot. It is encoded as a
series of type-length-value entries and is usually constructed by
loader(8) and passed to the kernel. It is also faked on some
architectures when booted by other means.
Although much of the module information is available via kldstat(8),
there is no easy way to debug the metadata in its entirety. Add some
routines to parse this data and allow it to be printed to the console
during early boot or output via a sysctl.
Since the output can be lengthly, printing to the console is gated
behind the debug.dump_modinfo_at_boot kenv variable as well as the
BOOTVERBOSE flag. The sysctl to print the metadata is named
debug.dump_modinfo.
Reviewed by: tsoome
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26687
kaktus [Thu, 8 Oct 2020 11:45:10 +0000 (11:45 +0000)]
[pf] /etc/rc.d/pf should REQUIRE routing
When a system with pf_enable="YES" in /etc/rc.conf uses hostnames in
/etc/pf.conf, these hostnames cannot be resolved via external nameservers
because the default route is not yet set. This results in an empty
(all open) ruleset.
Since r195026 already put netif back to REQUIRE, this change does not affect
the issue that the firewall should rather have been setup before any
network traffic can occur.
PR: 211928
Submitted by: Robert Schulze
Reported by: Robert Schulze
Tested by: Mateusz Kwiatkowski
No objections from: kp
MFC after: 3 days
cxgbe(4): knobs to drop various kinds of undesirable frames on ingress.
These kind of drops come for free in the sense that they do not use the
filter TCAM or any other resource that wouldn't normally be used during
rx. Frames dropped by the hardware get counted in the MAC's rx stats
but are not delivered to the driver.
hw.cxgbe.attack_filter
Set to 1 to enable the "attack filter". Default is 0. The attack
filter will drop an incoming frame if any of these conditions is true:
src ip/ip6 == dst ip/ip6; tcp and src/dst ip is not unicast; src/dst ip
is loopback (127.x.y.z); src ip6 is not unicast; src/dst ip6 is loopback
(::1/128) or unspecified (::/128); tcp and src/dst ip6 is mcast
(ff00::/8).
hw.cxgbe.drop_ip_fragments
Set to 1 to drop all incoming IP fragments. Default is 0. Note that
this drops valid frames.
hw.cxgbe.drop_pkts_with_l2_errors
Set to 1 to drop incoming frames with Layer 2 length or checksum errors.
Default is 1.
hw.cxgbe.drop_pkts_with_l3_errors
Set to 1 to drop incoming frames with IP version, length, or checksum
errors. Default is 0.
hw.cxgbe.drop_pkts_with_l4_errors
Set to 1 to drop incoming frames with Layer 4 length, checksum, or other
errors. Default is 0.
mhorne [Wed, 7 Oct 2020 23:14:49 +0000 (23:14 +0000)]
Handle kmod local relocation failures gracefully
It is possible for elf_reloc_local() to fail in the unlikely case of
an unsupported relocation type. If this occurs, do not continue to
process the file.
80211: ifconfig replace MS() with _IEEE80211_MASKSHIFT()
As we did in the kernel in r366112 replace the MS() macro with the version(s)
added to the kernel: _IEEE80211_MASKSHIFT(). Also provide its counter part.
This will later allow use to use other macros defined in net80211 headers
here in ifconfig.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
This code was iteratively implemented during the work on various WiFi
drivers -- from individual functions to a macro-created implementations
for the various bit sized needed (and then extended to more for
comepleteness). Some of the bit combinations do not seem to make sense
so are left out.
The __bf_shf(x) was obtained from D26681 [1].
Requested by: manu [1]
Reviewed by: hselasky, manu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26708
The method was optional prior to r365938, which made it mandatory but did add
any test that an implementation provides the method nor implement it for
bhyveload. The code path might not be hit unless the user's loader was
configured to write to a file on disk, such as with nextboot(8).
mhorne [Wed, 7 Oct 2020 18:48:10 +0000 (18:48 +0000)]
Print symbol index for unsupported relocation types
It is unlikely, but possible, that an unrecognized or unsupported
relocation type is encountered while trying to load a kernel module. If
this occurs we should offer the symbol index as a hint to the user.
While here, fix some small style issues.
Reviewed by: markj, kib (amd64 part, in D26701)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
hselasky [Wed, 7 Oct 2020 17:46:49 +0000 (17:46 +0000)]
Properly cleanup driver during remove_one() in mlx5core.
Cleanup all host resources, SYSCTLs, MSIX vectors and memory used
by the host and only leave the device allocated memory behind, if any,
because it may still be in use, when the PCI remove function is called.
Else future probe calls may fail due to SYSCTLs already existing.
imp [Wed, 7 Oct 2020 06:16:37 +0000 (06:16 +0000)]
Move kernel env global variables, etc to sys/kenv.h
The kernel globals for kenv are confined to 2 files that need them and
a few that likely shouldn't (but as written the code does). Move them
from sys/systm.h to sys/kenv.h. This removed a XXX from systm.h and
cleans it up a little bit...
imp [Wed, 7 Oct 2020 05:44:35 +0000 (05:44 +0000)]
cam: Add quirk for Samsung MZ7* behind a SATA-to-SAS interposer
Sometimes, this drive will be present in the system such that the the
firmware identification string doesn't start with ATA, such as when
it's behind a SATA-to-SAS interposer. Add another quirk for that.
bridge: call member interface ioctl() without NET_EPOCH
We're not allowed to hold NET_EPOCH while sleeping, so when we call ioctl()
handlers for member interfaces we cannot be in NET_EPOCH. We still need some
protection of our CK_LISTs, so hold BRIDGE_LOCK instead.
That requires changing BRIDGE_LOCK into a sleepable lock, and separating the
BRIDGE_RT_LOCK, to protect bridge_rtnode lists. That lock is taken in the data
path (while in NET_EPOCH), so it cannot be a sleepable lock.
jhb [Tue, 6 Oct 2020 17:58:56 +0000 (17:58 +0000)]
Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway. This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).
In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.
jrtc27 [Tue, 6 Oct 2020 13:02:20 +0000 (13:02 +0000)]
riscv: Handle supervisor instruction page faults
We should never take instruction page faults when in the kernel, but by
using the standard page fault code we should get a more-informative
message about faulting on a NOFAULT page rather than branching to the
default case here and printing an "Unknown kernel exception ..."
message.
lwhsu [Tue, 6 Oct 2020 04:18:42 +0000 (04:18 +0000)]
Clear the dmesg buffer to prevent rotating causes issues
This is a workaround for the current continuously failing test case
sys.kern.sonewconn_overflow.sonewconn_overflow_01
The side effect is the dmesg buffer got cleared and may effect other tests
depends on dmesg output running in parallel. The better solution would be
tailing the log file like /var/log/debug.log
kevans [Mon, 5 Oct 2020 20:57:44 +0000 (20:57 +0000)]
crunchgen: fix MK_AUTO_OBJ logic after r364166
r364166 converted echo -n `/bin/pwd` to a raw pwd invocation, leaving a
trailing newline at the end of path. This caused a later stat() of it to
erroneously fail and the fallback to MK_AUTO_OBJ=no logic proceeded as
unexpected.
Harry Schmalzbauer bissected the resulting build failure he experienced
(stable/12 host, -HEAD build) down to r365887. This change is mostly
unrelated, except it switches the build to bootstrapped crunchgen - clue!
I then bissected recent crunchgen changes going back a bit since we wouldn't
observe the failure immediately with -CURRENT in most configurations, which
landed me on r364166. After many intense head-scratching minutes and printf
debugging, I realized that the newline was the difference. This is where our
tale ends.
Reported by: Harry Schmalzbauer, O. Hartmann, Mike Tancsa, kevans
MFC after: 3 days
freqlabs [Mon, 5 Oct 2020 20:13:22 +0000 (20:13 +0000)]
Enable iterating all sysctls, even ones with CTLFLAG_SKIP
Add an "nextnoskip" sysctl that allows for listing of sysctls intended to be
normally skipped for cost reasons.
This makes it so the names/descriptions of those sysctls can be discovered with
sysctl -aN/sysctl -ad/sysctl -at.
It also makes it so children are visited when a node flagged with CTLFLAG_SKIP
is explicitly requested.
The intended use case is to mark the root "kstat" node with CTLFLAG_SKIP so that
the extensive and expensive stats are skipped by default but may still be easily
obtained without having to know them all (which may not even be possible) and
request each one-by-one.
mjg [Mon, 5 Oct 2020 19:38:51 +0000 (19:38 +0000)]
cache: fix pwd use-after-free in setting up fallback
Since the code exits smr section prior to calling pwd_hold, the used
pwd can be freed and a new one allocated with the same address, making
the comparison erroneously true.
fernape [Mon, 5 Oct 2020 14:07:32 +0000 (14:07 +0000)]
procstat(1): Add EXAMPLES section
* Add some examples showing binary, arguments and file info from living
processes.
* Show information from core dumps including an attempt using an old core file.
* While here, fix warning 'no blank before trailing delimiter' reported by igor.
fernape [Mon, 5 Oct 2020 13:39:37 +0000 (13:39 +0000)]
df(1): Add EXAMPLES section to man page
* Add EXAMPLES section with four simple examples.
* Simplify -H flag description. This makes easy to see the difference between
this flag and -h
* While here, fix .Tn deprecated macro.