testing: add ability to specify multi-vnet topologies in the pytest framework.
Notable amount of tests related to the packet IO require two VNET jails
for proper testing and avoiding side effects for the host system.
Additionally, it is often required to run actions in the jails seme-sequentially
- waiting for the listener initialisation can be an example of such
dependency.
This change extends pytest vnet framework to allow defining multi-vnet
multi-epair topologies in declarative style, without any need to bother
about jail or repair names. All jail creation/teardown, interface
creation/teardown and address assignments are handled automatically.
The definitions above will create 2 vnets ("jail_test_output6_base",
"jail_test_output6_base_2"), 3 epairs, attached to both first and
second jails, set up the IP addresses for each epair, spawn another
process for vnet2_handler and pass control to vnet2_handler and
test_output6_base. Both processes can pass objects between each
other using pre-created pipes.
VNET_FOREACH() is a LIST_FOREACH if VIMAGE is set, but empty if it's
not. This means that users of the macro couldn't use 'continue' or
'break' as one would expect of a loop.
Change VNET_FOREACH() to be a loop in all cases (although one that is
fixed to one iteration if VIMAGE is not set).
This allows for easy copy-and-paste of a unix(4) peer name for lookup.
With current implementation it is guaranteed that a peer listed could be
found in the output.
sockstat(1): print out full connection graph for unix(4) sockets
Kernel provides us with enough information to display all possible
connections between UNIX sockets.
o Store unp_conn, xu_firstref and xu_nextref in the faddr of a UNIX sock.
o Build tree of file descriptors, indexed by the socket pointer.
o In displaysock() print out all possible information:
1) if socket is bound, print name of this socket
2) if socket has connected to a peer with a name, print peers name
3) if socket has connected to a peer without a name, print [pid fd]
4) if a bound socket has received connections, print list of them
as [pid fd]
Previously, only 1) either 2) were printed.
o Use tree to lookup by socket kvaddr. The size of hash is too big for a
small virtual machine and at the same time too little for a large
production server. A tree would better fit here.
o For those pcbs, that don't have a socket associated, use a list.
o Provide a second tree to lookup by pcb kvaddr. These removes full hash
traversal when printing every unix(4) socket.
tcp: remove a condition in tcp_usr_detach() that never happens
The comment from Robert Watson doubts that this condition ever happens.
Our analysis confirm that. Also, we found that if you manage to create
such a connection with help of some other bug, then after the "second
case" code is executed, the kernel will panic in other part of the stack.
Bug fix to UFS/FFS superblock integrity checks when reading a superblock.
Older versions of growfs(8) failed to correctly update fs_dsize.
Filesystems that have been grown fail the test for fs_dsize's correct
value. For now we exclude the fs_dsize test from the requirements.
Reported by: Edward Tomasz Napiera
Tested by: Edward Tomasz Napiera
Tested by: Peter Holm
MFC after: 1 month (with 076002f24d35)
Differential Revision: https://reviews.freebsd.org/D35219
Bug fix to UFS/FFS superblock integrity checks when reading a superblock.
The original check verified that if an alternate superblock has not
been selected that the superblock is located in its standard location.
For UFS1 the with a 65536 block size, the first backup superblock
is at the same location as the UFS2 superblock. Since SBLOCK_UFS2
is the first location checked, the first backup is the superblock
that will be used for a UFS1 filesystems with a 65536 block size.
This patch allows the use of the first backup superblock in that
situation.
Reported by: Peter Holm
Tested by: Peter Holm
MFC after: 1 month (with 076002f24d35)
Differential Revision: https://reviews.freebsd.org/D35219
Bug fix to UFS/FFS superblock integrity checks when reading a superblock.
The tests for number of cylinder groups (fs_ncg), inodes per cylinder
group (fs_ipg), and the size and layout of the cylinder group summary
information (fs_csaddr and fs_cssize) were overly restrictive and
would exclude some valid filesystems. These updates avoid precluding
valid fiesystems while still detecting rogue values that can crash or
hang the kernel.
testing: provide meaningful error when pytest is not available
atf format does not provide any way of signalling any error message
back to the atf runner when listing tests. Work this around by
reporting "__test_cases_list_pytest_binary_not_found__" test instead.
These are all "standard microarchitectural events", which in theory are
supported by every ARMv8 processor. In practice, it depends on the
pmu-event definitions being complete and accurate, which they are not
for every processor. Still, these aliases should be functional on the
majority of systems.
PR: 254532
Reported by: emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35550
Thanks to the recently updated import of the jevents utility by mav@, we
can now compile the latest version of these event definitions. This
should support a wider set of common ARMv8 processors, for example, the
Cortex-A72 in the Raspberry Pi 4.
This brings this folder in sync with Linux commit 62e6eb8d5454.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35549
Michal Krawczyk [Tue, 5 Jul 2022 10:59:25 +0000 (12:59 +0200)]
ena: Fix LLQ descriptor reconfiguration
After the device reset, the LLQ configuration descriptor wasn't passed
to the hardware. On a 6-generation AWS instances (like C6gn), it is
required to pass the LLQ descriptor after the device reset, otherwise
the hardware will be missing the LLQ configuration resulting in
performance degradation.
This patch reconfigures the LLQ each time the ena_device_init() is
called. This means that the LLQ descriptor will be passed during the
initial configuration and after a reset.
The ena_map_llq_mem_bar() function call was moved before the
ena_device_init() call, to make sure that the mem bar is available.
Michal Krawczyk [Mon, 4 Jul 2022 07:03:54 +0000 (09:03 +0200)]
ena: Align req_id and qid print order
In most places, the req_id is printed first, and the qid is printed as a
second. To align the driver, one printout was reworked and the print
order of those variables was changed.
Brooks Davis [Wed, 6 Jul 2022 13:03:48 +0000 (14:03 +0100)]
cddl/*: add a WITH(OUT)_DTRACE option
Add an option to enable/disable DTrace without disabling ZFS. New
architectures such as CHERI may support ZFS before they support DTrace
and the old model of WITHOUT_CDDL disabling both wasn't helpful.
For compatiblity, the CDDL option remains and WITHOUT_CDDL implies
WITHOUT_DTRACE. WITHOUT_DTRACE also implies WITHOUT_CTF.
As part of this change, largely convert cddl/*/Makefile to using the
more compact SUBDIR.${MK_<FOO>}+= form rather than using intermediate
variables.
Mike Karels [Sat, 2 Jul 2022 16:03:36 +0000 (11:03 -0500)]
netstat -i: do not truncate interface names
The field for interface names for netstat -i was 5 characters by
default, which is no longer sufficient with names like "vlan1234"
and "vtnet0". netstat -iW computed the necessary field width, but
also enlarged the address field by a lot (especially with IPv6 enabled).
Make netstat -i compute the field width for interface names with or
without -W. Note that the existing default output does not fit in
80 columns in any case. Update the man page accordingly, documenting
the remaining effect of -W with -i. Also add -W to the list of
General Options, as there are numerous pointers to this.
Reported by: Chris Ross
Reviewed by: melifaro, rgrimes, cy
Differential Revision: https://reviews.freebsd.org/D35703
MFC after: 1 week
Split the current FDT-only implementation up into an FDT and an
ACPI part reusing and sharing as much code as possible (thanks mw!).
This makes the Synopsis XHCI root hubs attach correctly on SolidRun's
HoenyComb instead of just the generic XHCI root and this means we
are also doing proper chip setup and applying the quirk needed there [1].
There is one problem with ACPI attachment in that it uses the generic
XHCI PNP ID. So we need to do extra checks in order to not claim
all xhci, which means we check for a known quirk to be present
in acpi_probe. Long term this isn't scaling and this was discussed
in SolidRun's Discord Channel in 2021 with the intend that "jnettlet"
will take this to a steering committee. Since then ACPI has kind-of
become a technology non grata (due to not getting changes into Linux
timely) so it is unclear if this will ever happen. If there will be
further hardware with dwc3/ACPI we should go and make sure this problem
gets solved.
If we receive a UDP packet (directed towards an active OpenVPN socket)
which is too short to contain an OpenVPN header ('struct
ovpn_wire_header') we wound up making m_copydata() read outside the
mbuf, and panicking the machine.
Explicitly check that the packet is long enough to copy the data we're
interested in. If it's not we will pass the packet to userspace, just
like we'd do for an unknown peer.
If dummynet_task() is run on a vnet where dummynet is still initialising
(i.e. still running ip_dn_vnet_init()) we can attempt to use an
uninitialised mutex.
We can use the existing init_done field to check if the per-vnet
V_dn_cfg is fully set up, if we ensure that it's only set to 1 when
we've done all of the init work.
DB_COMMAND(9): update to mention additional macros
Document the existing alias definitions, and augment the example with
one of these. Also, describe the purpose of the newly added _FLAGS
variations of these command definitions.
Make some small style improvements to appease mandoc -Tlint.
Reviewed by: markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35664
Mitchell Horne [Thu, 30 Jun 2022 15:58:22 +0000 (12:58 -0300)]
ddb: add _FLAGS command variants
Provide _FLAGS variants of the various command definition macros, in
anticipation of adding a new flag. This can also be used for some
existing commands which require special flag values.
Reviewed by: markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35581
Provide separate helper macros for regular commands and next-level table
commands as they are mutually exclusive. This ensures proper
initialization of each element and allows us to exclude some redundant
fields, such as specifying .more = NULL for every regular command.
Reviewed by: markj, jhb
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35580
netinet6: perform out-of-bounds check for loX multicast statistics
Currently, some per-mbuf multicast statistics is stored in
the per-interface ip6stat.ip6s_m2m[] array of size 32 (IP6S_M2MMAX).
Check that loopback ifindex falls within 0.. IP6S_M2MMAX-1 range to
avoid silent data corruption. The latter cat happen with large
number of VNETs.
arp(8): use getifaddrs(3) instead of ioctl(SIOCGIFCONF)
The original code had used a fixed-size buffer for ioctl(SIOCGIFCONF),
that might cause the target ifreq spilled from the buffer. Use the handy
getifaddrs(3) to fix the problem.
During the review of 09cdf4878c621be4cd229fa88cdccdcdc8c101f7 we
switched from cached registers to reading them as needed.
One read of the two reads was moved after the softreset got triggered
and as a result returned 0 rather than the proper register value.
Moving the read before the softreset gets initiated seems to make
things work again and xhci.c no longer complains about
"Controller does not support 4K page size.".
sockets: use only soref()/sorele() as socket reference count
o Retire SS_FDREF as it is basically a debug flag on top of already
existing soref()/sorele().
o Convert SS_PROTOREF into soref()/sorele().
o Change reference model for the listen queues, see below.
o Make sofree() private. The correct KPI to use is only sorele().
o Make soabort() respect the model and sorele() instead of sofree().
Note on listening queues. Until now the sockets on a queue had zero
reference count. And the reference were given only upon accept(2). The
assumption was that there is no way to see the queued socket from anywhere
except its head. This is not true, since queued sockets already have pcbs,
which are linked at least into the global pcb lists. With this change we
put the reference right in the sonewconn() and on accept(2) path we just
hand the existing reference to the file descriptor.
sockets: use positive flag for file descriptor socket reference
Rename SS_NOFDREF to SS_FDREF and flip all bitwise operations.
Mark sockets created by socreate() with SS_FDREF.
This change is mostly illustrative. With it we see that SS_FDREF
is a debugging flag, since:
* socreate() takes a reference with soref().
* on accept path solisten_dequeue() takes a reference
with soref() and then soaccept() sets SS_FDREF.
* soclose() checks SS_FDREF, removes it and does sorele().
pca954x: harmonize pca9547 and pca954x and add pca9540 support
The two implementations for the pca9548 switch and the pca9547 mux
seemed close enough so we can put them together and with a bit more
abstraction add pca9540 support.
While here apply a bit of consistency in variable and driver naming and
use device_has_property instead of the FDT-only OF_ variant.
This disconnects pca9547 from the build but does not yet delete it.
Doug Moore [Mon, 4 Jul 2022 17:28:35 +0000 (12:28 -0500)]
rb_tree: fine-tune rebalancing code
Change parts of RB_INSERT_COLOR and RB_REMOVE_COLOR to reduce the
number of operations, by, in some cases, flipping two color bits at a
time instead of flipping each individually, in separate operations,
and by using a switch statement to replace a sequence of if-elses.
Rewrite RB_SET_PARENT to generate fewer instructions. These changes
reduce the code size by over 100 bytes on some architectures.
Also, allow RB users to define a preprocessor symbol to generate
RB_REMOVE_COLOR code that matches the implementation described in the
original weak-AVL paper in one particular case, instead of the still
correct, but slightly more efficient implementation of that case
currently implemented.
Andrew Gallatin [Mon, 4 Jul 2022 16:40:35 +0000 (12:40 -0400)]
pmcstat: fix log analysis
pmcstat has been broken for analyzing logs since D35342 / b6e28991bf3aadb.
This is because the pmc for the first CPU is not added when reading logs
because unlike its clones, its event id is not invalid. That causes us
to fail the assertion at lib/libpmcstat/libpmcstat_logging.c:293
when encountering samples from cpu0.
Fix this by removing the check that the PMC is invalid
When accessing a register directly from etherswitchcfg one must specify
a register group(e.g. registers of portN) and the register offset within
the group. The latter is passed as the 5 least significant bits.
Extract the former by dividing the register address by 32, not by 5.
lockstat: Fix construction of comparision predicates
Passing "0x%p" to sprintf results in double "0x" being printed.
This causes a dtrace script compilation failure when "-d" flag
is specified.
Fix that by removing the extraneous "0x".
Mike Karels [Sun, 3 Jul 2022 23:04:41 +0000 (18:04 -0500)]
mountd startup: enable NFSv4 if needed on restart
The mountd script in rc.d sets vfs.nfsd.server_max_nfsvers correctly
when it is run at system startup, relying on the kernel default.
However, if NFSv4 was enabled in /etc/rc.conf later, and the script
was re-run to restart mountd, the sysctl was still set to 3.
Set the sysctl to the right value in all cases.
Some systems put sensors on non-0 lun, so we should not omit it. This
was the only difference with the Linux driver, where DIMM sensors could
be queried, but not on FreeBSD.
See this report[1] on the FreeBSD forums:
https://forums.freebsd.org/threads/freebsd-cannot-get-dimm-temperature-sensor-value.85166/
Handle IPMB requests using SEND_MSG (sent as driver request as we do not
need to return anything back to userland for this) and GET_MSG (sent as
usual request so we can return the data for RECEIVE_MSG ioctl) pair.
This fixes fetching complete sensor data from boards (e.g. HP ProLiant
DL380 Gen10).
Reviewed by: philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35605
Rather than hiding behind #if 0, hide the debugging behind DWC3_DEBUG
so it can be turned on with a single define. Require bootverbose
to print anything so we can still avoid spamming the console if DWC3_DEBUG
is on.
Harmonize the format string in snsp_dwc3_dump_regs() to always print the
full register and also print the XHCI quirks.
Call snsp_dwc3_dump_regs() twice, before and after generic XHCI attachment
and initialisation as this may have an effect on the confirgumation state.
Obtained from: an old debug patch
MFC after: 2 weeks
Reviewed by: mw
Differential Revision: https://reviews.freebsd.org/D35700
Rather than just printing the Global SNPS ID Register store it as well
so we can do a version check later.
In addition, for debugging purposes, read the Global Hardware Parameters
Registers and print them.
Based on the snpsid disable an XHCI feature using a quirk prepared
in 447c418da03454a2a00bc115a69c62055a6d5272.
Add the "snps,dis_u3_susphy_quirk" quirk and handle Suspend USB3.0 SS PHY
after power-on-reset/during core initialization (suggested to be cleared)
based on the DWC3_GHWPARAMS0 register.
MFC after: 2 weeks
Obtained from: an old debugging patch
Reviewed by: mw (earlier version), mmel
Differential Revision: https://reviews.freebsd.org/D35699
Enable dwc3's auto retry feature. For IN transfers with crc errors
or internal overruns this will make the host reply with a
non-terminating retry ACK. I believe the hope was to improve
reliability after seeing occasional hiccups.
Obtained from: an old debugging patch
MFC after: 2 weeks
Reviewed by: mw
Differential Revision: https://reviews.freebsd.org/D35698
wpa_supplicant: Resolve secondary VAP association issue
Association will fail on a secondary open unprotected VAP when the
primary VAP is configured for WPA. Examples of secondary VAPs are,
hotels, universities, and commodity routers' guest networks.
A broadly similar bug was discussed on Red Hat's bugzilla affecting
association to a D-Link DIR-842.
This suggests that as IEs were added to the 802.11 protocol the old code
was increasingly inadaquate to handle the additional IEs, not only a
secondary VAP.
As of hostap 2.10, WEP is disabled by default. This of course is not a
bad thing but requires some planning and an announcment to remove WEP
support by default. A possible src.conf knob or letting users know they
should use the port instead might different options.
Rick Macklem [Sun, 3 Jul 2022 20:37:23 +0000 (13:37 -0700)]
mount_nfs.8: Update BUGS section for NFSv4.1/4.2
If the "intr" and/or "soft" mount options are used for
NFSv4 mounts, the protocol can be broken when the
operation returns without waiting for the RPC reply.
The likelyhood of failure increases for NFSv4.1/4.2
mounts, since the session slot will be broken when
an RPC reply is not processed.
This is mentioned in the BUGS section of "man mount_nfs",
but there was no specific mention of the session slot
problem. This patch adds a sentence for this case.
Apply clang fix for assertion building llvm with libc++ 15
Merge commit f1b0a4fc540f from llvm git (by Richard Smith):
An expression should only contain an unexpanded parameter pack if it
lexically contains a mention of the pack.
Systematically distinguish between syntactic and semantic references to
packs, especially when propagating dependence from a type into an
expression. We should consult the type-as-written when computing
syntactic dependence and should consult the semantic type when computing
semantic dependence.
Warner Losh [Sat, 2 Jul 2022 19:39:24 +0000 (13:39 -0600)]
pselect(2): Document what a null pointer for the signalmask means
When pselect is passed a null pointer for the signal mask, the standard
says it shall behave like select (except for the different timeout
arg). Make a note of that here.
Bjoern A. Zeeb [Tue, 28 Jun 2022 00:02:17 +0000 (00:02 +0000)]
arm64: NXP add LS1088a clockgen support
Add a driver for NXP LS1088a clockgen support which passes
configuration information to QorIQ clockgen class.
The implementaiton started off as copy of ls1028 support and was
adjusted accordingly.
Warner Losh [Sat, 2 Jul 2022 14:01:09 +0000 (08:01 -0600)]
amd64/efi: Remove setting hints for rsdp
Given that hints set this way don't work when a static kenv is compiled
into the kernel. acpi.rsdp has been set for this for the past 6 years,
and all kernels in that time have used it in preference to the hints. As
such, we no longer hints.*, so remove them.
Warner Losh [Sat, 2 Jul 2022 14:01:02 +0000 (08:01 -0600)]
amd64/efi: Stop falling back to hints for RSDP
All boot loaders for the last 6 years set acpi.rsdp in addition to the
hints. This was planned for removal ~5 years ago. Belatedly remove it
from here.
Warner Losh [Sat, 2 Jul 2022 14:00:40 +0000 (08:00 -0600)]
loader: Set preferred kenv for acpi.rsdp on arm64
Several years ago, x86 moved from using hints to communicate this
information to using the simpler acpi.rsdp variables. If one compiles
static hints into the kernel, then these hints are ignored. We can
remove this when we branch FreeBSD 15. Thought about BURN_BRIDGES
here, but it's too messy.
Warner Losh [Wed, 29 Jun 2022 14:26:58 +0000 (08:26 -0600)]
tty: Default to printing kernel stack traceback only on INVARIANT kernels
Change the default from printing a breif kernel thread stack informaton
back to omitting it for non-invariant kernels in response to
SIGINFO/^T. Full and brief stack support can be selected with the
kern.tty_info_kstacks sysctl.
Rick Macklem [Fri, 1 Jul 2022 21:43:17 +0000 (14:43 -0700)]
mount_nfs: Warn that intr, soft are not safe for NFSv4
If the "intr" and/or "soft" mount options are used for
NFSv4 mounts, the protocol can be broken when the
operation returns without waiting for the RPC reply.
The likelyhood of failure increases for NFSv4.1/4.2
mounts, since the session slot will be broken when
an RPC reply is not processed.
This is mentioned in the BUGS section of "man mount_nfs",
but more needs to be done. This patch adds code that
generates a warning message when the mount is done.
Warner Losh [Fri, 1 Jul 2022 17:22:38 +0000 (11:22 -0600)]
MIMIMAL: add uart
While uart could be detected completely through plug and play means, add
it here for two reasons. First, we don't do that from the loader, so
it's not available as a console. Second, even if we did do it from the
loader, there's a limitation in the system today that console drivers
must be compiled into the kernel because the console is selected before
external modules are linked into the kernel. Adding it only increases
the kernel size by ~14k as well.
Sponsored by: Netflix
Idea liked by: des, rpokala, brooks, jhb