Gleb Smirnoff [Sat, 4 Dec 2021 17:49:36 +0000 (09:49 -0800)]
ifnet: make V_if_index static to if.c
This requires moving net.link.generic sysctl declaration from if_mib.c
to if.c. Ideally if_mib.c needs just to be merged to if.c, but they
have different license texts.
Gleb Smirnoff [Sat, 4 Dec 2021 17:49:36 +0000 (09:49 -0800)]
ifnet_byindex() actually requires network epoch
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch. Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit. Mark that with a comment.
Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.
Gleb Smirnoff [Sat, 4 Dec 2021 17:49:35 +0000 (09:49 -0800)]
ifnet: merge ifindex_alloc(), ifnet_setbyindex(), if_grow() and call magic
Now it is possible to just merge all this complexity into single
linear function. Note that IFNET_WLOCK() is a sleepable lock, so
we can M_WAITOK and epoch_wait_preempt().
Warner Losh [Mon, 6 Dec 2021 17:23:23 +0000 (10:23 -0700)]
nvme_sim: Only report PCI related stats when we can
For AHCI attached devices, we report the location and identification
information of the AHCI controller that we're attached to. We also
don't reprot link speed in that case, since we can't get to the PCIe
config space registers to find that out.
Warner Losh [Mon, 6 Dec 2021 17:23:14 +0000 (10:23 -0700)]
nvd: For AHCI attached devices, report ahci bridge
When an NVME device is attached via a AHCI controller, we have no access
to its config space. So instead of information about the nvme drive
itself, return info about the AHCI controller as the next best
thing. Since the Intel Hardware RAID support looks at these values, this
likely is best.
Warner Losh [Mon, 6 Dec 2021 17:23:06 +0000 (10:23 -0700)]
nvme_ahci: Mark AHCI devices as such in the controller
Add a quirk to flag AHCI attachment to the controller. This is for any
of the strategies for attaching nvme devices as children of the AHCI
device for Intel's RAID devices. This also has a side effect of cleaning
up resource allocation from failed nvme_attach calls now.
Warner Losh [Mon, 6 Dec 2021 17:22:50 +0000 (10:22 -0700)]
nvme: Move to a quirk for the Intel alignment data
Prior to NVMe 1.3, Intel produced a series of drives that had
performance alignment data in the vendor specific space since no
standard had been defined. Move testing the versions to a quick so the
NVMe NS code doesn't know about PCI device info.
Kristof Provost [Mon, 6 Dec 2021 17:15:24 +0000 (18:15 +0100)]
dummynet tests: disable for now
Disable the dummynet tests when running the ci tests. This avoids
running into the panic described in https://reviews.freebsd.org/D33064
(where an interface is removed but a dummynet queued packet still has a
pointer to it).
These tests can be re-enabled when the work in
https://reviews.freebsd.org/D33267 lands.
Mark Johnston [Mon, 6 Dec 2021 15:42:19 +0000 (10:42 -0500)]
x86: Perform late TSC calibration before LAPIC timer calibration
This ensures that LAPIC calibration is done using the correct tsc_freq
value, i.e., the one associated with the TSC timecounter. It does mean
though that TSC calibration cannot use sbinuptime() to read the
reference timecounter, as timehands are not yet set up.
Reviewed by: kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33209
Mark Johnston [Mon, 6 Dec 2021 15:42:10 +0000 (10:42 -0500)]
x86: Defer LAPIC calibration until after timecounters are available
This ensures that we have a good reference timecounter for performing
calibration.
Change lapic_setup to avoid configuring the timer when booting, and move
calibration and initial configuration to a new lapic routine,
lapic_calibrate_timer. This calibration will be initiated from
cpu_initclocks(), before an eventtimer is selected.
Reviewed by: kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33206
Mark Johnston [Mon, 6 Dec 2021 15:37:49 +0000 (10:37 -0500)]
libdwarf: Support consumption of compressed ELF sections
Automatically decompress zlib-compressed debug sections when loading
them. This lets ctfcovert work on userland code after commit c910570e7573 ("Use compressed debug in standalone userland debug files
by default").
Reported by: avg
Reviewed by: avg, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33139
Randall Stewart [Mon, 6 Dec 2021 14:56:09 +0000 (09:56 -0500)]
tcp: rack fails to send out a TLP after a MTU change
When rack sends out a TLP it sets up various state to make sure
it avoids the cwnd (its been more than 1 RTT since our last send) and
it may at times send new data. If an MTU change as occurred
and our cwnd has collapsed we can have a situation where must_retran
flag is set and we obey the cwnd thus never sending the TLP and then
sitting stuck.
This one line fix addresses that problem
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33231
Mitchell Horne [Mon, 6 Dec 2021 14:09:54 +0000 (10:09 -0400)]
ucom: s/sio/ucom/
Seems like a copy-paste error, or at least this made more sense when the
sio(4) driver still existed. This modifies the debug port name displayed
at boot, but otherwise has no functional change.
Reviewed by: hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D33278
Kristof Provost [Thu, 2 Dec 2021 17:47:40 +0000 (18:47 +0100)]
pf tests: more thorough pfsync defer test
Add a somewhat more extensive pfsync defer mode test. Ensure that pfsync
actually delays the state creating packet until after it has sent the
pfsync update and given the peer time to create the state.
Ideally the test should validate the pfsync state update and generate an
ack message, but to keep the test simple we rely on the timeout of the
deferred packet instead.
Kristof Provost [Thu, 2 Dec 2021 17:39:23 +0000 (18:39 +0100)]
pfsync: fix incorrect enabling of defer mode
When we exposed the PFSYNCF_OK flag to userspace in 5f5bf88949d we
unintentionally caused defer mode to always be enabled.
The ioctl check only looked for nonzero, not for the PFSYNCF_DEFER flag.
Kristof Provost [Thu, 2 Dec 2021 16:42:56 +0000 (17:42 +0100)]
pfsync: locking fixes
* Ensure we unlock the pfsync lock in pfsync_defer()
* We must hold the bucket lock when calling pfsync_push()
* The pfsync_defer_tmo() callout locks the bucket lock, not the pfsync
lock
Kristof Provost [Thu, 2 Dec 2021 16:30:36 +0000 (17:30 +0100)]
pfsync: fix defer timeout
Don't use a fixed number of ticks, but take hz into account so we have a
consistent timeout, regardless of what hz is set up.
Use a 20ms timeout, becaues that's what OpenBSD uses.
Emmanuel Vadot [Wed, 1 Dec 2021 15:13:09 +0000 (16:13 +0100)]
fb: Add new FBTYPE_EFIFB
Currently the type isn't set in the fbtype struct so any userland
program that call the FBIOGTYPE ioctl will think it's a FBTYPE_SUN1BW
which is far from the truth.
No app that I found find checks the type but at least now it's correct.
Emmanuel Vadot [Wed, 1 Dec 2021 10:57:42 +0000 (11:57 +0100)]
fb: Remove some unused ioctls
6d1699583d7e added the FBIOGXINFO,FBIOMONINFO and FBIOPUTCMAPI/FBIOGETCMAPI
ioctls and said that implementation in driver will come later.
Since it was in 2001 I think we can remove this.
Emmanuel Vadot [Wed, 1 Dec 2021 10:53:03 +0000 (11:53 +0100)]
fb: Remove unused FBIOVERTICAL ioctl
Commit 6d1699583d7e added the FBIOVERTICAL ioctl and said that implementation
in driver will come later.
Since it was in 2001 I think we can remove this.
Andriy Gapon [Mon, 6 Dec 2021 07:59:28 +0000 (09:59 +0200)]
vmxnet3: skip zero-length descriptor in the middle of a packet
Passing up such descriptors to iflib is obviously wasteful.
But the main conern is that we may overrun iri_frags array because of
them. That's been observed in practice.
Also, assert that the number of fragments / descriptors / segments is
less than IFLIB_MAX_RX_SEGS.
Andreas Wetzel [Mon, 6 Dec 2021 07:21:38 +0000 (09:21 +0200)]
rtwn/usb: add product ID for Asus USB N10 Nano Rev. B1
According to information found on the internet the following products
use exactly the same hardware but probably different USB IDs:
- Edimax EW-7811Un V2 (v2)
- Edimax EW-7811GLN 2.0A (v2)
I am not adding them as I cannot verify.
Warner Losh [Mon, 6 Dec 2021 05:57:50 +0000 (22:57 -0700)]
cam/iosched: fix off by one error
Set the bucket size to be SBT_1US/50000 + 1 to be the first number >
20us. I had this uncommitted in my three when I pushed 2283206935b8
since kern.cam.iosched.bucket_base_us was reporting 19us.
Warner Losh [Mon, 6 Dec 2021 04:54:42 +0000 (21:54 -0700)]
cam-iosched: Publish parameters of the latency buckets
Add sysctls to publish the latency bucket size, number, and stride. Move
to putting all the iosched stuff under kern.cam.iosched as well and move
kern.cam.do_dynamic_iosched to kern.cam.iosched.dynamic. In addition, move
kern.cam.io_sched_alpha_bits to kern.cam.iosched.alpha_bits. Publish
kern.cam.iosched.bucket_base (the smallest bucket time), .bucket_ratio
(the geometric progression factor * 100), and .buckets (the total number
of buckets).
Move to publishing 20 buckets, starting at 20us. This allows us to get
better resolution on the short end and detect preformance degredation of
the NVMe drives I've tested on, even with the uncertainty of bucketing.
Alan Cox [Sun, 5 Dec 2021 23:40:53 +0000 (17:40 -0600)]
amd64/pmap: fix user page table page accounting
When a superpage mapping is destroyed and the original page table page
containing 4KB mappings that was being held in reserve is deallocated,
the recently introduced user page table page count was not being
decremented. Consequentially, the count was wrong and would grow over
time. For example, after multiple iterations of "buildworld", I was
seeing implausible counts, like the following:
Stefan Eßer [Sun, 5 Dec 2021 21:27:33 +0000 (22:27 +0100)]
sys/bitset.h: reduce visibility of BIT_* macros
Add two underscore characters "__" to names of BIT_* and BITSET_*
macros to move them to the implementation name space and to prevent
a name space pollution due to BIT_* macros in 3rd party programs with
conflicting parameter signatures.
These prefixed macro names are used in kernel header files to define
macros in e.g. sched.h, sys/cpuset.h and sys/domainset.h.
If C programs are built with either -D_KERNEL (automatically passed
when building a kernel or kernel modules) or -D_WANT_FREENBSD_BITSET
(or this macros is defined in the source code before including the
bitset macros), then all macros are made visible with their previous
names, too. E.g., both __BIT_SET() and BIT_SET() are visible with
either of _KERNEL or _WANT_FREEBSD_BITSET defined.
The main reason for this change is that some 3rd party sources
including sched.h have been found to contain conflicting BIT_*
macros.
As a work-around, parts of shed.h have been made conditional and
depend on _WITH_CPU_SET_T being set when sched.h is included.
Ports that expect the full functionality provided by sched.h need
to be built with -D_WITH_CPU_SET_T. But this leads to conflicts if
BIT_* macros are defined in that program, too.
This patch set makes all of sched.h visible again without this
parameter being passed and without any name space pollution due
to BIT_* macros becoming visible when sched.h is included.
This patch set will be backported to the STABLE branches, but ports
will need to use -D_WITH_CPU_SET_T as long as there are supported
releases that do not contain these patches.
Gleb Smirnoff [Sun, 5 Dec 2021 16:47:24 +0000 (08:47 -0800)]
in_pcb: delay crfree() down into UMA dtor
inpcb lookups, which check inp_cred, work with pcbs that potentially went
through in_pcbfree(). So inp_cred should stay valid until SMR guarantees
its invisibility to lookups.
While here, put the whole inpcb destruction sequence of in_pcbfree(),
inpcb_dtor() and inpcb_fini() sequentially.
Dimitry Andric [Sun, 5 Dec 2021 17:54:13 +0000 (18:54 +0100)]
Apply fix for clang crashing on invalid -Wa,-march= values
Merge commit df08b2fe8b35 from llvm git (by Dimitry Andric):
[AArch64] Avoid crashing on invalid -Wa,-march= values
As reported in https://bugs.freebsd.org/260078, the gnutls Makefiles
pass -Wa,-march=all to compile a number of assembly files. Clang does
not support this -march value, but because of a mistake in handling
the arguments, an unitialized Arg pointer is dereferenced, which can
cause a segfault.
Work around this by adding a check if the local WaMArch variable is
initialized, and if so, using its value in the diagnostic message.
Mark Johnston [Sun, 5 Dec 2021 15:45:12 +0000 (10:45 -0500)]
ng_ubt: Avoid attaching to several newer Intel controllers
Like other Intel controllers, these require firmware to be loaded, and
generic ng_ubt attach causes them to lock up until a power cycle.
However, their firmware interface for querying version info and loading
operational firmware is different from that implemented by ng_ubt_intel
and iwmbtfw, so they are not usable yet. Just disable attach for now to
avoid stalls during USB device enumeration.
PR: 260161
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Mitchell Horne [Sun, 5 Dec 2021 15:11:55 +0000 (11:11 -0400)]
x86: remove unused T_USER flag
It stopped being used in 3c256f5395aa, when trap() was reorganized to
have separate switch statements for user and kernel traps. Remove the
two leftover references and the flag itself.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D33253
Rick Macklem [Sat, 4 Dec 2021 22:38:55 +0000 (14:38 -0800)]
nfsd: Fix Verify for attributes like FilesAvail
When the Verify operation calls nfsv4_loadattr(), it provides
the "struct statfs" information that can be used for doing a
compare for FilesAvail, FilesFree, FilesTotal, SpaceAvail,
SpaceFree and SpaceTotal. However, the code erroneously
used the "struct nfsstatfs *" argument that is NULL.
This patch fixes these cases to use the correct argument
structure. For the case of FilesAvail, the code in
nfsv4_fillattr() was factored out into a separate function
called nfsv4_filesavail(), so that it can be called from
nfsv4_loadattr() as well as nfsv4_fillattr().
In fact, most of the code in nfsv4_filesavail() is old
OpenBSD code that does not build/run on FreeBSD, but I
left it in place, in case it is of some use someday.
I am not aware of any extant NFSv4 client that does Verify
on these attributes.
swapoff(2): replace special device name argument with a structure
For compatibility, add a placeholder pointer to the start of the
added struct swapoff_new_args, and use it to distinguish old vs. new
style of syscall invocation.
Reviewed by: markj
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33165
Florian Walpen [Sat, 4 Dec 2021 16:17:29 +0000 (18:17 +0200)]
MAC/priority module for realtime privilege group
This is a MAC policy module that grants scheduling privileges based on
group membership. Users or processes in the group realtime (gid 47) are
allowed to run threads and processes with realtime scheduling priority.
For timing-sensitive, low-latency software like audio/jack, running with
realtime priority helps to avoid stutter and gaps.
Gleb Smirnoff [Sat, 4 Dec 2021 18:05:46 +0000 (10:05 -0800)]
nhop: hash ifnet pointer instead of if_index
Yet another problem created by VIMAGE/if_vmove/epair design that
relocates ifnet between vnets and changes if_index. Since if_index
changes, nhop hash values also changes, unlink_nhop() isn't able to
find entry in hash and leaks the nhop. Since nhop references ifnet,
the latter is also leaked. As result running network tests leaks
memory on every single test that creates vnet jail.
While here, rewrite whole hash_priv() to use static initializer,
per Alexander's suggestion.
Michael Tuexen [Sat, 4 Dec 2021 14:00:05 +0000 (15:00 +0100)]
tcpdrop: allow TCP connections to be filtered by cc-algo
In addition to filtering by stack and state, allow filtering
by the congestion control module used. Choose the command line
options to be consistent with the ones of sockstat.
Cy Schubert [Wed, 20 Oct 2021 03:11:40 +0000 (20:11 -0700)]
ipfilter: Avoid more null if-then-else blocks
As in 73db3b64f167, when WITHOUT_INET6 is selected, null if-then-else
blocks are generated because #if statements are incorrectly placed.
Moving the #if statements reduces unnecessary runtime comparisons or
compiler optimizations.
Cy Schubert [Tue, 5 Oct 2021 04:26:58 +0000 (21:26 -0700)]
ipfilter: Add DTrace SDT probe
Add an SDT probe, using the newly created DT5 macro, in similar vein
to DEBUG_PARSE printf for when FTP junk is anticipated and ok. This
will assist in debugging port (active) FTP proxy issues.
Kristof Provost [Thu, 2 Dec 2021 07:22:34 +0000 (08:22 +0100)]
if_pflog: fix packet length
There were two issues with the new pflog packet length.
The first is that the length is expected to be a multiple of
sizeof(long), but we'd assumed it had to be a multiple of
sizeof(uint32_t).
The second is that there's some broken software out there (such as
Wireshark) that makes incorrect assumptions about the amount of padding.
That is, Wireshark assumes there's always three bytes of padding, rather
than however much is needed to get to a multiple of sizeof(long).
Fix this by adding extra padding, and a fake field to maintain
Wireshark's assumption.
Robert Wing [Fri, 3 Dec 2021 23:22:23 +0000 (14:22 -0900)]
ipsec: fix a panic with INVARIANTS
When adding an SPD entry that already exists, a refcount wraparound
panic is encountered. This was caused from dropping a reference on the
wrong security policy.
Fixes: 4920e38fecc3 ("ipsec: fix race condition in key.c")
Reviewed by: wma
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D33100