Nathan Whitehorn [Fri, 28 May 2021 13:53:42 +0000 (09:53 -0400)]
Fix scripted installation from media without local distfiles.
The bsdinstall script target did not have the infrastructure to fetch
distfiles from a remote server the way the interactive installer does
on e.g. bootonly media. Solve this by factoring out the parts of the
installer that deal with fetching missing distributions into a new
install stage called 'fetchmissingdists', which is called by both the
interactive and scripted installer frontends.
In the course of these changes, cleaned up a few other issues with
the fetching of missing distribution files and added a warning if
fetching the MANIFEST file, which is used to verify the integrity of
the distribution files. We should at some point add cryptographic
signatures to MANIFEST so that it can be fetched safely if not present
on the install media (which it is for bootonly media).
Rick Macklem [Fri, 28 May 2021 02:08:36 +0000 (19:08 -0700)]
nfscl: Use hash lists to improve expected search performance for opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.
Commit 3f7e14ad9345 added a hash table of lists hashed on file handle
for the opens. This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens.
This commit should not affect the high level semantics of open
handling.
Mark Johnston [Thu, 27 May 2021 19:49:59 +0000 (15:49 -0400)]
ktrace: Fix a race with fork()
ktrace(2) may toggle trace points in any of
1. a single process
2. all members of a process group
3. all descendents of the processes in 1 or 2
In the first two cases, we do not permit the operation if the process is
being forked or not visible. However, in case 3 we did not enforce this
restriction for descendents. As a result, the assertions about the child
in ktrprocfork() may be violated.
Move these checks into ktrops() so that they are applied consistently.
Allow KTROP_CLEAR for nascent processes. Otherwise, there is a window
where we cannot clear trace points for a nascent child if they are
inherited from the parent.
Mark Johnston [Thu, 27 May 2021 19:49:32 +0000 (15:49 -0400)]
kevent: Prohibit negative change and event list lengths
Previously, a negative change list length would be treated the same as
an empty change list. A negative event list length would result in
bogus copyouts. Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.
Reviewed by: imp, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30480
Mark Johnston [Thu, 27 May 2021 19:49:12 +0000 (15:49 -0400)]
ktrace: Handle negative array sizes in ktrstructarray
ktrstructarray() may be used to create copies of kevent(2) change and
event arrays. It is called before parameter validation is done and so
should check for bogus array lengths before allocating a copy.
Reported by: syzkaller
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30479
Eric van Gyzen [Thu, 27 May 2021 16:33:22 +0000 (11:33 -0500)]
libprocstat kstack: fix race with thread creation
When collecting kernel stacks for a target process, if the process
adds a thread between the two calls to sysctl, ignore the additional
threads. Previously, procstat would print only a useless error
message. Now, it prints a consistent snapshot of the stacks.
We know that snapshot is already stale, but it could still be stale
even with a more complex fix to reallocate and retry, so such a fix
is hardly worth the effort.
Randall Stewart [Thu, 27 May 2021 14:50:32 +0000 (10:50 -0400)]
tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
However there was a missing case, i.e. where we get an out-of-order FIN by itself.
In such a case we don't want to leave the FIN bit set, otherwise we will do the
wrong thing and ack the FIN incorrectly. Instead we need to go through the
tcp_reasm() code and that way the FIN will be stripped and all will be well.
Bjoern A. Zeeb [Wed, 26 May 2021 17:55:21 +0000 (17:55 +0000)]
LinuxKPI: netdevice.h remove more ifnet operating macros
Now that mlx4 and ofed either are operating on ifnet functions
directly or have a private copy of these macros, we can remove them
from linux/netdevice.h.
With this only the #define for net_device to ifnet is left.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30478
Bjoern A. Zeeb [Wed, 26 May 2021 17:51:24 +0000 (17:51 +0000)]
OFED: migrate LinuxKPI net_device/ifnet macros into ofed
The LinuxKPI net_device actually is an ifnet; in order to further
clean that up so we can extend "net_device" migrate the few macros
left into ofed and make sure the header is included in all files
which need access to the macros.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30477
Navdeep Parhar [Thu, 27 May 2021 02:18:42 +0000 (19:18 -0700)]
cxgbe(4): Fix an incorrect assert.
CTRL and OFLD tx queues do not have automatic tx credit flush enabled so
it is okay for the cidx not to be the same as the pidx when the queue is
destroyed.
hwpmc: Move 4 bits of mode to extend class size to 8
Since r289025 we have had at least 5 bits class size.
Before that it was even 16 bits, but macro handling conversion between
pmcid and set of CPU, MODE, CLASS, ROWINDEX still use 4 bits class size
and 8 bits mode size.
This breaks some libpmc API methods, like pmc_capabilities.
Since we only have 4 modes and MODE field is a number (not a bitfield)
this patch moves 4 bits of mode to extend the CLASS field.
tcp: Use local CC data only in the correct context
Most CC algos do use local data, and when calling
newreno_cong_signal from there, the latter misinterprets
the data as its own struct, leading to incorrect behavior.
Reported by: chengc_netapp.com
Reviewed By: chengc_netapp.com, tuexen, #transport
MFC after: 3 days
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30470
Lutz Donnerhacke [Sun, 23 May 2021 17:48:13 +0000 (19:48 +0200)]
tests/libalias: Improve testing
gettimeofday(3) is almost as expensive as the calls to libalias.
So the call frequency for this call is reduced by a factor of 1000 in
order to neglect it's influence.
Using NAT entries became more realistic: A communication of a random
length of up to 150 packets (10% outgoing, 90% incoming) is applied
for each entry.
Precision of the execution time is raised to see the trends better.
Andrew Gallatin [Wed, 26 May 2021 13:54:26 +0000 (09:54 -0400)]
cxgbe: fix enabling lro & rxtimestamps
A recent change caused iq flags, like LRO, to be set before
init_iq(). However, init_iq() clears those flags, so they
became effectively impossible to set. This change moves
the initializion of these flags to after the call to init_iq().
This fixes LRO.
Kristof Provost [Tue, 18 May 2021 07:24:50 +0000 (09:24 +0200)]
pf: Move nvlist conversion functions to pf_nv
Separate the conversion functions (between kernel structs and nvlists)
to pf_nv. This reduces the size of pf_ioctl.c, which is already quite
large and complex, a good bit. It also keeps all the fairly
straightforward conversion code together.
Eugene Grosbein [Wed, 26 May 2021 11:30:24 +0000 (18:30 +0700)]
rc.d/random: add support for zero harvest_mask
Replace the check for zero harvest_mask with new check for empty string.
This allows one to specify harvest_mask="0" that disables harversting
entropy from all but "pure" sources. Exact bit values for "pure" sources
differ for stable/12 and later branches, so it is handy to use zero.
The check for zero pre-dates introduction of "pure" non-maskable sources
Use empty string to disable altering sysctl kern.random.harvest.mask.
Note that notion of "pure" random sources is not documented in user level
manual pages yet. Still, it helps to extend battery life for hardware
with embedded "Intel Secure Key RNG" by disabling all other sources.
Note that no defaults changed and default behaviour is not affected.
Randall Stewart [Wed, 26 May 2021 10:43:30 +0000 (06:43 -0400)]
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back
a long time when delayed ack timers were longer and links were slower. A
200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that
lowering this value to something more in line with todays delayed ack values (40ms)
might improve TCP. This bit of code makes it so rack can, via a socket option,
adjust the timer slop.
Dmitry Chagin [Wed, 26 May 2021 05:34:32 +0000 (08:34 +0300)]
linux_common: retire extra module version.
The second 'linuxcommon' line was added by c66f5b079d2a259c3a65b1efe0f2143cd030dc52
but Linuxulator's modules dependend on 'linux_common'.
To avoid such mistakes in the future rename moduledata name and module
name to 'linux_common' and retire 'linuxcommon' line.
John Baldwin [Tue, 25 May 2021 23:59:19 +0000 (16:59 -0700)]
crypto: Add crypto_cursor_segment() to fetch both base and length.
This function combines crypto_cursor_segbase() and
crypto_cursor_seglen() into a single function. This is mostly
beneficial in the unmapped mbuf case where back to back calls of these
two functions have to iterate over the sub-components of unmapped
mbufs twice.
Bump __FreeBSD_version for crypto drivers in ports.
John Baldwin [Tue, 25 May 2021 23:59:19 +0000 (16:59 -0700)]
Assume OCF is the only KTLS software backend.
This removes support for loadable software backends. The KTLS OCF
support is now always included in kernels with KERN_TLS and the
ktls_ocf.ko module has been removed. The software encryption routines
now take an mbuf directly and use the TLS mbuf as the crypto buffer
when possible.
Bump __FreeBSD_version for software backends in ports.
John Baldwin [Tue, 25 May 2021 23:59:18 +0000 (16:59 -0700)]
crypto: Add a new type of crypto buffer for a single mbuf.
This is intended for use in KTLS transmit where each TLS record is
described by a single mbuf that is itself queued in the socket buffer.
Using the existing CRYPTO_BUF_MBUF would result in
bus_dmamap_load_crp() walking additional mbufs in the socket buffer
that are not relevant, but generating a S/G list that potentially
exceeds the limit of the tag (while also wasting CPU cycles).
Rick Macklem [Tue, 25 May 2021 21:19:29 +0000 (14:19 -0700)]
nfscl: Use hash lists to improve expected search performance for opens
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently. When searching for an open,
usually for a match by file handle, a linear search of all opens
is done.
Commit 3f7e14ad9345 added a hash table of lists hashed on file handle
for the opens. This patch uses the hash lists for searching for
a matching open based of file handle instead of an exhaustive
linear search of all opens.
This change appears to be performance neutral for a small number
of opens, but should improve expected performance for a large
number of opens. This patch also moves any found match to the front
of the hash list, to try and maintain the hash lists in recently
used ordering (least recently used at the end of the list).
This commit should not affect the high level semantics of open
handling.
Bjoern A. Zeeb [Tue, 25 May 2021 17:37:15 +0000 (17:37 +0000)]
Bump __FreeBSD_version to 1400015 for LinuxKPI changes.
Commits 17accc08ae15 and de102f870501 add new files to LinuxKPI
which break drm-kmod. In addition various other additions where
comitted. Bump __FreeBSD_version to 1400015 to be able to detect this.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Bjoern A. Zeeb [Mon, 24 May 2021 17:54:16 +0000 (17:54 +0000)]
LinuxKPI: byteorder.h
Add a few more le<n>_{tp,add}_cpu*() #defines/functions found in
wireless drivers. While here fill most of the combinatorics gaps
and also add the remaining combinations [1].
Suggested by: emaste [1] (for one part)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30418
Bjoern A. Zeeb [Mon, 24 May 2021 18:09:37 +0000 (18:09 +0000)]
LinuxKPI: add ether_addr_equal_unaligned()
Replace the implementation for ether_addr_equal() with
ether_addr_equal_unaligned() and add a define for ether_addr_equal()
pointing to the now ether_addr_equal_unaligned() implementation.
This way ether_addr_equal_unaligned() cannot be broken by accident [1].
Suggested by: emaste [1]
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30425
Bjoern A. Zeeb [Mon, 24 May 2021 18:14:37 +0000 (18:14 +0000)]
LinuxKPI: add irq_set_affinity_hint()
Add an implementation for irq_set_affinity_hint() to linux/interrupt.h
and include linux/hardirq.h for synchronize_irq() as needed by
wireless drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30427
Bjoern A. Zeeb [Mon, 24 May 2021 18:17:30 +0000 (18:17 +0000)]
LinuxKPI: add linux/{ip,tcp,udp}.h
Add header files for struct and accessors for IPv4, UDP, and TCP.
Only parts of the fields of the structs have been seen while working
on wireless drivers. The remaining field names are filled up with
the FreeBSD field names for now. If you have insights into their
correct naming in Linux, feel free to adjust.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30428
Bjoern A. Zeeb [Mon, 24 May 2021 18:26:41 +0000 (18:26 +0000)]
LinuxKPI: change BUILD_BUG_ON()
BUILD_BUG_ON() can be used inside functions where the definition to
CTASSERT() (_Static_assert()) seems to not work.
Go back to an old-style CTASSERT() implementation but also add a
variable dclaration to avoid "unsued typedef" errors and dummy-use
the variable to avoid "unusued variable" errors. Given it is all
self-contained in a block and not used outside this should be
optimised away.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30431
Bjoern A. Zeeb [Mon, 24 May 2021 18:40:42 +0000 (18:40 +0000)]
LinuxKPI: add rcu_dereference_check()
Add a define for rcu_dereference_check() to rcu_dereference_protected()
which ignores the check argument. Our lockdep compat implementation
for use cases found in iwlwifi would return 1 anyway.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30436
Bjoern A. Zeeb [Mon, 24 May 2021 18:53:28 +0000 (18:53 +0000)]
LinuxKPI: extract stringify() in their own header file
Add linux/stringify.h as directly included by drivers. Remove the
definitions from compiler.h and include the new header in places
where the stringify macros are already used without linuxkpi.
Randall Stewart [Tue, 25 May 2021 17:23:31 +0000 (13:23 -0400)]
tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.
Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.