Dmitry Chagin [Wed, 26 May 2021 05:34:32 +0000 (08:34 +0300)]
linux_common: retire extra module version.
The second 'linuxcommon' line was added by c66f5b079d2a259c3a65b1efe0f2143cd030dc52
but Linuxulator's modules dependend on 'linux_common'.
To avoid such mistakes in the future rename moduledata name and module
name to 'linux_common' and retire 'linuxcommon' line.
netgraph/ng_base: Renaming a node to the same name is a noop
Detailed analysis in https://github.com/genneko/freebsd-vimage-jails/issues/2
brought the problem down to a double call of ng_node_name() before and
after a vnet move. Because the name of the node is already known
(occupied by itself), the second call fails.
PR: 241954
Reported by: Paul Armstrong
Differential Revision: https://reviews.freebsd.org/D30110
John Baldwin [Tue, 20 Apr 2021 20:22:11 +0000 (13:22 -0700)]
etcupdate: Gracefully handle SIGINT when building trees.
Run the 'build_tree' function inside of a subshell and trap SIGINT to
return an error to the caller. This allows callers to gracefully
cleanup a partially created tree.
While here, redirect stdout/stderr of the subshell to the log file
instead of applying redirections individually to each command executed
while building the tree.
John Baldwin [Tue, 20 Apr 2021 20:21:42 +0000 (13:21 -0700)]
etcupdate: Always extract to a temporary tree.
etcupdate has had a somewhat nasty race condition since its creation
in that its state machine can get very confused if it is interrupted
while building the tree to compare against. This is exacerbated by
the fact that etcupdate doesn't emit any output while building the
tree which can take several seconds (especially in recent years with
the addition of the tree-wide buildconfig/installconfig passes).
To mitigate this, always install a new tree into a temporary directory
created via mktemp as was previously done only for dry-runs via -n.
The existing trees are only rotated and the new tree installed as
/var/db/etcupdate/current after the update command has completed.
David Bright [Mon, 24 May 2021 19:02:43 +0000 (14:02 -0500)]
pciconf: Fix up pciconf -lc output
The pciconf command fails to emit newlines when particular ecap field
values are seen. Fix them up. This has been seen on several systems at
$JOB. The documentation for PCI capabilities says that capability
type 0 should not be used once the spec for PCI capabilities was
published, but that seems more wishful-thinking than reality. pciconf
also chooses not to print fields related to field values that are
zero, but it seems several of these fields are zero on actual
hardware.
Sponsored by: Dell EMC Isilon
Submitted by: Robert Herndon (Robert.Herndon@dell.com)
David Bright [Mon, 24 May 2021 17:12:15 +0000 (12:12 -0500)]
libsa: Fix infinite loop in bzipfs & gzipfs
A bug in the loader's bzipfs & gzipfs filesystems caused compressed
kernel and modules not to work on EFI systems with a veriexec-enabled
loader. Since the size of files in these filesystems are not known
_a priori_ `stat` would initialize the size to -1 and the loader would
then hang in an infinite loop while trying to seek (read) to the end
of file since the loop termination condition compares the current
offset to that negative target position.
rack: honor prior socket buffer lock when doing the upcall
While partially reverting D24237 with D29690, due to introducing some
unintended effects for in-kernel TCP consumers, the preexisting lock
on the socket send buffer was not considered properly.
Found by: markj
MFC after: 2 weeks
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D30390
r367492 would unlock the socket buffer before eventually calling the upcall.
This leads to problematic interaction with NFS kernel server/client components
(MP threads) accessing the socket buffer with potentially not correctly updated
state.
Randall Stewart [Fri, 4 Jun 2021 09:26:43 +0000 (05:26 -0400)]
tcp: A better fix for the previously attempted fix of the ack-war issue with tcp.
So it turns out that my fix before was not correct. It ended with us failing
some of the "improved" SYN tests, since we are not in the correct states.
With more digging I have figured out the root of the problem is that when
we receive a SYN|FIN the reassembly code made it so we create a segq entry
to hold the FIN. In the established state where we were not in order this
would be correct i.e. a 0 len with a FIN would need to be accepted. But
if you are in a front state we need to strip the FIN so we correctly handle
the ACK but ignore the FIN. This gets us into the proper states
and avoids the previous ack war.
I back out some of the previous changes but then add a new change
here in tcp_reass() that fixes the root cause of the issue. We still
leave the rack panic fixes in place however.
Randall Stewart [Thu, 27 May 2021 14:50:32 +0000 (10:50 -0400)]
tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr).
However there was a missing case, i.e. where we get an out-of-order FIN by itself.
In such a case we don't want to leave the FIN bit set, otherwise we will do the
wrong thing and ack the FIN incorrectly. Instead we need to go through the
tcp_reasm() code and that way the FIN will be stripped and all will be well.
Randall Stewart [Wed, 26 May 2021 10:43:30 +0000 (06:43 -0400)]
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back
a long time when delayed ack timers were longer and links were slower. A
200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that
lowering this value to something more in line with todays delayed ack values (40ms)
might improve TCP. This bit of code makes it so rack can, via a socket option,
adjust the timer slop.
Randall Stewart [Tue, 25 May 2021 17:23:31 +0000 (13:23 -0400)]
tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed
in the last commit. The problem is the left edge gets transmitted before the adjustments are done
to the send_map, this means that right edge bits must be considered to be added only if
the entire RSM is being retransmitted.
Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns
out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself.
After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically
what happens is we go into the reassembly code and lose the FIN bit. The trick here is we
should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything.
That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no
timers running. This is because the usrclosed function gets called and the FIN's and such have
already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get
stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly
recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE
before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2
timer can speed this up in testing.
Randall Stewart [Mon, 24 May 2021 18:42:15 +0000 (14:42 -0400)]
tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu change for fsb's
The push bit itself was also not actually being properly moved to
the right edge. The FIN bit was incorrectly on the left edge. We
fix these two issues as well as plumb in the mtu_change for
alternate stacks.
Michael Tuexen [Sat, 22 May 2021 12:35:09 +0000 (14:35 +0200)]
tcp: Handle stack switch while processing socket options
Handle the case where during socket option processing, the user
switches a stack such that processing the stack specific socket
option does not make sense anymore. Return an error in this case.
Michael Tuexen [Fri, 21 May 2021 07:45:00 +0000 (09:45 +0200)]
tcp: Fix sending of TCP segments with IP level options
When bringing in TCP over UDP support in
https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605,
the length of IP level options was considered when locating the
transport header. This was incorrect and is fixed by this patch.
Randall Stewart [Thu, 13 May 2021 11:36:04 +0000 (07:36 -0400)]
tcp: Incorrect KASSERT causes a panic in rack
Skyzall found an interesting panic in rack. When a SYN and FIN are
both sent together a KASSERT gets tripped where it is validating that
a mbuf pointer is in the sendmap. But a SYN and FIN often will not
have a mbuf pointer. So the fix is two fold a) make sure that the
SYN and FIN split the right way when cloning an RSM SYN on left
edge and FIN on right. And also make sure the KASSERT properly
accounts for the case that we have a SYN or FIN so we don't
panic.
Reviewed by: mtuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D30241
Randall Stewart [Tue, 11 May 2021 12:15:05 +0000 (08:15 -0400)]
tcp: In rack, we must only convert restored rtt when the hostcache does restore them.
Rack now after the previous commit is very careful to translate any
value in the hostcache for srtt/rttvar into its proper format. However
there is a snafu here in that if tp->srtt is 0 is the only time that
the HC will actually restore the srtt. We need to then only convert
the srtt restored when it is actually restored. We do this by making
sure it was zero before the call to cc_conn_init and it is non-zero
afterwards.
Randall Stewart [Mon, 10 May 2021 15:25:51 +0000 (11:25 -0400)]
tcp:Host cache and rack ending up with incorrect values.
The hostcache up to now as been updated in the discard callback
but without checking if we are all done (the race where there are
more than one calls and the counter has not yet reached zero). This
means that when the race occurs, we end up calling the hc_upate
more than once. Also alternate stacks can keep there srtt/rttvar
in different formats (example rack keeps its values in microseconds).
Since we call the hc_update *before* the stack fini() then the
values will be in the wrong format.
Rack on the other hand, needs to convert items pulled from the
hostcache into its internal format else it may end up with
very much incorrect values from the hostcache. In the process
lets commonize the update mechanism for srtt/rttvar since we
now have more than one place that needs to call it.
Randall Stewart [Fri, 7 May 2021 21:32:32 +0000 (17:32 -0400)]
This takes Warners suggested approach to making it so that
platforms that for whatever reason cannot include the RATELIMIT option
can still work with rack. It adds two dummy functions that rack will
call and find out that the highest hw supported b/w is 0 (which
kinda makes sense and rack is already prepared to handle).
Reviewed by: Michael Tuexen, Warner Losh
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30163
Randall Stewart [Fri, 7 May 2021 18:06:43 +0000 (14:06 -0400)]
Fix a UDP tunneling issue with rack. Basically there are two
issues.
A) Not enough hdrlen was being calculated when a UDP tunnel is
in place.
and
B) Not enough memory is allocated in racks fsb. We need to
overbook the fsb to include a udphdr just in case.
Submitted by: Peter Lei
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30157
Randall Stewart [Thu, 6 May 2021 15:22:26 +0000 (11:22 -0400)]
This brings into sync FreeBSD with the netflix versions of rack and bbr.
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..
Martin Matuska [Tue, 8 Jun 2021 15:01:18 +0000 (17:01 +0200)]
zfs: merge openzfs/zfs@7d9f3ef0e (zfs-2.1-release) into stable/13
Notable upstream pull request merges:
#11710 Allow zfs to send replication streams with missing snapshots
#11786 Ratelimit deadman zevents as with delay zevents
#11813 Allow pool names that look like Solaris disk names
#11822 Atomically check and set dropped zevent count
#11822 Don't scale zfs_zevent_len_max by CPU count
#11837 zfs get -p only outputs 3 columns if "clones" property is empty
#11849 Use dsl_scan_setup_check() to setup a scrub
#11861 Improvements to the 'compatibility' property
#11862 cmd/zfs receive: allow dry-run (-n) to check property args
#11864 receive: don't fail inheriting (-x) properties on wrong dataset type
#11877 Combine zio caches if possible
#11881 FreeBSD: use vnlru_free_vfsops if available
#11883 FreeBSD: add support for lockless symlink lookup
#11884 FreeBSD: add missing seqc write begin/end around zfs_acl_chown_setattr
#11896 Fix crash in zio_done error reporting
#11905 zfs-send(8): Restore sorting of flags
#11926 FreeBSD: damage control racing .. lookups in face of mkdir/rmdir
#11938 Fix AVX512BW Fletcher code on AVX512-but-not-BW machines
#11966 Scale worker threads and taskqs with number of CPUs
#11997 FreeBSD: Don't force xattr mount option
#11997 FreeBSD: Use SET_ERROR to trace xattr name errors
#11998 Simplify/fix dnode_move() for dn_zfetch
#12003 FreeBSD: Initialize/destroy zp->z_lock
#12010 Fix dRAID self-healing short columns
#12033 Revert "Fix raw sends on encrypted datasets when copying back snapshots"
#12040 Reinstate the old zpool read label logic as a fallback
#12049 FreeBSD: avoid memory allocation in arc_prune_async
#12061 Fix dRAID sequential resilver silent damage handling
#12077 FreeBSD: Retry OCF ENOMEM errors.
#12088 Propagate vdev state due to invalid label corruption
#12097 FreeBSD: Update dataset_kstats for zvols in dev mode
Mark Johnston [Tue, 1 Jun 2021 23:38:22 +0000 (19:38 -0400)]
amd64: Clear the local TSS when creating a new thread
Otherwise it is copied from the creating thread. Then, if either thread
exits, the other is left with a dangling pointer, typically resulting in
a page fault upon the next context switch.
Reported by: syzkaller
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Markus Stoff [Tue, 18 May 2021 20:35:33 +0000 (22:35 +0200)]
ng_parse: IP address parsing in netgraph eating too many characters
Once the final component of the IP address has been parsed, the offset
on the input must not be advanced, as this would remove an unparsed
character from the input.
Submitted by: Markus Stoff
Reviewed by: donner
Differential Revision: https://reviews.freebsd.org/D26489
Randall Stewart [Tue, 26 Jan 2021 16:54:42 +0000 (11:54 -0500)]
This pulls over all the changes that are in the netflix
tree that fix the ratelimit code. There were several bugs
in tcp_ratelimit itself and we needed further work to support
the multiple tag format coming for the joint TLS and Ratelimit dances.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28357
Kristof Provost [Thu, 3 Jun 2021 13:22:19 +0000 (15:22 +0200)]
pf tests: Make killstate:match more robust
The killstate:match test starts nc as a background process. There was no
guarantee that the nc process would have connected by the time we check
for states, so this test occasionally failed without good reason.
Teach the test to wait for at least some states to turn up before
executing the critical checks.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC ("Netgate")
Lutz Donnerhacke [Sat, 15 May 2021 13:24:12 +0000 (15:24 +0200)]
libalias: Remove unused function LibAliasCheckNewLink
The functionality to detect a newly created link after processing a
single packet is decoupled from the packet processing. Every new
packet is processed asynchronously and will reset the indicator, hence
the function is unusable. I made a Google search for third party code,
which uses the function, and failed to find one.
That's why the function should be removed: It unusable and unused.
A much simplified API/ABI will remain in anything below 14.
Mark Johnston [Tue, 1 Jun 2021 23:38:09 +0000 (19:38 -0400)]
amd64: Relax the assertion added in commit 4a59cbc12
We only need to ensure that interrupts are disabled when handling a
fault from iret. Otherwise it's possible to trigger the assertion
legitimately, e.g., by copying in from an invalid address.
Fixes: 4a59cbc12
Reported by: pho
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Mark Johnston [Mon, 31 May 2021 22:49:33 +0000 (18:49 -0400)]
amd64: Avoid enabling interrupts when handling kernel mode prot faults
When PTI is enabled, we may have been on the trampoline stack when iret
faults. So, we have to switch back to the regular stack before
re-entering trap().
trap() has the somewhat strange behaviour of re-enabling interrupts when
handling certain kernel-mode execeptions. In particular, it was doing
this for exceptions raised during execution of iret. When switching
away from the trampoline stack, however, the thread must not be migrated
to a different CPU. Fix the problem by simply leaving interrupts
disabled during the window.
Reported by: syzbot+6cfa544fd86ad4647ffc@syzkaller.appspotmail.com
Reported by: syzbot+cfdfc9e5a8f28f11a7f5@syzkaller.appspotmail.com
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Mark Johnston [Mon, 31 May 2021 22:51:14 +0000 (18:51 -0400)]
x86: Fix lapic_ipi_alloc() on i386
The loop which checks to see if "dynamic" IDT entries are allocated
needs to compare with the trampoline address of the reserved ISR.
Otherwise it will never succeed.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Rich Ercolani [Wed, 2 Jun 2021 13:00:29 +0000 (13:00 +0000)]
vfs: fix MNT_SYNCHRONOUS check in vn_write
ca1ce50b2b5ef11d ("vfs: add more safety against concurrent forced
unmount to vn_write") has a side effect of only checking MNT_SYNCHRONOUS
if O_FSYNC is set.
Michael Tuexen [Sun, 2 May 2021 20:38:27 +0000 (22:38 +0200)]
sctp: improve error handling in INIT/INIT-ACK processing
When processing INIT and INIT-ACK information, also during
COOKIE processing, delete the current association, when it
would end up in an inconsistent state.
Michael Tuexen [Mon, 26 Apr 2021 08:38:05 +0000 (10:38 +0200)]
sctp: improve handling of illegal packets containing INIT chunks
Stop further processing of a packet when detecting that it
contains an INIT chunk, which is too small or is not the only
chunk in the packet. Still allow to finish the processing
of chunks before the INIT chunk.
Thanks to Antoly Korniltsev and Taylor Brandstetter for reporting
an issue with the userland stack, which made me aware of this
issue.
Ed Maste [Sun, 2 May 2021 19:28:36 +0000 (15:28 -0400)]
Restore Cirrus-CI boot smoke test
This reverts commit a7d593dd1da27833b5384349700bc3c7bcae6aad.
We now use compute_engine_instance which allows us to specify a custom
disk size. Also go back to using the default qemu version (rather than
qemu42 or qemu-devel) as any issues were fixed some time ago.
Reviewed by: lwhsu, markj
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30082
Merge commit f511dc75e4c1 from llvm git (by Mark Johnston):
[asan] Add an offset for the kernel address sanitizer on FreeBSD
This is based on a port of the sanitizer runtime to the FreeBSD kernel
that has been commited as https://cgit.freebsd.org/src/commit/?id=38da497a4dfcf1979c8c2b0e9f3fa0564035c147
and the following commits.
Reviewed By: emaste, dim
Differential Revision: https://reviews.llvm.org/D98285