Andrew Fengler [Wed, 12 May 2021 01:59:10 +0000 (01:59 +0000)]
Add support for adding default routes for other FIBs
Make rc.d/routing read defaultrouter_fibN and ipv6_defaultrouter_fibN, and
set it as the default gateway for FIB N, where N is from 1 to (net.fibs - 1)
This allows adding gateways for multiple FIBs in the same format as the main
gateway. (FIB 0)
Robert Wing [Thu, 11 Mar 2021 19:27:43 +0000 (10:27 -0900)]
libvmm: explicitly save and restore errno in vm_open()
In commit 6bb140e3ca895a14, vm_destroy() was replaced with free() to
preserve errno. However, it's possible that free() may change the errno
as well. Keep the free() call, but explicitly save and restore errno.
Kirk McKusick [Tue, 11 May 2021 21:51:06 +0000 (14:51 -0700)]
Ensure that files with no allocated blocks are trimmed to zero length.
UFS does not allow files to end with a hole; it requires that the
last block of a file be allocated. As fsck_ffs(8) initially scans
each allocated inode, it tracks the last allocated block in the
inode. It then checks that the inode's size falls in the last
allocated block. If the last allocated block falls before the size,
a `file size beyond end of allocated file' warning is issued and
the file is shortened to reference the last allocated block (to avoid
having it reference a hole at its end). If the last allocated block
falls after the size, a `partially truncated file' warning is issued
and all blocks following the block referenced by the size are freed.
Because of an incorrect unsigned comparison, this test was failing
to handle files with no allocated blocks but non-zero size (which
should have had their size reset to zero). Once that was fixed the
test started incorrectly complaining about short symbolic links
that place the link path in the inode rather than in a disk block.
Because these symbolic links have a non-zero size, but no allocated
blocks, fsck_ffs wanted to zero out their size. This patch has to
detect and avoid changing the size of such symbolic links.
Mark Johnston [Tue, 11 May 2021 21:36:12 +0000 (17:36 -0400)]
cryptodev: Fix some input validation bugs
- When we do not have a separate IV, make sure that the IV length
specified by the session is not larger than the payload size.
- Disallow AEAD requests without a separate IV. crp_sanity() asserts
that CRYPTO_F_IV_SEPARATE is set for AEAD requests, and some (but not
all) drivers require it.
- Return EINVAL for AEAD requests if an IV is specified but the
transform does not expect one.
Roger Pau Monné [Tue, 11 May 2021 10:19:29 +0000 (12:19 +0200)]
xen/blkback: fix reconnection of backend
The hotplug script will be executed only once for each backend,
regardless of the frontend triggering reconnections. Fix blkback to
deal with the hotplug script being executed only once, so that
reconnections don't stall waiting for a hotplug script execution
that will never happen.
As a result of the fix move the initialization of dev_mode, dev_type
and dev_name to the watch callback, as they should be set only once
the first time the backend connects.
This fix is specially relevant for guests wanting to use UEFI OVMF
firmware, because OVMF will use Xen PV block devices and disconnect
afterwards, thus allowing them to be used by the guest OS. Without
this change the guest OS will stall waiting for the block backed to
attach.
Fixes: de0bad00010c ('blkback: add support for hotplug scripts')
MFC after: 1 week
Sponsored by: Citrix Systems R&D
Randall Stewart [Tue, 11 May 2021 12:15:05 +0000 (08:15 -0400)]
tcp: In rack, we must only convert restored rtt when the hostcache does restore them.
Rack now after the previous commit is very careful to translate any
value in the hostcache for srtt/rttvar into its proper format. However
there is a snafu here in that if tp->srtt is 0 is the only time that
the HC will actually restore the srtt. We need to then only convert
the srtt restored when it is actually restored. We do this by making
sure it was zero before the call to cc_conn_init and it is non-zero
afterwards.
Wojciech Macek [Fri, 23 Apr 2021 03:57:03 +0000 (05:57 +0200)]
mroute: fix race condition during mrouter shutting down
There is a race condition between V_ip_mrouter de-init
and ip_mforward handling. It might happen that mrouted
is cleaned up after V_ip_mrouter check and before
processing packet in ip_mforward.
Use epoch call aproach, similar to IPSec which also handles
such case.
There are two module declarations in the nfscl.ko module for "nfscl"
and "nfs". Both of these declarations had MODULE_DEPEND() calls.
This patch deletes the MODULE_DEPEND() calls for "nfs" to avoid
confusion with respect to what modules this module is dependent upon.
The patch also adds comments explaining why there are two module
declarations within the module.
Mark Johnston [Tue, 11 May 2021 00:18:00 +0000 (20:18 -0400)]
vfs: Fix error handling in vn_fullpath_hardlink()
vn_fullpath_any_smr() will return a positive error number if the
caller-supplied buffer isn't big enough. In this case the error must be
propagated up, otherwise we may copy out uninitialized bytes.
Reported by: syzkaller+KMSAN
Reviewed by: mjg, kib
MFC aftr: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30198
rtld: preserve the 'seen' state of the dlerror message in errmsg_save()
rtld preserves its current error message around calls to user init/fini
lists, to not override original error with potential secondary errors
caused by user code recursing into rtld. After 4d9128da54f8f8e2a29190,
the preservation of the string itself is not enough, the 'seen'
indicator must be preserved as well. Otherwise, since new code does not
clear string (it cannot), call to _rtld_error() from errmsg_restore()
revived whatever message was consumed last.
Change errmsg_save() to return structure recording both 'seen' indicator
and the message, if any.
PR: 255698
Reported by: Eugene M. Kim <astralblue@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
It reopens the passed file descriptor, checking the file backing vnode'
current access rights against open mode. In particular, this flag allows
to convert file descriptor opened with O_PATH, into operable file
descriptor, assuming permissions allow that.
Reviewed by: markj
Tested by: Andrew Walker <awalker@ixsystems.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30148
Randall Stewart [Mon, 10 May 2021 15:25:51 +0000 (11:25 -0400)]
tcp:Host cache and rack ending up with incorrect values.
The hostcache up to now as been updated in the discard callback
but without checking if we are all done (the race where there are
more than one calls and the counter has not yet reached zero). This
means that when the race occurs, we end up calling the hc_upate
more than once. Also alternate stacks can keep there srtt/rttvar
in different formats (example rack keeps its values in microseconds).
Since we call the hc_update *before* the stack fini() then the
values will be in the wrong format.
Rack on the other hand, needs to convert items pulled from the
hostcache into its internal format else it may end up with
very much incorrect values from the hostcache. In the process
lets commonize the update mechanism for srtt/rttvar since we
now have more than one place that needs to call it.
Kristof Provost [Tue, 4 May 2021 17:23:15 +0000 (19:23 +0200)]
in6_mcast: Return EADDRINUSE when we've already joined the group
Distinguish between truly invalid requests and those that fail because
we've already joined the group. Both cases fail, but differentiating
them allows userspace to make more informed decisions about what the
error means.
For example. radvd tries to join the all-routers group on every SIGHUP.
This fails, because it's already joined it, but this failure should be
ignored (rather than treated as a sign that the interface's multicast is
broken).
This puts us in line with OpenBSD, NetBSD and Linux.
sbin/ipfw: Fix parsing error in table based forward
The argument parser does not recognise the optional port for an
"tablearg" argument. Fix simplifies the code by make the internal
representation expicit for the parser.
Rick Macklem [Sat, 8 May 2021 00:30:56 +0000 (17:30 -0700)]
nfscl: Add support for va_birthtime to NFSv4
There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.
Randall Stewart [Fri, 7 May 2021 21:32:32 +0000 (17:32 -0400)]
This takes Warners suggested approach to making it so that
platforms that for whatever reason cannot include the RATELIMIT option
can still work with rack. It adds two dummy functions that rack will
call and find out that the highest hw supported b/w is 0 (which
kinda makes sense and rack is already prepared to handle).
Reviewed by: Michael Tuexen, Warner Losh
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30163
Fix panic when trying to delete non-existent gateway in multipath route.
IF non-existend gateway was specified, the code responsible for calculating
an updated nexthop group, returned the same already-used nexthop group.
After the route table update, the operation result contained the same
old & new nexthop groups. Thus, the code responsible for decomposing
the notification to the list of simple nexthop-level notifications,
was not able to find any differences. As a result, it hasn't updated any
of the "simple" notification fields, resulting in empty rtentry pointer.
This empty pointer was the direct reason of a panic.
Fix the problem by returning ESRCH when the new nexthop group is the same
as the old one after applying gateway filter.
Reported by: Michael <michael.adm at gmail.com>
PR: 255665
MFC after: 3 days
This allows us to kill states created from a rule with route-to/reply-to
set. This is particularly useful in multi-wan setups, where one of the
WAN links goes down.
Submitted by: Steven Brown
Obtained from: https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30058
Mark Johnston [Fri, 7 May 2021 18:27:58 +0000 (14:27 -0400)]
divert: Fix mbuf ownership confusion in div_output()
div_output_outbound() and div_output_inbound() relied on the caller to
free the mbuf if an error occurred. However, this is contrary to the
semantics of their callees, ip_output(), ip6_output() and
netisr_queue_src(), which always consume the mbuf. So, if one of these
functions returned an error, that would get propagated up to
div_output(), resulting in a double free.
Fix the problem by making div_output_outbound() and div_output_inbound()
responsible for freeing the mbuf in all cases.
Reported by: Michael Schmiedgen <schmiedgen@gmx.net>
Tested by: Michael Schmiedgen
Reviewed by: donner
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30129
Mark Johnston [Fri, 7 May 2021 18:24:37 +0000 (14:24 -0400)]
linker_set: Disable ASAN only in userspace
KASAN does not insert redzones around global variables and so is not
susceptible to the problem that led to us disabling ASAN for linker set
elements in the first place (see commit fe3d8086fb6f).
Reviewed by: andrew, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30126
Randall Stewart [Fri, 7 May 2021 18:06:43 +0000 (14:06 -0400)]
Fix a UDP tunneling issue with rack. Basically there are two
issues.
A) Not enough hdrlen was being calculated when a UDP tunnel is
in place.
and
B) Not enough memory is allocated in racks fsb. We need to
overbook the fsb to include a udphdr just in case.
Submitted by: Peter Lei
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30157
This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways. Replace (almost) all such places with
proper methods.
Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.
This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
Fedor Uporov [Thu, 18 Feb 2021 07:48:10 +0000 (10:48 +0300)]
Improve extents verification logic.
It is possible to walk thru inode extents if EXT2FS_PRINT_EXTENTS
macro is defined. The extents headers magics and physical blocks
ranges are checked during extents walk.
Andriy Gapon [Fri, 7 May 2021 07:17:57 +0000 (10:17 +0300)]
storvsc: fix auto-sense reporting
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Fedor Uporov [Thu, 18 Feb 2021 08:26:50 +0000 (11:26 +0300)]
Add chr/blk devices support.
The dev field is placed into the inode structure.
The major/minor numbers conversion to/from linux compatile
format happen during on-disk inodes writing/reading.
Marcin Wojtas [Fri, 7 May 2021 01:45:22 +0000 (03:45 +0200)]
sdhci_fsl_fdt: specify base clk divisor per SoC
Only LS1046A and LS1028A require the base clk to be divided by 2.
Implement that by moving the divider to a SoC specific data.
This commit fixes base clk setup for the entire SoC family,
including the already suported LS2160A.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30120
Warner Losh [Thu, 6 May 2021 22:20:19 +0000 (16:20 -0600)]
headers: Implement _ISOC11_SOURCES macro when __POSIX_C_SOURCE defined
When _ISOC11_SOURCES is defined for glibc at the same time
__POSIX_C_SOURCE is defined, it extends the __POSIX_C_SOURCE definition
by exaclty what C11 adds to the spec for each system header. We follow
both OpenBSD's and glibc's convention by also C11 or higher compliation
mode is selected.
The Open Group is working on issuing a new version of the POSIX standard
that will realign the standard from C99 to a newer version of C. This
commit is a stop-gap measure for greater compatibility until that
environment has been standardized.
Warner Losh [Thu, 6 May 2021 19:05:09 +0000 (13:05 -0600)]
boot: fix OBJS to not include BTX's crt0.o
According to comments in the Makefile, to make pxeboot work we need to
have crt0.o first. This is needed because the simplified loader in
pxeboot assumes that the startup code is at offset 0 in this binary. In
normal booting, the start address can be obtained from headers of the
binary, but since pxeboot encodes this as a pure binary, it has no way
of knowing where that is and assumes 0. Added comments to that effect
in the Makefile.
We've done this by adding it to OBJS before all the other .o's are
added. However, there's a problem. This also adds it to the CLEANFILES
variable, which causes it to be removed from multiple places. The
dependencies may also cause it to be re-built at a time that's after
boot2 is built. This causes installs to fail because at install time
boot2 is considered to be out of date and the programs to rebuild it are
no longer in the path.
Cope with this problem by just adding it to LDFLAGS instead.
Glanced at by: kevans ("I thought that went in ages ago")
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D28876
Andriy Gapon [Thu, 6 May 2021 18:49:37 +0000 (21:49 +0300)]
PCI hot-plug: use dedicated taskqueue for device attach / detach
Attaching and detaching devices can be heavy-weight and detaching can
sleep waiting for events. For that reason using the system-wide
single-threaded taskqueue_thread is not really appropriate.
There is even a possibility for a deadlock if taskqueue_thread is used
for detaching.
In fact, there is an easy to reproduce deadlock involving nvme, pass
and a sudden removal of an NVMe device.
A pass peripheral would not release a reference on an nvme sim until
pass_shutdown_kqueue() is executed via taskqueue_thread. But the
taskqueue's thread is blocked in nvme_detach() -> ... -> cam_sim_free()
because of the outstanding reference.
Alan Somers [Thu, 22 Apr 2021 21:09:03 +0000 (15:09 -0600)]
gmultipath: make physpath distinct from the underlying providers'
zfsd uses a device's physical path attribute to automatically replace a
missing ZFS disk when a blank disk is inserted into the same physical
slot. Currently gmultipath passes through its underlying providers'
physical path attribute. That may cause zfsd to replace a missing
gmultipath provider with a newly arrived, single-path disk. That would
be bad.
This commit fixes that problem by simply appending "/mp" to the
underlying providers' physical path, in a manner similar to what geli
already does.
Randall Stewart [Thu, 6 May 2021 15:22:26 +0000 (11:22 -0400)]
This brings into sync FreeBSD with the netflix versions of rack and bbr.
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..