Marcin Wojtas [Fri, 12 Feb 2021 15:41:49 +0000 (16:41 +0100)]
Disable PIE for MIPS ubldr
When performing buildworld for MIPS with PIE enabled, the build fails
with "position-independent code requires '-mabicalls'" message.
-mno-abicalls and -fno-pic flags are explicitly set in MIPS ubldr
makefile, so to work around this problem, set MK_PIE=no for MIPS
ubldr.
Nathan Whitehorn [Wed, 24 Feb 2021 15:31:44 +0000 (10:31 -0500)]
Add GPT PREP-boot type to mkimg(1) from geom_gpt.
This partition type can be used to boot some PowerKVM VMs. We don't
support it well because of some limitations in SLOF, but it's worth at
least have feature parity in geom and mkimg.
Mark Johnston [Wed, 24 Feb 2021 15:08:53 +0000 (10:08 -0500)]
iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice. This was introduced when
pfil hooks were added to the iflib receive path. We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.
PR: 253583
Reviewed by: gallatin, erj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28900
Call softdep_prealloc() before taking ffs_lock_ea(), if unlock is committing
softdep_prealloc() must be called to ensure enough journal space is
available, before ffs_extwrite(). Also it must be done before taking
ffs_lock_ea(), because it calls ffs_syncvnode(), potentially dropping
the vnode lock.
Reviewed by: mckusick
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
ffs_lock_ea is after the vnode lock, so vnode must not be relocked under
lock_ea. Move ffs_truncate() call in ffs_close_ea() after the lock_ea is
dropped, and only truncate to length zero, since this is the only mode
supported by ffs_truncate() for EAs. Previously code did truncation and
then write.
Zero the part of the ext area that is unused, if truncation is due but not
done because ea area is not zero-length.
Reviewed by: mckusick
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
ffs: do not call softdep_prealloc() from UFS_BALLOC()
Do it in ffs_write(), where we can gracefuly handle relock and its
consequences. In particular, recheck the v_data to see if the vnode
reclamation ended, and return EBADF when we cannot proceed with the
write.
Reviewed by: mckusick
Reported by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Cy Schubert [Wed, 24 Feb 2021 05:12:49 +0000 (21:12 -0800)]
rc: fix parse of $local_startup
77e1ccbee3ed6c837929e4e232fd07f95bfc8294 introduced parallel execution
of rc. It separated groups with line feeds (\n) and elements within
groups using spaces. This is a natural separation due to rcorder
using spaces and lines to separate elements within groups with groups
of services separated by line feeds.
77e1ccbee3ed6c837929e4e232fd07f95bfc8294 parses the output from rcorder
by setting $IFS. However it failed to reset $IFS to default ' \t\n'
prior to calling find_local_scripts_new(), causing find_local_scripts_new()
to fail parsing $local_startup for site-specific local rc scripts, i.e.
${LOCALBASE}/etc/rc.d. This caused daemons from ports and packages such
as postfix, dovecot, nut, and others in ${LOCALBASE} not to be started.
Nathan Whitehorn [Wed, 24 Feb 2021 03:17:20 +0000 (22:17 -0500)]
Delete memstick images for PowerPC.
These images only ever worked on Apple Powermacs, which are now a very
old platform, and did so only for a very loose definition of "worked"
(they booted on a small subset of supported machines). Moreover, all
the machines they *did* boot on also would boot from a memstick made
by dd'ing an CD image to a flash drive. Since a flash drive prepared
in this way would also boot all the newer systems we support, the
memstick images were strictly less functional than the CD images, even
for booting from memory sticks.
Reviewed by: jhibbits
MFC after: 1 week
Mark Johnston [Wed, 24 Feb 2021 02:15:50 +0000 (21:15 -0500)]
rmlock: Add a required compiler membar to the rlock slow path
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue. Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other. This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.
Nathan Whitehorn [Tue, 23 Feb 2021 21:16:52 +0000 (16:16 -0500)]
Mount the EFI system partition (ESP) on newly-installed systems.
Per hier(7), the ESP will be mounted at /boot/efi. On UFS systems,
any existing ESP will be reused and mounted there; otherwise, a new one
will be made. On ZFS systems, space for an ESP is allocated on all disks
in the root pool, but only the partition actually used to boot is set up
and mounted.
This makes future upgrades of the EFI loader easier (upgrade scripts can
just change /boot/efi) and also greatly simplifies the parts of the
installer involved in initialization of the ESP. It also makes the
installer's behavior correspond to the documentation in hier(7).
Dimitry Andric [Tue, 23 Feb 2021 20:03:32 +0000 (21:03 +0100)]
Build lib/msun tests with compiler builtins disabled
This forces the compiler to emit calls to libm functions, instead of
possibly substituting pre-calculated results at compile time, which
should help to actually test those functions.
Allan Jude [Tue, 23 Feb 2021 20:06:16 +0000 (20:06 +0000)]
iicsmb: Request the bus recursively in bread()
ipmi_ssif will `smbus_request_bus()` to do multiple smbus requests
(which requests the iicbus), and then here in `bread()` we also need to
request the bus because `bread()` takes multiple transactions.
This causes deadlock as it's waiting for the bus it already has without
`IIC_RECURSIVE`.
Warner Losh [Tue, 23 Feb 2021 19:33:26 +0000 (12:33 -0700)]
camcontrol: change hueristic for I/O-less devtype
Some SATA drives have 'config' set to 0 in the identify block. Rather than rely
on it, use the strings windows uses to display the drive since they are supposed
to be space padded and will always be non-zero.
release(7): Remove stray references to DOC* variables
We now live in the world of git, and release(7) should reflect that.
As of the commit referenced below, release images also no longer
include (stale) documentation, as the documentation has moved to
AsciiDoctor. This means that a few environment variables no longer
make sense, so remove them from their sections and mention them in
the compatibility section instead.
While here, also pet mandoc.
PR: 253615
MFC after: 3 days
MFC with: f61e92ca5a23 release: permanently remove the 'reldoc'
target and associates
Roger Pau Monné [Tue, 23 Feb 2021 14:56:27 +0000 (15:56 +0100)]
stand/multiboot2: fix header length check
Check whether we have reached the end of the buffer using search_size
instead of MULTIBOOT_SEARCH, which is the maximum defined by the
specification, but the file can be shorter than that.
This prevents printing a harmless error message when loading a file
that is smaller than MULTIBOOT_SEARCH.
Sponsored by: Citrix Systems R&D
MFC after: 3 days Fixes: adda2797eb2a ('stand/multiboot2: add support for booting a Xen dom0 in UEFI mode')
Kristof Provost [Mon, 22 Feb 2021 07:19:43 +0000 (08:19 +0100)]
arp/nd: Cope with late calls to iflladdr_event
When tearing down vnet jails we can move an if_bridge out (as
part of the normal vnet_if_return()). This can, when it's clearing out
its list of member interfaces, change its link layer address.
That sends an iflladdr_event, but at that point we've already freed the
AF_INET/AF_INET6 if_afdata pointers.
In other words: when the iflladdr_event callbacks fire we can't assume
that ifp->if_afdata[AF_INET] will be set.
Kristof Provost [Sun, 21 Feb 2021 20:20:32 +0000 (21:20 +0100)]
bridge: Remove members when assigned to a new vnet
When the bridge is moved to a different vnet we must remove all of its
member interfaces (and span interfaces), because we don't know if those
will be moved along with it. We don't want to hold references to
interfaces not in our vnet.
Kristof Provost [Sat, 20 Feb 2021 09:11:30 +0000 (10:11 +0100)]
bridge: Support STP on VLAN devices
VLAN devices have type IFT_L2VLAN, so the STP code mistakenly believed
they couldn't be used for STP. That's not the case, so add the
ITF_L2VLAN to the check.
Jamie Gritton [Tue, 23 Feb 2021 01:04:06 +0000 (17:04 -0800)]
jail: Don't allow jails under dying parents
If a jail is created with jail_set(...JAIL_DYING), and it has a parent
currently in a dying state, that will bring the parent jail back to
life. Restrict that to require that the parent itself be explicitly
brought back first, and not implicitly created along with the new
child jail.
Simplify ifa/ifp refcounting in the routing stack.
The routing stack control depends on quite a tree of functions to
determine the proper attributes of a route such as a source address (ifa)
or transmit ifp of a route.
When actually inserting a route, the stack needs to ensure that ifa and ifp
points to the entities that are still valid.
Validity means slightly more than just pointer validity - stack need guarantee
that the provided objects are not scheduled for deletion.
Currently, callers either ignore it (most ifp parts, historically) or try to
use refcounting (ifa parts). Even in case of ifa refcounting it's not always
implemented in fully-safe manner. For example, some codepaths inside
rt_getifa_fib() are referencing ifa while not holding any locks, resulting in
possibility of referencing scheduled-for-deletion ifa.
Instead of trying to fix all of the callers by enforcing proper refcounting,
switch to a different model.
As the rib_action() already requires epoch, do not require any stability guarantees
other than the epoch-provided one.
Use newly-added conditional versions of the refcounting functions
(ifa_try_ref(), if_try_ref()) and fail if any of these fails.
Add if_try_ref() to simplify refcount handling inside epoch.
When we have an ifp pointer and the code is running inside epoch,
epoch guarantees the pointer will not be freed.
However, the following case can still happen:
* in thread 1 we drop to refcount=0 for ifp and schedule its deletion.
* in thread 2 we use this ifp and reference it
* destroy callout kicks in
* unhappy user reports a bug
This can happen with the current implementation of ifnet_byindex_ref(),
as we're not holding any locks preventing ifnet deletion by a parallel thread.
To address it, add if_try_ref(), allowing to return failure when
referencing ifp with refcount=0.
Additionally, enforce existing if_ref() is with KASSERT to provide a
cleaner error in such scenarios.
Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL
if the latter fails.
The previous implementation was reported to try to coalesce packets
in situations when it should not, that resulted in assertion later.
This implementation better checks the first packet of the chain for
the coallescing elligibility.
Toomas Soome [Sun, 21 Feb 2021 10:32:18 +0000 (12:32 +0200)]
loader: autoload_font will hung loader when there is no local console
If we start with console set to comconsole, the local
console (vidconsole, efi) is never initialized and attempt to
use the data can render the loader hung.
Mark Johnston [Mon, 22 Feb 2021 20:50:09 +0000 (15:50 -0500)]
vm_kern: Avoid sign extension in the KVA_QUANTUM definition
Otherwise, on a powerpc64 NUMA system with hashed page tables, the
first-level superpage reservation size is large enough that the value of
the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is
negative and gets sign-extended when passed to vmem_set_import(). This
results in a boot-time hang on such platforms.
Jamie Gritton [Mon, 22 Feb 2021 20:27:44 +0000 (12:27 -0800)]
jail: Add PD_KILL to remove a prison in prison_deref().
Add the PD_KILL flag that instructs prison_deref() to take steps
to actively kill a prison and its descendents, namely marking it
PRISON_STATE_DYING, clearing its PR_PERSIST flag, and killing any
attached processes.
This replaces a similar loop in sys_jail_remove(), bringing the
operation under the same single hold on allprison_lock that it already
has. It is also used to clean up failed jail (re-)creations in
kern_jail_set(), which didn't generally take all the proper steps.
Dimitry Andric [Mon, 22 Feb 2021 20:01:09 +0000 (21:01 +0100)]
Fix possibly unitialized variables in __cxa_demangle_gnu3()
After 0ee0dbfb0d26cf4bc37f24f12e76c7f532b0f368 where I imported a more
recent libcxxrt snapshot, the variables 'rtn' and 'has_ret' could in
some cases be used while still uninitialized. Most obviously this would
lead to a jemalloc complaint about a bad free(), aborting the program.
Fix this by initializing a bunch variables in their declarations. This
change has also been sent upstream, with some additional changes to be
used in their testing framework.
Cy Schubert [Tue, 16 Feb 2021 15:44:07 +0000 (07:44 -0800)]
ipfilter: Make LARGE_NAT a tunable.
LARGE_NAT is a C macro that increases
NAT_SIZE from 127 to 2047,
RDR_SIZE from 127 to 2047,
HOSTMAP_SIZE from 2047 to 8191,
NAT_TABLE_MAX from 30000 to 180000, and
NAT_TABLE_SZ from 2047 to 16383.
These values can be altered at runtime using the ipf -T command however
some adminstrators of large firewalls rebuild the kernel to enable
LARGE_NAT at boot. This revision adds the tunable net.inet.ipf.large_nat
which allows an administrator to set this option at boot instead of build
time. Setting the LARGE_NAT macro to 1 is unaffected allowing build-time
users to continue using the old way.
Cy Schubert [Fri, 12 Feb 2021 15:17:32 +0000 (07:17 -0800)]
Remove the redundant ipfilter IPv6 rc rules load.
As of ipfilter 5.1.2 the IPv4 and IPv6 rules tables have been merged.
The ipf(8) -6 option has been a NOP since then. Currently the additional
ipf -6 load statement in rc.d/ipfilter simply added the second ipfilter
rules file to the table already populated by the previous ipf command.
Plenty of time has passed since ipfilter 5.1.2 was imported. It is time to
remove the option from rc.conf and the rc script.
The makefs msdosfs code includes fs/msdosfs/denode.h which directly uses
struct buf from <sys/buf.h> rather than the makefs struct m_buf.
To work around this problem provide a local denode.h that includes
ffs/buf.h and defines buf as an alias for m_buf.
Alex Richardson [Mon, 22 Feb 2021 17:19:06 +0000 (17:19 +0000)]
Update libm tests from NetBSD
I did this without a full vendor update since that would cause too many
conflicts. Since these files now almost match the NetBSD sources the
next git subtree merge should work just fine.
Fix for natd(8) sending wrong sequence number after TCP retransmission,
terminating a TCP connection.
If a TCP packet must be retransmitted and the data length has changed in the
retransmitted packet, due to the internal workings of TCP, typically when ACK
packets are lost, then there is a 30% chance that the logic in GetDeltaSeqOut()
will find the correct length, which is the last length received.
This can be explained as follows:
If a "227 Entering Passive Mode" packet must be retransmittet and the length
changes from 51 to 50 bytes, for example, then we have three cases for the
list scan in GetDeltaSeqOut(), depending on how many prior packets were
received modulus N_LINK_TCP_DATA=3:
case 1: index 0: original packet 51
index 1: retransmitted packet 50
index 2: not relevant
case 2: index 0: not relevant
index 1: original packet 51
index 2: retransmitted packet 50
case 3: index 0: retransmitted packet 50
index 1: not relevant
index 2: original packet 51
This patch simply changes the searching order for TCP packets, always starting
at the last received packet instead of any received packet, in
GetDeltaAckIn() and GetDeltaSeqOut().
Roger Pau Monné [Wed, 20 Jan 2021 18:40:51 +0000 (19:40 +0100)]
xen-blkback: fix leak of grant maps on ring setup failure
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.
Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.
Mark Johnston [Mon, 22 Feb 2021 15:03:37 +0000 (10:03 -0500)]
m_uiotombuf_nomap(): Stop clearing PG_ZERO in newly allocated pages
The caller should not be passing M_ZERO in the first place, so PG_ZERO
will not be preserved by the page allocator and clearing it accomplishes
nothing.
Reviewed by: gallatin, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28808
Notable upstream changes: 778869fa1 Fix reporting of mount progress e7adccf7f Disable use of hardware crypto offload drivers on FreeBSD 03e02e5b5 Fix checksum errors not being counted on repeated repair 64e0fe14f Restore FreeBSD resource usage accounting 11f2e9a49 Fix panic if scrubbing after removing a slog device
Alexander Motin [Sun, 21 Feb 2021 21:45:14 +0000 (16:45 -0500)]
Refactor CTL datamove KPI.
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion. It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
- Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove(). It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
- Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.
Jamie Gritton [Sun, 21 Feb 2021 21:24:47 +0000 (13:24 -0800)]
jail: Add pr_state to struct prison
Rather that using references (pr_ref and pr_uref) to deduce the state
of a prison, keep track of its state explicitly. A prison is either
"invalid" (pr_ref == 0), "alive" (pr_uref > 0) or "dying"
(pr_uref == 0).
State transitions are generally tied to the reference counts, but with
some flexibility: a new prison is "invalid" even though it now starts
with a reference, and jail_remove(2) sets the state to "dying" before
the user reference count drops to zero (which was prviously
accomplished via the PR_REMOVE flag).
pr_state is protected by both the prison mutex and allprison_lock, so
it has the same availablity guarantees as the reference counts do.
Notable changes:
- fix reporting of mount progress (778869fa1)
- disable use of hardware crypto offload drivers on FreeBSD (e7adccf7f)
- fix checksum errors not being counted on repeated repair (03e02e5b5)
- restore FreeBSD resource usage accounting (64e0fe14f)
- fix panic if scrubbing after removing a slog device (11f2e9a49)
Jamie Gritton [Sun, 21 Feb 2021 18:55:44 +0000 (10:55 -0800)]
jail: Change the locking around pr_ref and pr_uref
Require both the prison mutex and allprison_lock when pr_ref or
pr_uref go to/from zero. Adding a non-first or removing a non-last
reference remain lock-free. This means that a shared hold on
allprison_lock is sufficient for prison_isalive() to be useful, which
removes a number of cases of lock/check/unlock on the prison mutex.
Expand the locking in kern_jail_set() to keep allprison_lock held
exclusive until the new prison is valid, thus making invalid prisons
invisible to any thread holding allprison_lock (except of course the
one creating or destroying the prison). This renders prison_isvalid()
nearly redundant, now used only in asserts.
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.
Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition. Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D28679
Mateusz Guzik [Sun, 21 Feb 2021 00:42:26 +0000 (00:42 +0000)]
amd64: implement strlen in assembly, take 2
Tested with glibc test suite.
The C variant in libkern performs excessive branching to find the zero
byte instead of using the bsfq instruction. The same code patched to use
it is still slower than the routine implemented here as the compiler
keeps neglecting to perform certain optimizations (like using leaq).
On top of that the routine can be used as a starting point for copyinstr
which operates on words intead of bytes.
The previous attempt had an instance of swapped operands to andq when
dealing with fully aligned case, which had a side effect of breaking the
code for certain corner cases. Noted by jrtc27.
Jamie Gritton [Sat, 20 Feb 2021 22:38:58 +0000 (14:38 -0800)]
jail: Improve locking when removing prisons
Change the flow of prison_deref() so it doesn't let go of allprison_lock
until it's completely done using it (except for a possible drop as part
of an upgrade on its first try).
Differential Revision: https://reviews.freebsd.org/D28458
MFC after: 3 days