Roger Pau Monné [Wed, 20 Jan 2021 18:40:51 +0000 (19:40 +0100)]
xen-blkback: fix leak of grant maps on ring setup failure
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.
Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.
This is part of XSA-361.
Approved by: so
Security: CVE-2021-26932
Security: FreeBSD-SA-21:06.xen
Security: XSA-361
Sponsored by: Citrix Systems R&D
The existing logic is nice in theory, but in practice freebsd-update will
not preserve the timestamps on these files. When doing a major upgrade, e.g.
from 12.1-RELEASE -> 12.2-RELEASE, pwd.mkdb et al. appear in the INDEX and
we clobber the timestamp several times in the process of packaging up the
existing system into /var/db/freebsd-update/files and extracting for
comparisons. This leads to these files not getting regenerated when they're
most likely to be needed.
Measures could be taken to preserve timestamps, but it's unclear whether
the complexity and overhead of doing so is really outweighed by the marginal
benefit.
I observed this issue when pkg subsequently failed to install a package that
wanted to add a user, claiming that the user was removed in the process.
bapt@ pointed to this pre-existing bug with freebsd-update as the cause.
PR: 234014, 232921
Approved by: so
Security: FreeBSD-EN-21:08.freebsd-update
Jamie Gritton [Fri, 19 Feb 2021 22:13:35 +0000 (14:13 -0800)]
MFC jail: Change both root and working directories in jail_attach(2)
jail_attach(2) performs an internal chroot operation, leaving it up to
the calling process to assure the working directory is inside the jail.
Add a matching internal chdir operation to the jail's root. Also
ignore kern.chroot_allow_open_directories, and always disallow the
operation if there are any directory descriptors open.
Jamie Gritton [Tue, 16 Feb 2021 19:19:13 +0000 (11:19 -0800)]
MFC jail: Handle a possible race between jail_remove(2) and fork(2)
jail_remove(2) includes a loop that sends SIGKILL to all processes
in a jail, but skips processes in PRS_NEW state. Thus it is possible
the a process in mid-fork(2) during jail removal can survive the jail
being removed.
Add a prison flag PR_REMOVE, which is checked before the new process
returns. If the jail is being removed, the process will then exit.
Also check this flag in jail_attach(2) which has a similar issue.
Roger Pau Monné [Wed, 25 Nov 2020 11:34:38 +0000 (12:34 +0100)]
xen: allow limiting the amount of duplicated pending xenstore watches
Xenstore watches received are queued in a list and processed in a
deferred thread. Such queuing was done without any checking, so a
guest could potentially trigger a resource starvation against the
FreeBSD kernel if such kernel is watching any user-controlled xenstore
path.
Allowing limiting the amount of pending events a watch can accumulate
to prevent a remote guest from triggering this resource starvation
issue.
For the PV device backends and frontends this limitation is only
applied to the other end /state node, which is limited to 1 pending
event, the rest of the watched paths can still have unlimited pending
watches because they are either local or controlled by a privileged
domain.
The xenstore user-space device gets special treatment as it's not
possible for the kernel to know whether the paths being watched by
user-space processes are controlled by a guest domain. For this reason
watches set by the xenstore user-space device are limited to 1000
pending events. Note this can be modified using the
max_pending_watch_events sysctl of the device.
This is XSA-349.
Sponsored by: Citrix Systems R&D
MFC after: 3 days
Mark Johnston [Sun, 3 Jan 2021 16:32:30 +0000 (11:32 -0500)]
Ensure that dirent's d_off field is initialized
We have the d_off field in struct dirent for providing the seek offset
of the next directory entry. Several filesystems were not initializing
the field, which ends up being copied out to userland.
Alan Somers [Tue, 1 Dec 2020 15:15:18 +0000 (15:15 +0000)]
Fix error merging r354116 from OpenZFS
When we merged 4c0883fb4af0d5565459099b98fcf90ecbfa1ca1 from OpenZFS (svn
r354116), there were some merge conflicts. One of those was resolved
incorrectly, causing "zfs receive" to fail to delete snapshots that a "zfs
send -R" stream has deleted.
This change corrects that merge conflict, and also reduces some harmless
diffs vis-a-vis OpenZFS that were also introduced by the same revision.
Direct commit to stable/12 because head has moved on to OpenZFS.
MFC r368237: if: Fix panic when destroying vnet and epair simultaneously
When destroying a vnet and an epair (with one end in the vnet) we often
panicked. This was the result of the destruction of the epair, which destroys
both ends simultaneously, happening while vnet_if_return() was moving the
struct ifnet to its home vnet. This can result in a freed ifnet being re-added
to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
used.
Prevent this race by ensuring that vnet_if_return() cannot run at the same time
as if_detach() or epair_clone_destroy().
Glen Barber [Fri, 23 Oct 2020 00:00:52 +0000 (00:00 +0000)]
- Switch releng/12.2 from RC3 to RELEASE.
- Add the anticipated 12.2-RELEASE date to UPDATING. Fix
a missing colon in the previous UPDATING entry while here.
- Set a static __FreeBSD_version.
Approved by: re (implicit)
Sponsored by: Rubicon Communications, LLC (netgate.com)
Allan Jude [Thu, 15 Oct 2020 15:07:25 +0000 (15:07 +0000)]
ZFS: whitelist zstd and encryption in the loader
MFC r364787:
MFS r366593:
Please note that neither zstd nor encryption is
supported by the loader at this instant. This
change makes it safe to use those features in
one's root pool, but not in one's root dataset.
Warner Losh [Wed, 14 Oct 2020 01:47:00 +0000 (01:47 +0000)]
MFS12: r366422 r366588
r366588: fixes video display heuristic that prevented byhve and vmware
from detecting dual consoles.
r366422: Report the kernel console on the boot screen
Report what console the boot loader is telling the kernel to use and allow
toggling between them.
Michael Tuexen [Thu, 1 Oct 2020 18:17:56 +0000 (18:17 +0000)]
MFS r366324:
Improve the handling of receiving unordered and unreliable user
messages using DATA chunks. Don't use fsn_included when not being
sure that it is set to an appropriate value. If the default is
used, which is -1, this can result in SCTP associaitons not
making any user visible progress.
Thanks to Yutaka Takeda for reporting this issue for the the
userland stack in https://github.com/pion/sctp/issues/138.
MFS r366329:
Improve the input validation and processing of cookies.
This avoids setting the association in an inconsistent
state, which could result in a use-after-free situation.
This can be triggered by a malicious peer, if the peer
can modify the cookie without the local endpoint recognizing
it.
Thanks to Ned Williamson for reporting the issue.
John Baldwin [Thu, 1 Oct 2020 17:30:38 +0000 (17:30 +0000)]
MFS 366297: Revert most of r360179.
I had failed to notice that sgsendccb() was using cam_periph_mapmem()
and thus was not passing down user pointers directly to drivers. In
practice this broke requests submitted from userland.
Rick Macklem [Tue, 29 Sep 2020 15:09:38 +0000 (15:09 +0000)]
MFS: r366238
Bjorn reported a problem where the Linux NFSv4.1 client is
using an open_to_lock_owner4 when that lock_owner4 has already
been created by a previous open_to_lock_owner4. This caused the NFS server
to reply NFSERR_INVAL.
For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes
that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify
what error to return).
For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661,
but the NFSv4.1 server can handle this case without error.
This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple
uses of the same lock_owner in open_to_lock_owner so that it now correctly
interoperates with the Linux NFS client.
It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID.
Thanks go to Bjorn for diagnosing this and testing the patch.
He also provided a program that I could use to reproduce the problem.
Stefan Eßer [Mon, 28 Sep 2020 14:47:36 +0000 (14:47 +0000)]
MF12 r366218:
Add documentation of the build options WITH_GH_BC and WITHOUT_GH_BC to
optionally replace the traditional implementation of bc(1) and dc(1) with
the new implementation that has become the default version in -CURRENT.
The man-page differs from the one in -CURRENT due to different default
values of that build option.
Alan Somers [Mon, 28 Sep 2020 00:23:59 +0000 (00:23 +0000)]
MF stable/12 r366190:
fusefs: fix mmap'd writes in direct_io mode
If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
instructs the kernel to bypass the page cache for that file. This feature
is also known by libfuse's name: "direct_io".
However, when accessing a file via mmap, there is no possible way to bypass
the cache completely. This change fixes a deadlock that would happen when
an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
that a write couldn't possibly come from cache if direct_io were set.
Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
But allowing it is less likely to cause user complaints, and is more in
keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
"reduce", not "eliminate" cache effects.
PR: 247276
Approved by: re (gjb)
Reported by: trapexit@spawn.link
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D26485
MFS r365987: certctl rehash upon install/distribute
r365829:
installworld: run `certctl rehash` after installation completes
This was originally introduced back in r360833, and subsequently reverted
because it was broken for -DNO_ROOT builds and it may not have been the
correct place for it.
While debatably this may still not be 'the correct place,' it's much cleaner
than scattering rehashes all throughout the tree. brooks has fixed the issue
with -DNO_ROOT by properly writing to the METALOG in r361397.
Do note that this is different than what was originally committed; brooks
had revisions in D24932 that made it actually use the revised unprivileged
mode and write to METALOG, along with being a little more friendly to
foreign crossbuilds and just using the certctl in-tree.
With this change, I believe we should now have a populated /etc/ssl/certs in
the VM images.
r365837:
Promote the installworld `certctl rehash` to distributeworld
Contrary to my belief, installworld is not sufficient for getting certs
installed into VM images. Promote the rehash to both installworld and
distributeworld (notably: not stageworld) and rehash the base distdir so we
end up with /etc/ssl/certs populated in the base dist archive. A future
commit will remove the rehash from bsdinstall, which doesn't really need to
happen if they're installed into base.txz.
While here, fix a minor typo: s/CERTCLTFLAGS/CERTCTLFLAGS/
r365852:
Revert r361257: bsdinstall: do a `certctl rehash` upon installation [...]
As of r365829, any given base distribution set will now include the /etc/ssl
symlinks that this rehash would've otherwise installed. This extra step is
no longer required.
Rick Macklem [Thu, 24 Sep 2020 16:21:30 +0000 (16:21 +0000)]
MFS: r366050, r366117
Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures.
The code in nfsrv_checksequence() and nfsrv_bindconnsess() would call
SVC_RELEASE() with mutex(es) held. Normally this is ok, since
all that happens is SVC_RELEASE() decrements the reference count.
However, if the socket has just been shut
down, SVC_RELEASE() drops the reference count to 0 and acquires a sleep
lock during destruction of the server side krpc structure.
This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() and nfsrv_bindconnsess() down a few lines to
below where the mutex(es) are released.
Rick Macklem [Thu, 24 Sep 2020 14:59:10 +0000 (14:59 +0000)]
MFS: r365703
Fix a case where the NFSv4.0 server might crash if delegations are enabled.
asomers@ reported a crash on an NFSv4.0 server with a backtrace of:
kdb_backtrace
vpanic
panic
nfsrv_docallback
nfsrv_checkgetattr
nfsrvd_getattr
nfsrvd_dorpc
nfssvc_program
svc_run_internal
svc_thread_start
fork_exit
fork_trampoline
where the panic message was "docallb", which indicates that a callback
was attempted when the ClientID is unconfirmed.
This would not normally occur, but it is possible to have an unconfirmed
ClientID structure with delegation structure(s) chained off it if the
client were to issue a SetClientID with the same "id" but different
"verifier" after acquiring delegations on the previously confirmed ClientID.
The bug appears to be that nfsrv_checkgetattr() failed to check for
this uncommon case of an unconfirmed ClientID with a delegation structure
that no longer refers to a delegation the client knows about.
This patch adds a check for this case, handling it as if no delegation
exists, which is the case when the above occurs.
Although difficult to reproduce, this change should avoid the panic().
MFS r365667,r365920: extend kern.geom.part.check_integrity to work on GPT
There are multiple USB/SATA bridges on the market that unconditionally
cut some LBAs off connected media. This could be a problem
for pre-partitioned drives so GEOM complains and does not create
devices in /dev for slices/partitions preventing access to existing data.
We have a knob kern.geom.part.check_integrity that allows us to correct
partitioning if changed from default 1 to 0 but it works for MBR only.
If backup copy of GPT is unavailable due to decreased number of LBAs,
the kernel does not give access to partitions still and prints to dmesg:
GEOM: md0: corrupt or invalid GPT detected.
GEOM: md0: GPT rejected -- may not be recoverable.
This change makes it work for GPT too, so it created partitions in /dev
and prints to dmesg this instead:
GEOM: md0: the secondary GPT table is corrupt or invalid.
GEOM: md0: using the primary only -- recovery suggested.
Then "gpart recover" re-creates backup copy of GPT
and allows further manipulations with partitions.
This change is no-op for default configuration having
kern.geom.part.check_integrity=1
Allan Jude [Sat, 19 Sep 2020 20:46:56 +0000 (20:46 +0000)]
MFS r365689,r365808,r365860
MFOpenZFS: Introduce read/write kstats per dataset
The following patch introduces a few statistics on reads and writes
grouped by dataset. These statistics are implemented as kstats
(backed by aggregate sums for performance) and can be retrieved by
using the dataset objset ID number. The motivation for this change is
to provide some preliminary analytics on dataset usage/performance.
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
openzfs/zfs@a448a2557ec4938ed6944c7766fe0b8e6e5f6456
Also contains parts of:
MFOpenZFS: Connect dataset_kstats for FreeBSD
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com>
openzfs/zfs@4547fc4e071ceb1818b3a46c3035b923e06e5390
Approved by: re (gjb)
Relnotes: yes
Sponsored by: Klara Inc.
The first issue was lack of quoting around INSTALLFLAGS, which set it
incorrectly and produced an error on -M.
The second issue was that we weren't actually doing the install in
unprivileged mode, making it effectively useless. This was designed to pass
through the proper metalog/unpriv flags to install(1), so just let it
happen.
MFS12 r365838:
MFC r365646, r365720:
r365646:
Enclose BRANCH_OVERRIDE in quotes in order to fix an issue with
freebsd-update(8) builds, where BRANCH is suffixed with -p0 for
builds.
r365720 (gordon):
Partially revert r346018 and use the if/then construct instead of
shell.
Approved by: re (kib)
Sponsored by: Rubicon Communications, LLC (netgate.com)
Turn MALLOC_PRODUCTION into a regular src.conf(5) option
For historical reasons, defining MALLOC_PRODUCTION in /etc/make.conf has
been used to turn off potentially expensive debug checks and statistics
gathering in the implementation of malloc(3).
It seems more consistent to turn this into a regular src.conf(5) option,
e.g. WITH_MALLOC_PRODUCTION / WITHOUT_MALLOC_PRODUCTION. This can then
be toggled similar to any other source build option, and turned on or
off by default for e.g. stable branches.
Follow-up r365371 by removing sentences which indicate the state of the
MK_MALLOC_PRODUCTION option on -CURRENT.
Also, for the sake of backwards compatibility, support the old way of
enabling 'production malloc', e.g. by adding a define in make.conf(5).
MF12 r365671:
Follow-up r365662 (MFC of r365371 and r365373) by correctly setting
WITH_MALLOC_PRODUCTION for stable branches. Also add a note to UPDATING,
to inform users about the new setting.
Direct commit to stable/{11,12} as this does not apply to head.
Noticed by: imp, Ronald Klop <ronald-lists@klop.ws>
MF12 r365672:
Follow-up r365662 (MFC of r365371 and r365373) by also removing the
header hack from jemalloc_FreeBSD.h, which rendered any make.conf
MALLOC_PRODUCTION or src.conf WITH/WITHOUT_MALLOC_PRODUCTION irrelevant.
Direct commit to stable/{11,12} as this does not apply to head.
There have been several mentions on our mailing lists about missing
atomic functions in our system libraries (e.g. __atomic_load_8 and
friends), and recently I saw __bswapdi2 and __bswapsi2 mentioned too.
To address this, add implementations for the functions from compiler-rt
to the system compiler support libraries, e.g. libcompiler_rt.a and and
libgcc_s.so.
This also needs a small fixup in compiler-rt's atomic.c, to ensure that
32-bit mips can build correctly.
Bump __FreeBSD_version to make it easier for port maintainers to detect
when these functions were added.
After r364753, there should be no need to suppress -Watomic-alignment
warnings anymore for compiler-rt's atomic.c. This occurred because the
IS_LOCK_FREE_8 macro was not correctly defined to 0 for mips, and this
caused the compiler to emit a runtime call to __atomic_is_lock_free(),
and that triggers the warning.
MFC r365509:
Follow-up r364753 by enabling compiler-rt's atomic implementation only
for clang, as it uses clang specific builtins, and does not compile
correctly with gcc. Note that gcc packages usually come with their own
libatomic, providing these primitives.
MFC r365588:
Follow-up r364753 by only using arm's stdatomic.c implementation, as it
already covers the functions in compiler-rt's atomic.c, leading to
conflicts when linking.
Allan Jude [Sun, 13 Sep 2020 23:51:07 +0000 (23:51 +0000)]
MFC r360229, r363255
MFS r365614
r360229:
Add VIRTIO_BLK_T_DISCARD (TRIM) support to the bhyve virtio-blk backend
This will advertise support for TRIM to the guest virtio-blk driver and
perform the DIOCGDELETE ioctl on the backing storage if it supports it.
Thanks to Jason King and others at Joyent and illumos for expanding on
my original patch, adding improvements including better error handling
and making sure to following the virtio spec.
r363255:
Add VIRTIO_BLK_T_DISCARD support to the virtio-blk driver
If the hypervisor advertises support for the DISCARD command then the
guest can perform TRIM commands, freeing space on the backing store.
If VIRTIO_BLK_F_DISCARD is enabled, advertise DISKFLAG_CANDELETE
Tested with FreeBSD guests on bhyve and KVM
Approved by: re (gjb)
Relnotes: yes
Sponsored by: Klara Inc.
getlogin_r is specified by POSIX to to take a size_t len, not int. Fix our
version to do the same, bump the symbol version due to ABI change and
provide compat.
This was reported to break compilation of Ruby 2.8.
Some discussion about the necessity of the ABI compat did take place in the
review. While many 64-bit platforms would likely be passing it in a 64-bit
register and zero-extended and thus, not notice ABI breakage, some do
sign-extend (e.g. mips).
MFS r365681: certctl: fix hashed link generation with duplicate subjects
Currently, certctl rehash will just keep clobbering .0 rather than
incrementing the suffix upon encountering a duplicate. Do this, and do it
for blacklisted certs as well.
This also improves the situation with the blacklist to be a little less
flakey, comparing cert fingerprints for all certs with a matching subject
hash in the blacklist to determine if the cert we're looking at can be
installed.
Future work needs to completely revamp the blacklist to align more with how
it's described in PR 246614. In particular, /etc/ssl/blacklisted should go
away to avoid potential confusion -- OpenSSL will not read it, it's
basically certctl internal.
Merge WiFi net80211, drivers, and management in order to support better 11n
and upcoming 11ac.
This includes an ath(4) update, some run(4) 11n support, 11n for otus(4),
A-MPDU, A-MSDU, A-MPDU+A-MSDU and Fast frames options, scanning fixes,
enhanced PRIV checks for jails, restored parent device name printing,
improvements for upcoming VHT support, lots of under-the-hood infrastructure
improvements, new device ID, debug tools updates, some whitespace changes
(to make future MFCs easier).
This does not include (most) epoch(9) related changes as too much other
infrastructure was not merged for that.
Tested on: some ath(4) AP, run(4) STA, and rtwn(4) STA
Discussed with: adrian (extremely briefly)
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate") [partially]
There's a race where dying vnets move their interfaces back to their original
vnet, and if_epair cleanup (where deleting one interface also deletes the other
end of the epair). This is commonly triggered by the pf tests, but also by
cleanup of vnet jails.
As we've not yet been able to fix the root cause of the issue work around the
panic by not dereferencing a NULL softc in epair_qflush() and by not
re-attaching DYING interfaces.
This isn't a full fix, but makes a very common panic far less likely.
Alexander Motin [Fri, 11 Sep 2020 15:30:47 +0000 (15:30 +0000)]
MFC r342852 (by cem): powerpc: Fix regression introduced in r342771
In r342771, I introduced a regression in Power by abusing the platform
smp_topo() method as a shortcut for providing the MI information needed for
the stated sysctls. The smp_topo() method was already called later by
sched_ule (under the name cpu_topo()), and initializes a static array of
scheduler topology information. I had skimmed the smp_topo_foo() functions
and assumed they were idempotent; empirically, they are not (or at least,
detect re-initialization and panic).
Do the cleaner thing I should have done in the first place and add a
platform method specifically for core- and thread-count probing.
- Copy stable/12@r365545 to releng/12.2 as part of the 12.2-RELEASE
cycle.
- Update from PRERELEASE to BETA1.
- Set the default pkg(7) repository to 'quarterly'.
- Bump __FreeBSD_version.
- Prune svn:mergeinfo from the new branch.
Brooks Davis [Wed, 9 Sep 2020 23:11:55 +0000 (23:11 +0000)]
MFC r365284:
Always report ENOSYS in init
While rare, encountering an unimplemented system call early in init is
catastrophic and difficult to debug. Even after a SIGSYS handler is
registered, such configurations are problematic. As such, always report
such events for pid 1 (following kern.lognosys if non-zero).
These devices have non-pccard attachments. Warn for those as well. Both an and
wi don't do the modern cyrpto needed to use these cards on secure wifi networks.
an needs firmware from Cisco, which I don't think was ever produced. wi could
in theory do it with raw frames and on-host encryption, but nobody has written
that in the 15 years since WEP was cracked.
MFC After: 3 days
Noticed by: rgrimes
Differential Revision: https://reviews.freebsd.org/D26138
Add deprecation notice for apm bios, aka the apm(4) device. The apm(8)
command will remain, at least for a while, since ACPI emulates the apm
ioctl interface.
Discussed on: arch@
Relnotes: yes
MFC After: 3 days