mjg [Thu, 22 Nov 2018 21:08:37 +0000 (21:08 +0000)]
fork: fix use-after-free with vfork
The pointer to the child is stored without any reference held. Then it is
blindly used to wait until P_PPWAIT is cleared. However, if the child is
autoreaped it could have exited and get freed before the parent started
waiting.
Use the existing hold mechanism to mitigate the problem. Most common case
of doing exec remains unchanged. The corner case of doing exit performs
wake up before waiting for holds to clear.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18295
markj [Thu, 22 Nov 2018 20:49:41 +0000 (20:49 +0000)]
Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland. Fix a number of such handlers.
Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: tuexen
MFC after: 3 days
Security: kernel memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18301
tuexen [Thu, 22 Nov 2018 20:05:57 +0000 (20:05 +0000)]
A TCP stack is required to check SEG.ACK first, when processing a
segment in the SYN-SENT state as stated in Section 3.9 of RFC 793,
page 66. Ensure this is also done by the TCP RACK stack.
tuexen [Thu, 22 Nov 2018 19:56:52 +0000 (19:56 +0000)]
Ensure that the default RTT stack can make an RTT measurement if
the TCP connection was initiated using the RACK stack, but the
peer does not support the TCP RACK extension.
This ensures that the TCP behaviour on the wire is the same if
the TCP connection is initated using the RACK stack or the default
stack.
emaste [Thu, 22 Nov 2018 16:55:09 +0000 (16:55 +0000)]
proto: change device permissions to 0600
C Turt reports that the driver is not thread safe and may have
exploitable races.
Note that the proto device is intended for prototyping and development,
and is not for use on production systems. From the man page:
SECURITY CONSIDERATIONS
Because programs have direct access to the hardware, the proto
driver is inherently insecure. It is not advisable to use this
driver on a production machine.
The proto device is not included in any of FreeBSD's kernel config files
(although the module is built).
The issues in the proto device still need to be fixed, and the device is
inherently (and intentionally) insecure, but it might as well be limited
to root only.
admbugs: 782
Reported by: C Turt <ecturt@gmail.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
cy [Thu, 22 Nov 2018 04:48:27 +0000 (04:48 +0000)]
Allow forced start of ipmon in special cases where testing is desired
(or other special cases) and when ipfilter is disabled in rc.conf but
started by other means.
mjg [Wed, 21 Nov 2018 22:25:05 +0000 (22:25 +0000)]
uipc_usrreq: fix inode number assignment
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.
Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.
sobomax [Wed, 21 Nov 2018 21:46:06 +0000 (21:46 +0000)]
Fix CU: output of the --debug-dump=decodedline, the problem there
is that both file name and current directory is recorded, however
file name sometimes already contains absolute path. In which case
prefixing it with directory name results in an invalid pathname.
Only append directory name if the file name does not start with '/'.
This seems to DTRT.
tuexen [Wed, 21 Nov 2018 18:19:15 +0000 (18:19 +0000)]
Improve two KASSERTs in the TCP RACK stack.
There are two locations where an always true comparison was made in
a KASSERT. Replace this by an appropriate check and use a consistent
panic message. Also use this code when checking a similar condition.
It was reported, and I easily reproduced it, that this change triggers panic
when receiving replication stream with enabled embedded blocks, when short
file compressing into one embedded block changes its block size. I am not
sure that the problem is in this particuler patch, not just triggered by it,
but since investigation and fix will take some time, I've decided to revert
this for now.
markj [Wed, 21 Nov 2018 17:32:09 +0000 (17:32 +0000)]
Avoid unsynchronized updates to kn_status.
kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held. For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.
Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter. KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.
Reported and tested by: Sylvain GALLIANO <sg@efficientip.com>
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18060
markj [Wed, 21 Nov 2018 17:18:27 +0000 (17:18 +0000)]
Add a taskqueue_quiesce(9) KPI.
This is similar to taskqueue_drain_all(9) but will wait for the queue
to become idle before returning instead of only waiting for
already-enqueued tasks to finish. This will be used in the opensolaris
compat layer.
PR: 227784
Reviewed by: cem
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17975
jhibbits [Wed, 21 Nov 2018 16:47:11 +0000 (16:47 +0000)]
DTrace/powerpc: Fix FBT return probes
The FBT fuction boundary prober was setting one return probe marker value,
but the dtrace handler was expecting another. This causes a hang when
tracing return probes.
0mp [Wed, 21 Nov 2018 12:46:28 +0000 (12:46 +0000)]
Cross-reference libbe(3) and bectl(8).
Those two manual pages are already referencing each other in the HISTORY
sections, which people might skip. Mention those manual pages explicitly in
the SEE ALSO sections. Also, remove a reference to be(1) from libbe(3).
bwidawsk [Wed, 21 Nov 2018 04:34:18 +0000 (04:34 +0000)]
linuxkpi: Use pageproc instead of vmproc
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages. vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary. It's a
somewhat legacy component of the system and isn't required. You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.
Based on this, we want pageproc to emulate kswapd, not vmproc.
bwidawsk [Tue, 20 Nov 2018 22:49:19 +0000 (22:49 +0000)]
linuxkpi: Add some basic swap functions
These are used by kms-drm to determine various heuristics relate
memory conditions.
The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.
This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h
emaste [Tue, 20 Nov 2018 20:59:49 +0000 (20:59 +0000)]
Add NT_FREEBSD_FEATURE_CTL ELF note to csu
This note will be used to allow binaries to opt out of, or in to,
upcoming vulnerability mitigation and other features. It is not yet
connected but being added now to facilitate testing and ensure
compatibility with existing kernels and tools.
Reviewed by: brooks, jhb, kib, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17438
emaste [Tue, 20 Nov 2018 16:54:42 +0000 (16:54 +0000)]
stand: remove CLANG_NO_IAS from btx and gptboot
Many components under stand/ had CLANG_NO_IAS added when Clang's
Integrated Assembler (IAS) did not handle .codeNN directives. Clang
gained support quite some time ago, and we can now build stand/ with
IAS.
Note that in some cases there are small differences in the generated
output, so CLANG_NO_IAS should be removed only after testing (or after
finding no differences in the output).
PR: 205250, 233094
Sponsored by: The FreeBSD Foundation
mjg [Tue, 20 Nov 2018 14:58:41 +0000 (14:58 +0000)]
Implement unr64
Important users of unr like tmpfs or pipes can get away with just
ever-increasing counters, making the overhead of managing the state
for 32 bit counters a pessimization.
Change it to an atomic variable. This can be further sped up by making
the counts variable "allocate" ranges and store them per-cpu.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18054
kib [Tue, 20 Nov 2018 14:52:43 +0000 (14:52 +0000)]
rtld: when immediate bind mode is requested, process irelocs in PLT
immediately after other PLT relocs.
Otherwise, if the object has relro page, we write to readonly page,
and we would need to use mprotect(2) two more times to fix it. Note
that resolve_object_ifunc() does nothing when called second time, so
there is no need to avoid existing call.
Reported and tested by: emaste
PR: 233333
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
tijl [Tue, 20 Nov 2018 14:18:57 +0000 (14:18 +0000)]
Fix another user address dereference in linux_sendmsg syscall.
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument. Stop using the macro as well as LINUX_CMSG_FIRSTHDR. Use
the size field of the kernel copy of the control message header to obtain
the next control message.
imp [Tue, 20 Nov 2018 07:11:23 +0000 (07:11 +0000)]
Ensure that all values of ns, us and ms work for {n,u,m}stosbt
Integer overflows and wrong constants limited the accuracy of these
functions and created situatiosn where sbttoXs(Xstosbt(Y)) != Y. This
was especailly true in the ns case where we had millions of values
that were wrong.
Instead, used fixed constants because there's no way to say ceil(X)
for integer math. Document what these crazy constants are.
Also, use a shift one fewer left to avoid integer overflow causing
incorrect results, and adjust the equasion accordingly. Document this.
Allow times >= 1s to be well defined for these conversion functions
(at least the Xstosbt). There's too many users in the tree that they
work for >= 1s.
This fixes a failure on boot to program firmware on the mlx4
NIC. There was a msleep(1000) in the code. Prior to my recent rounding
changes, msleep(1000) worked, but msleep(1001) did not because the old
code rounded to just below 2^64 and the new code rounds to just above
it (overflowing, causing the msleep(1000) to really sleep 1ms).
A test program to test all cases will be committed shortly. The test
exaustively tries every value (thanks to bde for the test).
rmacklem [Tue, 20 Nov 2018 01:59:57 +0000 (01:59 +0000)]
Improve sanity checking for the dircount hint argument to
NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code
checked for a zero argument, but did not check for a very large value.
This patch clips dircount at the server's maximum data size.
rmacklem [Tue, 20 Nov 2018 01:56:34 +0000 (01:56 +0000)]
nfsm_advance() would panic() when the offs argument was negative.
The code assumed that this would indicate a corrupted mbuf chain, but
it could simply be caused by bogus RPC message data.
This patch replaces the panic() with a printf() plus error return.
rmacklem [Tue, 20 Nov 2018 01:52:45 +0000 (01:52 +0000)]
r304026 added code that started statistics gathering for an operation
before the operation number (the variable called "op") was sanity checked.
This patch moves the code down to below the range sanity check for "op".
marius [Tue, 20 Nov 2018 00:08:33 +0000 (00:08 +0000)]
Given that the idea of D15374 was to "make memmove a first class citizen",
provide a _MEMMOVE extension of _MEMCPY that deals with overlap based on
the previous bcopy(9) implementation and use the former for bcopy(9) and
memmove(9). This addresses my D15374 review comment, avoiding extra MOVs
in case of memmove(9) and trashing the stack pointer.
tmunro [Tue, 20 Nov 2018 00:06:53 +0000 (00:06 +0000)]
pom: Fix fencepost bugs.
Under some conditions pom would report "waning" and then "full", show
higher percentages than it should, and get confused by DST. Fix.
Before:
2018.01.30: The Moon is Waxing Gibbous (97% of Full)
2018.01.31: The Moon is Waning Gibbous (100% of Full)
2018.02.01: The Moon is Full
2018.02.02: The Moon is Waning Gibbous (98% of Full)
After:
2018.01.30: The Moon is Waxing Gibbous (96% of Full)
2018.01.31: The Moon is Waxing Gibbous (99% of Full)
2018.02.01: The Moon is Full
2018.02.02: The Moon is Waning Gibbous (97% of Full)
marius [Mon, 19 Nov 2018 23:56:33 +0000 (23:56 +0000)]
For consistency within the front-end, prefer SDHCI_{READ,WRITE}_{2,4}()
to sdhci_acpi_{read,write}_{2,4}() in the sdhci_acpi_set_uhs_timing()
added in r340543.
jhibbits [Mon, 19 Nov 2018 23:54:49 +0000 (23:54 +0000)]
powerpc: Sync icache on SIGILL, in case of cache issues
The update of jemalloc to 5.1.0 exposed a cache syncing issue on a Freescale
e500 base system. There was already code in the FPU emulator to address
this, but it was limited to a single static variable, and did not attempt to
sync the cache. This pulls that out to the higher level program exception
handler, and syncs the cache.
If a SIGILL is hit a second time at the same address, it will be treated as
a real illegal instruction, and handled accordingly.
emaste [Mon, 19 Nov 2018 22:18:18 +0000 (22:18 +0000)]
rescue: set NO_SHARED in Makefile
The rescue binary is built statically via the Makefile generated by
crunchgen, but that does not trigger other shared/static logic in
bsd.prog.mk - in particular disabling retpolineplt with static linking.
PR: 233336
Reported by: Charlie Li
Sponsored by: The FreeBSD Foundation
bcr [Mon, 19 Nov 2018 20:45:49 +0000 (20:45 +0000)]
Add a fortune describing how to upload a machine's dmesg information to the
NYCBUG database.
We want to encourage our users to upload their dmesgs so that the project can
get a better insight into what kind of hardware is run on. This helps in making
data-driven decisions about i.e., platform and driver support.
Note that dmesgs may contain sensitive information like hardware serial numbers,
hence uploading them without review is discouraged.
bwidawsk [Mon, 19 Nov 2018 18:29:03 +0000 (18:29 +0000)]
acpi: fix acpi_ec_probe to only check EC devices
This patch utilizes the fixed_devclass attribute in order to make sure
other acpi devices with params don't get confused for an EC device.
The existing code assumes that acpi_ec_probe is only ever called with a
dereferencable acpi param. Aside from being incorrect because other
devices of ACPI_TYPE_DEVICE may be probed here which aren't ec devices,
(and they may have set acpi private data), it is even more nefarious if
another ACPI driver uses private data which is not dereferancable. This
will result in a pointer deref during boot and therefore boot failure.
On X86, as it stands today, no other devices actually do this (acpi_cpu
checks for PROCESSOR type devices) and so there is no issue. I ran into
this because I am adding such a device which gets probed before
acpi_ec_probe and sets private data. If ARM ever has an EC, I think
they'd run into this issue as well.
There have been several iterations of this patch. Earlier
iterations had ECDT enumerated ECs not call into the probe/attach
functions of this driver. This change was Suggested by: jhb@.
kevans [Mon, 19 Nov 2018 17:09:57 +0000 (17:09 +0000)]
bectl(8) tests: attempt to load the ZFS module
Observed in a CI test image, bectl_create test will run and be marked as
skipped because the module is not loaded. The first zpool invocation will
automagically load the module, but bectl_create is still skipped. Subsequent
tests all pass as expected because the module is now loaded and everything
is OK.
kevans [Mon, 19 Nov 2018 16:47:21 +0000 (16:47 +0000)]
libbe(3): Handle non-ZFS rootfs better
If rootfs isn't ZFS, current version will emit an error claiming so and fail
to initialize libbe. As a consumer, bectl -r (undocumented) can be specified
to operate on a BE independently of whether on a UFS or ZFS root.
Unbreak this for the UFS case by only erroring out the init if we can't
determine a ZFS dataset for rootfs and no BE root was specified. Consumers
of libbe should take care to ensure that rootfs is non-empty if they're
trying to use it, because this could certainly be the case.
Some check is needed before zfs_path_to_zhandle because it will
unconditionally emit to stderr if the path isn't a ZFS filesystem, which is
unhelpful for our purposes.
This should also unbreak the bectl(8) tests on a UFS root, as is the case in
Jenkins' -test runs.
tijl [Mon, 19 Nov 2018 15:31:54 +0000 (15:31 +0000)]
Do proper copyin of control message data in the Linux sendmsg syscall.
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin. For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly. One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).
jchandra [Mon, 19 Nov 2018 03:52:56 +0000 (03:52 +0000)]
gitv3_its: fixes for multiple GIC ITS blocks
First pass of support for multiple GIC ITS blocks with ACPI.
Changes are to:
* register the correct subset of interrupts with pic_register
in case of ACPI.
* initialize just the cpu interface for the first ITS, when
domain information is not avialable. This has to be done
until we split the per-CPU init to do LPI setup just once.
* remove duplicate check for the GIC ITS domain, the sc_cpus
are setup from domain, so the check again in per-CPU init
seems unnecessary.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17841
jchandra [Mon, 19 Nov 2018 03:43:10 +0000 (03:43 +0000)]
pci_host_generic : move activate/release to generic code
Now that the ACPI and FDT implementations for activating and
deactivating resources are the same, we can move it to
pci_host_generic.c. No functional changes.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17793
This is a major update for pci_host_generic_acpi.c, the current
implementation has some gaps that are better fixed up in one go.
The changes are to:
* Follow x86 method of not adding PCI resources to PCI host bridge in
ACPI code. This has been moved to pci_host_generic_acpi.c, where we
walk thru its resources of the host bridge and add them.
* Fixup code in pci_host_generic_acpi.c to read all decoded ranges
and update the 'ranges' property. This allows us to share most of
the code with generic implementation (and the FDT one).
* Parse and setup IO ranges and bus ranges when walking the resources
above. Drop most of the changes related to this from acpica code.
* Add the ECAM memory area as mem resource 0. Implement the logic to
get the ECAM area from MCFG (using bus range which we now decode),
or from _CBA (using _BBN/bus range). Drop aarch64 ifdefs from acpica
code which did part of this.
* Switch resource activation to similar code as FDT implementation,
this can be moved into generic implementation in a later pass.
* Drop the mechanism of using the 7th bit of bus number as the domain,
this is not correct and will work only in very specific cases. Use
_SEG as PCI domain and use the bus ranges of the host bridge to
provide start bus number.
This commit should not make any functional change to dev/acpica/acpi.c
for other architectures, almost all the changes there are to revert
earlier additions in this file done for aarch64.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17791
jchandra [Mon, 19 Nov 2018 03:02:47 +0000 (03:02 +0000)]
acpica: rework INTRNG interrupts
On arm64 (where INTRNG is enabled), the interrupts have to be mapped
with ACPI_BUS_MAP_INTR() before adding them as resources to devices.
The earlier code did the mapping before calling acpi_set_resource(),
which bypassed code that checked for PCI link interrupts.
To fix this, move the call to map interrupts into acpi_set_resource()
and that requires additional work to lookup interrupt properties.
The changes here are to:
* extend acpi_lookup_irq_handler() to lookup an irq in the ACPI
resources
* create a helper function acpi_map_intr() which uses the updated
acpi_lookup_irq_handler() to look up an irq, and then map it
with ACPI_BUS_MAP_INTR()
* use acpi_map_intr() in acpi_pcib_route_interrupt() to map
pci link interrupts.
With these changes, we can drop the ifdefs in acpi_resource.c, and
we can also drop the call for mapping interrupts in generic_timer.c
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17790
jchandra [Mon, 19 Nov 2018 02:55:18 +0000 (02:55 +0000)]
pci_host_generic*: basic implementation of bus range
Both ACPI and FDT support bus ranges for pci host bridges. Update
pci_host_generic*.[ch] with a default implementation to support this.
This will be used in the next set of changes for ACPI based host
bridge. No functional changes in this commit.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17657
jchandra [Mon, 19 Nov 2018 02:43:34 +0000 (02:43 +0000)]
pci_host_generic: allocate resources against devices
Fix up pci_host_generic.c and pci_host_generic_fdt.c to allocate
resources against devices that requested them. Currently the
allocation happens against the pcib, which is incorrect.
This is needed for the upcoming changes for fixing up
pci_host_generic_acpi.c
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17656
jchandra [Mon, 19 Nov 2018 02:38:02 +0000 (02:38 +0000)]
pci_host_generic: remove unneeded ThunderX2 quirk
The current quirk implementation writes a fixed address to the PCI BAR
to fix a firmware bug. The PCI BARs are allocated by firmware and will
change depending on PCI devices present. So using a fixed address here
is not correct.
This quirk worked around a firmware bug that programmed the MSI-X bar
of the SATA controller incorrectly. The newer firmware does not have
this issue, so it is better to drop this quirk altogether.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17655
kevans [Mon, 19 Nov 2018 02:30:12 +0000 (02:30 +0000)]
bectl(8): Add some regression tests
These tests operate on a file-backed zpool that gets created in the kyua
temp dir. root and ZFS support are both required for these tests. Current
tests cover create, destroy, export/import, jail, list (kind of), mount,
rename, and jail.
List tests should later be extended to cover formatting and the different
list flags, but for now only covers basic "are create/destroy actually
reflected properly"
kevans [Mon, 19 Nov 2018 02:16:20 +0000 (02:16 +0000)]
libbe(3): Properly account for altroot when creating new BEs
Previously we would blindly copy the 'mountpoint' property, which includes
the altroot. The altroot needs to be snipped off prior to setting it on the
new BE, though, or you'll end up with a new BE and a mountpoint of /mnt with
altroot=/mnt
kevans [Mon, 19 Nov 2018 02:12:08 +0000 (02:12 +0000)]
bectl(3)/libbe(3): Allow BE root to be specified
Add an undocumented -r option preceding the bectl subcommand to specify a BE
root to operate out of. This will remain undocumented for now, as some
caveats apply:
- BEs cannot be activated in the pool that doesn't contain the rootfs
- bectl create cannot work out of the box without the -e option right now,
since it defaults to the rootfs and cross-pool cloning doesn't work like
that (IIRC)
Plumb the BE root through to libbe(3) so that some things -can- be done to
it, e.g.
this aides in some upgrade setups where rootfs is not necessarily ZFS, and
also makes it easier/possible to regression-test bectl when combined with a
file-backed zpool.
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D18029