Mark Johnston [Tue, 18 Sep 2018 17:51:45 +0000 (17:51 +0000)]
Only update the domain cursor once in keg_fetch_slab().
We drop the keg lock when we go to actually allocate the slab, allowing
other threads to advance the cursor. This can cause us to exit the
round-robin loop before having attempted allocations from all domains,
resulting in a hang during a subsequent blocking allocation attempt from
a depleted domain.
Reported and tested by: Jan Bramkamp <crest@bultmann.eu>
Reviewed by: alc, cem
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17209
Update the pkg-stage.sh script used to populate packages on the
dvd1.iso installation medium from including KDE4 to KDE5, as the
KDE4-based ports have been marked as deprecated in the Ports
Collection.
MFC after: 3 days
Approved by: re (rgrimes)
Sponsored by: The FreeBSD Foundation
Ed Maste [Tue, 18 Sep 2018 15:01:21 +0000 (15:01 +0000)]
Require ifunc-capable linker for i386
The amd64 kernel started using ifunc for a variety of functions with
arch-specific implementations, and we would like to make use of the
same functionality on i386 and as much as possible avoid divergence
between i386 and amd64. In particular, future changes for security
improvements and mitigations may rely on ifunc support.
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
vm: stop taking proc lock in mmap to satisfy racct if it is disabled
Limits can be safely obtained with lim_cur from the thread. racct is compiled
in but disabled by default. Note that racct enablement is a boot-only tunable.
This eliminates second most common place of taking the lock while pkg building.
While here don't take the lock in mlockall either.
Reviewed by: kib
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D17210
Do not upgrade the vnode lock to call getinoquota().
Doing so can deadlock when the thread already owns another vnode lock,
e.g. during a rename, as was demonstrated by the reporter. In fact,
there seems to be no need to force the call to getinoquota() always,
because vn_open() locks vnode exclusively, and this is the most
important case. To add to the point, directories where the dirent is
added or removed, are locked exclusively as well.
Reported by: bwidawsk
Tested by: bwidawsk, pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
MFC after: 1 week
John Baldwin [Mon, 17 Sep 2018 17:18:54 +0000 (17:18 +0000)]
Fix a regression in r338360 when booting an x86 machine without APIC.
The atpic_register_sources callback tries to avoid registering interrupt
sources that would collide with an I/O APIC. However, the previous
implementation was failing to register IRQs 8-15 since the slave PIC
saw valid IRQs from the master and assumed an I/O APIC was present. To
fix, go back to registering all 8259A interrupt sources in one loop when
the master's register_sources method is invoked.
PR: 231291
Approved by: re (kib)
MFC after: 1 month
Mark Johnston [Mon, 17 Sep 2018 16:16:57 +0000 (16:16 +0000)]
Fix an nvpair leak in vdev_geom_read_config().
Also change the behaviour slightly: instead of freeing "config" if the
last nvlist doesn't pass the tests, return the last config that did pass
those tests. This matches the comment at the beginning of the function.
PR: 230704
Diagnosed by: avg
Reviewed by: asomers, avg
Tested by: Mark Martinec <Mark.Martinec@ijs.si>
Approved by: re (gjb)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17202
Use ifunc to resolve context switching mode on amd64.
Patch removes all checks for pti/pcid/invpcid from the context switch
path. I verified this by looking at the generated code, compiling with
the in-tree clang. The invpcid_works1 trick required inline attribute
for pmap_activate_sw_pcid_pti() to work.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D17181
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.
Restore outbound packets capturing for if_gre(4). It was missed in r335048.
Also clear M_MCAST and M_BCAST flags for encapsulated datagram, since it
will have new IP header.
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.
Significantly improve pf purge cpu usage by only taking locks
when there is work to do. This reduces CPU consumption to one
third on systems. This will help keep the thread CPU usage under
control now that the default hash size has increased.
Reviewed by: kp
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D17097
Pull in r325478 from upstream clang trunk (by Ivan A. Kosarev):
[CodeGen] Initialize large arrays by copying from a global
Currently, clang compiles explicit initializers for array elements
into series of store instructions. For large arrays of built-in types
this results in bloated output code and significant amount of time
spent on the instruction selection phase. This patch fixes the issue
by initializing such arrays with global constants that store the
binary image of the initializer.
Add the "-t" option to geom(8) utility, to display geoms hierarchy.
Sample output:
% geom -t
Geom Class Provider
da0 DISK da0
da0 PART da0s1
da0s1 PART da0s1a
ffs.da0s1a VFS
da0s1a DEV
da0s1 DEV
da0 DEV
da1 DISK da1
swap SWAP
da1 DEV
cd0 DISK cd0
cd0 DEV
Matt Macy [Fri, 14 Sep 2018 01:30:05 +0000 (01:30 +0000)]
hwpmc: set default rate if event description lacks one / filter rate against misuse
Not all event descriptions have a sample rate (such as inst_retired.any)
this will restore the legacy behavior of using 65536 in that case. It also
prevents accidental API misuse that could lead to panic.
Eric van Gyzen [Thu, 13 Sep 2018 17:56:48 +0000 (17:56 +0000)]
Set zfs_arc_meta_strategy to metadata only
The previous default of "balanced" appears to have caused pathological
behavior, including very poor performance and 100% CPU load in the
arc_reclaim_thread.
The symptoms appeared when the daily periodic run started.
With this change, the system--and the ARC in particular--behaved
normally during a manual daily periodic run.
From Mark Johnston: The port of the balanced strategy is incomplete,
since arc_prune_async() is a no-op on FreeBSD. (This also seems
to imply that r337653 is a no-op.) After 12 is branched we can
port the remaining bits and consider changing the default back.
Ian Lepore [Thu, 13 Sep 2018 15:16:05 +0000 (15:16 +0000)]
If a user skips the pre-world mergemaster, an installworld check
notices the missing ntpd user and refers to UPDATING. This change makes
it more clear which aspect of UPDATING is important for the ntpd change.
Output padding is specified via outlen, which is set using the return value
of fprintf. Because it's printing that padding plus a trailing byte, it
grows by one each iteration rather than reflecting actual length.
Additionally, iec was sized improperly for scaling up similarly to si.
Fixing this revealed that the humanize_number(3) call to populate persec
was using the wrong width.
Submitted by: Thomas Hurst <tom@hur.st>
Reviewed by: imp
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D16960
Ed Maste [Thu, 13 Sep 2018 14:52:59 +0000 (14:52 +0000)]
Enable reproducible builds in advance of 12.0-REL
r338642 toggled the REPRODUCIBLE_BUILD knob but missed the
corresponding kern.opts.mk change.
We want to build the 12.0 release artifacts with reproducible builds
mode enabled. Switch it on in HEAD now to enable testing with upcoming
ALPHA builds. We can revisit the default setting for HEAD after the
branch is created.
This change eliminates the build metadata (user, hostname, timestamp,
etc.) from the kernel and loader. If the src tree is a git, svn or p4
checkout with changes then the metadata is retained.
The WITHOUT_REPRODUCIBLE_BUILD src.conf(5) knob can be used to revert
to the previous behaviour.
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Ed Maste [Thu, 13 Sep 2018 14:26:53 +0000 (14:26 +0000)]
Enable reproducible builds in advance of 12.0-REL
We want to build the 12.0 release artifacts with reproducible builds
mode enabled. Switch it on in HEAD now to enable testing with upcoming
ALPHA builds. We can revisit the default setting for HEAD after the
branch is created.
This change eliminates the build metadata (user, hostname, timestamp,
etc.) from the kernel and loader. If the src tree is a git, svn or p4
checkout with changes then the metadata is retained.
The WITHOUT_REPRODUCIBLE_BUILD src.conf(5) knob can be used to revert
to the previous behaviour.
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Both drivers use this interface so add a dependancy on it.
Since awg uses aw_sid for generating the MAC address, make it
depend on both aw_sid and nmvem so when only removing nvmem from
kernel config it will not include this driver.
xen: temporary disable SMAP when forwarding hypercalls from user-space
The Xen page-table walker used to resolve the virtual addresses in the
hypercalls will refuse to access user-space pages when SMAP is enabled
unless the AC flag in EFLAGS is set (just like normal hardware with
SMAP support would do).
Since privcmd allows forwarding hypercalls (and buffers) from
user-space into Xen make sure SMAP is temporary disabled for the
duration of the hypercall from user-space.
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
Register interrupts using the PIC pic_register_sources method instead
of doing it in apic_setup_io. This is now required, since the internal
interrupt structures are not yet setup when calling apic_setup_io.
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
lapic: skip setting intrcnt if lapic is not present
Instead of panicking. Legacy PVH mode doesn't provide a lapic, and
since native_lapic_intrcnt is called unconditionally this would cause
the assert to trigger. Change the assert into a continue in order to
take into account the possibility of systems without a lapic.
Reviewed by: jhb
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D17015
The recommended way to obtain the vcpu id is using the cpuid
instruction with a specific leaf value. This leaf value must be
obtained at runtime, and it's done when populating the hypercall page.
Legacy PVH however will get the hypercall page populated by the
hypervisor itself before booting, so the cpuid leaf was not actually
set, thus preventing setting the vcpu id value from cpuid.
Fix this by making sure the cpuid leaf has been probed before
attempting to set the vcpu id.
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
When adding support for the new PVH mode the kenv handling was
switched to use a boot time allocated scratch space, however the
legacy PVH early boot code was not modified to allocate such space.
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
The vcpu_id for legacy PVH mode can be set from the output of cpuid,
so there's no need to have a special function to set it.
Also note that xenpv_set_ids should have been executed only for PV
guests, but was executed for all guests types and vcpu_id was later
fixed up for HVM guests.
Reported by: cperciva
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
So that it's done when the vcpu_id has been set. For the BSP the
vcpu_id is set at SUB_INTR, while for the APs it's done in
init_secondary_tail that's called at SUB_SMP order FIRST.
Reported and tested by: cperciva
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D17013
msi: remove the check that interrupt sources have been added
When running as a specific type of Xen guest the hypervisor won't
provide any emulated IO-APICs or legacy PICs at all, thus hitting the
following assert in the MSI code:
panic: Assertion num_io_irqs > 0 failed at /usr/src/sys/x86/x86/msi.c:334
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826ffa70
vpanic() at vpanic+0x1a3/frame 0xffffffff826ffad0
panic() at panic+0x43/frame 0xffffffff826ffb30
msi_init() at msi_init+0xed/frame 0xffffffff826ffb40
apic_setup_io() at apic_setup_io+0x72/frame 0xffffffff826ffb50
mi_startup() at mi_startup+0x118/frame 0xffffffff826ffb70
start_kernel() at start_kernel+0x10
Fix this by removing the assert in the MSI code, since it's possible
to get to the MSI initialization without having registered any other
interrupt sources.
Reviewed by: jhb
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D17001
APIC: CPU 6 has ACPI ID 6
APIC: CPU 7 has ACPI ID 7
panic: vm_wait in early boot
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826ff8d0
vpanic() at vpanic+0x1a3/frame 0xffffffff826ff930
panic() at panic+0x43/frame 0xffffffff826ff990
vm_wait_domain() at vm_wait_domain+0xf9/frame 0xffffffff826ff9c0
kmem_alloc_contig_domain() at kmem_alloc_contig_domain+0x252/frame 0xffffffff826ffa50
kmem_alloc_contig() at kmem_alloc_contig+0x6c/frame 0xffffffff826ffad0
contigmalloc() at contigmalloc+0x2e/frame 0xffffffff826ffb00
x86bios_modevent() at x86bios_modevent+0x225/frame 0xffffffff826ffb20
module_register_init() at module_register_init+0xc0/frame 0xffffffff826ffb50
mi_startup() at mi_startup+0x118/frame 0xffffffff826ffb70
start_kernel() at start_kernel+0x10
While there also make x86bios_unmap_mem idempotent.
Reviewed by: kib
Approved by: re (gjb)
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D17000
Fix issues about cancelling USB transfers in LibUSB when the USB device has
been detached. When a USB device has been detached the kernel file handle
stops responding to commands. USB applications which continue to run after
the USB device has been detached, depend on LibUSB generated events to tear
down its pending USB transfers. Add code to handle the needed cleanup when
processing the USB transfer(s) fails and prevent new USB transfer(s) from
being submitted.
Found by: Ludovic Rousseau <ludovic.rousseau+freebsd@gmail.com>
PR: 231076
MFC after: 1 week
Approved by: re (gjb)
Sponsored by: Mellanox Technologies
Michael Tuexen [Wed, 12 Sep 2018 10:27:58 +0000 (10:27 +0000)]
Fix TCP Fast Open for the TCP RACK stack.
* Fix a bug where the SYN handling during established state was
applied to a front state.
* Move a check for retransmission after the timer handling.
This was suppressing timer based retransmissions.
* Fix an off-by one byte in the sequence number of retransmissions.
* Apply fixes corresponding to
https://svnweb.freebsd.org/changeset/base/336934
Reviewed by: rrs@
Approved by: re (kib@)
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16912
Not all libpcap backends use the BPF compatible set
of IOCTLs. For example the mlx5 backend uses libibverbs
which is currently not capsicum compatible.
Disable sandboxing for such backends.
MFC after: 3 days
Discussed with: emaste@
Approved by: re (kib)
Sponsored by: Mellanox Technologies
Ruslan Bukin [Wed, 12 Sep 2018 08:05:33 +0000 (08:05 +0000)]
Don't mark module data as static on RISC-V.
Similar to arm64, riscv compiler uses PC-relative loads/stores,
and with static data compiler does not emit relocations.
In result, kernel module linker has nothing to fix and data accessed
from the wrong location.
Gordon Tetlow [Wed, 12 Sep 2018 04:57:34 +0000 (04:57 +0000)]
Correct ELF header parsing code to prevent invalid ELF sections from
disclosing memory.
Submitted by: markj
Reported by: Thomas Barabosch, Fraunhofer FKIE
Approved by: re (implicit)
Approved by: so
Security: FreeBSD-SA-18:12.elf
Security: CVE-2018-6924
Sponsored by: The FreeBSD Foundation
Martin Matuska [Tue, 11 Sep 2018 20:51:34 +0000 (20:51 +0000)]
MFV r338519:
Update libarchive to 3.3.3
As all important changes have already been merged from libarchive git
this is just version number bump, documentation update and some
polishing for cpio tests. Other source code changes are not relevant to
FreeBSD.
Ed Maste [Tue, 11 Sep 2018 20:32:57 +0000 (20:32 +0000)]
remove doubled name in objcopy manpage
We generate the installed objcopy man page from ELF Tool Chain's
elfcopy, but the sed expresion used for this ended up producing
"objcopy, objcopy - copy and translate object files".
Instead of replacing the first "elfcopy" with objcopy, just remove it.
Ed Maste [Tue, 11 Sep 2018 19:19:07 +0000 (19:19 +0000)]
Switch reproducible builds to unmodified src tree mode
newvers.sh supports two modes for reproducible builds:
-r Reproducible build. Do not embed directory names, user
names, time stamps or other dynamic information into
the output file. This is intended to allow two builds
done at different times and even by different people on
different hosts to produce identical output.
-R Reproducible build if the tree represents an unmodified
checkout from a version control system. Metadata is
included if the tree is modified.
Switch to the second mode when reproducible builds are enabled.
The value of a reproducible build is much less when building from an
uncontrolled, modified src tree, and -R likely provides the best
compromise in allowing the REPRODUCIBLE_BUILD knob to be enabled by
default for the release.
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
Eric Joyner [Tue, 11 Sep 2018 18:33:43 +0000 (18:33 +0000)]
ix(4), ixv(4): VLAN tag stripping fixes for Amazon EC2 Enhanced Networking
From Piotr:
ix(4), ixv(4): Add VLAN tag strip check when receiving packets
ixv(4): Fix support for VLAN_HWTAGGING and VLAN_HWFILTER flags
This change will prevent driver from passing VLAN tags when
interface configuration is not expecting them. VF driver will
check for VLAN_HWTAGGING and VLAN_HWFILTER flags and act adequately.
This patch resolves problem occuring on EC2 platforms.