Brooks Davis [Wed, 5 Sep 2018 23:23:16 +0000 (23:23 +0000)]
Rework rtld's TLS Variant I implementation to match r326794
The above commit fixed handling overaligned TLS segments in libc's
TLS Variant I implementation, but rtld provides its own implementation
for dynamically-linked executables which lacks these fixes. Thus,
port these changes to rtld.
This was previously commited as r337978 and reverted in r338149 due to
exposing a bug the ARM rtld. This bug was fixed in r338317 by mmel.
John Baldwin [Wed, 5 Sep 2018 20:51:53 +0000 (20:51 +0000)]
Fix objcopy for little-endian MIPS64 objects.
MIPS64 does not store the 'r_info' field of a relocation table entry as
a 64-bit value consisting of a 32-bit symbol index in the high 32 bits
and a 32-bit type in the low 32 bits as on other architectures. Instead,
the 64-bit 'r_info' field is really a 32-bit symbol index followed by four
individual byte type fields. For big-endian MIPS64, treating this as a
64-bit integer happens to be compatible with the layout expected by other
architectures (symbol index in upper 32-bits of resulting "native" 64-bit
integer). However, for little-endian MIPS64 the parsed 64-bit integer
contains the symbol index in the low 32 bits and the 4 individual byte
type fields in the upper 32-bits (but as if the upper 32-bits were
byte-swapped).
To cope, add two helper routines in gelf_getrel.c to translate between the
correct native 'r_info' value and the value obtained after the normal
byte-swap translation. Use these routines in gelf_getrel(), gelf_getrela(),
gelf_update_rel(), and gelf_update_rela(). This fixes 'readelf -r' on
little-endian MIPS64 objects which was previously decoding incorrect
relocations as well as 'objcopy: invalid symbox index' warnings from
objcopy when extracting debug symbols from kernel modules.
Even with this fixed, objcopy was still crashing when trying to extract
debug symbols from little-endian MIPS64 modules. The workaround in
gelf_*rel*() depends on the current ELF object having a valid ELF header
so that the 'e_machine' field can be compared against EM_MIPS. objcopy
was parsing the relocation entries to possibly rewrite the 'r_info' fields
in the update_relocs() function before writing the initial ELF header to
the destination object file. Move the initial write of the ELF header
earlier before copy_contents() so that update_relocs() uses the correct
symbol index values.
Note that this change should really go upstream. The binutils readelf
source has a similar hack for MIPS64EL though I implemented this version
from scratch using the MIPS64 ABI PDF as a reference.
Warner Losh [Wed, 5 Sep 2018 20:02:23 +0000 (20:02 +0000)]
Be a little conservative about when to force size optimizations.
Reports have come in that there's issue with powerpc and sparc64 since
we've switched to using -Oz / -Os. We don't strictly need them for
!x86, so be conservative about when we enable them.
Mark Johnston [Wed, 5 Sep 2018 19:05:30 +0000 (19:05 +0000)]
Correct the condition under which we allocate a terminator node.
We will have last_block < blocks if the block count is divisible
by BLIST_BMAP_RADIX, but a terminator node is still needed if the
tree isn't balanced. In this case we were overruning the blist
array by 16 bytes during initialization.
While here, add a check for the invalid blocks == 0 case.
PR: 231116
Reviewed by: alc, kib (previous version), Doug Moore <dougm@rice.edu>
Approved by: re (gjb)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17020
Mark Johnston [Wed, 5 Sep 2018 15:04:11 +0000 (15:04 +0000)]
Fix style bugs in in_pcblookup_lbgroup().
No functional change intended.
Reviewed by: bz, Johannes Lundberg <johalun0@gmail.com>
Approved by: re (rgrimes)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17030
Fix "ipfw fwd" to work for incoming IPv4 packets when ip_tryforward() chooses
fast forwarding path, as it already works for IPv6 and for both of them
on old slow path.
amd64: For non-PTI mode, do not initialize PCPU kcr3 to KPML4phys.
Non-PTI mode does not switch kcr3, which means that kcr3 is almost
always stale. This is important for the NMI handler, which reloads
%cr3 with PCPU(kcr3) if the value is different from PMAP_NO_CR3.
The end result is that curpmap in NMI handler does not match the page
table loaded into hardware. The manifestation was copyin(9) looping
forever when a usermode access page fault cannot be resolved by
vm_fault() updating a different page table.
Reported by: mmacy
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Approved by: re (gjb)
r337289 has a side effect of reducing usb frame 0 buffer size down to
touch report size. That broke some devices e.g. "Raydium Touch System"
which are capable of generating non-touch frames of bigger length.
Fix it with enlarging frame 0 buffer up to internal wmt(4) buffer size.
As discussed in D6262 post-commit review, change inp_route to
inp_route6 for IPv6 code after r301217.
This was most likely a c&p error from the legacy IP code, which
did not matter as it is a union and both structures have the same
layout at the beginning.
No functional changes.
Ruslan Bukin [Mon, 3 Sep 2018 14:43:16 +0000 (14:43 +0000)]
Enable 'C'-compressed ISA extension.
This was disabled recently due to lack of support in KDB disassembler
and DTrace FBT provider. Support for 'C'-extension to both of these was
added, so we can now enable 'C'-extension.
This reduces size of the kernel important for low-end embedded devices,
and saves cache footprint for high perfomance machines.
Robert Watson [Mon, 3 Sep 2018 14:26:43 +0000 (14:26 +0000)]
The kernel DTrace audit provider (dtaudit) relies on auditd(8) to load
/etc/security/audit_event to provide a list of audit event-number <->
name mappings. However, this occurs too late for anonymous tracing.
With this change, adding 'audit_event_load="YES"' to /boot/loader.conf
will cause the boot loader to preload the file, and then the kernel
audit code will parse it to register an initial set of audit event-number
<-> name mappings. Those mappings can later be updated by auditd(8) if
the configuration file changes.
This appeared to be required to have EFI RT support and EFI RTC
enabled by default, because there are too many reports of faulting
calls on many different machines. The knob is added to leave the
exceptions unhandled to allow to debug the actual bugs.
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (rgrimes)
Differential revision: https://reviews.freebsd.org/D16972
Improve error messages from clock_if.m method failures.
Print error message in verbose mode when CLOCK_SETTIME() clock_if.m
method failed. For EFIRT RTC clock, add error code for the failure of
CLOCK_GETTIME() report.
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (rgrimes)
Differential revision: https://reviews.freebsd.org/D16972
Swap order of dererencing PCPU curpmap and checking for usermode in
trap_pfault() KPTI violation check.
EFI RT may set curpmap to NULL for the duration of the call for some
machines (PCID but no INVPCID). Since apparently EFI RT code must be
ready for exceptions from the calls, avoid dereferencing curpmap until
we know that this call does not come from usermode.
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (rgrimes)
Differential revision: https://reviews.freebsd.org/D16972
Alan Cox [Sun, 2 Sep 2018 18:29:38 +0000 (18:29 +0000)]
Recent changes have created, for the first time, physical memory segments
that can be coalesced. To be clear, fragmentation of phys_avail[] is not
the cause. This fragmentation of vm_phys_segs[] arises from the "special"
calls to vm_phys_add_seg(), in other words, not those that derive directly
from phys_avail[], but those that we create for the initial kernel page
table pages and now for the kernel and modules loaded at boot time. Since
we sometimes iterate over the physical memory segments, coalescing these
segments at initialization time is a worthwhile change.
Try harder in cfiscsi_offline(). This is believed to be the workaround
for the "ctld hanging on reload" problem observed in same cases under
high load. I'm not 100% sure it's _the_ fix, as the issue is rather hard
to reproduce, but it was tested as part of a larger path and the problem
disappeared. It certainly shouldn't break anything.
Now, technically, it shouldn't be needed. Quoting mav@, "After
ct->ct_online == 0 there should be no new sessions attached to the target.
And if you see some problems abbout it, it may either mean that there are
some races where single cfiscsi_session_terminate(cs) call may be lost,
or as a guess while this thread was sleeping target was reenabbled and
redisabled again". Should such race be discovered and properly fixed
in the future, than this and the followup two commits can be backed out.
PR: 220175
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Discussed with: mav
Approved by: re (gjb)
MFC after: 2 weeks
Sponsored by: playkey.net
Adding support for CS46xx MIDI output. With this patch, users can
play the MIDI files through /dev/sequencer device with tools like
playmidi. The audio output will go through the external MIDI device
such like wavetable synthesis card.
Reviewed by: matk (a long time ago), kib
Approved by: re (kib)
Tested with: Terratec SiXPack 5.1+ + Yamaha DB50XG
MFC after: 4 weeks
userboot: handle guest interpreter mismatches more intelligently
The switch to lualoader creates a problem with userboot: the host is
inclined to build userboot with Lua, but the host userboot's interpreter
must match what's available on the guest. For almost all FreeBSD guests in
the wild, Lua is not yet available and a Lua-based userboot will fail.
This revision updates userboot protocol to version 5, which adds a
swap_interpreter callback to request a different interpreter, and tries to
determine the proper interpreter to be used based on how the guest
/boot/loader is compiled. This is still a bit of a guess, but it's likely
the best possible guess we can make in order to get it right. The
interpreter is now embedded in the resulting executable, so we can open
/boot/loader on the guest and hunt that down to derive the interpreter it
was built with.
Using -l with bhyveload will not allow an intepreter swap, even if the
loader specified happens to be a userboot with the wrong interpreter. We'll
simply complain about the mismatch and bail out.
For legacy guests without the interpreter marker, we assume they're 4th.
For new guests with the interpreter marker, we'll read it and swap over
to the proper interpreter if it doesn't match what the userboot we're using
was compiled with.
Both flavors of userboot are installed by default, userboot_4th.so and
userboot_lua.so. This fixes the build WITHOUT_FORTH as a coincidence, which
was broken by userboot being forced to 4th.
libbe(3): Fix error handling with respect to be_exists
Some paths through be_exists will set the error state, others will not
There are multiple reasons that a call can fail, so clean it up a bit: all
paths now return an appropriate error code so the caller can attempt to
distinguish between a BE legitimately not existing and just having the wrong
mountpoint. The caller is expected to bubble the error through to the
internal error handler as needed.
This fixes some unfriendliness with bectl(8)'s activate subcommand, where
it might fail due to a bad mountpoint but the only message output is a
generic "failed to activate" message.
Glen Barber [Fri, 31 Aug 2018 16:29:36 +0000 (16:29 +0000)]
Update head from ALPHA3 to ALPHA4 as part of the 12.0-RELEASE
cycle. The i386 build failure appears to be transient, and
now becoming more difficult to reliably reproduce to identify
the cause. I will continue to investigate this, however.
Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation
John Baldwin [Fri, 31 Aug 2018 16:10:01 +0000 (16:10 +0000)]
Don't directly dereference a user pointer in the VPD ioctl.
The PCIOCLISTVPD ioctl on /dev/pci is used to fetch a list of VPD
key-value pairs for a specific PCI function. It is used by
'pciconf -l -V'. The list is stored in a userland-supplied buffer as
an array of variable-length structures where the key and data length
are stored in a fixed-size header followed by the variable-length
value as a byte array. To facilitate walking this array in userland,
<sys/pciio.h> provides a PVE_NEXT() helper macro to return a pointer
to the next array element by reading the the length out of the current
header and using it to compute the address of the next header.
To simplify the implementation, the ioctl handler was also using
PVE_NEXT() when on the user address of the user buffer to compute the
user address of the next array element. However, the PVE_NEXT() macro
when used with a user address was reading the value's length by
indirecting the user pointer. The value was ready after the current
record had been copied out to the user buffer, so it appeared to work
on architectures where user addresses are directly dereferencable from
the kernel (all but powerpc and i386 after the 4:4 split). The recent
enablement of SMAP on amd64 caught this violation however. To fix,
add a variant of PVE_NEXT() for use in the ioctl handler that takes an
explicit value length.
Kristof Provost [Fri, 31 Aug 2018 08:37:15 +0000 (08:37 +0000)]
frag6: Fix fragment reassembly
r337776 started hashing the fragments into buckets for faster lookup.
The hashkey is larger than intended. This results in random stack data being
included in the hashed data, which in turn means that fragments of the same
packet might end up in different buckets, causing the reassembly to fail.
Set the correct size for hashkey.
PR: 231045
Approved by: re (kib)
MFC after: 3 days
Warner Losh [Fri, 31 Aug 2018 01:01:16 +0000 (01:01 +0000)]
Don't load ccp automatically with devmatch
Remove the PNP info for the moment from the driver. It's an
experimental driver (as noted in r328150). It's performance is about
1/10th that of aesni. It will often panic when used with GELI (PR 2279820). It's not in our best interest to have such a driver be
autoloaded by default.
Kyle Evans [Thu, 30 Aug 2018 18:00:28 +0000 (18:00 +0000)]
release.sh: disable colors and the beastie menu for ARM/ARM64 targets
lualoader has moved to a model where the user is expected to disable color
as desired, rather than disabling it automatically for serial boots, due to
more wide-spread support for color sequences.
In a similar vain, though also to reduce special cases, lualoader no
longer disables the beastie menu automatically for !x86. This was done in
Forth land with a different loader.rc that simply didn't invoke the menu
routines, thus wasn't necessary.
This set of changes puts release images back to how they would've been
experienced prior to the switch to Lua.
Emmanuel Vadot [Thu, 30 Aug 2018 14:32:47 +0000 (14:32 +0000)]
omap4_prcm: Delay the frequencies read check
Same as r333305, with Linux 4.17 dts the compatible for the prcm added
'simplebus', it mean that the simplebus driver will attach to it
at the BUS_PASS_BUS pass.
Change the pass for the prcm driver to be at BUS_PASS_BUS so we will win
the attach.
This introduce a problem as this driver needs the omap_scm one to be already
attached. omap_scm also attach at BUS_PASS_BUS but after the prcm one as it is
after in the dtb and the simplebus driver simpy walk the tree to attach it's
children.
Use the bus_new_pass method to defer the frequencies read at BUS_PASS_TIMER.
This fixes booting on pandaboard
Kyle Evans [Thu, 30 Aug 2018 13:29:32 +0000 (13:29 +0000)]
lualoader: fix color usage
Resetting to the default color scheme was done prior to reading the config.
This is bogus; colors may only be declined by the user with the
loader.conf(5) variable "loader_color", so such a request for no color will
not be completely honored as we reset to the default color scheme
unconditionally.
Stephen Hurd [Wed, 29 Aug 2018 15:55:25 +0000 (15:55 +0000)]
Fix potential data corruption in iflib
The MP ring may have txq pointers enqueued. Previously, these were
passed to m_free() when IFC_QFLUSH was set. This patch checks for
the value and doesn't call m_free().
Emmanuel Vadot [Wed, 29 Aug 2018 14:01:27 +0000 (14:01 +0000)]
arm64: GENERIC-MMCCAM: Fix build and module depend
Fix the build of the GENERIC-MMCCAM kernel config after the sdhci_xenon
driver was commited.
While here correct sdhci_fdt and tegra_sdhci, even with MMCCAM they do
need to depend on sdhci(4)
Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.
Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.
Based on the submission by: royger
Reviewed by: alc, markj (previous version)
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Approved by: re (marius)
Differential revision: https://reviews.freebsd.org/D16881
Cy Schubert [Wed, 29 Aug 2018 06:04:54 +0000 (06:04 +0000)]
Avoid printing extraneous function names when searching man page
database (apropos, man -k). This commit Replaces .SS with .SH,
similar to the man page provided by original heimdal (as in port).
PR: 230573
Submitted by: yuripv@yuripv.net
Approved by: re (rgrimes@)
MFC after: 3 days
John Baldwin [Tue, 28 Aug 2018 21:09:19 +0000 (21:09 +0000)]
Dynamically allocate IRQ ranges on x86.
Previously, x86 used static ranges of IRQ values for different types
of I/O interrupts. Interrupt pins on I/O APICs and 8259A PICs used
IRQ values from 0 to 254. MSI interrupts used a compile-time-defined
range starting at 256, and Xen event channels used a
compile-time-defined range after MSI. Some recent systems have more
than 255 I/O APIC interrupt pins which resulted in those IRQ values
overflowing into the MSI range triggering an assertion failure.
Replace statically assigned ranges with dynamic ranges. Do a single
pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to
determine the total number of IRQs required. Allocate the interrupt
source and interrupt count arrays dynamically once this pass has
completed. To minimize runtime complexity these arrays are only sized
once during bootup. The PIC range is determined by the PICs present
in the system. The MSI and Xen ranges continue to use a fixed size,
though this does make it possible to turn the MSI range size into a
tunable in the future.
As a result, various places are updated to use dynamic limits instead
of constants. In addition, the vmstat(8) utility has been taught to
understand that some kernels may treat 'intrcnt' and 'intrnames' as
pointers rather than arrays when extracting interrupt stats from a
crashdump. This is determined by the presence (vs absence) of a
global 'nintrcnt' symbol.
This change reverts r189404 which worked around a buggy BIOS which
enumerated an I/O APIC twice (using the same memory mapped address for
both entries but using an IRQ base of 256 for one entry and a valid
IRQ base for the second entry). Making the "base" of MSI IRQ values
dynamic avoids the panic that r189404 worked around, and there may now
be valid I/O APICs with an IRQ base above 256 which this workaround
would incorrectly skip.
If in the future the issue reported in PR 130483 reoccurs, we will
have to add a pass over the I/O APIC entries in the MADT to detect
duplicates using the memory mapped address and use some strategy to
choose the "correct" one.
While here, reserve room in intrcnts for the Hyper-V counters.
Mark Johnston [Tue, 28 Aug 2018 20:21:36 +0000 (20:21 +0000)]
Allow multiple FBT probes to share a tracepoint.
With GNU ifuncs, multiple FBT probes may correspond to the same
instruction. fbt_invop() assumed that this could not happen and
would return after the first probe found in the global FBT hash
table, which might not be the one that's enabled. Fix the problem
on x86 by linking probes that share a tracepoint and having each
linked probe fire when the tracepoint is hit.
PR: 230846
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16921
Several bug fixes and robustness improvements for the AP boot page
table allocation.
At the time that mp_bootaddress() is called, phys_avail[] array does
not reflect some memory reservations already done, like kernel
placement. Recent changes to DMAP protection which make kernel text
read-only in DMAP revealed this, where on some machines AP boot page
tables selection appears to intersect with the kernel itself.
Fix this by checking the addresses selected using the same algorithm
as bootaddr_rwx(). Also, try to chomp pages for the page table not
only at the start of the contiguous range, but also at the end. This
should improve robustness when the only suitable range is already
consumed by the kernel.
Reported and tested by: Michael Gmelin <freebsd@grem.de>
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D16907
Marcin Wojtas [Tue, 28 Aug 2018 17:09:41 +0000 (17:09 +0000)]
Use ip/ipv6 structures in al_eth only if they are supported
The ip/ipv6 header files are included only if the appropriate definition
exists, but the driver was missing similar checks when using the
ip and ip6_hdr structures.
If the kernel was not built with the INET or INET6 option, the driver
was preventing kernel from being built.
To fix that, the missing ifdef checks were added to the driver.
PR: Bug 230886
Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported by: O. Hartmann
Approved by: re (gjb)
Obtained from: Semihalf
MFC after: 1 week
Sponsored by: Amazon, Inc.
Warner Losh [Tue, 28 Aug 2018 14:46:55 +0000 (14:46 +0000)]
Add big, nasty abandonware tags to this code.
This code works for some people, but hasn't been updated in a long
time. Still allow people to use this code for the moment, but put a
big, nasty obsolete message to inform and encourage people to move to
the port.
Kyle Evans [Mon, 27 Aug 2018 19:34:50 +0000 (19:34 +0000)]
Fix bsdbox build WITH_OFED
hostapd requires libpcap, which links against libmlx5 and libibverbs when
building WITH_OFED. These were not pulled in to bsdbox and most bsdbox
builds were WITHOUT_OFED up until recently, so it was not noticed.
Andrew Gallatin [Mon, 27 Aug 2018 18:13:20 +0000 (18:13 +0000)]
Reject IPv4 SO_REUSEPORT_LB groups when looking up an IPv6 listening socket
Similar to how the IPv4 code will reject an IPv6 LB group,
we must ignore IPv4 LB groups when looking up an IPv6
listening socket. If this is not done, a port only match
may return an IPv4 socket, which causes problems (like
sending IPv6 packets with a hopcount of 0, making them unrouteable).
Thanks to rrs for all the work to diagnose this.
Approved by: re (rgrimes)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16899
Kirk McKusick [Mon, 27 Aug 2018 15:20:42 +0000 (15:20 +0000)]
When doing a -S "safe copy", the install command should do an
fsync(2) system call after copying the installed file to ensure
that it is on stable storage.
PR: 230851
Reviewed by: kib
Approved by: re (marius)
Andrew Turner [Mon, 27 Aug 2018 11:14:49 +0000 (11:14 +0000)]
Ensure we have a large enough stack for the lua loader
Lua has a few places where it allocates a large buffer on the stack. This
is normally fine, except there are a few places where there can be multiple
frames with this buffer. This can cause a stack overflow on some arm64 SoCs.
Fix this by allocating our own stack in loader.efi large enough for these
objects. The required size has been found by tracing how the stack pointer
changes in a virtual machine and found to be no larger than 50kB. A
larger stack is allocated to reduce the likelihood of overflow from future
changes.
Reviewed by: kevans
Approved by: re (kib)
Differential Revision: https://reviews.freebsd.org/D16886
Andrew Turner [Mon, 27 Aug 2018 10:08:27 +0000 (10:08 +0000)]
Use the correct register when storing the arm VFP state.
Previously we have been lucky where the state was already in r0, however
this is not guaranteed. Use the passed in register as the location to
store the upper half of the arm VFP registers rather than relying on it
being r0.
Sean Bruno [Sun, 26 Aug 2018 17:05:43 +0000 (17:05 +0000)]
r338270 had the side effect of no longer installing libmd.so into /lib.
For users who have a seperate zfs mount of /usr or /usr/lib, this will
cause dynamic loading failures when attempting to execute zfs mount on
bootup. E.g. the system won't boot.
Including <src.opts.mk> sets SHLIBDIR, so SHLIBDIR?= has no
effect. The other lib/ Makefiles solve this problem by moving the
SHLIBDIR assignment to before .include <src.opts.mk>.
Colin Percival [Sun, 26 Aug 2018 03:56:54 +0000 (03:56 +0000)]
Disable atkbd0 and atkdbc0 in EC2 AMIs. This has the effect of skipping
the probing and attaching of the PS/2 mouse (not present on EC2) and
keyboard (emulated, but not accessible via EC2).
Note that we disable atkbd0 separately even though during device probing
it shows up as a child of atkbdc0; this is necessary because the device
is also initialized during the early console setup from hammer_time.
This change cuts the kernel boot time on an EC2 c5.4xlarge instance from
7259ms down to 4727 ms.