luporl [Fri, 28 Jun 2019 15:52:40 +0000 (15:52 +0000)]
[PowerPC64] Add ABI flags to 'file' magic
The distinction between ELF header version and OpenPOWER ELF ABI version is
confusing for most of people, so this adds text to "file" output to make it
clear about which OpenPOWER ELF ABI version binary was built for.
The strings used in this change are based on "64-Bit ELF V2 ABI
Specification/3.1. ELF Header" document available at
http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655241_97607.html
Example:
$ file t1-Flag2 -m -m contrib/file/magic/Magdir/elf t1-Flag2: ELF 64-bit MSB
executable, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1
(FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD
13.0 (1300033), FreeBSD-style, not stripped
hselasky [Fri, 28 Jun 2019 10:49:04 +0000 (10:49 +0000)]
Need to wait for epoch callbacks to complete before detaching a
network interface.
This particularly manifests itself when an INP has multicast options
attached during a network interface detach. Then the IPv4 and IPv6
leave group call which results from freeing the multicast address, may
access a freed ifnet structure. These are the steps to reproduce:
service mdnsd onestart # installed from ports
ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy
hselasky [Fri, 28 Jun 2019 10:38:56 +0000 (10:38 +0000)]
Implement API for draining EPOCH(9) callbacks.
The epoch_drain_callbacks() function is used to drain all pending
callbacks which have been invoked by prior epoch_call() function calls
on the same epoch. This function is useful when there are shared
memory structure(s) referred to by the epoch callback(s) which are not
refcounted and are rarely freed. The typical place for calling this
function is right before freeing or invalidating the shared
resource(s) used by the epoch callback(s). This function can sleep and
is not optimized for performance.
rmacklem [Thu, 27 Jun 2019 23:10:40 +0000 (23:10 +0000)]
Add non-blocking trylock variants for the rangelock functions.
A future patch that will add a Linux compatible copy_file_range(2) syscall
needs to be able to lock the byte ranges of two files concurrently.
To do this without a risk of deadlock, a non-blocking variant of
vn_rangelock_rlock() called vn_rangelock_tryrlock() was needed.
This patch adds this, along with vn_rangelock_trywlock(), in order to
do this.
The patch also adds a couple of comments, that I hope clarify how the
algorithm used in kern_rangelock.c works.
jhb [Thu, 27 Jun 2019 22:50:11 +0000 (22:50 +0000)]
Fix comment in sofree() to reference sbdestroy().
r160875 added sbdestroy() as a wrapper around sbrelease_internal to be
called from sofree(), yet the comment added in the same revision to
sofree() still mentions sbrelease_internal().
bcran [Thu, 27 Jun 2019 22:06:41 +0000 (22:06 +0000)]
Increase EFI_STAGING_SIZE to 100MB on x64
To avoid failures when the large 18MB nvidia.ko module is being loaded,
increase EFI_STAGING_SIZE from 64MB to 100MB on x64 systems.
Leave the other platforms at 64MB.
jhb [Thu, 27 Jun 2019 19:36:30 +0000 (19:36 +0000)]
Hold an explicit reference on the socket for the aiotx task.
Previously, the aiotx task relied on the aio jobs in the queue to hold
a reference on the socket. However, when the last job is completed,
there is nothing left to hold a reference to the socket buffer lock
used to check if the queue is empty. In addition, if the last job on
the queue is cancelled, the task can run with no queued jobs holding a
reference to the socket buffer lock the task uses to notice the queue
is empty.
Fix these races by holding an explicit reference on the socket when
the task is queued and dropping that reference when the task
completes.
avg [Thu, 27 Jun 2019 15:46:06 +0000 (15:46 +0000)]
gpiobus: provide a new hint, pin_list
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
kevans [Thu, 27 Jun 2019 14:03:32 +0000 (14:03 +0000)]
bectl(8): create non-recursive boot environments
bectl advertises that it has the ability to create recursive and
non-recursive boot environments. This patch implements that functionality
using the be_create_depth API provided by libbe. With this patch, bectl now
works as bectl(8) describes in regards to creating recursive/non-recursive
boot environments.
Submitted by: Rob Fairbanks <rob.fx907 gmail com> (with minor changes)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20240
cognet [Wed, 26 Jun 2019 22:06:40 +0000 (22:06 +0000)]
In get_fpcontext32() and set_fpcontext32(), we can't just use memcpy() to
copy the VFP registers.
arvm7 VFP uses 32 64bits fp registers (but those could be used in pairs to
make 16 128bits registers), while aarch64 uses 32 128bits fp registers, so
we have to copy the value of each register.
alc [Wed, 26 Jun 2019 21:43:41 +0000 (21:43 +0000)]
Revert one of the changes from r349323. Specifically, undo the change
that replaced a pmap_invalidate_page() with a dsb(ishst) in
pmap_enter_quick_locked(). Even though this change is in principle
correct, I am seeing occasional, spurious bus errors that are only
reproducible without this pmap_invalidate_page(). (None of adding an
isb, "upgrading" the dsb to wait on loads as well as stores, or
disabling superpage mappings eliminates the bus errors.) Add an XXX
comment explaining why the pmap_invalidate_page() is being performed.
rgrimes [Wed, 26 Jun 2019 21:19:43 +0000 (21:19 +0000)]
Emulate the "TEST r/m{16,32,64}, imm{16,32,32}" instructions (opcode F7H).
This adds emulation for:
test r/m16, imm16
test r/m32, imm32
test r/m64, imm32 sign-extended to 64
OpenBSD guests compiled with clang 8.0.0 use TEST directly against a
Local APIC register instead of separate read via MOV followed by a
TEST against the register.
PR: 238794
Submitted by: jhb
Reported by: Jason Tubnor jason@tubnor.net
Tested by: Jason Tubnor jason@tubnor.net
Reviewed by: markj, Patrick Mooney patrick.mooney@joyent.com
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20755
markj [Wed, 26 Jun 2019 20:11:52 +0000 (20:11 +0000)]
Avoid a divide-by-zero when bad checksum counters overflow.
A mixture of IP or UDP packets with valid and invalid checksum could
cause {ip,udp}_packets_bad_checksum to wrap around to 0, resulting
in a division by zero.
This is packet.c rev. 1.27 from OpenBSD.
admbugs: 552
Obtained from: OpenBSD
MFC after: 3 days
markj [Wed, 26 Jun 2019 17:37:51 +0000 (17:37 +0000)]
Add a return value to vm_page_remove().
Use it to indicate whether the page may be safely freed following
its removal from the object. Also change vm_page_remove() to assume
that the page's object pointer is non-NULL, and have callers perform
this check instead.
This is a step towards an implementation of an atomic reference counter
for each physical page structure.
cognet [Wed, 26 Jun 2019 16:56:56 +0000 (16:56 +0000)]
Fix debugging of 32bits arm binaries on arm64.
In set_regs32()/fill_regs32(), we have to get/set SP and LR from/to
tf_x[13] and tf_x[14].
set_regs() and fill_regs() may be called for a 32bits process, if the process
is ptrace'd from a 64bits debugger. So, in set_regs() and fill_regs(), get
or set PC and SPSR from where the debugger expects it, from tf_x[15] and
tf_x[16].
markj [Wed, 26 Jun 2019 16:38:30 +0000 (16:38 +0000)]
libdwarf: Use the cached strtab pointer when reading string attributes.
Previously we would perform a linear search of the DWARF section
list for ".debug_str". However, libdwarf always caches a pointer to
the strtab image in its debug descriptor. Using it gives a modest
performance improvement when iterating over the attributes of each
DIE.
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20759
marius [Wed, 26 Jun 2019 15:28:21 +0000 (15:28 +0000)]
o In iflib_txq_drain():
- Remove desc_used, which is only ever written to.
- Remove a dead store to reclaimed.
- Don't recycle avail.
- Sort variables according to style(9).
These changes will make a subsequent commit easier to read.
o In iflib_tx_credits_update(), don't bother checking whether the
ift_txd_credits_update method pointer is NULL; _iflib_pre_assert()
asserts upfront that this method has been assigned and functions
like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}()
and _task_fn_tx() were already unconditionally relying on the
method being callable.
hselasky [Wed, 26 Jun 2019 12:04:54 +0000 (12:04 +0000)]
Only call libusb_hotplug_enumerate() once from libusb_hotplug_register_callback().
Else when registering multiple filters the same USB device may appear twice in
the list.
MFC after: 3 days
Sponsored by: Mellanox Technologies
hselasky [Wed, 26 Jun 2019 11:28:08 +0000 (11:28 +0000)]
Fix support for LIBUSB_HOTPLUG_ENUMERATE in libusb. Currently all
devices are enumerated regardless of of the LIBUSB_HOTPLUG_ENUMERATE
flag. Make sure when the flag is not specified no arrival events are
generated for currently enumerated devices.
MFC after: 3 days
Sponsored by: Mellanox Technologies
avg [Wed, 26 Jun 2019 07:38:31 +0000 (07:38 +0000)]
gpio.4: document device hints common to all devices on gpiobus
"at" keyword is documented in device.hints(5) for all buses, but it does
hurt to add another reference to it.
"pins" keyword is specific to gpiobus.
At least these two hints should be configured for any gpiobus device on
a hints based system.
jhibbits [Wed, 26 Jun 2019 01:14:39 +0000 (01:14 +0000)]
powerpc/booke: Handle misaligned floating point loads/stores as on AIM
Misaligned floating point loads and stores are already handled for AIM, but
use the DSISR to obtain the necessary data. Book-E does not have the DSISR,
so these fixups are not performed, leading to a SIGBUS on misaligned FP
loads or stores. Obtain the necessary data on the Book-E side, similar to
how is done for SPE.
cy [Wed, 26 Jun 2019 00:53:49 +0000 (00:53 +0000)]
While working on PR/238796 I discovered an unused variable in frdest,
the next hop structure. It is likely this contributes to PR/238796
though other factors remain to be investigated.
dougm [Tue, 25 Jun 2019 20:25:16 +0000 (20:25 +0000)]
Eliminate some uses of the prev and next fields of vm_map_entry_t.
Since the only caller to vm_map_splay is vm_map_lookup_entry, move the
implementation of vm_map_splay into vm_map_lookup_helper, called by
vm_map_lookup_entry.
vm_map_lookup_entry returns the greatest entry less than or equal to a
given address, but in many cases the caller wants the least entry
greater than or equal to the address and uses the next pointer to get
to it. Provide an alternative interface to lookup,
vm_map_lookup_entry_ge, to provide the latter behavior, and let
callers use one or the other rather than having them use the next
pointer after a lookup miss to get what they really want.
In vm_map_growstack, the caller wants an entry that includes a given
address, and either the preceding or next entry depending on the value
of eflags in the first entry. Incorporate that behavior into
vm_map_lookup_helper, the function that implements all of these
lookups.
Eliminate some temporary variables used with vm_map_lookup_entry, but
inessential.
emaste [Tue, 25 Jun 2019 19:06:43 +0000 (19:06 +0000)]
bhyve: avoid theoretical stack buffer overflow from integer overflow
Use the proper size_t type to match strlen's return type. This is not
exploitable in practice as this parses command line arguments, which
are limited to well below 2^31 bytes.
This is a minimal change to address the reported issue; hda_parse_config
and the rest of this file will benefit from further review.
Reported by: Fakhri Zulkifli
Reviewed by: jhb, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
kevans [Tue, 25 Jun 2019 18:47:40 +0000 (18:47 +0000)]
libbe(3): restructure be_mount, skip canmount check for BE dataset
Further cleanup after r349380; loader and kernel will both ignore canmount
on the root dataset as well, so we should not be so strict about it when
mounting it. be_mount is restructured to make it more clear that depth==0 is
special, and to not try fetching these properties that we won't care about.
mav [Tue, 25 Jun 2019 18:35:23 +0000 (18:35 +0000)]
Avoid extra taskq_dispatch() calls by DMU.
DMU sync code calls taskq_dispatch() for each sublist of os_dirty_dnodes
and os_synced_dnodes. Since the number of sublists by default is equal
to number of CPUs, it will dispatch equal, potentially large, number of
tasks, waking up many CPUs to handle them, even if only one or few of
sublists actually have any work to do.
This change adds check for empty sublists to avoid this.
kevans [Tue, 25 Jun 2019 18:13:39 +0000 (18:13 +0000)]
libbe(3): mount: the BE dataset is mounted at /
Other parts of libbe(3) were fairly strict on the mountpoint property of the
BE dataset, and be_mount was not much better. It was improved in r347027 to
allow mountpoint=none for depth==0, but this bit was still sensitive to
mountpoint != / and mountpoint != none. Given that other parts of libbe(3)
no longer restrict the mountpoint property here, and the rest of the base
system is generally OK and will assume that a BE is mounted at /, let's do
the same.
luporl [Tue, 25 Jun 2019 17:15:44 +0000 (17:15 +0000)]
[PowerPC64] Don't mark module data as static
Fixes panic when loading ipfw.ko and if_epair.ko built with modern compiler.
Similar to arm64 and riscv, when using a modern compiler (!gcc4.2), code
generated tries to access data in the wrong location, causing kernel panic
(data storage interrupt trap) when loading if_epair and ipfw.
Issue was reproduced with kernel/module compiled using gcc8 and clang8. It
affects both ELFv1 and ELFv2 ABI environments.
mav [Tue, 25 Jun 2019 17:00:53 +0000 (17:00 +0000)]
Fix strsep_quote() on strings without quotes.
For strings without quotes and escapes dstptr and srcptr are equal, so
zeroing *dstptr before checking *srcptr is not a good idea. In practice
it means that in -maproot=65534:65533 everything after the colon is lost.
The problem was there since r293305, but before r346976 it was covered by
improper strsep_quote() usage.
PR: 238725
MFC after: 3 days
Sponsored by: iXsystems, Inc.
hselasky [Tue, 25 Jun 2019 11:54:41 +0000 (11:54 +0000)]
Convert all IPv4 and IPv6 multicast memberships into using a STAILQ
instead of a linear array.
The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.
To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi
All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.
dougm [Tue, 25 Jun 2019 07:44:37 +0000 (07:44 +0000)]
vm_map_protect may return an INVALID_ARGUMENT or PROTECTION_FAILURE
error response after clipping the first map entry in the region to be
reserved. This creates a pair of matching entries that should have
been "simplified" back into one, or never created. This change defers
the clipping of that entry until those two vm_map_protect failure
cases have been ruled out.
imp [Tue, 25 Jun 2019 06:14:31 +0000 (06:14 +0000)]
Replay r349342 by imp accidentally reverted by r349352
Use the cam_ed copy of ata_params rather than malloc and freeing
memory for it. This reaches into internal bits of xpt a little, and
I'll clean that up later.