mhorne [Sun, 30 Jun 2019 19:43:13 +0000 (19:43 +0000)]
elftoolchain: fix an incorrect e_flags description
r349482 introduced the definitions and descriptions of the RISC-V
specific e_flags values to elftoolchain. However, the description for
the EF_RISCV_RVE flag was incorrectly duplicated from EF_RISCV_RVC. Fix
this by providing the proper description for this flag.
arichardson [Sun, 30 Jun 2019 11:49:58 +0000 (11:49 +0000)]
Reduce size of rtld by 22% by pulling in less code from libc
Currently RTLD is linked against libc_nossp_pic which means that any libc
symbol used in rtld can pull in a lot of depedencies. This was causing
symbol such as __libc_interposing and all the pthread stubs to be included
in RTLD even though they are not required. It turns out most of these
dependencies can easily be avoided by providing overrides inside of rtld.
This change is motivated by CHERI, where we have an experimental ABI that
requires additional relocation processing to allow the use of function
pointers inside of rtld. Instead of adding this self-relocation code to
RTLD I attempted to remove most function pointers from RTLD and discovered
that most of them came from the libc dependencies instead of being actually
used inside rtld.
A nice side-effect of this change is that rtld is now 22% smaller on amd64.
text data bss dec hex filename
0x21eb6 0xce0 0xe60 145910 239f6 /home/alr48/ld-elf-x86.before.so.1
0x1a6ed 0x728 0xdd8 113645 1bbed /home/alr48/ld-elf-x86.after.so.1
The number of R_X86_64_RELATIVE relocations that need to be processed on
startup has also gone down from 368 to 187 (almost 50% less).
tijl [Sat, 29 Jun 2019 17:01:56 +0000 (17:01 +0000)]
Build lib32 libl. The library is built from usr.bin/lex/lib. It would be
better to move this directory to lib/libl, but this requires more extensive
changes to Makefile.inc1. This simple fix can be MFCed quickly.
markj [Sat, 29 Jun 2019 16:11:09 +0000 (16:11 +0000)]
Use a consistent snapshot of the fd's rights in fget_mmap().
fget_mmap() translates rights on the descriptor to a VM protection
mask. It was doing so without holding any locks on the descriptor
table, so a writer could simultaneously be modifying those rights.
Such a situation would be detected using a sequence counter, but
not before an inconsistency could trigger assertion failures in
the capability code.
Fix the problem by copying the fd's rights to a structure on the stack,
and perform the translation only once we know that that snapshot is
consistent.
Reported by: syzbot+ae359438769fda1840f8@syzkaller.appspotmail.com
Reviewed by: brooks, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20800
markj [Sat, 29 Jun 2019 16:05:52 +0000 (16:05 +0000)]
Fix mutual exclusion in pipe_direct_write().
We use PIPE_DIRECTW as a semaphore for direct writes to a pipe, where
the reader copies data directly from pages mapped into the writer.
However, when a reader finishes such a copy, it previously cleared
PIPE_DIRECTW, allowing multiple writers to race and corrupt the state
used to track wired pages belonging to the writer.
Fix this by having the writer clear PIPE_DIRECTW and instead use the
count of unread bytes to determine whether a write is finished.
lwhsu [Sat, 29 Jun 2019 14:55:53 +0000 (14:55 +0000)]
Fix VOP_PUTPAGES(9) in regards to the use of VM_PAGER_CLUSTER_OK
Submitted by: Ka Ho Ng <khng300 at gmail.com>
Reviewed by: mckusick
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20695
jhb [Sat, 29 Jun 2019 00:52:21 +0000 (00:52 +0000)]
Add support for IFCAP_NOMAP to cxgbe(4).
Since cxgbe(4) uses sglist instead of bus_dma, this required updates
to the code that generates scatter/gather lists for packets. Also,
unmapped mbufs are always sent via DMA and never as immediate data in
the payload of a work request.
jhb [Sat, 29 Jun 2019 00:50:25 +0000 (00:50 +0000)]
Compress pending socket buffer data once it is marked ready.
Apply similar logic from sbcompress to pending data in the socket
buffer once it is marked ready via sbready. Normally sbcompress
merges small mbufs to reduce the length of mbuf chains in the socket
buffer. However, sbcompress cannot do this for mbufs marked
M_NOTREADY. sbcompress_ready is now called from sbready when mbufs
are marked ready to merge small mbuf chains once the data is available
to copy.
jhb [Sat, 29 Jun 2019 00:48:33 +0000 (00:48 +0000)]
Add an external mbuf buffer type that holds multiple unmapped pages.
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages. It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.
For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer. This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers). It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused. To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.
Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.
NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability. This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.
If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output. If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.
alc [Fri, 28 Jun 2019 22:40:34 +0000 (22:40 +0000)]
When we protect PTEs (as opposed to PDEs), we only call vm_page_dirty()
when, in fact, we are write protecting the page and the PTE has PG_M set.
However, pmap_protect_pde() was always calling vm_page_dirty() when the PDE
has PG_M set. So, adding PG_NX to a writeable PDE could result in
unnecessary (but harmless) calls to vm_page_dirty().
Simplify the loop calling vm_page_dirty() in pmap_protect_pde().
hselasky [Fri, 28 Jun 2019 22:28:51 +0000 (22:28 +0000)]
Need to apply the PCIM_BAR_MEM_BASE mask to the physical memory
address before returning it to the user. Some of the least significant
bits have special meaning and should be masked away.
luporl [Fri, 28 Jun 2019 15:52:40 +0000 (15:52 +0000)]
[PowerPC64] Add ABI flags to 'file' magic
The distinction between ELF header version and OpenPOWER ELF ABI version is
confusing for most of people, so this adds text to "file" output to make it
clear about which OpenPOWER ELF ABI version binary was built for.
The strings used in this change are based on "64-Bit ELF V2 ABI
Specification/3.1. ELF Header" document available at
http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/dbdoclet.50655241_97607.html
Example:
$ file t1-Flag2 -m -m contrib/file/magic/Magdir/elf t1-Flag2: ELF 64-bit MSB
executable, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1
(FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD
13.0 (1300033), FreeBSD-style, not stripped
hselasky [Fri, 28 Jun 2019 10:49:04 +0000 (10:49 +0000)]
Need to wait for epoch callbacks to complete before detaching a
network interface.
This particularly manifests itself when an INP has multicast options
attached during a network interface detach. Then the IPv4 and IPv6
leave group call which results from freeing the multicast address, may
access a freed ifnet structure. These are the steps to reproduce:
service mdnsd onestart # installed from ports
ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy
hselasky [Fri, 28 Jun 2019 10:38:56 +0000 (10:38 +0000)]
Implement API for draining EPOCH(9) callbacks.
The epoch_drain_callbacks() function is used to drain all pending
callbacks which have been invoked by prior epoch_call() function calls
on the same epoch. This function is useful when there are shared
memory structure(s) referred to by the epoch callback(s) which are not
refcounted and are rarely freed. The typical place for calling this
function is right before freeing or invalidating the shared
resource(s) used by the epoch callback(s). This function can sleep and
is not optimized for performance.
rmacklem [Thu, 27 Jun 2019 23:10:40 +0000 (23:10 +0000)]
Add non-blocking trylock variants for the rangelock functions.
A future patch that will add a Linux compatible copy_file_range(2) syscall
needs to be able to lock the byte ranges of two files concurrently.
To do this without a risk of deadlock, a non-blocking variant of
vn_rangelock_rlock() called vn_rangelock_tryrlock() was needed.
This patch adds this, along with vn_rangelock_trywlock(), in order to
do this.
The patch also adds a couple of comments, that I hope clarify how the
algorithm used in kern_rangelock.c works.
jhb [Thu, 27 Jun 2019 22:50:11 +0000 (22:50 +0000)]
Fix comment in sofree() to reference sbdestroy().
r160875 added sbdestroy() as a wrapper around sbrelease_internal to be
called from sofree(), yet the comment added in the same revision to
sofree() still mentions sbrelease_internal().
bcran [Thu, 27 Jun 2019 22:06:41 +0000 (22:06 +0000)]
Increase EFI_STAGING_SIZE to 100MB on x64
To avoid failures when the large 18MB nvidia.ko module is being loaded,
increase EFI_STAGING_SIZE from 64MB to 100MB on x64 systems.
Leave the other platforms at 64MB.
jhb [Thu, 27 Jun 2019 19:36:30 +0000 (19:36 +0000)]
Hold an explicit reference on the socket for the aiotx task.
Previously, the aiotx task relied on the aio jobs in the queue to hold
a reference on the socket. However, when the last job is completed,
there is nothing left to hold a reference to the socket buffer lock
used to check if the queue is empty. In addition, if the last job on
the queue is cancelled, the task can run with no queued jobs holding a
reference to the socket buffer lock the task uses to notice the queue
is empty.
Fix these races by holding an explicit reference on the socket when
the task is queued and dropping that reference when the task
completes.
avg [Thu, 27 Jun 2019 15:46:06 +0000 (15:46 +0000)]
gpiobus: provide a new hint, pin_list
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
kevans [Thu, 27 Jun 2019 14:03:32 +0000 (14:03 +0000)]
bectl(8): create non-recursive boot environments
bectl advertises that it has the ability to create recursive and
non-recursive boot environments. This patch implements that functionality
using the be_create_depth API provided by libbe. With this patch, bectl now
works as bectl(8) describes in regards to creating recursive/non-recursive
boot environments.
Submitted by: Rob Fairbanks <rob.fx907 gmail com> (with minor changes)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20240
cognet [Wed, 26 Jun 2019 22:06:40 +0000 (22:06 +0000)]
In get_fpcontext32() and set_fpcontext32(), we can't just use memcpy() to
copy the VFP registers.
arvm7 VFP uses 32 64bits fp registers (but those could be used in pairs to
make 16 128bits registers), while aarch64 uses 32 128bits fp registers, so
we have to copy the value of each register.
alc [Wed, 26 Jun 2019 21:43:41 +0000 (21:43 +0000)]
Revert one of the changes from r349323. Specifically, undo the change
that replaced a pmap_invalidate_page() with a dsb(ishst) in
pmap_enter_quick_locked(). Even though this change is in principle
correct, I am seeing occasional, spurious bus errors that are only
reproducible without this pmap_invalidate_page(). (None of adding an
isb, "upgrading" the dsb to wait on loads as well as stores, or
disabling superpage mappings eliminates the bus errors.) Add an XXX
comment explaining why the pmap_invalidate_page() is being performed.
rgrimes [Wed, 26 Jun 2019 21:19:43 +0000 (21:19 +0000)]
Emulate the "TEST r/m{16,32,64}, imm{16,32,32}" instructions (opcode F7H).
This adds emulation for:
test r/m16, imm16
test r/m32, imm32
test r/m64, imm32 sign-extended to 64
OpenBSD guests compiled with clang 8.0.0 use TEST directly against a
Local APIC register instead of separate read via MOV followed by a
TEST against the register.
PR: 238794
Submitted by: jhb
Reported by: Jason Tubnor jason@tubnor.net
Tested by: Jason Tubnor jason@tubnor.net
Reviewed by: markj, Patrick Mooney patrick.mooney@joyent.com
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20755
markj [Wed, 26 Jun 2019 20:11:52 +0000 (20:11 +0000)]
Avoid a divide-by-zero when bad checksum counters overflow.
A mixture of IP or UDP packets with valid and invalid checksum could
cause {ip,udp}_packets_bad_checksum to wrap around to 0, resulting
in a division by zero.
This is packet.c rev. 1.27 from OpenBSD.
admbugs: 552
Obtained from: OpenBSD
MFC after: 3 days
markj [Wed, 26 Jun 2019 17:37:51 +0000 (17:37 +0000)]
Add a return value to vm_page_remove().
Use it to indicate whether the page may be safely freed following
its removal from the object. Also change vm_page_remove() to assume
that the page's object pointer is non-NULL, and have callers perform
this check instead.
This is a step towards an implementation of an atomic reference counter
for each physical page structure.
cognet [Wed, 26 Jun 2019 16:56:56 +0000 (16:56 +0000)]
Fix debugging of 32bits arm binaries on arm64.
In set_regs32()/fill_regs32(), we have to get/set SP and LR from/to
tf_x[13] and tf_x[14].
set_regs() and fill_regs() may be called for a 32bits process, if the process
is ptrace'd from a 64bits debugger. So, in set_regs() and fill_regs(), get
or set PC and SPSR from where the debugger expects it, from tf_x[15] and
tf_x[16].
markj [Wed, 26 Jun 2019 16:38:30 +0000 (16:38 +0000)]
libdwarf: Use the cached strtab pointer when reading string attributes.
Previously we would perform a linear search of the DWARF section
list for ".debug_str". However, libdwarf always caches a pointer to
the strtab image in its debug descriptor. Using it gives a modest
performance improvement when iterating over the attributes of each
DIE.
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20759
marius [Wed, 26 Jun 2019 15:28:21 +0000 (15:28 +0000)]
o In iflib_txq_drain():
- Remove desc_used, which is only ever written to.
- Remove a dead store to reclaimed.
- Don't recycle avail.
- Sort variables according to style(9).
These changes will make a subsequent commit easier to read.
o In iflib_tx_credits_update(), don't bother checking whether the
ift_txd_credits_update method pointer is NULL; _iflib_pre_assert()
asserts upfront that this method has been assigned and functions
like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}()
and _task_fn_tx() were already unconditionally relying on the
method being callable.
hselasky [Wed, 26 Jun 2019 12:04:54 +0000 (12:04 +0000)]
Only call libusb_hotplug_enumerate() once from libusb_hotplug_register_callback().
Else when registering multiple filters the same USB device may appear twice in
the list.
MFC after: 3 days
Sponsored by: Mellanox Technologies
hselasky [Wed, 26 Jun 2019 11:28:08 +0000 (11:28 +0000)]
Fix support for LIBUSB_HOTPLUG_ENUMERATE in libusb. Currently all
devices are enumerated regardless of of the LIBUSB_HOTPLUG_ENUMERATE
flag. Make sure when the flag is not specified no arrival events are
generated for currently enumerated devices.
MFC after: 3 days
Sponsored by: Mellanox Technologies
avg [Wed, 26 Jun 2019 07:38:31 +0000 (07:38 +0000)]
gpio.4: document device hints common to all devices on gpiobus
"at" keyword is documented in device.hints(5) for all buses, but it does
hurt to add another reference to it.
"pins" keyword is specific to gpiobus.
At least these two hints should be configured for any gpiobus device on
a hints based system.