vmaffione [Fri, 9 Nov 2018 08:43:40 +0000 (08:43 +0000)]
netmap: add load balancer program
Add the lb program, which is able to load-balance input traffic
received from a netmap port over M groups, with N netmap pipes in
each group. Each received packet is forwarded to one of the pipes
chosen from each group (using an L3/L4 connection-consistent hash function).
This also adds a man page for lb and some cross-references in related
man pages.
yuripv [Fri, 9 Nov 2018 03:32:53 +0000 (03:32 +0000)]
Reset persistent mbstates when rune locale encoding changes.
This was shown to be a problem by side effect of now-enabled test case,
which was going through C, en_US.UTF-8, ja_JP.SJIS, and ja_JP.eucJP,
and failing eventually as data in mbrtowc's mbstate, that was
perfectly correct for en_US.UTF-8 was treated as incorrect for
ja_JP.SJIS, failing the entire test case.
This makes the persistent mbstates to be per ctype-component,
and not per-locale so we could easily reset the mbstates when
only LC_CTYPE is changed.
jhibbits [Thu, 8 Nov 2018 20:48:44 +0000 (20:48 +0000)]
powerpc64: Fix "show spr" command on ELFv2 kernels
Summary: When compiling for ELFv2, it is necessary to adjust the offset to
get_spr and factor in the function prologue to ensure the correct instruction is
being edited.
Test Plan:
Before:
```
db> show spr 110
KDB: reentering
KDB: stack backtrace:
0xc008000020fb96e0: at 0xc000000002bb2e34 = kdb_backtrace+0x68
0xc008000020fb97f0: at 0xc000000002bb3798 = kdb_reenter+0x54
0xc008000020fb9860: at 0xc000000002f87090 = trap+0x4e4
0xc008000020fb9990: at 0xc000000002f78a60 = powerpc_interrupt+0x110
0xc008000020fb9a20: kernel trap 0xe40 by 0xc000000002401978 = get_spr+0x8: srr1=0x9000000000001032
r1=0xc008000020fb9cd0 cr=0x80009438 xer=0x20040000 ctr=0xc000000002f7b40c r2=0xc0000000037fd000
saved LR(0xfffffffffffffffb) is invalid.
```
jhibbits [Thu, 8 Nov 2018 20:31:12 +0000 (20:31 +0000)]
powerpc/powernv: Restrict the busdma tag to only POWER8
It seems this tag is causing problems on POWER9 systems. Since no POWER9 user
has encountered the problem fixed by r339589 just restrict it to POWER8 for now.
A better fix will likely be to update powerpc/busdma_machdep.c to handle the
window correctly.
emaste [Thu, 8 Nov 2018 20:17:36 +0000 (20:17 +0000)]
Avoid buffer underwrite in icmp_error
icmp_error allocates either an mbuf (with pkthdr) or a cluster depending
on the size of data to be quoted in the ICMP reply, but the calculation
failed to account for the additional padding that m_align may apply.
Include the ip header in the size passed to m_align. On 64-bit archs
this will have the net effect of moving everything 4 bytes later in the
mbuf or cluster. This will result in slightly pessimal alignment for
the ICMP data copy.
Also add an assertion that we do not move m_data before the beginning of
the mbuf or cluster.
Reported by: A reddit user
Reviewed by: bz, jtl
MFC after: 3 days
Security: CVE-2018-17156
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17909
vangyzen [Thu, 8 Nov 2018 19:50:23 +0000 (19:50 +0000)]
in6_ifattach_linklocal: handle immediate removal of the new LLA
If another thread immediately removes the link-local address
added by in6_update_ifa(), in6ifa_ifpforlinklocal() can return NULL,
so the following assertion (or dereference) is wrong.
Remove the assertion, and handle NULL somewhat better than panicking.
This matches all of the other callers of in6_update_ifa().
erj [Thu, 8 Nov 2018 19:10:43 +0000 (19:10 +0000)]
ixl/iavf(4): Fix TSO offloads when TXCSUM is disabled
From Jake:
The iflib stack does not disable TSO automatically when TXCSUM is
disabled, instead assuming that the driver will correctly handle TSOs
even when CSUM_IP is not set.
This results in iflib calling ixl_isc_txd_encap with packets which have
CSUM_IP_TSO, but do not have CSUM_IP or CSUM_IP_TCP set. Because of
this, ixl_tx_setup_offload will not setup the IPv4 checksum offloading.
This results in bad TSO packets being sent if a user disables TXCSUM
without disabling TSO.
Fix this by updating the ixl_tx_setup_offload function to check both
CSUM_IP and CSUM_IP_TSO when deciding whether to enable IPv4 checksums.
Once this is corrected, another issue for TSO packets is revealed. The
driver sets IFLIB_NEED_ZERO_CSUM in order to enable a work around that
causes the ip->sum field to be zero'd. This is necessary for ixl
hardware to correctly perform TSOs.
However, if TXCSUM is disabled, then the work around is not enabled, as
CSUM_IP will not be set when the iflib stack checks to see if it should
clear the sum field.
Fix this by adding IFLIB_TSO_INIT_IP to the iflib flags for the iavf and
ixl interface files.
It is uncertain if the hardware needs IFLIB_NEED_ZERO_CSUM for any other
case besides TSO, so leave that flag assigned. It may be worth
investigating to see if this work around flag could be disabled in
a future change.
Once both of these changes are made, the ixl driver should correctly
offload TSO packets when TSO4 offload is enabled, regardless of whether
TXCSUM is enabled or disabled.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, shurd@
MFC after: 0 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D17900
brooks [Thu, 8 Nov 2018 00:35:00 +0000 (00:35 +0000)]
Add a top-level make target to rebuild all sysent files.
The sysent target is useful when changing makesyscalls.sh, when
making paired changes to syscalls.master files, or in a future where
freebsd32 sysent entries are built from the default syscalls.master.
markj [Wed, 7 Nov 2018 23:28:11 +0000 (23:28 +0000)]
Fix a use-after-free in swp_pager_meta_free().
This was introduced in r326329 and explains the crashes mentioned in
the commit log message for r339934. In particular, on INVARIANTS
kernels, UMA trashing causes the loop to exit early, leaving swap
blocks behind when they should have been freed. After r336984 this
became more problematic since new anonymous mappings were more
likely to reuse swapped-out subranges of existing VM objects, so faults
would trigger pageins of freed memory rather than returning zeroed
pages.
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17897
tsoome [Wed, 7 Nov 2018 21:36:52 +0000 (21:36 +0000)]
loader: ptable_open() check for ptable_cd9660read result is wrong
The ptable_*read() functions return NULL on read errors (and partition table
closed as an side effect). The ptable_open must check the return value and
act properly.
oshogbo [Wed, 7 Nov 2018 21:01:14 +0000 (21:01 +0000)]
bspatch: simplify capsicumization
Assume that user wants to run with capsicum support if he builds the software
with HAVE_CAPSICUM. Treat running application without capsicum in the kernel as
an error.
emaste [Wed, 7 Nov 2018 20:36:57 +0000 (20:36 +0000)]
newvers.sh: avoid regenerating vers.c if content unchanged
When reproducible build mode is enabled vers.c may be unchanged between
successive builds. In this case avoid changing the file's metadata so
that it does not cause dependent targets to be rebuilt.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D17892
shurd [Wed, 7 Nov 2018 19:31:48 +0000 (19:31 +0000)]
Fix rxcsum issue introduced in r338838
r338838 attempted to fix issues with rxcsum and rxcsum6.
However, the rxcsum bits were set as though if_setcapenablebit() was
being called, not if_togglecapenable() which is in use. As a result,
it was not possible to disable rxcsum when rxcsum6 was supported.
PR: 233004
Reported by: lev
Reviewed by: lev
MFC after: 3 days
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17881
jhb [Wed, 7 Nov 2018 18:27:43 +0000 (18:27 +0000)]
Enable use of a global shared page for RISC-V.
machine/vmparam.h already defines the SHAREDPAGE constant. This
change just enables it for ELF executables. The only use of the
shared page currently is to hold the signal trampoline.
brooks [Wed, 7 Nov 2018 16:55:04 +0000 (16:55 +0000)]
makesyscalls.sh: allow pointer return types.
The previous code required that the return type be a single word. This
allows it to be a pointer without using a typedef.
Update the return types of break, mmap, and shmat to be void * as
declared. This only effects systrace output in-tree, but can aid in
generating system call wrappers from syscalls.master.
sobomax [Wed, 7 Nov 2018 16:28:09 +0000 (16:28 +0000)]
Revert r340187, it breaks EOD (end-of-device) detection logic. Turns out,
i/o into last_sector+N is handled differently for N==1 and N>1 cases to
accomodate that, so some other approach would be needed to fix DIOCGDELETE
ioctl(2).
arichardson [Wed, 7 Nov 2018 15:04:41 +0000 (15:04 +0000)]
Handle the DT_MIPS_RLD_MAP_REL dynamic tag in RTLD
This dynamic tag contains the location of the .rld_map section relative to
the location of the dynamic tag. For PIE MIPS binaries DT_MIPS_RLD_MAP can
not be used since it contains an absolute address. Without this change
GDB can not find the function program counters in other libraries and once
I apply this change I can successfully run info sharedlibraries again.
hselasky [Wed, 7 Nov 2018 08:25:44 +0000 (08:25 +0000)]
Sometimes the complete split packet may be queued too early and the
transaction translator will return a NAK. Ignore this message and
retry the complete split instead.
MFC after: 3 days
Sponsored by: Mellanox Technologies
jhibbits [Wed, 7 Nov 2018 01:42:00 +0000 (01:42 +0000)]
powerpc/atomic: Loosen the memory barrier on atomic_load_acq_*()
'sync' is pretty heavy-handed, and is unnecessary for this use case. It's a
full barrier, which is applicable for all storage types. However,
atomic_load_acq_*() is only expected to operate on physical memory, not
device memory, so lwsync is sufficient (lwsync provides access ordering on
memory that is marked as Coherency Required and is not Write Through nor
Cache Inhibited). On 32-bit systems, this is a nop, since powerpc_lwsync()
is defined to use sync, as a workaround for a silicon bug in the Freescale
e500 core.
markj [Tue, 6 Nov 2018 23:41:44 +0000 (23:41 +0000)]
Avoid fixing the tty_info() buffer size in tty.h.
Different compilation units may otherwise get a different view of the
layout of struct tty depending on whether they include opt_printf.h.
This caused a blowup in the number of types defined in the kernel's
CTF file after r339468; thanks to dim@ for bisecting down to that
revision.
PR: 232675
Reported by: dim
Reviewed by: cem (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17877
rmacklem [Tue, 6 Nov 2018 22:50:50 +0000 (22:50 +0000)]
Change nfs_advlock() so that the NFSVOPUNLOCK() is mostly done at the end.
Prior to this patch, nfs_advlock() did NFSVOPUNLOCK(); return (error);
in many places. This patch replaces these code sequenences with a "goto out;"
and does the NFSVOPUNLOCK(); return (error); at the end of the function
in order to make the vnode locking simpler.
This patch does not change the semantics of nfs_advlock().
markj [Tue, 6 Nov 2018 21:57:03 +0000 (21:57 +0000)]
Avoid specifying VM_PROT_EXECUTE in mappings from pipe_map and exec_map.
These submaps are used for mapping pipe buffers and execv() argument
strings respectively, so there's no need for such mappings to have
execute permissions.
Reported by: jhb
Reviewed by: alc, jhb, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17827
yuripv [Tue, 6 Nov 2018 21:49:50 +0000 (21:49 +0000)]
Cleanup locale tools:
- Simplify the source dir specification, and update README
appropriately
- Drop the LC (doonly) processing, it's broken, and even if fixed, not
really useful
- Don't remove the target directories while installing new data as it
removes Makefile.depend which we don't manage; only rm the files we
are going to add/replace/delete instead
- Restrict adding bsd.endian.mk to colldef and ctypedef Makefiles, it's
not needed in other (text-only) categories
- GC unused scripts; they don't seem to be particularly helpful standalone
as well
arichardson [Tue, 6 Nov 2018 18:06:52 +0000 (18:06 +0000)]
Turn off BUILD_WITH_STRICT_TMPPATH by default
Building with a strict $PATH (without inheriting from the parent
environment) still causes build failures in some workflows/environemnts
that I have not yet tested.
I will try to bring this back once these issues have all been resolved
since it is actually extremely useful in tracking broken dependencies
and wrong assumptions about the build environemt.
andrew [Tue, 6 Nov 2018 17:47:58 +0000 (17:47 +0000)]
Add the KUBSAN options to the arm64 and amd64 GENERIC kernel config files.
As the kernel file size may be too large to run with a stock loader comment
them out for now.
andrew [Tue, 6 Nov 2018 17:32:07 +0000 (17:32 +0000)]
Port the NetBSD ubsan runtime to the FreeBSD kernel.
This allows us to build the ubsan code added in r340189 into the kernel
with the KUBSAN option. This will report when undefined behaviour is
detected in the currently running kernel.
As it can be large, the kernel is 65MB on arm64, loader may not be able to
load the kernel on all architectures so is disabled by default for now.
andrew [Tue, 6 Nov 2018 16:56:49 +0000 (16:56 +0000)]
Import the NetBSD micro ubsan code for the kernel.
This imports revision 1.3 of common/lib/libc/misc/ubsan.c from NetBSD, the
micro-ubsan code. It is an implementation of the Undefined Behavior
Sanitizer runtime for use with recent clang and gcc.
The uubsan code will be used in a later commit to implement kubsan to help
find undefined behavior in the kernel.
sobomax [Tue, 6 Nov 2018 15:55:41 +0000 (15:55 +0000)]
Don't allow BIO_READ, BIO_WRITE or BIO_DELETE requests that are
fully beyond the end of providers media. The only exception is made
for the zero length transfers which are allowed to be just on the
boundary. Previously, any requests starting on the boundary (i.e. next
byte after the last one) have been allowed to go through.
No response from: freebsd-geom@, phk
MFC after: 1 month
emaste [Tue, 6 Nov 2018 15:52:49 +0000 (15:52 +0000)]
Add a WITH_BIND_NOW build knob
The linker's -z now flag sets the DF_BIND_NOW flag, which signals to the
runtime loader that all relocation processing should be performed at
process startup rather than on demand. In combination with lld's
default of enabling relro this causes the GOT to be made read-only when
the process starts, preventing straightforward GOT overwrite attacks.
Shawn Webb discovered a failure on HardenedBSD with BIND_NOW and ifunc
use, which resulted in my rtld fix in r340137. Add a BIND_NOW knob as
it is trivial to do so and is a useful ELF hardening feature. This
change is equivalent to HardenedBSD's but not identical as there are
other diffs/conflicts nearby.
Note that our ELF Tool Chain readelf does not currently decode the
DF_BIND_NOW flag - see PR232983.
Reviewed by: brooks
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17846
tijl [Tue, 6 Nov 2018 13:51:08 +0000 (13:51 +0000)]
On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers. The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver. Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.
Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration. These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.
Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again. That is a much bigger effort
however and this commit is meant to be MFCable.
This enables linux64 support in x11/nvidia-driver*.
tuexen [Tue, 6 Nov 2018 12:55:03 +0000 (12:55 +0000)]
Don't use a function when neither INET nor INET6 are defined.
This is a valid case for the userland stack, where this fixes
two set-but-not-used warnings in this case.
Thanks to Christian Wright for reporting the issue.
arichardson [Tue, 6 Nov 2018 09:36:59 +0000 (09:36 +0000)]
Remove btxld from symlinked host tools
It is only present on amd64/i386 systems which breaks buildworld on
other hosts. In fact there is no need to add it to the bootstrap tools
list since it is already included in the cross-tools phase.
However, for cross-tools it was only built if the host and target
architecture didn't match. After this change it is also built when we
are builtin with a strict $PATH.
Do not print "ip6" keyword in print_icmp6types() for O_ICMP6TYPE opcode.
It produces incompatibility when rules listing is used again to
restore saved ruleset, because "ip6" keyword produces separate opcode.
The kernel already has the check and only IPv6 packets will be checked
for matching.
jhb [Tue, 6 Nov 2018 00:11:36 +0000 (00:11 +0000)]
Add a facility for transmitting "raw" work requests on regular NIC queues.
- Use PH_loc.eight[1] as a general 'cflags' (Chelsio flags) field to
describe properties of a queued packet. The MC_RAW_WR flag
indicates an mbuf holding a raw work request. mbuf_cflags() returns
the current flags.
- Raw work request mbufs are allocated via alloc_wr_mbuf() which will
allocate a single contiguous range to hold the mbuf data. The
consumer can use mtod() to obtain the start of the work request and
write the required work request in the buffer. The mbuf can then be
enqueued directly to the txq via mp_ring_enqueue().
- Since raw work requests might potentially send arbitrary work
requests, only set the EQUIQ and EQUEQ bits on work requests that
support them such as the normal tunneled Ethernet packet work
requests.
jhb [Mon, 5 Nov 2018 22:54:03 +0000 (22:54 +0000)]
Add a custom implementation of cpu_lock_delay() for x86.
Avoid using DELAY() since it can try to use spin locks on CPUs without
a P-state invariant TSC. For cpu_lock_delay(), always use the TSC if
it exists (even if it is not P-state invariant) to delay for a
microsecond. If the TSC does not exist, read from I/O port 0x84 to
delay instead.
PR: 228768
Reported by: Roger Hammerstein <cheeky.m@live.com>
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17851
arichardson [Mon, 5 Nov 2018 22:51:44 +0000 (22:51 +0000)]
Keep inheriting $PATH when using system linker/compiler
I missed this case when testing r340157. For now just keep
$PATH when we aren't bootstrapping a compiler so that the build
can find cc/c++/ld without an absolute path.
emaste [Mon, 5 Nov 2018 22:36:45 +0000 (22:36 +0000)]
revert r340156, restoring sys/sys/capability.h
More time is still needed for ports to accommodate the migration to
capsicum.h.
The header was renamed in 2014 due to concerns about conflicts with with
a draft POSIX.1e capability.h header on other systems and there is (now)
no need for complex autoconf tests for both capability.h and capsicum.h.
Any supported Capsicum-capable system has capsicum.h.
Reported by: antoine
Sponsored by: The FreeBSD Foundation
jhb [Mon, 5 Nov 2018 21:34:17 +0000 (21:34 +0000)]
Add a KPI for the delay while spinning on a spin lock.
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI. Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms. However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.
jhb [Mon, 5 Nov 2018 20:00:36 +0000 (20:00 +0000)]
Rework setting PTE_D for kernel mappings.
Rather than unconditionally setting PTE_D for all writeable kernel
mappings, set PTE_D for writable mappings of unmanaged pages (whether
user or kernel). This matches what amd64 does and also matches what
the RISC-V spec suggests (preset the A and D bits on mappings where
the OS doesn't care about the state).
arichardson [Mon, 5 Nov 2018 19:51:16 +0000 (19:51 +0000)]
Build the elftoolchain libraries as part of bootstrap-tools
It is not necessary to build libelf and libdwarf this early. Furthermore,
when building on Linux/MacOS, m4 will only be built during the bootstrap
tools phase and not be available in $PATH before.
arichardson [Mon, 5 Nov 2018 19:51:10 +0000 (19:51 +0000)]
Allow building world without inheriting $PATH
Inheriting $PATH during the build phase can cause the build to fail when
compiling on a different system due to missing build tools or incompatible
versions somewhere in $PATH. This has cause build failures for us before
due to the jenkins slaves still running FreeBSD 10.
Listing the tools we depend on explicitly instead of just using whatever
happens to be in $PATH allows us to check that we don't accidentally add a
new build dependency.
All tools that do no need to be bootstrapped will now be symlinked to
${WORLDTMP}/legacy/bin and during the build phase $PATH will only contain
${WORLDTMP}. There is also a new variable "BOOTSTRAP_ALL_TOOLS" which can
be set to force compiling almost all bootstrap tools instead of symlinking
them. This will not bootstrap tools such as cp,mv, etc. since they may be
used during the build and for those we should really only be using POSIX
compatible options.
Furthermore, this change is required in order to be able to build on
non-FreeBSD hosts. While the same binaries may exist on Linux/MacOS they
often accept different flags or produce incompatible output.
imp [Mon, 5 Nov 2018 18:47:29 +0000 (18:47 +0000)]
Only assert locked for many async events.
Many async events that we see are called for this specific path. When
calling an async callback for a targetted device, XTP will lock that
specific device's path lock (same as what cam_periph_lock does). For
those AC_ events, assert we have the lock rather than trying to
recusrively take it (which causes panics since it's not recursive).
Add annotations about this and about the fact that AC_SCSI_AEN events
are generated now only in the ata stack (which cannot have a scsi_da
attachment). Leave it in place in case I've overlooked something as
the code is harmless.
This is fallout from my attempts to "fix" locking for softc->flags in
r330796 that's not been triggered often enough to get my attention
until now.
Sponsored by: Netflix
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D17837
mmacy [Mon, 5 Nov 2018 08:11:16 +0000 (08:11 +0000)]
hwpmc: limit wait for user callchain collection to 1 tick
The hwpmc pcpu sample buffer is prone to head of line blocking
when waiting for user process to return to user space and
collect a pending callchain. If more than one tick has elapsed
between the time the sample entry was marked for collection and
the time that the hardclock pmc handler runs to copy the records
to a larger temporary buffer, mark the sample entry as not in
use.
This changes reduces the number of samples marked as not valid
when collecting under load from ~99.5% to 5-20%.
jhibbits [Mon, 5 Nov 2018 01:53:20 +0000 (01:53 +0000)]
powerpc/SMP: Don't spam the console with AP bringup messages
Especially on new POWER9 systems, the console can be filled with
SMP: AP CPU #XX launched
messages. This can also slow down the console printing. Instead, do what
x86 now does, as of r333335, and print it all on one line, unless
bootverbose is set.