avos [Sat, 31 Mar 2018 17:28:30 +0000 (17:28 +0000)]
MFC r324673:
mbuf(9): unbreak m_fragment()
- Fix it by replacing m_cat() with m_prev->m_next = m_new
(m_cat() will try to append data - as a result, there will be no
fragmentation).
- Move some checks out of the loop.
Some variables were renamed (m_final -> m_first, m_new -> m_last)
dim [Sat, 31 Mar 2018 11:38:43 +0000 (11:38 +0000)]
Merge clang, llvm, lld, lldb, compiler-rt and libc++ 6.0.0 release, and
several follow-up fixes.
MFC r327952:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 (branches/release_60 r321788). Upstream has branched for the
6.0.0 release, which should be in about 6 weeks. Please report bugs and
regressions, so we can get them into the release.
Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11
support to build; see UPDATING for more information.
MFC r328010:
Pull in r322473 from upstream llvm trunk (by Andrei Elovikov):
[LV] Don't call recordVectorLoopValueForInductionCast for
newly-created IV from a trunc.
Summary:
This method is supposed to be called for IVs that have casts in their
use-def chains that are completely ignored after vectorization under
PSE. However, for truncates of such IVs the same InductionDescriptor
is used during creation/widening of both original IV based on PHINode
and new IV based on TruncInst.
This leads to unintended second call to
recordVectorLoopValueForInductionCast with a VectorLoopVal set to the
newly created IV for a trunc and causes an assert due to attempt to
store new information for already existing entry in the map. This is
wrong and should not be done.
This should fix "Vector value already set for part" assertions when
building the net/iodine and sysutils/daa2iso ports.
Reported by: jbeich
PR: 224867, 224868
MFC r328090:
Pull in r322623 from upstream llvm trunk (by Andrew V. Tischenko):
Allow usage of X86-prefixes as separate instrs.
Differential Revision: https://reviews.llvm.org/D42102
This should fix parse errors when x86 prefixes (such as 'lock' and
'rep') are followed by various non-mnemonic tokens, e.g. comments, .byte
directives and labels.
PR: 224669, 225054
MFC r328091:
Revert r327340, as the workaround for rep prefixes followed by .byte
directives is no longer needed after r328090.
MFC r328141 (by emaste):
lld: Fix for ld.lld does not accept "AT" syntax for declaring LMA region
AT> lma_region expression allows to specify the memory region
for section load address.
Should fix [upstream LLVM] PR35684.
LLVM review: https://reviews.llvm.org/D41397
Obtained from: LLVM r322359 by George Rimar
MFC r328143 (by emaste):
lld: Handle parsing AT(ADDR(.foo-bar)).
The problem we had with it is that anything inside an AT is an
expression, so we failed to parse the section name because of the - in
it.
Requested by: royger
Obtained from: LLVM r322801 by Rafael Espindola
MFC r328144 (by emaste):
lld: Fix incorrect physical address on self-referencing AT command.
When a section placement (AT) command references the section itself,
the physical address of the section in the ELF header was calculated
incorrectly due to alignment happening right after the location
pointer's value was captured.
The problem was diagnosed and the first version of the patch written
by Erick Reyes.
Obtained from: LLVM r322421 by Rafael Espindola
MFC r328145:
Pull in r322016 from upstream llvm trunk (by Sanjay Patel):
[ValueTracking] remove overzealous assert
The test is derived from a failing fuzz test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5008
Credit to @rksimon for pointing out the problem.
This should fix "Bad flavor while matching min/max" errors when building
the graphics/libsixel and science/kst2 ports.
Reported by: jbeich
PR: 225268, 225269
MFC r328146:
Pull in r322106 from upstream llvm trunk (by Alexey Bataev):
[COST]Fix PR35865: Fix cost model evaluation for shuffle on X86.
Summary:
If the vector type is transformed to non-vector single type, the
compile may crash trying to get vector information about non-vector
type.
This should fix "Not a vector MVT!" errors when building the
games/dhewm3 port.
Reported by: jbeich
PR: 225271
MFC r328286 (by emaste):
lld: Don't mark a shared library as needed because of a lazy symbol.
Obtained from: LLVM r323221 by Rafael Esp?ndola
MFC r328381:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 (branches/release_60 r323338).
PR: 224669
MFC r328513:
Pull in r322245 from upstream clang trunk (by Craig Topper):
[X86] Make -mavx512f imply -mfma and -mf16c in the frontend like it
does in the backend.
Similarly, make -mno-fma and -mno-f16c imply -mno-avx512f.
Withou this "-mno-sse -mavx512f" ends up with avx512f being enabled
in the frontend but disabled in the backend.
Reported by: pawel
PR: 225488
MFC r328542 (by emaste):
lld: Use lookup instead of find. NFC, just simpler.
Obtained from: LLVM r323395 by Rafael Espindola
MFC r328543 (by emaste):
lld: Only lookup LMARegion once. NFC.
This is similar to how we handle MemRegion.
Obtained from: LLVM r323396 by Rafael Espindola
MFC r328544 (by emaste):
lld: Remove MemRegionOffset. NFC.
We can just use a member variable in MemoryRegion.
Obtained from: LLVM r323399 by Rafael Espindola
MFC r328545 (by emaste):
lld: Simplify. NFC.
Obtained from: LLVM r323440 by Rafael Espindola
MFC r328546 (by emaste):
lld: Improve LMARegion handling.
This fixes the crash reported at [LLVM] PR36083.
The issue is that we were trying to put all the sections in the same
PT_LOAD and crashing trying to write past the end of the file.
This also adds accounting for used space in LMARegion, without it all
3 PT_LOADs would have the same physical address.
Obtained from: LLVM r323449 by Rafael Espindola
MFC r328547 (by emaste):
lld: Move LMAOffset from the OutputSection to the PhdrEntry. NFC.
If two sections are in the same PT_LOAD, their relatives offsets,
virtual address and physical addresses are all the same.
[Rafael] initially wanted to have a single global LMAOffset, on the
assumption that every ELF file was in practiced loaded contiguously in
both physical and virtual memory.
Unfortunately that is not the case. The linux kernel has:
The delta for all but the third PT_LOAD is the same:
0xffffffff80000000. [Rafael] thinks the 3rd one is a hack for implementing
per cpu data, but we can't break that.
Obtained from: LLVM r323456 by Rafael Espindola
MFC r328548 (by emaste):
lld: Put the header in the first PT_LOAD even if that PT_LOAD has a LMAExpr
The root problem is that we were creating a PT_LOAD just for the header.
That was technically valid, but inconvenient: we should not be making
the ELF discontinuous.
The solution is to allow a section with LMAExpr to be added to a PT_LOAD
if that PT_LOAD doesn't already have a LMAExpr.
LLVM PR: 36017
Obtained from: LLVM r323625 by Rafael Espindola
MFC r328594 (by emaste):
Pull in r322108 from upstream llvm trunk (by Rafael Esp?ndola):
Make one of the emitFill methods non virtual. NFC.
This is just preparatory work to fix [LLVM] PR35858.
MFC r328595 (by emaste):
Pull in r322123 from upstream llvm trunk (by Rafael Esp?ndola):
Don't create MCFillFragment directly.
Instead use higher level APIs that take care of most bookkeeping.
MFC r328596 (by emaste):
Pull in r322131 from upstream llvm trunk (by Rafael Esp?ndola):
Use a MCExpr for the size of MCFillFragment.
This allows the size to be found during ralaxation. This fixes
[LLVM] pr35858.
Requested by: royger
MFC r328753:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 (branches/release_60 r323948).
PR: 224669
MFC r328817:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 (branches/release_60 r324090).
This introduces retpoline support, with the -mretpoline flag. The
upstream initial commit message (r323155 by Chandler Carruth) contains
quite a bit of explanation. Quoting:
Introduce the "retpoline" x86 mitigation technique for variant #2 of
the speculative execution vulnerabilities disclosed today,
specifically identified by CVE-2017-5715, "Branch Target Injection",
and is one of the two halves to Spectre.
Summary:
First, we need to explain the core of the vulnerability. Note that
this is a very incomplete description, please see the Project Zero
blog post for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative
execution of the processor to some "gadget" of executable code by
poisoning the prediction of indirect branches with the address of
that gadget. The gadget in turn contains an operation that provides a
side channel for reading data. Most commonly, this will look like a
load of secret data followed by a branch on the loaded value and then
a load of some predictable cache line. The attacker then uses timing
of the processors cache to determine which direction the branch took
*in the speculative execution*, and in turn what one bit of the
loaded value was. Due to the nature of these timing side channels and
the branch predictor on Intel processors, this allows an attacker to
leak data only accessible to a privileged domain (like the kernel)
back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In
many cases, the compiler can simply use directed conditional branches
and a small search tree. LLVM already has support for lowering
switches in this way and the first step of this patch is to disable
jump-table lowering of switches and introduce a pass to rewrite
explicit indirectbr sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as a
trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures
the processor predicts the return to go to a controlled, known
location. The retpoline then "smashes" the return address pushed onto
the stack by the call with the desired target of the original
indirect call. The result is a predicted return to the next
instruction after a call (which can be used to trap speculative
execution within an infinite loop) and an actual indirect branch to
an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this
device. For 32-bit ABIs there isn't a guaranteed scratch register
and so several different retpoline variants are introduced to use a
scratch register if one is available in the calling convention and to
otherwise use direct stack push/pop sequences to pass the target
address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the
retpoline thunk by the compiler to allow for custom thunks if users
want them. These are particularly useful in environments like
kernels that routinely do hot-patching on boot and want to hot-patch
their thunk to different code sequences. They can write this custom
thunk and use `-mretpoline-external-thunk` *in addition* to
`-mretpoline`. In this case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are
from precompiled runtimes (such as crt0.o and similar). The ones we
have found are not really attackable, and so we have not focused on
them here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z
retpolineplt` (or use similar functionality from some other linker).
We strongly recommend also using `-z now` as non-lazy binding allows
the retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typic al workloads, and relatively minor hits (approximately
2%) even for extremely syscall-heavy applications. This is largely
due to the small number of indirect branches that occur in
performance sensitive paths of the kernel.
When using these patches on statically linked applications,
especially C++ applications, you should expect to see a much more
dramatic performance hit. For microbenchmarks that are switch,
indirect-, or virtual-call heavy we have seen overheads ranging from
10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically
reduce the impact of hot indirect calls (by speculatively promoting
them to direct calls) and allow optimized search trees to be used to
lower switches. If you need to deploy these techniques in C++
applications, we *strongly* recommend that you ensure all hot call
targets are statically linked (avoiding PLT indirection) and use both
PGO and ThinLTO. Well tuned servers using all of these techniques saw
5% - 10% overhead from the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality
available as soon as possible. Happy for more code review, but we'd
really like to get these patches landed and backported ASAP for
obvious reasons. We're planning to backport this to both 6.0 and 5.0
release streams and get a 5.0 release with just this cherry picked
ASAP for distros and vendors.
This patch is the work of a number of people over the past month:
Eric, Reid, Rui, and myself. I'm mailing it out as a single commit
due to the time sensitive nature of landing this and the need to
backport it. Huge thanks to everyone who helped out here, and
everyone at Intel who helped out in discussions about how to craft
this. Also, credit goes to Paul Turner (at Google, but not an LLVM
contributor) for much of the underlying retpoline design.
Pull in r324594 from upstream clang trunk (by Alexander Ivchenko):
Fix for #31362 - ms_abi is implemented incorrectly for values >=16
bytes.
Summary:
This patch is a fix for following issue:
https://bugs.llvm.org/show_bug.cgi?id=31362 The problem was caused by
front end lowering C calling conventions without taking into account
calling conventions enforced by attribute. In this case win64cc was
no correctly lowered on targets other than Windows.
This fixes clang 6.0.0 assertions when building the emulators/wine and
emulators/wine-devel ports, and should also make it use the correct
Windows calling conventions. Bump __FreeBSD_version to make the fix
easy to detect.
PR: 224863
MFC r329223:
Pull in r323998 from upstream clang trunk (by Richard Smith):
PR36157: When injecting an implicit function declaration in C89, find
the right DeclContext rather than injecting it wherever we happen to
be.
This avoids creating functions whose DeclContext is a struct or
similar.
This fixes assertion failures when parsing certain not-completely-valid
struct declarations.
Reported by: ae
PR: 225862
MFC r329410:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 (branches/release_60 r325330).
PR: 224669
MFC r329983:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 (branches/release_60 r325932). This corresponds to 6.0.0 rc3.
PR: 224669
MFC r330384:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 release (upstream r326565).
Release notes for llvm, clang and lld will be available here soon:
<http://releases.llvm.org/6.0.0/docs/ReleaseNotes.html>
<http://releases.llvm.org/6.0.0/tools/clang/docs/ReleaseNotes.html>
<http://releases.llvm.org/6.0.0/tools/lld/docs/ReleaseNotes.html>
Relnotes: yes
PR: 224669
MFC r330686:
Pull in r326882 from upstream llvm trunk (by Sjoerd Meijer):
[ARM] Fix for PR36577
Don't PerformSHLSimplify if the given node is used by a node that
also uses a constant because we may get stuck in an infinite combine
loop.
This fixes an assertion when building the emulators/snes9x port.
Reported by: jbeich
PR: 225471
MFC r331066:
Pull in r321999 from upstream clang trunk (by Ivan A. Kosarev):
[CodeGen] Fix TBAA info for accesses to members of base classes
Resolves:
Bug 35724 - regression (r315984): fatal error: error in backend:
Broken function found (Did not see access type in access path!)
https://bugs.llvm.org/show_bug.cgi?id=35724
This fixes "Did not see access type in access path" fatal errors when
building the devel/gdb port (version 8.1).
Reported by: jbeich
PR: 226658
MFC r331366:
Pull in r327101 from upstream llvm trunk (by Rafael Espindola):
Don't treat .symver as a regular alias definition.
This patch starts simplifying the handling of .symver.
For now it just moves the responsibility for creating an alias down to
the streamer. With that the asm streamer can pass a .symver unchanged,
which is nice since gas cannot parse "foo@bar = zed".
In a followup I hope to move the handling down to the writer so that
we don't need special hacks for avoiding breaking names with @@@ on
windows.
Pull in r327160 from upstream llvm trunk (by Rafael Espindola):
Delay creating an alias for @@@.
With this we only create an alias for @@@ once we know if it should
use @ or @@. This avoids last minutes renames and hacks to handle MS
names.
This only handles the ELF writer. LTO still has issues with @@@
aliases.
Pull in r327928 from upstream llvm trunk (by Vitaly Buka):
Object: Move attribute calculation into RecordStreamer. NFC
Pull in r327930 from upstream llvm trunk (by Vitaly Buka):
Object: Fix handling of @@@ in .symver directive
Summary:
name@@@nodename is going to be replaced with name@@nodename if symbols is
defined in the assembled file, or name@nodename if undefined.
https://sourceware.org/binutils/docs/as/Symver.html
gjb [Sat, 31 Mar 2018 01:37:14 +0000 (01:37 +0000)]
MFC r331696, r331697:
r331696:
Update the Release Engineering article URL to the modern version.
r331697:
Add an example for building SD card images for the RPI-B and RPI3.
Note, this commit manually fixes a merge conflict caused by r325096,
which does a seemingly recursive http -> https update, which this
commit was never marked for MFC.
gonzo [Sat, 31 Mar 2018 00:30:47 +0000 (00:30 +0000)]
MFC r330558:
[ig4] Add support for i2c controllers on Skylake and Kaby Lake
This was tested by Ben on HP Chromebook 13 G1 with a
Skylake CPU and Sunrise Point-LP I2C controller and by me on
Minnowboard Turbot with Atom E3826 (formerly Bay Trail)
Submitted by: Ben Pye <ben@curlybracket.co.uk>
Reviewed by: gonzo
Obtained from: DragonflyBSD (a4549657 by Imre Vadász)
Differential Revision: https://reviews.freebsd.org/D13654
gonzo [Fri, 30 Mar 2018 23:31:08 +0000 (23:31 +0000)]
MFC r329832, r329926
r329832:
[chvgpio] add GPIO driver for Intel Z8xxx SoC family
Add chvgpio(4) driver for Intel Z8xxx SoC family. This product
was formerly known as Cherry Trail but Linux and OpenBSD drivers
refer to it as Cherry View. This driver is derived from OpenBSD
one so the name is kept for alignment with another BSD system.
Submitted by: Tom Jones <tj@enoti.me>
Reviewed by: gonzo, wblock(man page)
Differential Revision: https://reviews.freebsd.org/D13086
r329926:
Add SPDX tags for chvgpio driver sources
Also move $FreeBSD$ keyword in header to BSD license
hselasky [Fri, 30 Mar 2018 19:25:40 +0000 (19:25 +0000)]
MFC r331455:
Fix incorrect page count when mlx5core is in internal error.
Change page cleanup flow when in internal error to properly decrement
the page counts when reclaiming pages. That prevents timing out
waiting for extra pages that were actually cleaned up previously.
hselasky [Fri, 30 Mar 2018 19:24:49 +0000 (19:24 +0000)]
MFC r331453:
Don't save PCI state when PCI error is detected in mlx5core.
When a PCI error is detected the PCI state could be corrupt, don't
save it in that flow. Save the state after initialization. After
restoring the PCI state during slot reset save it again, restoring
the state destroys the previously saved state info.
hselasky [Fri, 30 Mar 2018 19:23:46 +0000 (19:23 +0000)]
MFC r331452:
Add mutual exclusion mechanism for software reset of firmware in mlx5core.
Since the FW can be shared between PCI functions it is common that
more than one health poll will detected a failure, this can lead to
multiple resets.
The solution is to use a FW locking mechanism using semaphore space to
provide a way to synchronize between functions. The FW semaphore is
acquired via config cycle access. First the VSEC gateway must be
acquired, then the semaphore can be locked by writing a value to it
and confirmed it's locked by reading the same value back. The process
in the same to free the semaphore, except the value written should be
zero.
hselasky [Fri, 30 Mar 2018 19:21:19 +0000 (19:21 +0000)]
MFC r331449:
Handle software reset of firmware in error flow in mlx5core.
Some mlx5 adapter firmware allows the driver to reset the firmware in
the event of an error. When a software reset is issued on any physical
function all PFs enter reset state. This is a recoverable condition.
The existing recovery flow was designed to allow the recovery of a
VF after a PF driver reload. This patch expands the scope of that
flow to recover PFs or VFs after a SW reset has been issued.
When a software reset is issued the following occurs:
1. The NIC interface mode is set to SW_RESET (7) while the reset is in
progress.
2. Once the reset completes the NIC interface mode is set to NIC
disabled (1).
After the reset has been issued (added in a subsequent patch) the
health poll for other functions will detect that the NIC interface
state has been set to disabled. This will cause it to enter the
existing recovery flow. If the PCI is still working (meaning it
doesn't return 0xff on all reads) it means recovery can proceed
immediately instead of waiting 60 seconds.
The error detetion has also been refactored to avoid incorrect or
misleading log messages.
hselasky [Fri, 30 Mar 2018 19:20:27 +0000 (19:20 +0000)]
MFC r331447:
Hide verbose proclamation of error when forced in mlx5core.
When mlx5_enter_error_state() operation is forced by shutdown, the
messages surrounding setting the error state are not informational
and confuse users.
hselasky [Fri, 30 Mar 2018 19:15:04 +0000 (19:15 +0000)]
MFC r330648:
Add support for explicit congestion notification, ECN, to mlx5ib(4).
ECN configuration and statistics is available through a set of sysctl(8)
nodes under sys.class.infiniband.mlx5_X.cong . The ECN configuration
nodes can also be used as loader tunables.
hselasky [Fri, 30 Mar 2018 18:55:13 +0000 (18:55 +0000)]
MFC r331437:
Create designated workqueue for each mlx5en(4) device instance.
The mlx5e_destroy_ifp() function may be called from the system workqueue and
in this case trying to flush all works will cause a dead lock.
Instead of using the system workqueue, create a designated workqueue
for each mlx5en(4) device instance.
hselasky [Fri, 30 Mar 2018 18:53:12 +0000 (18:53 +0000)]
MFC r331355:
Clear old MSIX IRQ numbers in the LinuxKPI.
When disabling the MSIX IRQ vectors for a PCI device through the
LinuxKPI, make sure any old MSIX IRQ numbers are no longer visible to
the linux_pci_find_irq_dev() function else IRQs can be requested from
the wrong PCI device.
hselasky [Fri, 30 Mar 2018 18:50:42 +0000 (18:50 +0000)]
MFC r330662:
Set correct SL in completion for RoCE in mlx5ib(4).
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.
While that, this patch also fills the vlan_id field in the completion if
present.
hselasky [Fri, 30 Mar 2018 18:38:43 +0000 (18:38 +0000)]
MFC r330580:
Make sure the IPv6 scope ID gets properly masked in ibcore.
When exchanging CM messages the IPv6 scope ID should be ignored
for link local addresses when doing comparisons. Make sure the
scope ID is always set to zero for link local addresses.
hselasky [Fri, 30 Mar 2018 18:36:44 +0000 (18:36 +0000)]
MFC r330508:
Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.
It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah
hselasky [Fri, 30 Mar 2018 18:33:30 +0000 (18:33 +0000)]
MFC r330507:
Get correct network device when accepting incoming RDMA connections in ibcore.
This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.
Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.
Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.
The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.
Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.
hselasky [Fri, 30 Mar 2018 18:31:14 +0000 (18:31 +0000)]
MFC r330504:
Add support for loopback in ibcore.
Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.
hselasky [Fri, 30 Mar 2018 18:30:10 +0000 (18:30 +0000)]
MFC r330501:
Make sure to register the VLAN GIDs using the VLAN network interface
and not the parent one in ibcore. Else looking up the VLAN GIDs will
fail for VLAN IPs.
hselasky [Fri, 30 Mar 2018 18:26:26 +0000 (18:26 +0000)]
MFC r330493:
Make deletion of RoCE GID entries synchronous in ibcore.
When a network device is departing, the RoCE GID entries should be
cleared before the default L2 link layer address is freed. Else a NULL
pointer access may happen.
hselasky [Fri, 30 Mar 2018 18:25:30 +0000 (18:25 +0000)]
MFC r330492:
Add support for IPv6 link local GIDs equal to the default GID for
VLANs in ibcore.
IPv6 link local addresses are usually derived from the netdev MAC
address. This is applicable to VLAN devices and its lower netdevice as
well. In such cases the IPv6 link local address is a duplicate of the
default GID.
Now that link local IPv6 addresses based GIDs are supported, allow
adding such GID entries in the GID table.
hselasky [Fri, 30 Mar 2018 18:24:35 +0000 (18:24 +0000)]
MFC r330491:
Do not add RoCEv2 default GID in ibcore when IPv6 is disabled to honor the
networking stack's IPv6 disabled setting. Else the offload HCA can start using
IPv6 packets for QPs.
hselasky [Fri, 30 Mar 2018 18:13:44 +0000 (18:13 +0000)]
MFC r331419:
Allow the libusb20_dev_get_port_path() function to be called when the
USB device is closed. This fixes a compatibility issue with upstream
libusb.
RoCE/infiniband/iWarp upgrade to Linux 4.9 for kernel and userspace.
This commit merges projects/bsd_rdma_4_9 to 11-stable.
Compatibility wrappers have been made for existing 11-stable ibcore
APIs, including ib_reg_phys_mr().
Refer to "sys/ofed/include/rdma/ib_verbs_compat.h" for more information.
The iw_cxgb driver has not been updated and has been disconnected from
the build.
Sponsored by: Mellanox Technologies
MFC r326169 and r326563:
RoCE/infiniband upgrade to Linux v4.9 for kernel and userspace.
List of kernel sources used:
============================
3) libibmad was cloned from git://git.openfabrics.org/~iraweiny/libibmad.git
Tag 1.3.13 with some additional patches from Mellanox.
4) infiniband-diags was cloned from git://git.openfabrics.org/~iraweiny/infiniband-diags.git
Tag 1.6.7 with some additional patches from Mellanox.
NOTES:
======
1) The mthca driver has been removed from userspace.
2) All GPLv2 only sources have been removed and where applicable
rewritten from scratch under a BSD license.
3) List of fully supported drivers in userspace and kernel:
a) iw_cxgbe (Chelsio)
b) mlx4ib (Mellanox)
c) mlx5ib (Mellanox)
4) WITH_OFED=YES is still required by make in order to build
OFED userspace and kernel code.
5) Full support has been added for routable RoCE, RoCE v2.
MFC r326649:
Disconnect OFED after r326169 broke all DIRDEPS support for it.
MFC r326716:
Correctly define the unordered_map namespace in ofed/libibnetdisc .
This should fix ofed/libibnetdisc compilation with C-compilers
different from clang and GCC v4.2.1.
MFC r326764:
ofed: Remove duplicated symbols from the version file.
ld.bfd accepts multiple listing of the same symbol in the version script.
lld is stricter and errors out. Since arm64 and sometimes amd64 use lld,
we should correct this cosmetic issue.
MFC r326765:
ofed: Define barriers for mips and arm.
I used the strongest barriers available on the architectures, so if
the future analysis show that it is excessive, the barriers could be
relaxed. Still, it is unlikely that it is meaningful to run IB on 32bit
ARM or current MIPS machines, so the change is to make WITH_OFED to pass
tinderbox.
MFC r303505:
sdp: Use an mbufq for received control packets.
This is simpler than the hand-rolled queue, and fixes a use-after-free.
Sponsored by: EMC / Isilon Storage Division
MFC r303506:
sdp: Destroy the PCB lock before freeing to the zone.
Sponsored by: EMC / Isilon Storage Division
MFC r303512:
sdp: Use malloc(9) instead of the Linux compat layer.
SDP transmit and receive rings are always created in a sleepable context,
so we can use M_WAITOK and remove error checks.
Sponsored by: EMC / Isilon Storage Division
MFC r303513:
sdp: Destroy the RDMA ID after destroying the connection's queue pair.
This is the ordering documented by rdma_destroy_qp(). Also add a useful
KASSERT to sdp_pcbfree().
Sponsored by: EMC / Isilon Storage Division
MFC r303646:
ipoib: Bound the number of egress mbufs buffered during pathrec lookups.
In pathological situations where the master subnet manager becomes
unresponsive for an extended period, we may otherwise end up queuing all
of the system's mbufs while waiting for a response to a path record lookup.
This addresses the same issue as commit 1e85b806f9 in Linux.
MFC r329222:
Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.
MFC r320418. Note that the socket lock _is_ the same as so_rcv's lock
in 11 and this is a no-op in this branch.
Sponsored by: Chelsio Communications
MFC r323082:
cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that
t4_tom picks it up right away. This is less work than waiting for
the connection to be established before applying the setting.
emaste [Fri, 30 Mar 2018 01:53:14 +0000 (01:53 +0000)]
MFC r331426: Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text. To avoid license proliferation switch to
the standard 2-Clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.
emaste [Fri, 30 Mar 2018 01:19:53 +0000 (01:19 +0000)]
MFC r329373: Correct module symbol export handling
EXPORT_SYMS can be set to YES, NO, a list of symbols to export from a
module, or to a filename containing such a list. For the case that it
is set to a symbol list, replace spaces in the list with newlines, so
the created file is in the format expected by kmod_syms.awk.
emaste [Fri, 30 Mar 2018 00:10:39 +0000 (00:10 +0000)]
MFC r321417: enable filter lib linker feature flag for lld 5.0+
Also switch the logic to enable this for any non-lld linker, since
filter library support is fairly simple and is very likely supported
by any other linker capable of linking the FreeBSD base system.
brooks [Thu, 29 Mar 2018 19:53:56 +0000 (19:53 +0000)]
MFC r328522:
Create deprecation management functions.
gone_in(majar, msg); If we're running in FreeBSD major, tell
the user this code may be deleted soon.
If we're running in FreeBSD major - 1,
the the user is deprecated and will
be gone in major.
Otherwise say nothing.
gone_in_dev(dev, major, msg) Just like gone_in, except use device_printf.
New tunable / sysctl debug.oboslete_panic: 0 - don't panic,
1 - panic in major or newer , 2 - panic in major - 1 or newer
default: 0
if NO_OBSOLETE_CODE is defined, then both of these turn into compile
time errors when building for major. Add options NO_OBSOLETE_CODE to
kernel build system.
This lets us tag code that's going away so users know it will be gone,
as well as automatically manage things.
gjb [Thu, 29 Mar 2018 19:29:12 +0000 (19:29 +0000)]
MFC r331562 (manu):
release: arm: Copy boot.scr from ports
Latest u-boot update need u-boot script to load and start ubldr.
(See D14230 for more details)
Copy this file for our arm release on the fat partition.
Provide proper values for X_LINKER_TYPE/VERSION when XLD == LD.
Pass along LINKER_* vars during installworld and show in test-system-compiler.
MFC r320258, r320272, r320275, r320502 (emaste):
change GNU ld LINKER_TYPE from binutils to bfd
GNU binutils includes two linkers: ld.bfd and ld.gold. For clarity use
LINKER_TYPE=bfd to refer to ld.bfd, the original binutils linker that
identifies itself as "GNU ld".
bsd.linker.mk: add band-aid for linker invocation failure
In some cases bsd.linker.mk reports an error like:
make[4]: ".../share/mk/bsd.linker.mk" line 56:
Unknown linker from LD=ld -m elf32ppc_fbsd:"
For now change this to a .warning, and then assume GNU ld 2.17.50.
At present the linker type detection is used only for enabling build-id,
and we can carry on without it when type detection fails.
Also, show errors from ${LD} --version to aid in failure diagnosis.
Successful invocations of ${LD} --version produce no output on stderr
so this will not create any spam in non-failing builds.
enable --build-id for the kernel link
A Build-ID is an identifier generated at link time to uniquely identify
ELF binaries. It allows efficient confirmation that an executable or
shared library and a corresponding standalone debuginfo file match.
(Otherwise, a checksum of the debuginfo file must be calculated when
opening it in a debugger.)
The FreeBSD base system includes GNU bfd ld 2.17.50 as the linker for
architectures other than arm64. Build-ID support was added to bfd ld
shortly after that version, so was not previously available to us.
We can now start making use of Build-ID as we migrate to using lld or
bfd ld from ports, conditionally enabled based on the LINKER_TYPE and
LINKER_VERSION make variables added in r320244 and subsequent commits.
Introduce LINKER_FEATURES to avoid duplicating version logic
MFC r327857 (bdrewery, submitted by Dan McGregor):
Cache LINKER_FEATURES to fix the wrong ones being used.
Sponsored by: Dell EMC Isilon
Sponsored by: The FreeBSD Foundation
araujo [Thu, 29 Mar 2018 04:51:07 +0000 (04:51 +0000)]
MFC r329817:
The firewall_type is ignored if not set in rc.conf or rc.conf.local,
after r190575 there is an option to call rc.firewall with the firewall_type
passed in as an argument.
Submitted by: David P. Discher <dpd@dpdtech.com>
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D14286
eadler [Thu, 29 Mar 2018 02:50:57 +0000 (02:50 +0000)]
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.
Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.
At present it is possible to boot from a root pool that is on RAIDZ but not
one that is on RAIDZ2 or RAIDZ3. This is because, at the time the pool
version is checked to ensure support for dual/triple parity, the uberblock
has not yet been loaded into the SPA and therefore the code determines that
the pool version is too old and returns ENOTSUP.
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
FreeBSD already had this fixed, so this is just a diff reduction.
jhb [Wed, 28 Mar 2018 17:49:31 +0000 (17:49 +0000)]
MFC 331248: Set the proper vnet in IPsec callback functions.
When using hardware crypto engines, the callback functions used to handle
an IPsec packet after it has been encrypted or decrypted can be invoked
asynchronously from a worker thread that is not associated with a vnet.
Extend 'struct xform_data' to include a vnet pointer and save the current
vnet in this new member when queueing crypto requests in IPsec. In the
IPsec callback routines, use the new member to set the current vnet while
processing the modified packet.
This fixes a panic when using hardware offload such as ccr(4) with IPsec
after VIMAGE was enabled in GENERIC.
emaste [Wed, 28 Mar 2018 17:19:04 +0000 (17:19 +0000)]
MFC r326992: embed_mfs: support embedding mfs into loader
The script originally supported embedding an mfs into ELF files or any
other type of file, because it searched for magic strings to mark the
beginning and end of the embeddable section. It was later modified to
read the section offset and length via readelf, which made it work for
ELF only. Restore the ability to update arbitrary file types by using
the readelf technique for ELF, and the magic string technique for all
others (including PE/COFF files like loader.efi).
emaste [Wed, 28 Mar 2018 17:16:16 +0000 (17:16 +0000)]
MFC r324707: embed_mfs: add error handling, usage
Ensure that we are called with two arguments, and that the output file
is writable. Also, if we cannot find the mfs section report the output
file name rather than "kernel", as this script may be used with other
than kernels.
emaste [Wed, 28 Mar 2018 16:58:24 +0000 (16:58 +0000)]
MFC r315522: use INT3 instead of NOP for x86 binary padding
We should never end up executing the inter-function padding, so we
are better off faulting than silently carrying on to whatever function
happens to be next.
emaste [Wed, 28 Mar 2018 16:54:15 +0000 (16:54 +0000)]
MFC r325422: posix_fallocate.2: add an EINVAL errno case
As of r325320 in HEAD posix_fallocate returns EINVAL on ZFS to indicate
that the underlying filesystem does not support this operation, per
POSIX.1-2008. Document this case in the man page.
Note that r325320 has not yet been merged to stable/11, and may or may
not be. However, we should document that EINVAL may be returned if the
filesystem does not support posix_fallocate (even if we don't actually
do so in stable/11), as software should be prepared to handle that case.
Discussed with: avg
Sponsored by: The FreeBSD Foundation
emaste [Wed, 28 Mar 2018 14:39:56 +0000 (14:39 +0000)]
MFC r324560: allow posix_fallocate in capability mode
posix_fallocate is logically equivalent to writing zero blocks to the
desired file size and there is no reason to prevent calling it in
capability mode. posix_fallocate already checked for the CAP_WRITE
right, so we merely need to list it in capabilities.conf.
Also MFC r324564: allow posix_fallocate in 32-bit compat capability mode
emaste [Wed, 28 Mar 2018 14:35:24 +0000 (14:35 +0000)]
MFC Capsicum open(2) and openat(2) documentation
r306537 by cem: open.2: Document Capsicum behavior
Document open(2) and openat(2) behavior in Capsicum capability mode.
Sponsored by: Dell EMC Isilon
r323622 by emaste: open(2): update ENOTCAPABLE description for .. lookups
After r308732 (MFC of r308212) Capsicum permits .. lookups in capability
mode, as long as path component traversal does not escape the directory
corresponding to the provided file descriptor.
emaste [Wed, 28 Mar 2018 13:41:43 +0000 (13:41 +0000)]
MFC r331329: Fix kernel memory disclosure in ibcs2_getdents
ibcs2_getdents() copies a dirent structure to userland. The ibcs2
dirent structure contains a 2 byte pad element. This element is never
initialized, but copied to userland none-the-less.
Note that ibcs2 has not built on HEAD since r302095.
Submitted by: Domagoj Stolfa <ds815@cam.ac.uk>
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Security: Kernel memory disclosure (803)
Sponsored by: The FreeBSD Foundation
ken [Tue, 27 Mar 2018 20:34:49 +0000 (20:34 +0000)]
MFC r331422:
------------------------------------------------------------------------
r331422 | ken | 2018-03-23 07:52:26 -0600 (Fri, 23 Mar 2018) | 42 lines
Disable T10 Protection Information / EEDP handling for type 2 protection.
The mps(4) and mpr(4) drivers and hardware handle T10 Protection
Information, which is a system of checksums and guard blocks to protect
data while it is being transferred and while it is on disk. It is also
known as T10 DIF. For more details, see section 4.22 of the SBC-4 spec.
Supporting Type 2 protection requires using 32 byte CDBs, and filling in
the fields in those CDBs. We don't yet support that in the da(4) driver.
Type 1 and Type 3 protection don't require that, and can be handled by
the mps(4)/mpr(4) driver's code and firmware without any additional
input from the da(4) driver.
If a drive has Type 2 protection enabled (you frequently see this with
SAS drives shipped from Dell), don't set the various EEDP fields in the
mps(4)/mpr(4) driver command fields. Otherwise, you wind up with errors
like this that would otherwise make no sense:
In other words, what kind of strange SAS hard drive doesn't support a
standard 10 byte SCSI READ command? In this case, one that has Type 2
protection enabled.
We can revisit this when we put Type 2 protection support in the da(4)
driver, but for now this will help people who put Type 2 formatted drives
in a system and wonder what in the world is going on.
jhb [Tue, 27 Mar 2018 20:14:22 +0000 (20:14 +0000)]
MFC 329785: Move DDP PCB state into a helper structure.
This consolidates all of the DDP state in one place. Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections. This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.
dim [Tue, 27 Mar 2018 18:52:27 +0000 (18:52 +0000)]
MFC r314568 (by emaste):
kern_sig.c: ANSIfy and remove archaic register keyword
Sponsored by: The FreeBSD Foundation
MFC r318389 (by emaste):
Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store
the corresponding variable in a CPU register, but it is not relevant
for any compiler used in the FreeBSD world today.
ANSIfy related prototypes while here.
Reviewed by: cem, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10193