Andrew Rybchenko [Fri, 21 Dec 2018 14:59:24 +0000 (14:59 +0000)]
MFC r340806
sfxge(4): fix default RSS context check on Siena
Default RSS context check is carried out during filter
insertion on Siena and it needs to be fixed
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18076
Andrew Rybchenko [Fri, 21 Dec 2018 14:58:16 +0000 (14:58 +0000)]
MFC r340805
sfxge(4): define a handle to denote default RSS context
Make the existing filter-specific define more general.
This is the same as MC_CMD_RSS_CONTEXT_ALLOC_OUT_RSS_CONTEXT_ID_INVALID.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18075
Andrew Rybchenko [Fri, 21 Dec 2018 14:53:50 +0000 (14:53 +0000)]
MFC r340797
sfxge(4): fix potential buffer overflow in Tx queue init
Improve error checking to avoid a caller overflowing the MCDI
request buffer if the requested TXQ size was excessively large.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18067
Andrew Rybchenko [Fri, 21 Dec 2018 14:52:46 +0000 (14:52 +0000)]
MFC r340826
sfxge(4): fix ignoring function return value
fix PreFAST issue, add missing annotation that function return value
should not be ignored. Fix alignment.
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D18096
Andrew Rybchenko [Fri, 21 Dec 2018 14:50:18 +0000 (14:50 +0000)]
MFC r340767
sfxge(4): limit max TXQ size on Medford to 2048
Queues with 4096 descriptors are not supported as the top bit is used
for vfifo stuffing.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D8948
Dimitry Andric [Thu, 20 Dec 2018 18:28:53 +0000 (18:28 +0000)]
Pull in r329671 from upstream clang trunk (by Akira Hatanaka):
[ExprConstant] Use an AST node and a version number as a key to
create an APValue and retrieve it from map Temporaries.
The version number is needed when a single AST node is visited
multiple times and is used to create APValues that are required to be
distinct from each other (for example, MaterializeTemporaryExprs in
default arguments and VarDecls in loops).
This should fix 'Assertion failed: (Result.isUninit() && "temporary
created multiple times"), function createTemporary' errors (if
assertions are enabled, otherwise the compiler internal state might go
bad), when building the graphics/rawtherapee port.
Direct commit to stable/11 and stable/12, since head already has clang
7.0.1, which includes this change.
David Bright [Thu, 20 Dec 2018 00:57:23 +0000 (00:57 +0000)]
MFC r341988
asmc: Add Support for Macbook Pro 8,1
PR: 217505
Submitted by: John O. Brickley <obryan.brickley@gmail.com>, updated by Maciej Pasternacki <maciej@pasternacki.net>
Reported by: John O. Brickley <obryan.brickley@gmail.com>
Implement unr64
pipe: use unr64
tmpfs: use unr64 for inode numbers
uipc_shm: use unr64 for inode numbers
uipc_usrreq: fix inode number assignment
unr64: use locked variant if not __LP64__
Mateusz Guzik [Wed, 19 Dec 2018 21:25:43 +0000 (21:25 +0000)]
MFC r341272,r341273,r341351
amd64: tidy up copying backwards in memmove
amd64: remove stale attribution for memmove work
amd64: handle small memmove buffers with overlapping stores
Ed Maste [Wed, 19 Dec 2018 18:17:59 +0000 (18:17 +0000)]
MFC r342227: bootpd: validate hardware type
Due to insufficient validation of network-provided data it may have been
possible for a malicious actor to craft a bootp packet which could cause
a stack buffer overflow.
admbugs: 850
Reported by: Reno Robert
Reviewed by: markj
Approved by: so
Security: FreeBSD-SA-18:15.bootpd
Sponsored by: The FreeBSD Foundation
Alexander Motin [Tue, 18 Dec 2018 22:56:24 +0000 (22:56 +0000)]
MFC r339909: Allow changing lagg(4) MTU.
Previously, changing the MTU would require destroying the lagg and
creating a new one. Now it is allowed to change the MTU of
the lagg interface and the MTU of the ports will be set to match.
If any port cannot set the new MTU, all ports are reverted to the original
MTU of the lagg. Additionally, when adding ports, the MTU of a port will be
automatically set to the MTU of the lagg. As always, the MTU of the lagg is
initially determined by the MTU of the first port added. If adding an
interface as a port for some reason fails, that interface is reverted to its
original MTU.
Submitted by: Ryan Moeller <ryan@freqlabs.com>
Relnotes: Yes
Sponsored by: iXsystems Inc.
Stephen Hurd [Tue, 18 Dec 2018 17:31:31 +0000 (17:31 +0000)]
MFC r341824:
Fix !tx_abdicate error from r336560
r336560 was supposed to restore pre-r323954 behaviour when tx_abdicate is
not set (the default case). However, it appears that rather than the drainage
check being made conditional on tx_abdicate being set, it was duplicated
so it occured twice if tx_abdicate was set and once if it was not.
Now when !tx_abdicate, drainage is only checked if the doorbell isn't
pending.
Brooks Davis [Tue, 18 Dec 2018 09:13:50 +0000 (09:13 +0000)]
MFC r342125:
Fix bugs in plugable CC algorithm and siftr sysctls.
Use the sysctl_handle_int() handler to write out the old value and read
the new value into a temporary variable. Use the temporary variable
for any checks of values rather than using the CAST_PTR_INT() macro on
req->newptr. The prior usage read directly from userspace memory if the
sysctl() was called correctly. This is unsafe and doesn't work at all on
some architectures (at least i386.)
In some cases, the code could also be tricked into reading from kernel
memory and leaking limited information about the contents or crashing
the system. This was true for CDG, newreno, and siftr on all platforms
and true for i386 in all cases. The impact of this bug is largest in
VIMAGE jails which have been configured to allow writing to these
sysctls.
Per discussion with the security officer, we will not be issuing an
advisory for this issue as root access and a non-default config are
required to be impacted.
Reviewed by: markj, bz
Discussed with: gordon (security officer)
Security: kernel information leak, local DoS (both require root)
Differential Revision: https://reviews.freebsd.org/D18443
MFC r341798:
Use correct size for IPv4 address in gethostbyaddr().
When u_long is 8 bytes, it returns EINVAL and 'ipfw -N show' doesn't work.
Reported by: Claudio Eichenberger <cei at yourshop.com>
MFC r341799:
Rework how protocol number is tracked in rule. Save it when O_PROTO
opcode will be printed. This should solve the problem, when protocol
name is not printed in `ipfw -N show`.
Reported by: Claudio Eichenberger <cei at yourshop.com>
Michal Meloun [Sat, 15 Dec 2018 06:22:48 +0000 (06:22 +0000)]
MFC r341738:
Implement R_AARCH64_TLS_DTPMOD64 and A_AARCH64_TLS_DTPREL64 relocations.
Although these are slightly obsolete in favor of R_AARCH64_TLSDESC, gcc
-mtls-dialect=trad still use them.
Alexander Motin [Fri, 14 Dec 2018 14:49:04 +0000 (14:49 +0000)]
MFC r341829: Allow CTL device specification in bhyve virtio-scsi.
There was a large refactoring done in CTL to allow multiple ioctl frontend
ports (and respective devices) to be created, particularly for bhyve.
Unfortunately, respective part of bhyve functionality got lost somehow from
the original virtio-scsi commit. This change allows wanted device path to
be specified in either of two ways:
-s 6,virtio-scsi,/dev/cam/ctl1.1
-s 6,virtio-scsi,dev=/dev/cam/ctl2.3
If neither is specified, the default /dev/cam/ctl device is used.
While there, remove per-queue CTL device opening, which makes no sense at
this point.
Alexander Motin [Fri, 14 Dec 2018 14:44:38 +0000 (14:44 +0000)]
MFC r341705: Fix several iov handling bugs in bhyve virtio-scsi backend.
- buf_to_iov() does not use buflen parameter, allowing out of bound read.
- buf_to_iov() leaks memory if seek argument > 0.
- iov_to_buf() doesn't need to reallocate buffer for every segment.
- there is no point to use size_t for iov counts, int is more then enough.
- some iov function arguments can be constified.
- pci_vtscsi_request_handle() used truncate_iov() incorrectly, allowing
getting out of buffer and possibly corrupting data.
- pci_vtscsi_controlq_notify() written returned status at wrong offset.
- pci_vtscsi_controlq_notify() leaked one buffer per event.
Michal Meloun [Fri, 14 Dec 2018 10:20:26 +0000 (10:20 +0000)]
MFC r341511,r341512,r341513:
r341511:
Fix style(9). Not a functional change.
r341512:
Implement arm64 version of __tls_get_addr().
r341513:
Tidy up arm64 reloc_jmpslots() implementation.
- don't relocate jump slots multiple times (if LD_BIND_NOW is defined).
- process only R_AARCH64_JUMP_SLOT here, other relocation types are
handled by reloc_plt().
The iflib subsystem implements netmap support in a driver-independent
way (sys/net/iflib.c). We can therefore remove the headers that
used to implement netmap support for all the drivers now supported
by iflib (em, igb, ixl, ixgbe, lem).
Kristof Provost [Thu, 13 Dec 2018 20:00:11 +0000 (20:00 +0000)]
pfsync: Performance improvement
pfsync code is called for every new state, state update and state
deletion in pf. While pf itself can operate on multiple states at the
same time (on different cores, assuming the states hash to a different
hashrow), pfsync only had a single lock.
This greatly reduced throughput on multicore systems.
Address this by splitting the pfsync queues into buckets, based on the
state id. This ensures that updates for a given connection always end up
in the same bucket, which allows pfsync to still collapse multiple
updates into one, while allowing multiple cores to proceed at the same
time.
The number of buckets is tunable, but defaults to 2 x number of cpus.
Benchmarking has shown improvement, depending on hardware and setup, from ~30%
to ~100%.
Eugene Grosbein [Thu, 13 Dec 2018 10:52:40 +0000 (10:52 +0000)]
MFC r340394: ipfw.8: Fix part of the SYNOPSIS documenting
LIST OF RULES AND PREPROCESSING that is still referred
as last section of the SYNOPSIS later but was erroneously situated
in the section IN-KERNEL NAT.
tools: netmap: pkt-gen: check packet length against interface MTU
Validate the value of the -l argument (packet length) against the MTU of the netmap port.
In case the netmap port does not refer to a physical interface (e.g. VALE port or pipe), then
the netmap buffer size is used as MTU.
This change also sets a better default value for the -M option, so that pkt-gen uses
the largest possible fragments in case of multi-slot packets.
The backpressure indication is implemented using an unlimited rate type of
mbuf send tag. When the upper layers typically the socket layer has obtained such
a tag, it can then query the destination driver queue for the current
amount of space available in the send queue.
A single mbuf send tag may be referenced multiple times and a refcount has been added
to the mlx5e_priv structure to track its usage. Because the send tag resides
in the mlx5e_channel structure, there is no need to wait for refcounts to reach
zero until the mlx4en(4) driver is detached. The channels structure is persistant
during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed
without any need of synchronization.
The mlx5e_snd_tag structure was extended to contain a type field, because there are now
two different tag types which end up in the driver which need to be distinguished.
MFC r341585:
mlx5en: Improve configuration of HW LRO.
In order to enable HW LRO, both the "hw_lro" sysctl in the mlx5en(4) config
space must be set, and the ifconfig(8) LRO capability must be set. Any other
settings will disable HW LRO.
MFC r341584:
mlx5en: Count all transmitted and received bytes.
Add counter for all transmitted and received bytes. Currently only all
transmitted and received packets were counted. Fix description of RX LRO
counters while at it.
MFC r341583:
mlx5en: Statically allocate and free the channel structure(s).
By allocating the worst case size channel structure array
at attach time we can eliminate various NULL checks in the
fast path. And also reduce the chance for use-after-free
issues in the transmit fast path.
This change is also a requirement for implementing
backpressure support.
MFC r341582:
mlx5en: Fix race in mlx5e_ethtool_debug_stats().
Writing to the debug stats variable must be locked,
else serialization will be lost which might cause
various kernel panics due to creating and destroying
sysctls out of order.
Make sure the sysctl context is initialized after freeing
the sysctl nodes, else they can be freed twice.
MFC r341580:
mlx5en: Don't set rate on SQs when the SQ is already stopped.
This can happen when connections are short lived and leads to
a firmware error printout in dmesg, syndrome 0x51cfb0, because
the SQ is in the wrong state.
MFC r341579:
mlx5en: Fix for inlining issues in transmit path
1) Don't exceed the drivers own hardcoded TX inline limit.
The blueflame register size can be much greater than the hardcoded limit
for inlining. Make sure we don't exceed the drivers own limit, because this
also means that the maximum number of TX fragments becomes invalid and
then memory size assumptions in the TX path no longer hold up.
2) Make sure the mlx5_query_min_inline() function returns an error code.
3) Header inlining is required when using TSO.
4) Catch failure to compute inline header size for TSO.
5) Add support for UDP when computing inline header size.
6) Fix for inlining issues with regards to DSCP.
Make sure we inline 4 bytes beyond the ethernet and/or
VLAN header to workaround a hardware bug extracting
the DSCP field from the IPv4/v6 header.
MFC r341578 and r341655:
mlx5en: Remove the DRBR and associated logic in the transmit path.
The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.
MFC r341577:
mlx5en: Implement support for bandwidth limiting in by ratio, ETS.
Add support for setting the bandwidth limit as a ratio rather than in bits per
second. The ratio must be an integer number between 1 and 100 inclusivly.
Implement the needed firmware commands and SYSCTLs through mlx5en(4).
MFC r341570:
mlx5ib: Make sure the congestion work timer does not escape the drain procedure.
If the mlx5_ib_read_cong_stats() function was running when mlx5ib was unloaded,
because this function unconditionally restarts the timer, the timer can still
be pending after the delayed work has been cancelled. To fix this simply loop
on the delayed work cancel procedure as long as it returns non-zero.
MFC r341568:
mlx5ib: Fix sign extension in mlx5_ib_query_device
"fw_rev_min(dev->mdev)" with type "unsigned short" (16 bits, unsigned) is
promoted in "fw_rev_min(dev->mdev) << 16" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long" (64 bits, unsigned). If
"fw_rev_min(dev->mdev) << 16" is greater than 0x7FFFFFFF, the upper bits of the
result will all be 1.
MFC r341566:
mlx5: Fixes to allow command polling mode to exist alongside event mode.
A command is either polling or event driven and the mode cannot change
during execution of a command. Make sure the event handler only handle
commands which are not polled. This is done by checking the command mode
in the command handler before completing commands.
This counter will represent transmitted packets which has more than
1518 octets.
The NIC has multiple hardware counters for counting transmitted
packets larger than 1518 octets. Each counter counts the packets
in specific range.
We accumulate those counters to have a single counter.