Zbigniew Bodek [Mon, 22 May 2017 14:46:13 +0000 (14:46 +0000)]
Add support for Amazon Elastic Network Adapter (ENA) NIC
ENA is a networking interface designed to make good use of modern CPU
features and system architectures.
The ENA device exposes a lightweight management interface with a
minimal set of memory mapped registers and extendable command set
through an Admin Queue.
The driver supports a range of ENA devices, is link-speed independent
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
a negotiated and extendable feature set.
Some ENA devices support SR-IOV. This driver is used for both the
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
ENA devices enable high speed and low overhead network traffic
processing by providing multiple Tx/Rx queue pairs (the maximum number
is advertised by the device via the Admin Queue), a dedicated MSI-X
interrupt vector per Tx/Rx queue pair, and CPU cacheline optimized
data placement.
The ENA driver supports industry standard TCP/IP offload features such
as checksum offload and TCP transmit segmentation offload (TSO).
Receive-side scaling (RSS) is supported for multi-core scaling.
The ENA driver and its corresponding devices implement health
monitoring mechanisms such as watchdog, enabling the device and driver
to recover in a manner transparent to the application, as well as
debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds. This feature will
be implemented for driver in future releases.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Jakub Palider <jpa@semihalf.com>
Jan Medala <jan@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10427
Ed Maste [Mon, 22 May 2017 11:43:19 +0000 (11:43 +0000)]
disallow open(2) in capability mode
Previously open(2) was allowed in capability mode, with a comment that
suggested this was likely the case to facilitate debugging. The system
call would still fail later on, but it's better to disallow the syscall
altogether.
We now have the kern.trap_enotcap sysctl or PROC_TRAPCAP_CTL proccontrol
to aid in debugging.
In any case libc has translated open() to the openat syscall since
r277032.
Reviewed by: kib, rwatson
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10850
Roger Pau Monné [Mon, 22 May 2017 11:38:39 +0000 (11:38 +0000)]
bsdinstall: mount is not needed for the ZFS install case
Because the datasets are already mounted by zfsboot, and the mount script
doesn't know anything about ZFS. Also do not execute the "umount" script for
ZFS for the same reasons.
catman(1) checks if mandoc(1) do support the manpage before trying to generate
the catpage and falls back on nroff, using the same mechanism as man(1).
Dimitry Andric [Sun, 21 May 2017 17:07:12 +0000 (17:07 +0000)]
Add libc++experimental.a for std::experimental support
This adds a separate library for supporting std::experimental features.
It is purposefully static, and must be explicitly linked into programs
using -lc++experimental.
PLEASE NOTE: there is NO WARRANTY as to any stability or continuing
existence of the features in the std::experimental parts of the C++
library!
Reviewed by: ed
Differential Revision: https://reviews.freebsd.org/D10840
Eric van Gyzen [Sat, 20 May 2017 17:42:58 +0000 (17:42 +0000)]
dma.8: fix problems reported by igor and 'mandoc -Tlint'
dma.8:77:contraction:Queue the mail, but [don't] attempt to deliver it.
dma.8:85:repeated:s [are are] ignored.
dma.8:87:contraction:[Don't] run in the background.
dma.8:201:contraction:Use the catch-all alias only if you [don't] want any local mail to be
mandoc: dma.8:308:5: WARNING: macro neither callable nor escaped: Sm
Eric van Gyzen [Sat, 20 May 2017 17:33:47 +0000 (17:33 +0000)]
libthr: Use CLI flags instead of pragmas to disable warnings
People tweaking the build system or compilers tend to look into
the Makefile and not into the source. Having some warning controls
in the Makefile and some in the source code is surprising.
Pragmas have the advantage that they leave the warnings enabled
for more code, but that advantage isn't very relevant in these cases.
2. Ensure that nsswitch.conf hosts line contains something like:
hosts: files cache dns
Note that cache must be specified before dns.
3. Start nscd.
4. Run the following command:
while true; do nc -z -w 3 www.google.com 80; sleep 5; done
5. While running the command, remove or comment out all nameserver
statements in /etc/resolv.conf. After a short while you will notice
non-recoverable name rsolution failures.
6. Uncomment or replace all nameserver statements back into
/etc/resolv.conf. Take note that name resolution never recovers.
To recover nscd must be restarted. This patch fixes this.
Ed Maste [Sat, 20 May 2017 11:20:03 +0000 (11:20 +0000)]
bsdgrep: Correct per-line line metadata printing
Metadata printing with -b, -H, or -n flags suffered from a few flaws:
1) -b/offset printing was broken when used in conjunction with -o
2) With -o, bsdgrep did not print metadata for every match/line, just
the first match of a line
3) There were no tests for this
Address these issues by outputting this data per-match if the -o flag is
specified, and prior to outputting any matches if -o but not --color,
since --color alone will not generate a new line of output for every
iteration over the matches.
To correct -b output, fudge the line offset as we're printing matches.
While here, make sure we're using grep_printline in -A context. Context
printing should *never* look at the parsing context, just the line.
The tests included do not pass with gnugrep in base due to it exhibiting
similar quirky behavior that bsdgrep previously exhibited.
Ed Maste [Sat, 20 May 2017 03:51:31 +0000 (03:51 +0000)]
bsdgrep: emit more than MAX_LINE_MATCHES per line
We should not set an arbitrary cap on the number of matches on a line,
and in any case MAX_LINE_MATCHES of 32 is much too low. Instead, if we
match more than MAX_LINE_MATCHES, keep processing and matching from the
last match until all are found.
For the regression test, we produce 4096 matches (larger than we expect
we'll ever set MAX_LINE_MATCHES) and make sure we actually get 4096
lines of output with the -o flag.
We'll also make sure that every distinct line is getting its own line
number to detect line metadata not being printed as appropriate along
the way.
Adrian Chadd [Sat, 20 May 2017 00:43:52 +0000 (00:43 +0000)]
[net80211] prepare for A-MSDU/A-MPDU offload crypto / sequence number checking.
When doing AMSDU offload, the driver (for now!) presents 802.11 frames with
the same sequence number and crypto sequence number / IV values up to the stack.
But, this will trip afoul over the sequence number detection.
So drivers now have a way to signify that a frame is part of an offloaded
AMSDU group, so we can just ensure that we pass those frames up to the
stack.
The logic will be a bit messy - the TL;DR will be that if it's part of
the previously seen sequence number then it belongs in the same burst.
But if we get a repeat of the same sequence number (eg we sent an ACK
but the receiver didn't hear it) then we shouldn't be passing those frames
up. So, we can't just say "all subframes go up", we need to track
whether we've seen the end of a burst of frames for the given sequence
number or not, so we know whether to actually pass them up or not.
The first part of doing all of this is to ensure the ieee80211_rx_stats
struct is available in the RX sequence number check path and the
RX ampdu reorder path. So, start by passing the pointer into these
functions to avoid doing another lookup.
The actual support will come in a subsequent commit once I know the
functionality actually works!
Ed Maste [Sat, 20 May 2017 00:42:47 +0000 (00:42 +0000)]
bsdgrep: fix segfault with --mmap
r313948 partially fixed --mmap behavior but was incomplete. This commit
generally reverts it and does it the more correct way- by just consuming
the rest of the buffer and moving on.
Enji Cooper [Fri, 19 May 2017 17:14:29 +0000 (17:14 +0000)]
sys/fs/tmpfs/vnd_test: make md(4) allocation dynamic
The previous logic was flawed in the sense that it assumed that /dev/md3
was always available. This was a caveat I noted in r306038, that I hadn't
gotten around to solving before now.
Cache the device for the mountpoint after executing mdmfs, then use the
cached value in basic_cleanup(..) when unmounting/disconnecting the md(4)
device.
Apply sed expressions to use reuse logic in the NetBSD code that could
also be applied to FreeBSD, just with different tools.
Enji Cooper [Fri, 19 May 2017 17:04:01 +0000 (17:04 +0000)]
Install {cron.d,newsyslog.conf.d,syslog.d} via `make distribution`, not `make install`
I incorrectly started this pattern in r277541 with the opensm newsyslog.conf.d file,
and continued using it in r318441 and r318443.
This will fix the files being handled improperly via installworld, preventing tools like
etcupdate, mergemaster, etc from functioning properly when comparing the installed
contents on a system vs the contents in a source tree when doing merges.
mlx4: Use the CQ quota for SRIOV when creating completion EQs
When creating EQs to handle CQ completion events for the PF or for
VFs, we create enough EQE entries to handle completions for the max
number of CQs that can use that EQ.
When SRIOV is activated, the max number of CQs a VF (or the PF) can
obtain is its CQ quota (determined by the Hypervisor resource
tracker). Therefore, when creating an EQ, the number of EQE entries
that the VF should request for that EQ is the CQ quota value (and not
the total number of CQs available in the firmware).
Under SRIOV, the PF, also must use its CQ quota, because the resource
tracker also controls how many CQs the PF can obtain.
Using the firmware total CQs instead of the CQ quota when creating EQs
resulted wasting MTT entries, due to allocating more EQEs than were
needed.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Don Lewis [Fri, 19 May 2017 08:38:03 +0000 (08:38 +0000)]
Fix the queue delay estimation in PIE/FQ-PIE when the timestamp
(TS) method is used. When packet timestamp is used, the "current_qdelay"
keeps storing the last queue delay value calculated in the dequeue
function. Therefore, when a burst of packets arrives followed by
a pause, the "current_qdelay" will store a high value caused by the
burst and stick to that value during the pause because the queue
delay measurement is done inside the dequeue function. This causes
the drop probability calculation function to calculate high drop
probability value instead of zero and prevents the burst allowance
mechanism from working properly. Fix this problem by resetting
"current_qdelay" inside the drop probability calculation function
when the queue length is zero and TS option is used.
Wojciech Macek [Fri, 19 May 2017 08:26:41 +0000 (08:26 +0000)]
Fix boot up on ARMADA38X uniprocessor variant
Marvell Armada 380 is a uni-processor variant of the 38x SoC
family. A function platform_mp_setmaxid() was setting a hardcoded
value, which caused boot fail on A380. Fix this by relying on
the CPU count obtained from device tree nodes.
Wojciech Macek [Fri, 19 May 2017 08:24:23 +0000 (08:24 +0000)]
Poll PHY status using internal e6000sw registers
e6000sw family automatically reflects PHY status in each port's registers.
Therefore it is not necessary to do a full PHY polling squence, which
results in much quicker operation and much less significant usage of
the SMI bus.
Care must be taken that the resulting ifmedia_active is identical to
what the PHY will compute, or gratuitous link status changes will
occur whenever the PHYs update function is called.
This patch implements above improvement. On the occasion set a pointer to
the proc structure to be part of software context instead of being
a global variable.
Roger Pau Monné [Fri, 19 May 2017 08:19:51 +0000 (08:19 +0000)]
xen/netfront: don't drop the ring RX lock with inconsistent ring state
Make sure the RX ring lock is only released when the state of the ring is
consistent, or else concurrent calls to xn_rxeof might get an inconsistent ring
state and thus some packets might be processed twice.
Note that this is not very common, and could only happen when an interrupt is
delivered while in xn_ifinit.
Wojciech Macek [Fri, 19 May 2017 08:19:39 +0000 (08:19 +0000)]
Enable proper configuration of CESA MBUS windows
For all Marvell devices, MBUS windows configuration is done
in a common place. Only CESA was an exception, so move its
related code from driver to mv_common.c. This way it uses
same proper DRAM information, same as all other interfaces
instead of parsing DT /memory node directly.
Wojciech Macek [Fri, 19 May 2017 08:16:47 +0000 (08:16 +0000)]
Improve busy-wait loop during switch phy access in e6000sw
Hitherto implementation of PHY polling resulted in a risk of an
endless loop and very high occupation of the SMI bus. Improve the
operation by limiting the polling tries and adding sleepable
pause.
Roger Pau Monné [Fri, 19 May 2017 08:11:15 +0000 (08:11 +0000)]
xen/blkfront: correctly detach a disk with active users
Call disk_gone when the backend switches to the "Closing" state and blkfront
still has pending users. This allows the disk to be detached, and will call
into xbd_closing by itself when the geom layout cleanup has finished.
Xin LI [Fri, 19 May 2017 04:59:12 +0000 (04:59 +0000)]
The current qsort(3) implementation ignores the sizes of partitions, and
always perform recursion on the left partition, then use a tail call to
handle the right partition. In the worst case this could require O(N)
levels of recursions.
Reduce the possible recursion level to log2(N) by always recursing on the
smaller partition instead.
Don Lewis [Fri, 19 May 2017 01:23:06 +0000 (01:23 +0000)]
The result of right shifting a negative signed value is implementation
defined. On machines without arithmetic shift instructions, zero bits
may be shifted in from the left, giving a large positive result instead
of the desired divide-by power-of-2. Fix this by operating on the
absolute value and compensating for the possible negation later.
Reverse the order of the underflow/overflow tests and the exponential
decay calculation to avoid the possibility of an erroneous overflow
detection if p is a sufficiently small non-negative value. Also
check for negative values of prob before doing the exponential decay
to avoid another instance of of right shifting a negative value.
Jilles Tjoelker [Thu, 18 May 2017 22:10:04 +0000 (22:10 +0000)]
sh: Keep output buffer across builtins.
Allocating and deallocating repeatedly the 1024-byte buffer for stdout from
builtins costs CPU time for little or no benefit.
A simple loop containing builtins that write to a file descriptor, such as
i=0; while [ "$i" -lt 1000000 ]; do printf .; i=$((i+1)); done >/dev/null
is over 10% faster in a simple benchmark on an amd64 virtual machine.
Mark Johnston [Thu, 18 May 2017 18:35:14 +0000 (18:35 +0000)]
Fix a few uses of kern_yield() in the TTM and the LinuxKPI.
kern_yield(0) effectively causes the calling thread to be rescheduled
immediately since it resets the thread's priority to the highest possible
value. This can cause livelocks when the pattern
"while (!trylock()) kern_yield(0);" is used since the thread holding the
lock may linger on the runqueue for the CPU on which the looping thread is
running.
indent(1): Support binary integer literals.
This was done by Romain Tartière for PR123553. I initially thought that it would break code like this:
#define b00101010 -1
if (0 b00101010)
...
by joining 0 and b00101010 together. However, the real problem with that patch was that once it saw a 0, it assumed that the number was base 2, 8 or 16, ignoring base 10 floating point numbers. I fixed that.
I didn't copy the diagnostic part of the original patch as it seems out of scope of implementing binary integer literals formatting.
The first test triggers the out of bounds read of the 'left' array. It
only fails when realpath.c is compiled with '-fsanitize=address'.
The other test checks for ENOENT when running into an empty
symlink. This matches NetBSD's realpath(3) semantics. Previously,
empty symlinks were treated like ".".
Enji Cooper [Thu, 18 May 2017 06:33:55 +0000 (06:33 +0000)]
Conditionally handle the crontab entry for atrun(8)
The default crontab prior to this commit assumes atrun(8) is always
present, which isn't true if MK_AT == no. Move atrun(8) execution
from /etc/crontab to /etc/cron.d/at, and base /etc/cron.d/at's installation
on MK_AT. cron(8) will detect /etc/cron.d/at's presence when the configuration
is loaded and run atrun every 5 minutes like it would prior to this commit.
SHELL and PATH are duplicated between /etc/crontab and /etc/cron.d/at
because atrun(8) executes programs, which may rely on environment
set in the current default /etc/crontab.
Noted by: bdrewery (in an internal review)
MFC after: 2 months
Relnotes: yes (may need to add environmental modifications to
/etc/cron.d/at)
Sponsored by: Dell EMC Isilon
Enji Cooper [Thu, 18 May 2017 06:25:39 +0000 (06:25 +0000)]
Handle the cron.d entry for MK_AT in cron conditionally
Install /etc/cron.d/at if MK_AT != no, always using it, which tries
to run a non-existent program via cron(8) every 5 minutes with the
default /etc/crontab, prior to this commit.
SHELL and PATH are duplicated between /etc/crontab and /etc/cron.d/at
because atrun(8) executes programs, which may rely on environment
currently set via /etc/crontab.
Noted by: bdrewery (in an internal review)
MFC after: 2 months
Relnotes: yes (may need to add environmental modifications to
/etc/cron.d/at)
Sponsored by: Dell EMC Isilon
Enji Cooper [Thu, 18 May 2017 01:43:30 +0000 (01:43 +0000)]
usr.bin/getconf: add some initial tests
Items tested via this commit are:
- Some basic POSIX constants.
- Some valid programming environments with -v.
- Some invalid programming environments via -v.
NOTE: this test makes assumptions about ILP32/LP32 vs LP64 that are
currently not true on all architectures to avoid hardcoding some
architectures in the tests. I'm working on improving getconf(1) to be
more sane about handling ILP32/LP32 vs LP64. Future commits are coming
soon to address this.
When I originally documented the LD_LIBRARY_PATH_FDS environment variable,
I used `.Ev` rather than `.It Ev` to introduce it; this led to the
documentation being embedded in the previous paragraph (LD_LIBRARY_PATH).
When executing rtld directly, allow a file descriptor to be explicitly
specified rather than opened from the given path. This, together with the
LD_LIBRARY_PATH_FDS environment variable, allows dynamically-linked
applications to be executed from within capability mode.
Also add some rudimentary argument parsing (without pulling in getopt or
the like) to accept this file descriptor, a help (-h) option and a basic
usage string.
John Baldwin [Wed, 17 May 2017 22:13:07 +0000 (22:13 +0000)]
Add a driver for the Chelsio T6 crypto accelerator engine.
The ccr(4) driver supports use of the crypto accelerator engine on
Chelsio T6 NICs in "lookaside" mode via the opencrypto framework.
Currently, the driver supports AES-CBC, AES-CTR, AES-GCM, and AES-XTS
cipher algorithms as well as the SHA1-HMAC, SHA2-256-HMAC, SHA2-384-HMAC,
and SHA2-512-HMAC authentication algorithms. The driver also supports
chaining one of AES-CBC, AES-CTR, or AES-XTS with an authentication
algorithm for encrypt-then-authenticate operations.
Note that this driver is still under active development and testing and
may not yet be ready for production use. It does pass the tests in
tests/sys/opencrypto with the exception that the AES-GCM implementation
in the driver does not yet support requests with a zero byte payload.
To use this driver currently, the "uwire" configuration must be used
along with explicitly enabling support for lookaside crypto capabilities
in the cxgbe(4) driver. These can be done by setting the following
tunables before loading the cxgbe(4) driver:
Zbigniew Bodek [Wed, 17 May 2017 15:58:39 +0000 (15:58 +0000)]
Fix broken malloc in e6000sw
Malloc should always return something when M_WAITOK flag is used,
but keep this code and change flag to M_NOWAIT as it is under a lock
(allows for possible future change). Free ifnet structure to avoid
memory leak on failure.
Zbigniew Bodek [Wed, 17 May 2017 15:56:09 +0000 (15:56 +0000)]
Correct MPIC order of attachment
If MPIC happens to be a slave interrupt controller (as on Armada38x),
it should be attached after primary interrupt controller.
Thus BUS_PASS_ORDER_LATE was added to default BUS_PASS_INTERRUPT.
This change doesn't affect the cases when MPIC is standalone IC.