Eric Joyner [Thu, 24 Aug 2017 22:56:22 +0000 (22:56 +0000)]
ixv(4): Add more robust mailbox API negotiation
The previous update to the driver to 3.2.12-k changed the VF's API version
to 1.2, but did not let the VF fall back to 1.1 or 1.0 versions. So, this
patch tries 1.2 first, then the older versions in succession if that fails.
This should allow the VF driver to negotiate 1.1 and work with older PF
drivers, such as the one used in Amazon's EC2 service.
Warner Losh [Thu, 24 Aug 2017 22:11:10 +0000 (22:11 +0000)]
Expand the latency tracking array from 1.024s to 8.192s to help track
extreme outliers from dodgy drives. Adjust comments to reflect this,
and make sure that the number of latency buckets match in the two
places where it matters.
Warner Losh [Thu, 24 Aug 2017 22:10:58 +0000 (22:10 +0000)]
Fix 32-bit overflow on latency measurements
o Allow I/O scheduler to gather times on 32-bit systems. We do this by shifting
the sbintime_t over by 8 bits and truncating to 32-bits. This gives us 8.24
time. This is sufficient both in range (256 seconds is about 128x what current
users need) and precision (60ns easily meets the 1ms smallest bucket size
measurements). 64-bit systems are unchanged. Centralize all the time math so
it's easy to tweak tha range / precision tradeoffs in the future.
o While I'm here, the I/O scheduler should be using periph_data rather than
sim_data since it is operating on behalf of the periph.
Gleb Smirnoff [Thu, 24 Aug 2017 20:52:02 +0000 (20:52 +0000)]
Add a test case for a connection on accept queue that is reset before
it is accepted. In that case accept(2) shall return ECONNABORTED.
Accept filters provide help with easily replicating that case.
Gleb Smirnoff [Thu, 24 Aug 2017 20:49:19 +0000 (20:49 +0000)]
Third take on the r319685 and r320480. Actually allow for call soisconnected()
via soisdisconnected(), and in the earlier unlock earlier to avoid lock
recursion.
This fixes a situation when a socket on accept queue is reset before being
accepted.
Reported by: Jason Eggleston <jeggleston llnw.com>
Alan Somers [Thu, 24 Aug 2017 19:48:41 +0000 (19:48 +0000)]
zfsd(8): Close a race condition when onlining a disk paritition
When inserting a partitioned disk, devfs and geom will announce the whole
disk before they announce the partition. If the partition containing ZFS
extends to one of the disk's extents, then zfsd will see a ZFS label on the
whole disk and attempt to online it. ZFS is smart enough to activate the
partition instead of the whole disk, but only if GEOM has already created
the partition's provider.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
Add a zpool_read_all_labels method. It's similar to
zpool_read_label, but it will return the number of labels found.
cddl/usr.sbin/zfsd/zfsd_event.cc
When processing a DevFS CREATE event, only online a VDEV if we can
read all four ZFS labels.
Stop masking FSGSBASE and SMEP features under monitors.
Not enabling FSGSBASE in %cr4 does not prevent reporting of the
feature by the CPUID instruction (blame Int*l). As result, kernels
which were run under monitors pretended that usermode cannot modify
TLS base without the syscall, while libc noted right combination of
capable CPU and the new kernel version, trying to use the WRFSBASE
instruction.
Really old hypervisors that cannot handle enablement of these features
in %cr4 would require the manual configuration, by setting the loader
tunable hw.cpu_stdext_disable=0x81
Reported by: lwhsu, mjoras
Sponsored by: The FreeBSD Foundation
MFC after: 18 days
Kyle Evans [Thu, 24 Aug 2017 01:23:33 +0000 (01:23 +0000)]
bsdgrep: add a primitive literal matcher
fgrep/grep -F will error out at runtime if compiled with a regex(3)
that does not define REG_NOSPEC or REG_LITERAL. glibc is one such regex(3)
implementation, and as it turns out they don't support literal matching at
all.
Provide a primitive literal matcher for use with glibc and other
implementations that don't support literal matching so that we don't
completely lose fgrep/grep -F if compiled against libgnuregex on stable/10,
stable/11, or other systems that we don't necessarily support.
This is a wholly unoptimized implementation with no plans to optimize it as
of now. This is due to both its use-case being primarily on unsupported
systems in the near-distant future and that it's reinventing the wheel that
we already have available as a feature of regex(3).
Kyle Evans [Thu, 24 Aug 2017 01:20:52 +0000 (01:20 +0000)]
bsdgrep: add some additional tests for fgrep
Previously added tests only check that fgrep is somewhat sane and works. Add
some more tests that check that the implementation is basically functional
and not producing incorrect results with various flags.
John Baldwin [Wed, 23 Aug 2017 23:30:25 +0000 (23:30 +0000)]
Improve the coverage of debug symbols for MK_DEBUG_FILES.
- Include debug symbols in static libraries. This permits binaries
to include debug symbols for functions obtained from static libraries.
- Permit the C/C++ compiler flags added for MK_DEBUG_FILES to be
overridden by setting DEBUG_FILES_CFLAGS. Use this to limit the debug
information for llvm libraries and binaries.
Ed Maste [Wed, 23 Aug 2017 17:56:55 +0000 (17:56 +0000)]
top: use __mips__ and __NetBSD__ for consistency
r322767 fixed the mips64 build failure with Clang with a minimal change
to use __FreeBSD__ instead of FreeBSD in a #if test. For consistency
and to facilitate possible upstreaming change the other macros in the
test to their canonical form.
Ed Maste [Wed, 23 Aug 2017 12:47:10 +0000 (12:47 +0000)]
Set MK_LLD_IS_LD to MK_LLD_BOOTSTRAP for cross-tools
LLD_BOOTSTRAP is intended to control the linker used to link world and
kernel, while LLD_IS_LD is intended to control the linker installed in
that world.
Force LLD_IS_LD equal to LLD_BOOTSTRAP for the cross-tools build and
install phase, so that lld will be installed as the ld to run on the
host, when LLD_BOOTSTRAP is set.
PR: 221543
Reviewed by: dim
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12072
Add new mlx5ib(4) driver to the kernel source tree which supports
Remote DMA over Converged Ethernet, RoCE, for the ConnectX-4 series of
PCI express network cards.
There is currently no user-space support and this driver only supports
kernel side non-routable RoCE V1. The krping kernel module can be used
to test this driver. Full user-space support including RoCE V2 will be
added as part of the ongoing upgrade to ibcore from Linux 4.9. Otherwise
this driver is feature equivalent to mlx4ib(4). The mlx5ib(4) kernel
module will only be built when WITH_OFED=YES is specified.
Ed Maste [Tue, 22 Aug 2017 17:57:34 +0000 (17:57 +0000)]
newvers.sh: accommodate `git worktree`
newvers.sh looks for a .vcs subdirectory (e.g. .git, .svn) to determine
which vcs info tool to run (e.g., git rev-parse, svn info).
(As of r308789 if a .vcs subdirectory is not found at ${TOPDIR} then
newvers.sh walks up successive parent directories, testing for the .vcs
subdirectory at each step. This is done in case the FreeBSD source is
built in a subdirectory as part of some larger project, but either way
newvers.sh still tests for the .vcs subdirectory.)
However, when using git worktree there is no .git subdirectory but
rather a plain text .git file which contains a reference to the main
working tree.
Change findvcs() to test that the .vcs entry exists, regardless of type.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Conrad Meyer [Tue, 22 Aug 2017 00:10:15 +0000 (00:10 +0000)]
subr_smp: Clean up topology analysis, add additional layers
Rather than repeatedly nesting loops, separate concerns with a single loop
per call stack level. Use a table to drive the recursive routine. Handle
missing topology layers more gracefully (infer a single unit).
Analyze some additional optional layers which may be present on e.g. AMD Zen
systems (groups, aka dies, per package; and cachegroups, aka CCXes, per
group).
Display that additional information in the boot-time topology information,
when it is relevent (non-one).
Mark Johnston [Mon, 21 Aug 2017 21:56:02 +0000 (21:56 +0000)]
Fix an off-by-two in the llquantize() action parameter validation.
The aggregation created by llquantize() partitions values into buckets; the
lower bound of the bucket containing the largest values is b^{m+1}, where
b and m are the second and fourth parameters to the action, respectively.
Bucket bounds are stored in a 64-bit integer, and so the llquantize()
validation checks need to verify that b^{m+1} fits in 64 bits. However, it
was only verifying that b^{m-1} fits in 64 bits, so certain parameter
combinations could trigger assertion failures in libdtrace.
Upgrade FW to 5.4.66
sysctls to display stats, stats polled every 2 seconds
Modify QLA_LOCK()/QLA_UNLOCK() to not sleep after acquiring mtx_lock
Add support to turn OFF/ON error recovery following heartbeat failure for
debug purposes.
Set default max values to 32 Tx/Rx/SDS rings
Glen Barber [Mon, 21 Aug 2017 20:23:05 +0000 (20:23 +0000)]
Apply changes from bin/chmod/tests/chmod_test.sh (r321949, r321950,
and r322101), adding atf_expect_fail() before chflags(8) is invoked
if the filesystem is ZFS, which does not support UF_IMMUTABLE.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Andrew Turner [Mon, 21 Aug 2017 18:12:32 +0000 (18:12 +0000)]
Improve the performance of the arm64 thread switching code.
The full system memory barrier around a TLB invalidation is stricter than
required. It needs to wait on accesses to main memory, with just the weaker
store variant before the invalidate. As such use the dsb istst, tlbi, dlb
ish sequence already used in pmap.
The tlbi instruction in this sequence is also unnecessarily using a
broadcast invalidate when it just needs to invalidate the local CPUs TLB.
Switch to a non-broadcast variant of this instruction.
Make WRFSBASE and WRGSBASE instructions functional.
Right now, we enable the CR4.FSGSBASE bit on CPUs which support the
facility (Ivy and later), to allow usermode to read fs and gs bases
without syscalls. This bit also controls the write access to bases
from userspace, but WRFSBASE and WRGSBASE instructions currently
cannot be used, because return path from both exceptions or interrupts
overrides bases with the values from pcb.
Supporting the instructions is useful because this means that usermode
can implement green-threads completely in userspace without issuing
syscalls to change all of the machine context.
Support is implemented by saving the fs base and user gs base when
PCB_FULL_IRET flag is set. The flag is set on the context switch,
which potentially causes clobber of the bases due to activation of
another context, and when explicit modification of the user context by
a syscall or exception handler is performed. In particular, the patch
moves setting of the flag before syscalls change context.
The changes to doreti_exit and PUSH_FRAME to clear PCB_FULL_IRET on
entry from userspace can be considered a bug fixes on its own.
Avoid dereferencing potentially freed workitem in
softdep_count_dependencies().
Buffer's b_dep list is protected by the SU mount lock. Owning the
buffer lock is not enough to guarantee the stability of the list.
Calculation of the UFS mount owning the workitems from the buffer must
be much more careful to not dereference the work item which might be
freed meantime. To get to ump, use the pointers chain which does not
involve workitems at all.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
When a security policy should match TCP connection with specific ports,
the SYN+ACK segment send by syncache_respond() is considered as forwarded
packet, because at this moment TCP connection does not have PCB structure,
and ip_output() is called without inpcb pointer. In this case SPIDX filled
for SP lookup will not contain TCP ports and security policy will not
be found. This can lead to unencrypted SYN+ACK on the wire.
This patch restores the old behavior, when ports will not be filled only
for forwarded packets.
Fix for deadlock situation in the LinuxKPI's RCU synchronize API.
Deadlock condition:
The return value of TDQ_LOCKPTR(td) is the same for two threads.
1) The first thread signals a wakeup while keeping the rcu_read_lock().
This invokes sched_add() which in turn will try to lock TDQ_LOCK().
2) The second thread is calling synchronize_rcu() calling mi_switch() over
and over again trying to yield(). This prevents the first thread from running
and releasing the RCU reader lock.
Solution:
Release the thread lock while yielding to allow other threads to acquire the
lock pointed to by TDQ_LOCKPTR(td).
Marius Strobl [Sun, 20 Aug 2017 20:38:15 +0000 (20:38 +0000)]
Bring back the much more readable unified format for differences in
/etc/{group,master.passwd}. This was originally turned on for all of
/etc/{aliases,group,master.passwd} in r55196, but then backed out
only for the latter two in r56697, as the adaption of the sed(1)ing
done in r56308 was incorrect. This left us with inconsistent diff(1)
formats in the daily output of periodic(8) ever since, despite in
r56697 having been promised to be revisited. So properly adapt the
password hash filtering to the unified format and turn the later on
again for /etc/{group,master.passwd}, too.
Do not drop NFS vnode lock when performing consistency checks.
Currently several paths in the NFS client upgrade the shared vnode
lock to exclusive, which might cause temporal dropping of the lock.
This action appears to be fatal for nullfs mounts over NFS. If the
operation is performed over nullfs vnode, then bypassed down to NFS
VOP, and the lock is dropped, other thread might reclaim the upper
nullfs vnode. Since on reclaim the nullfs vnode lock and NFS vnode
lock are split, the original lock state of the nullfs vnode is not
restored. As result, VFS operations receive not locked vnode after a
VOP call.
Stop upgrading the vnode lock when we check the consistency or flush
buffers as result of detected inconsistency. Instead, allocate a new
lockmgr lock for each NFS node, which is locked exclusive instead of
the vnode lock upgrade. In other words, the other parallel
modification of the vnode are excluded by either vnode lock conflict
or exclusivity of the new lock when the vnode lock is shared.
Also revert r316529 because now the vnode cannot be reclaimed during
ncl_vinvalbuf().
In collaboration with: pho
Reviewed by: rmacklem
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12083
Allow vinvalbuf() to operate with the shared vnode lock.
This mode allows other clean buffers to arrive while we flush the buf
lists for the vnode, which is fine for the targeted use. We only need
that all buffers existed at the time of the function start were
flushed. In fact, only one assert has to be relaxed.
In collaboration with: pho
Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12083
- Use more relevant name 'signo' instead of 'i' for the local variable
which contains a signal number to send for the current exception.
- Eliminate two labels 'userout' and 'out' which point to the very end
of the trap() function. Instead use return directly.
- Re-indent the prot_fault_translation block by reducing if() nesting.
- Some more monor style changes.
Requested and reviewed by: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Conrad Meyer [Sun, 20 Aug 2017 00:41:49 +0000 (00:41 +0000)]
hwpstate: Add support for family 17h pstate info from MSRs
This information is normally available via acpi_perf, but in case it is not,
add support for fetching the information via MSRs on AMD family 17h (Zen)
processors. Zen uses a slightly different formula than previous generation
AMD CPUs.
This was inspired by, but does not fix, PR 221621.
Reported by: Sean P. R. <seanpr AT swbell.net>
Reviewed by: mjoras@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12082
Bruce Evans [Sat, 19 Aug 2017 23:13:33 +0000 (23:13 +0000)]
Fix setting of defaults for the text cursor.
There was already a per-vty defaults field, but it was useless since it was
only initialized when propagating the global settings and thus no different
from the current global settings and not per-vty. The global defaults field
was also invariant after boot time, but not quite so useless.
Fix this by adding a second selection bit the the control flags of the
relevant ioctl(). vidcontrol doesn't support this yet. Setting either
default propagates the change to the current setting for the same level
and then to all lower levels.
Improve the 3-way escape sequence used by termcap to control the cursor.
The "normal" (ve) case has always used reset, so the user could set
it to anything, but since the reset is to a global value this is not
very useful, especially since the "very visible" (vs) case doesn't
reset but inconsistently forces to a blinking block. Change vs to
first reset and then XOR the blinking bit so that it is predictably
different from ve.
Bruce Evans [Sat, 19 Aug 2017 21:40:42 +0000 (21:40 +0000)]
Rename curr_curs_attr to base_curr_attr. The actual current cursor
attribute field is curs_attr. The base field holds user data translated
in a reversible way and is needed because current field holds this in
an irreversible way for efficiency.
Factor out some common code for the reversible translation. This is
slightly simpler now, and much easier to expand.
Translate the magic flags value -1 to a single control flag internally
up front so other flags can be trusted later. This can be used for the
relevant ioctl() too.
Remove CONS_CURSOR_FLAGS which contained all the control flags. It was
unused and not useful. After adding more flags, there will be tests on
a couple at a time but never on them all. This API should have used this
to disallow unknown flags.
Use the known valid segment when accessing memory in #UD handler.
Make sure that %eflags.D flag is cleared for hook.
Improve comments.
When #UD dtrace code checks for a registered hook before checking that
the exception was raised from kernel mode, we might run with the user
%ds, trapping on access. Exception entry from userspace automatically
load valid %ss, which we can use there instead.
Noted and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Bruce Evans [Sat, 19 Aug 2017 19:33:16 +0000 (19:33 +0000)]
Use better hard-coded defaults for the cursor shape, and remove nearby
redundant initializations.
Hard-code base = 0, height = (approx. 1/8 of the boot-time font height)
in all cases, and remove the BIOS/MD support for setting these values.
This asks for an underline cursor sized for the boot-time font instead
of various less hard-coded but worse values. I used that think that
the x86 BIOS always gave the same values as the above hard-coding, but
on 1 of my systems it gives the wrong value of base = 1.
The remaining BIOS fields are shift_state and bell_pitch. These are now
consistently not explicitly reinitialized to 0. All sc_get_bios_value()
functions except x86's are now empty, and the only useful thing that x86
returns is shift_state. This really belongs in atkbdc, but heavier
use of the BIOS to read the more useful typematic rate has been removed
there. fb still makes much heavier use of the BIOS.
Emmanuel Vadot [Sat, 19 Aug 2017 14:27:11 +0000 (14:27 +0000)]
RPI DTS: Add value previously set by VideoCore and DTB links
Using latest U-Boot for RPI 1 or 2 the DTB loaded by the firmware is discarded.
The DTB was previously patched by the firmware to contain the DMA channel mask.
DTB provided by the rpi firmware or DTS in the Linux tree contain the raw value
directly. Do the same for our DTS as we cannot switch to the upstream ones yet.
Not having the DMA channel mask setup properly cause mmc not to be detected
(and probably other problems on driver using DMA).
Also, add links for rpi dtb to the name used by u-boot. This way the dtb can be
loaded by ubldr using the U-Boot env variable fdtfile.
Tested On: RPI B Rev2, RPI Zero, RPI 2 v1.1 RPI 2 v1.2
Thanks to Sylvain Garrigues <sylvain@sylvaingarrigues.com> for the help.
Bruce Evans [Sat, 19 Aug 2017 12:14:46 +0000 (12:14 +0000)]
Reduce complexity and backwards compatibilty a little by removing new aliases
and repurposing "blink". Improve accuracy of documentation of historical
mistakes and other bugs.
"blink" now means "set the blink attribute for the target(s)" instead of
"set the blink attribute and clear other attributes [and control flags]".
It was even more confusing to use "blinking" for the single attribute to
keep the old meaning for "blink".
"destructive" is not as historically broken or gone as the previous version
said.
The bugs involving resetting from defaults are now understood and partly
documented (the defaults are mis-initialized).
Ed Maste [Sat, 19 Aug 2017 00:19:23 +0000 (00:19 +0000)]
pw usermod: Properly deal with empty secondary group lists (-G '')
"pw usermod someuser -G ''" is supposed make sure that someuser
doesn't have any secondary group memberships.
Previouly it was a nop because split_groups() only intitialised
"groups" if at least one group was specified. As a result the
existing secondary group memberships were kept.