Kyle Evans [Sat, 24 Mar 2018 04:00:01 +0000 (04:00 +0000)]
lualoader: Make config env-related bits private API
This pertains exclusively to the set/restore functionality that we offer,
where any changes made by loader.conf previously will be effectively removed
upon reload of the configuration. We don't currently have a need to export
these, so don't bother.
Kyle Evans [Sat, 24 Mar 2018 01:53:43 +0000 (01:53 +0000)]
efi loader: Choose a console mode instead if hw.vga.textmode is set
Not all systems use efifb; pull hw.vga.textmode and choose a good console
mode instead if it's set to something non-zero. This is basically a revival
of the code that used to live in boot1, but instead rebased onto this
different way of doing mode selection in loader.efi.
Interestingly enough, the regression that was previously introduced where
GOP would not reflect the console setting does not seem to exist when
console mode selection is done here. I've not done any investigation as to
why this is the case. Nevertheless, boot1.efi is still not the best place to
do mode selection.
John Baldwin [Fri, 23 Mar 2018 22:36:24 +0000 (22:36 +0000)]
Add a workaround to the hypervisor detection for older versions of KVM.
Originally KVM set %eax to 0 in the cpuid leaf 0x4000000 rather than
to the highest supported leaf in the hypervisor "branch". Detect this
case and fixup the %eax value so that the hypervisor is still
detected.
Benno Rice [Fri, 23 Mar 2018 20:56:18 +0000 (20:56 +0000)]
Allow makefs to properly tag UEFI El Torito boot images. Use them in amd64 ISOs.
UEFI booting requires an EFI System Partition (ESP). On most storage devices
this will be in a specific partition type. To allow booting from CD/ISO
filesystems, UEFI will look for an ESP in the form of a FAT filesystem image
embedded in the image. Historically FreeBSD has added one of these to its
amd64 ISO images but marked it as simply another i386 boot image. Luckily for
us most UEFI implementations are rather forgiving and work this out for us.
This change adds the ability to mark a boot image as being a UEFI image. It
also modifies our ISO generation to use this marking for the UEFI image we
embed.
Reported by: Thomas Schmitt <scdbackup@gmx.net>
Reviewed by: emaste, imp
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14809
Fix incorrect page count when mlx5core is in internal error.
Change page cleanup flow when in internal error to properly decrement
the page counts when reclaiming pages. That prevents timing out
waiting for extra pages that were actually cleaned up previously.
Don't save PCI state when PCI error is detected in mlx5core.
When a PCI error is detected the PCI state could be corrupt, don't
save it in that flow. Save the state after initialization. After
restoring the PCI state during slot reset save it again, restoring
the state destroys the previously saved state info.
Add mutual exclusion mechanism for software reset of firmware in mlx5core.
Since the FW can be shared between PCI functions it is common that
more than one health poll will detected a failure, this can lead to
multiple resets.
The solution is to use a FW locking mechanism using semaphore space to
provide a way to synchronize between functions. The FW semaphore is
acquired via config cycle access. First the VSEC gateway must be
acquired, then the semaphore can be locked by writing a value to it
and confirmed it's locked by reading the same value back. The process
in the same to free the semaphore, except the value written should be
zero.
Handle software reset of firmware in error flow in mlx5core.
Some mlx5 adapter firmware allows the driver to reset the firmware in
the event of an error. When a software reset is issued on any physical
function all PFs enter reset state. This is a recoverable condition.
The existing recovery flow was designed to allow the recovery of a
VF after a PF driver reload. This patch expands the scope of that
flow to recover PFs or VFs after a SW reset has been issued.
When a software reset is issued the following occurs:
1. The NIC interface mode is set to SW_RESET (7) while the reset is in
progress.
2. Once the reset completes the NIC interface mode is set to NIC
disabled (1).
After the reset has been issued (added in a subsequent patch) the
health poll for other functions will detect that the NIC interface
state has been set to disabled. This will cause it to enter the
existing recovery flow. If the PCI is still working (meaning it
doesn't return 0xff on all reads) it means recovery can proceed
immediately instead of waiting 60 seconds.
The error detetion has also been refactored to avoid incorrect or
misleading log messages.
Hide verbose proclamation of error when forced in mlx5core.
When mlx5_enter_error_state() operation is forced by shutdown, the
messages surrounding setting the error state are not informational
and confuse users.
Create designated workqueue for each mlx5en(4) device instance.
The mlx5e_destroy_ifp() function may be called from the system workqueue and
in this case trying to flush all works will cause a dead lock.
Instead of using the system workqueue, create a designated workqueue
for each mlx5en(4) device instance.
Kristof Provost [Fri, 23 Mar 2018 16:56:44 +0000 (16:56 +0000)]
netpfil: Introduce PFIL_FWD flag
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.
Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.
Warner Losh [Fri, 23 Mar 2018 16:23:15 +0000 (16:23 +0000)]
Flag when we have a pending TUR. Don't schedule another one when we
have one pending. Otherwise, we can race and send two, which is
wasteful in close proximity. It can also cause the acaquire/release
count for TUR to be > 1, which is undexpected.
There is no need to disable interrupts around npxsave call.
i386 was changed to only require critical section around the thread
FPU state manipulations, and vm86_bioscall callers already enter
critical section for other reasons.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Warner Losh [Fri, 23 Mar 2018 15:35:19 +0000 (15:35 +0000)]
Convert the PCI ID selection from a simple if into a table.
Mark the table with PNP info.
Fix compilation by returning FILTER_STRAY in two places, as suggested by comments.
Create a simple module from this. Left unconnected because I can't test it as a module.
Ed Maste [Fri, 23 Mar 2018 14:39:34 +0000 (14:39 +0000)]
Rationalize license text on Linuxolator files
Many licenses on Linuxolator files contained small variations from the
standard FreeBSD license text. To avoid license proliferation switch to
the standard 2-Clause FreeBSD license for those files where I have
permission from each of the listed copyright holders.
Approved by: rdivacky, marcel
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Kenneth D. Merry [Fri, 23 Mar 2018 13:52:26 +0000 (13:52 +0000)]
Disable T10 Protection Information / EEDP handling for type 2 protection.
The mps(4) and mpr(4) drivers and hardware handle T10 Protection
Information, which is a system of checksums and guard blocks to protect
data while it is being transferred and while it is on disk. It is also
known as T10 DIF. For more details, see section 4.22 of the SBC-4 spec.
Supporting Type 2 protection requires using 32 byte CDBs, and filling in
the fields in those CDBs. We don't yet support that in the da(4) driver.
Type 1 and Type 3 protection don't require that, and can be handled by
the mps(4)/mpr(4) driver's code and firmware without any additional
input from the da(4) driver.
If a drive has Type 2 protection enabled (you frequently see this with
SAS drives shipped from Dell), don't set the various EEDP fields in the
mps(4)/mpr(4) driver command fields. Otherwise, you wind up with errors
like this that would otherwise make no sense:
In other words, what kind of strange SAS hard drive doesn't support a
standard 10 byte SCSI READ command? In this case, one that has Type 2
protection enabled.
We can revisit this when we put Type 2 protection support in the da(4)
driver, but for now this will help people who put Type 2 formatted drives
in a system and wonder what in the world is going on.
Andrew Turner [Fri, 23 Mar 2018 11:08:59 +0000 (11:08 +0000)]
If sc->sc_ep_max is already set use it to find the number of RX and TX
endpoints. The Allwinner driver will need to set this as the EPINFO
register isn't useful there.
Andriy Gapon [Fri, 23 Mar 2018 09:42:47 +0000 (09:42 +0000)]
zfs: fix mismatch between format specifier and type
vdev_dbgmsg_print_tree printed vdev_id of uint64_t type with %u format
specifier. That caused subsequent parameters to be incorrectly read
from the stack and lead to a crash when a wrong value was interpreted as
a string pointer.
Conrad Meyer [Fri, 23 Mar 2018 04:31:19 +0000 (04:31 +0000)]
Bring in JHB's cryptocheck tool
It can be used to validate basic algorithm correctness on a variety of inputs,
by comarison to openssl.
While here, add some sanity to the crypto/Makefile.
The tool may not be perfect, but getting it in tree where collaboration can
happen is a nice first step. The pace of development outside of svn seems
to have slowed down mid-2017.
Cy Schubert [Fri, 23 Mar 2018 03:22:30 +0000 (03:22 +0000)]
Fix build on i386 without INVARIANTS following r331369.
--- vm_reserv.o ---
In file included from /opt/src/svn-current/sys/vm/vm_reserv.c:48:
In file included from /opt/src/svn-current/sys/sys/counter.h:37:
./machine/counter.h:174:3: error: implicit declaration of function
'critical_enter' is invalid in C99 [-Werror,-Wimplicit-function-declarat
ion]
critical_enter();
Kyle Evans [Fri, 23 Mar 2018 02:45:09 +0000 (02:45 +0000)]
efidev: Drop a quick note in about efi_cfgtbl/efi_runtime
There's no real annotation for it, so it's not immediately obvious to the
unfamiliar that these pointers are to locations in the EFI runtime map
unlike the system table pointer immediately above them.
Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Toomas Soome <tsoome@me.com>
Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Toomas Soome <tsoome@me.com>
Alexander Motin [Fri, 23 Mar 2018 02:15:05 +0000 (02:15 +0000)]
MFV r331400: 8484 Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should
implements fanned out counters and use them to improve the performance of
some of the arc statistics. These stats are updated extremely frequently,
and can consume a significant amount of CPU time.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
Alexander Motin [Fri, 23 Mar 2018 00:20:42 +0000 (00:20 +0000)]
8484 Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should
implements fanned out counters and use them to improve the performance of
some of the arc statistics. These stats are updated extremely frequently,
and can consume a significant amount of CPU time.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
Sean Bruno [Thu, 22 Mar 2018 23:34:48 +0000 (23:34 +0000)]
Refactor ip6_getpcbopt() for better locking and memory management
Created GET_PKTOPT_EXT_HDR() and GET_PKTOPT_SOCKADDR() macros to
handle safely fetching options from in6p_outputopts, including
properly dealing with in6p locking and preparing memory for
sooptcopyout().
Changed the function signature of ip6_getpcbopt() to allow the
function to acquire and release locks on in6p as needed.
Landon J. Fuller [Thu, 22 Mar 2018 22:13:46 +0000 (22:13 +0000)]
Add missing NULL checks when calling malloc(M_NOWAIT) in
bhnd_nv_strdup/bhnd_nv_strndup.
If malloc(9) failed during initial bhnd(4) attach, while allocating the root
NVRAM path string ("/"), the returned NULL pointer would be passed as the
destination to memcpy().
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Do not send signals to init directly from shutdown_nice(9), do it from
the task context.
shutdown_nice() is used from the fast interrupt handlers, mostly for
console drivers, where we cannot lock blockable locks. Schedule the
task in the fast queue to send the signal from the proper context.
Reviewed by: imp
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Fixes for ptrace(PT_GETXSTATE_INFO) related to the padding in struct
ptrace_xstate_info).
struct ptrace_xstate_info has 64bit member but ends up with 32bit
one. As result, on amd64 there is a 32bit padding at the end, but not
on i386.
We must clear the padding before doing the copyout. For compat32 case,
we must copyout the structure which does not have the padding at the
end. The later fixes 32bit gdb display of the YMM registers when
running on amd64 kernel.
Jeff Roberson [Thu, 22 Mar 2018 19:21:11 +0000 (19:21 +0000)]
Lock reservations with a dedicated lock in each reservation. Protect the
vmd_free_count with atomics.
This allows us to allocate and free from reservations without the free lock
except where a superpage is allocated from the physical layer, which is
roughly 1/512 of the operations on amd64.
Use the counter api to eliminate cache conention on counters.
Dimitry Andric [Thu, 22 Mar 2018 18:58:34 +0000 (18:58 +0000)]
Pull in r327101 from upstream llvm trunk (by Rafael Espindola):
Don't treat .symver as a regular alias definition.
This patch starts simplifying the handling of .symver.
For now it just moves the responsibility for creating an alias down to
the streamer. With that the asm streamer can pass a .symver unchanged,
which is nice since gas cannot parse "foo@bar = zed".
In a followup I hope to move the handling down to the writer so that
we don't need special hacks for avoiding breaking names with @@@ on
windows.
Pull in r327160 from upstream llvm trunk (by Rafael Espindola):
Delay creating an alias for @@@.
With this we only create an alias for @@@ once we know if it should
use @ or @@. This avoids last minutes renames and hacks to handle MS
names.
This only handles the ELF writer. LTO still has issues with @@@
aliases.
Pull in r327928 from upstream llvm trunk (by Vitaly Buka):
Object: Move attribute calculation into RecordStreamer. NFC
Pull in r327930 from upstream llvm trunk (by Vitaly Buka):
Object: Fix handling of @@@ in .symver directive
Summary:
name@@@nodename is going to be replaced with name@@nodename if symbols is
defined in the assembled file, or name@nodename if undefined.
https://sourceware.org/binutils/docs/as/Symver.html
Kyle Evans [Thu, 22 Mar 2018 18:24:00 +0000 (18:24 +0000)]
Re-work efidev ordering to fix efirt preloaded by loader on amd64
On amd64, efi_enter calls fpu_kern_enter(). This may not be called until
fpuinitstate has been invoked, resulting in a kernel panic with
efirt_load="YES" in loader.conf(5).
Move fpuinitstate a little earlier in SI_SUB_DRIVERS so that we can squeeze
efirt between it and efirtc at SI_SUB_DRIVERS, SI_ORDER_ANY. efidev must be
after efirt and doesn't really need to be at SI_SUB_DEVFS, so drop it at
SI_SUB_DRIVER, SI_ORDER_ANY.
The not immediately obvious dependency of fpuinitstate by efirt has been
noted in both places.
Discussed with: kib, andrew
Reported by: Jakob Alvermark <jakob@alvermark.net>
X-MFC-With: r330868
Andrew Turner [Thu, 22 Mar 2018 15:32:57 +0000 (15:32 +0000)]
Enter into the EFI environment before dereferencing the runtime services
pointer. This may be within the EFI address space and not the FreeBSD
kernel address space.
Warner Losh [Thu, 22 Mar 2018 15:11:53 +0000 (15:11 +0000)]
Revert r331298
Normally, shutdown_nice() just signals init. However, sometimes it
calls kern_reboot directly. For that case, r331298 dropped the Giant
lock before calling it. This turns out to be incorrect for the more
common case where init exists and we just signal it. Restore the old
behavior. The direct call to kern_reboot() doesn't sync buffers to the
disk, so should work with Giant held, so we don't need to drop locks
here for that.
When disabling the MSIX IRQ vectors for a PCI device through the
LinuxKPI, make sure any old MSIX IRQ numbers are no longer visible to
the linux_pci_find_irq_dev() function else IRQs can be requested from
the wrong PCI device.
Kyle Evans [Thu, 22 Mar 2018 11:57:59 +0000 (11:57 +0000)]
Partially revert r328780
efi.4th was added to ObsoleteFiles and disconnected from the build, but not
removed from hte repo. We've since found a mild use for it that makes some
amount of sense, so partially revert r328780 and bring it back to life.
Add the "TCP Blackbox Recorder" which we discussed at the developer
summits at BSDCan and BSDCam in 2017.
The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.
It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.
You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.
This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.
There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.
Gleb Smirnoff [Thu, 22 Mar 2018 05:07:57 +0000 (05:07 +0000)]
Fix LINT-NOINET build initializing local to false. This is
a dead code, since for NOINET build isipv6 is always true,
but this dead code makes it compilable.
Kyle Evans [Thu, 22 Mar 2018 04:16:14 +0000 (04:16 +0000)]
forthloader: Don't break BIOS boots...
I thought I tested this scenario, but clearly I failed to. =(
BIOS boots won't have efi-autoresizecons, so trying to use it as a forth
word fails during include. Use evaluate on "efi-autoresizecons" as a string
instead to move any potential errors to runtime- safely after we've already
checked that we're booting UEFI.
Ed Maste [Thu, 22 Mar 2018 01:00:55 +0000 (01:00 +0000)]
Correct signedness bug in drm_modeset_ctl
drm_modeset_ctl() takes a signed in from userland, does a boundscheck,
and then uses it to index into a structure and write to it. The
boundscheck only checks upper bound, and never checks for nagative
values. If the int coming from userland is negative [after conversion]
it will bypass the boundscheck, perform a negative index into an array
and write to it, causing memory corruption.
Note that this is in the "old" drm driver; this issue does not exist
in drm2.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: cem
MFC after: 1 day
Sponsored by: The FreeBSD Foundation
Conrad Meyer [Wed, 21 Mar 2018 23:52:37 +0000 (23:52 +0000)]
getentropy(3): Fallback to kern.arandom sysctl on older kernels
On older kernels, when userspace program disables SIGSYS, catch ENOSYS and
emulate getrandom(2) syscall with the kern.arandom sysctl (via existing
arc4_sysctl wrapper).
Special care is taken to faithfully emulate EFAULT on NULL pointers, because
sysctl(3) as used by kern.arandom ignores NULL oldp. (This was caught by
getentropy(3) ATF tests.)
Ed Maste [Wed, 21 Mar 2018 23:51:14 +0000 (23:51 +0000)]
Fix kernel memory disclosure in drm_infobufs
drm_infobufs() has a structure on the stack, fills it out and copies it
to userland. There are 2 elements in the struct that are not filled out
and left uninitialized. This will leak uninitialized kernel stack data
to userland.
Submitted by: Domagoj Stolfa <ds815@cam.ac.uk>
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 1 day
Security: Kernel memory disclosure (798)
Ed Maste [Wed, 21 Mar 2018 23:26:42 +0000 (23:26 +0000)]
Fix kernel memory disclosure in ibcs2_getdents
ibcs2_getdents() copies a dirent structure to userland. The ibcs2
dirent structure contains a 2 byte pad element. This element is never
initialized, but copied to userland none-the-less.
Note that ibcs2 has not built on HEAD since r302095.
Submitted by: Domagoj Stolfa <ds815@cam.ac.uk>
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
MFC after: 3 days
Security: Kernel memory disclosure (803)