mckusick [Sat, 15 Dec 2018 18:49:30 +0000 (18:49 +0000)]
Ensure that the inode check-hash is not left zeroed out in the case where
the check-hash fails. Prior to the fix in -r342133 the inode with the
zeroed out check-hash was written back to disk causing further confusion.
Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix
mckusick [Sat, 15 Dec 2018 18:35:46 +0000 (18:35 +0000)]
Reorder ffs_verify_dinode_ckhash() so that it checks the inode check-hash
before copying in the inode so that the mode and link-count are not set
if the check-hash fails. This change ensures that the vnode will be properly
unwound and recycled rather than being held in the cache.
Initialize the file mode is zero so that if the loading of the inode
fails (for example because of a check-hash failure), the vnode will be
properly unwound and recycled.
Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix
mckusick [Sat, 15 Dec 2018 17:58:42 +0000 (17:58 +0000)]
Must set ip->i_effnlink = ip->i_nlink to avoid a soft updates
"panic: softdep_update_inodeblock: bad link count" when releasing
a partially initialized vnode after an inode check-hash failure.
Reported by: Gary Jennejohn <gljennjohn@gmail.com>
Reported by: Peter Holm (pho)
Sponsored by: Netflix
mckusick [Sat, 15 Dec 2018 17:32:47 +0000 (17:32 +0000)]
Fsck would find, report, and offer to fix inode check-hash failures.
If requested to fix the inode check-hash it would confirm having done
it, but then fail to make the fix. The same code is used in fsdb which,
unlike fsck, would actually fix the inode check-hash.
The discrepancy occurred because fsck has two ways to fetch inodes.
The inode by number function ginode() and the streaming inode
function getnextinode() used during pass1. Fsdb uses the ginode()
function which correctly does the fix, while fsck first encounters
the bad inode check-hash in pass1 where it is using the getnextinode()
function that failed to make the correction. This patch corrects
the getnextinode() function so that fsck now correctly fixes inodes
with incorrect inode check-hashs.
Reported by: Gary Jennejohn <gljennjohn@gmail.com>
Sponsored by: Netflix
brooks [Sat, 15 Dec 2018 15:06:22 +0000 (15:06 +0000)]
Fix bugs in plugable CC algorithm and siftr sysctls.
Use the sysctl_handle_int() handler to write out the old value and read
the new value into a temporary variable. Use the temporary variable
for any checks of values rather than using the CAST_PTR_INT() macro on
req->newptr. The prior usage read directly from userspace memory if the
sysctl() was called correctly. This is unsafe and doesn't work at all on
some architectures (at least i386.)
In some cases, the code could also be tricked into reading from kernel
memory and leaking limited information about the contents or crashing
the system. This was true for CDG, newreno, and siftr on all platforms
and true for i386 in all cases. The impact of this bug is largest in
VIMAGE jails which have been configured to allow writing to these
sysctls.
Per discussion with the security officer, we will not be issuing an
advisory for this issue as root access and a non-default config are
required to be impacted.
Reviewed by: markj, bz
Discussed with: gordon (security officer)
MFC after: 3 days
Security: kernel information leak, local DoS (both require root)
Differential Revision: https://reviews.freebsd.org/D18443
dim [Sat, 15 Dec 2018 14:08:41 +0000 (14:08 +0000)]
Update clang, llvm, lld, lldb, compiler-rt and libc++ version number to
7.0.1 release r349250. There were no functional changes since the 7.0.1
rc3 import.
mmel [Sat, 15 Dec 2018 10:38:07 +0000 (10:38 +0000)]
Improve R_AARCH64_TLSDESC relocation.
The original code did not support dynamically loaded libraries and used
suboptimal access to TLS variables.
New implementation removes lazy resolving of TLS relocation - due to flaw
in TLSDESC design is impossible to switch resolver function at runtime
without expensive locking.
Due to this, 3 specialized resolvers are implemented:
- load time resolver for TLS relocation from libraries loaded with main
executable (thus with known TLS offset).
- resolver for undefined thread weak symbols.
- slower lazy resolver for dynamically loaded libraries with fast path for
already resolved symbols.
cem [Sat, 15 Dec 2018 05:46:04 +0000 (05:46 +0000)]
efirt: When present, attempt to use EFI runtime services to shutdown
PR: maybe related to 233998 (inconclusive at this time)
Submitted by: byuu <byuu AT tutanota.com> (previous version)
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D18506
arichardson [Fri, 14 Dec 2018 21:16:04 +0000 (21:16 +0000)]
Allow bootstrapping libnv on macOS and Linux
MacOS/Linux do not define struct cmsgcred but we need to bootstrap libnv
when building on non-FreeBSD systems. Since they are not used during
bootstrap we can just omit these two functions there.
markj [Fri, 14 Dec 2018 21:07:12 +0000 (21:07 +0000)]
Add some more checking to the RISC-V page fault handler.
- Panic immediately if witness says we're holding non-sleepable locks.
This helps ensure that we don't recurse on the pmap lock in
pmap_fault_fixup().
- Panic if the kernel faults on a user address without setting an
onfault handler.
- Panic if the fault occurred in a critical section or interrupt
handler, like we do on other platforms.
- Fix some style issues in trap_pfault().
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18561
markj [Fri, 14 Dec 2018 21:04:30 +0000 (21:04 +0000)]
Avoid needless TLB invalidations in pmap_remove_pages().
pmap_remove_pages() is called during process termination, when it is
guaranteed that no other CPU may access the mappings being torn down.
In particular, it unnecessary to invalidate each mapping individually
since we do a pmap_invalidate_all() at the end of the function.
Also don't call pmap_invalidate_all() while holding a PV list lock, the
global pvh lock is sufficient.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18562
markj [Fri, 14 Dec 2018 18:50:32 +0000 (18:50 +0000)]
Clean up the riscv pmap_bootstrap() implementation.
- Build up phys_avail[] in a single loop, excluding memory used by
the loaded kernel.
- Fix an array indexing bug in the aforementioned phys_avail[]
initialization.[1]
- Remove some unneeded code copied from the arm64 implementation.
PR: 231515 [1]
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18464
mw [Fri, 14 Dec 2018 16:14:36 +0000 (16:14 +0000)]
Introduce driver for TPM 2.0 in CRB and FIFO (TIS) modes
It was written basing on:
TCG PC Client Platform TPM Profile (PTP) Specification Version 22, Revision 1.03.
It only supports Locality 0. Interrupts are only supported in FIFO mode.
The driver in FIFO mode was tested on x86 with Infineon SLB9665 discrete TPM chip.
Driver in both modes was also tested on qemu with swtpm running on host.
manu [Fri, 14 Dec 2018 10:26:17 +0000 (10:26 +0000)]
arm64: allwinner: axp81x: Fix double invertion for FLDO1
This fix booting on A64 boards when disabling the unused regulators at boot.
We did disable all the regulator handled by register 0x13 which of course contain
mandatory regulators for the board to be up.
Reported by: Mark Millard <marklmi@yahoo.com>
X-MFC-With: r340848
kadesai [Fri, 14 Dec 2018 08:04:16 +0000 (08:04 +0000)]
This patch will add support for NVME PRPs creation by driver for fastpath
capable IOs. NVME specification supports specific type of scatter gather list
called as PRP (Physical Region Page) for IO data buffers. Since NVME drive is
connected behind SAS3.5 tri-mode adapter, MegaRAID driver/firmware has to convert
OS SGLs in native NVMe PRP format. For IOs sent to firmware, MegaRAID firmware
does this job of OS SGLs to PRP translation and send PRPs to backend NVME device.
For fastpath IOs, driver will do this OS SGLs to PRP translation.
kadesai [Fri, 14 Dec 2018 08:02:44 +0000 (08:02 +0000)]
To improve RAID 1/10 Write performance, OS drivers need to issue the
required Write IOs as Fast Path IOs (after the appropriate checks
allowing Fast Path to be used) to the appropriate physical drives
(translated from the OS logical IO) and wait for all Write IOs to complete.
Design: A write IO on RAID volume will be examined if it can be sent in
Fast Path based on IO size and starting LBA and ending LBA falling on to
a Physical Drive boundary. If the underlying RAID volume is a RAID 1/10,
driver issues two fast path write IOs one for each corresponding physical
drive after computing the corresponding start LBA for each physical drive.
Both write IOs will have the same payload and are posted to HW such that
replies land in the same reply queue.
If there are no resources available for sending two IOs, driver will send
the original IO from upper layer to RAID volume through the Firmware.
When both IOs are completed by HW, the resources will be released
and SCSI IO completion handler will be called.
kadesai [Fri, 14 Dec 2018 08:01:49 +0000 (08:01 +0000)]
Detect sequential Write IOs and pass the hint that it is part of sequential
stream to help HBA Firmware do the Full Stripe Writes. For read IOs on
certain RAID volumes like Read Ahead volumes,this will help driver to
send it to Firmware even if the IOs can potentially be sent to
hardware directly (called fast path) bypassing firmware.
Design: 8 streams are maintained per RAID volume as per the combined
firmware/driver design. When there is no stream detected the LRU stream
is used for next potential stream and LRU/MRU map is updated to make this
as MRU stream. Every time a stream is detected the MRU map
is updated to make the current stream as MRU stream.
jkim [Fri, 14 Dec 2018 01:06:34 +0000 (01:06 +0000)]
Do not complain when /dev/crypto does not exist.
Now the new devcrypto engine is enabled since r342009, many users started
seeing "Could not open /dev/crypto: No such file or directory". Disable
the annoying error message as it is not very useful anyway.
bcran [Thu, 13 Dec 2018 23:20:58 +0000 (23:20 +0000)]
Print an error message in efi_main.c if we can't allocate memory for the heap
With the default Qemu parameters, only 128MB RAM gets given to a VM. This causes
the loader to be unable to allocate the 64MB it needs for the heap. This change
makes the cause of the error more obvious.
chuck [Thu, 13 Dec 2018 13:25:37 +0000 (13:25 +0000)]
nda(4) fix check for Dataset Management support
In the nda(4) driver, only set DISKFLAG_CANDELETE (a.k.a. can support
BIO_DELETE) if the drive supports Dataset Management. There are reports
that without this check, VMWare Workstation does not work reliably.
Fix is to check the ONCS field in the NVMe Controller Data structure for
support. This check previously existed but did not survive the
big-endian changes.
jhibbits [Thu, 13 Dec 2018 05:07:39 +0000 (05:07 +0000)]
powerpc/booke: Change KERNBASE to be physical load address
Previous commits have made VM_MIN_KERNEL_ADDRESS its own separate entity,
and rebased the kernel around that address instead of KERNBASE. This commit
pulls the trigger to rebase KERNBASE to a physical load address. The
eventual goal is to align the address with the AIM KERNBASE, but at this
time that's not an option.
Currently a Book-E kernel must be loaded on a 64MB boundary, due to size
issues. The common load address is at the 64MB mark (0x04000000), so simply
make that the default KERNBASE.
As of this commit, Book-E kernels can be loaded and booted with ubldr.
jhibbits [Thu, 13 Dec 2018 04:48:28 +0000 (04:48 +0000)]
powerpcspe: Fix GPR handling in SPE exception handler
Optimize the exception handler to only save and load the upper word of the
GPRs used in the emulating instruction. This reduces the save/load
overhead, and as a side effect does not overwrite the upper word of any
temporary register.
With this commit I am now able to run editors/abiword and math/gnumeric on a
e500-based system.
mmacy [Thu, 13 Dec 2018 04:40:53 +0000 (04:40 +0000)]
Generalize AES iov optimization
Right now, aesni_cipher_alloc does a bit of special-casing
for CRYPTO_F_IOV, to not do any allocation if the first uio
is large enough for the requested size. While working on ZFS
crypto port, I ran into horrible performance because the code
uses scatter-gather, and many of the times the data to encrypt
was in the second entry. This code looks through the list, and
tries to see if there is a single uio that can contain the
requested data, and, if so, uses that.
This has a slight impact on the current consumers, in that the
check is a little more complicated for the ones that use
CRYPTO_F_IOV -- but none of them meet the criteria for testing
more than one.
Submitted by: sef at ixsystems.com
Reviewed by: cem@
MFC after: 3 days
Sponsored by: iX Systems
Differential Revision: https://reviews.freebsd.org/D18522
imp [Thu, 13 Dec 2018 00:42:26 +0000 (00:42 +0000)]
Correctly implemenet atomic_swap_long for mips64.
MIPS64 has 64-bit longs, so use uint64_t for it, otherwise uint32_t.
sizeof(long) == sizeof(ptr) for all platforms, so define
atomic_swap_ptr in terms of atomic_swap_long.
manu [Wed, 12 Dec 2018 22:08:43 +0000 (22:08 +0000)]
arm64: Add mv_cp110_icu and mv_cp110_gicp
icu is a interrupt concentrator in the CP110 block and gicp
is a gic extension to allow interrupts in the CP block to be turned
into GIC SPI interrupts
manu [Wed, 12 Dec 2018 22:01:06 +0000 (22:01 +0000)]
arm64: marvell: Add driver for Marvell Ap806 System Controller
The first two clocks are for the clusters and their frequencies can be
found reading a register. Then a fixed 1200Mhz clock is present and two
fixed clocks, 'mss' which is 1200 / 6 and 'sdio' which is 1200 / 3.
jkim [Wed, 12 Dec 2018 21:56:47 +0000 (21:56 +0000)]
Enable devcryptoeng for OpenSSL.
Since OpenSSL 1.1.1, the good old BSD-specific cryptodev engine has been
deprecated in favor of this new engine. However, this engine is not
throughly tested on FreeBSD because it was originally written for Linux.
http://cryptodev-linux.org/
Also, the author actually meant to enable it by default on BSD platforms but
he failed to do so because there was a bug in the Configure script.
https://github.com/openssl/openssl/pull/7882
Now they found that it was more generic issue.
https://github.com/openssl/openssl/pull/7885
Therefore, we need to enable this engine on head to give it more exposure.
kp [Wed, 12 Dec 2018 20:19:18 +0000 (20:19 +0000)]
pf tests: NAT exhaustion test
It's been reported that pf doesn't handle running out of available ports
for NAT correctly. It freezes until a state expires and it can find a
free port.
Test for this, by setting up a situation where only two ports are
available for NAT and then attempting to create three connections.
If successful the third connection will fail immediately. In an
incorrect case the connection attempt will freeze, also freezing all
interaction with pf through pfctl and trigger timeout.
kp [Wed, 12 Dec 2018 20:15:06 +0000 (20:15 +0000)]
pf: Fix endless loop on NAT exhaustion with sticky-address
When we try to find a source port in pf_get_sport() it's possible that
all available source ports will be in use. In that case we call
pf_map_addr() to try to find a new source IP to try from. If there are
no more available source IPs pf_map_addr() will return 1 and we stop
trying.
However, if sticky-address is set we'll always return the same IP
address, even if we've already tried that one.
We need to check the supplied address, because if that's the one we'd
set it means pf_get_sport() has already tried it, and we should error
out rather than keep trying.
cem [Wed, 12 Dec 2018 18:13:56 +0000 (18:13 +0000)]
gmirror: Remove a last-minute INVARIANTS breakage in r341840
I mistakenly added a lock assertion to this routine at the last minute
without confirming it was held during g_mirror_create. It isn't (it isn't
even initialized yet). Mea culpa. Access is exclusive in both callers,
just not always by that particular lock.
markj [Wed, 12 Dec 2018 15:49:14 +0000 (15:49 +0000)]
Fix a possible mbuf double free in bwn_dma_tx_start().
If bus_dmamap_load_mbuf() fails following a defrag, the caller of
bwn_dma_tx_start() would free the original mbuf after m_defrag() had
already done so. Fix this by returning the defragged mbuf to the
caller instead. Update bwn_pio_tx_start() similarly for consistency.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: landonf
Tested by: landonf
MFC after: 3 days
admbug: 820
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18342
dab [Wed, 12 Dec 2018 13:43:55 +0000 (13:43 +0000)]
asmc: Add Support for Macbook Pro 8,1
PR: 217505
Submitted by: John O. Brickley <obryan.brickley@gmail.com>, updated by Maciej Pasternacki <maciej@pasternacki.net>
Reported by: John O. Brickley <obryan.brickley@gmail.com>
MFC after: 1 week
cem [Wed, 12 Dec 2018 05:48:27 +0000 (05:48 +0000)]
gmirror: Fix a bug introduced in r341674
r341674 inadvertently introduced a bug where newer mirror components being
tasted would clear the high sc_flags that are not controlled by component
metadata, such as G_MIRROR_DEVICE_FLAG_TASTING. This could plausibly expose
a small window of time during STARTING where device destruction might race
with mirror component addition, probably resulting in a crash.
yuripv [Wed, 12 Dec 2018 04:23:00 +0000 (04:23 +0000)]
regcomp: reduce size of bitmap for multibyte locales
This fixes the obscure endless loop seen with case-insensitive
patterns containing characters in 128-255 range; originally
found running GNU grep test suite.
Our regex implementation being kludgy translates the characters
in case-insensitive pattern to bracket expression containing both
cases for the character and doesn't correctly handle the case when
original character is in bitmap and the other case is not, falling
into the endless loop going through in p_bracket(), ordinary(),
and bothcases().
Reducing the bitmap to 0-127 range for multibyte locales solves this
as none of these characters have other case mapping outside of bitmap.
We are also safe in the case when the original character outside of
bitmap has other case mapping in the bitmap (there are several of those
in our current ctype maps having unidirectional mapping into bitmap).