tsoome [Mon, 17 Dec 2018 07:43:29 +0000 (07:43 +0000)]
loader: zfs reader should not probe partitionless disks (UEFI case)
With r342151 I did fix the BIOS version of zfs_probe_dev() from accessing
the whole disk, but the fix was not complete - we actually did not check
if the device name was really for whole disk. Since UEFI version
is only calling the zfs_probe_dev() with partitions and not with whole
disk, the UEFI loader was not able to find the zfs pools.
This update does correct the issue by calling archsw.arch_getdev() to
translate the device name back to dev_desc, and we have whole disk when both
partition and slice values are -1.
tsoome [Sun, 16 Dec 2018 08:58:14 +0000 (08:58 +0000)]
loader: zfs reader should not probe partitionless disks
First of all, normal setups can not boot such pools as the tools
do not support installing boot programs.
Secondly, for proper pool configuration detection, we need to checks all
four label copies on disk, 2 from front and 2 from the end of the disk,
but zfs label does not contain the size of the disk - so we depend on
firmware to report the correct disk size or use information from the
partition table.
Without partition table, we only can rely on firmware to report and support
disk IO properly.
There is a specific case: 8TB disks are reported by BIOS to have 4294967295
sectors (0x00000000ffffffff), the sectors reported by OS is 15628053168
(0x00000003a3812ab0), so the reported size is less than actual but is hitting
32-bit max. Unfortuantely the real limit must be even lower because probing
this disk in this system will wnd up with hung system.
UEFI boot of this system seems not to be affected.
mckusick [Sat, 15 Dec 2018 19:04:50 +0000 (19:04 +0000)]
Under UFS/FFS the VFS_ROOT() function will return an error if the inode
check-hash fails. Panic'ing is not an appropriate response. So, check
for an error return from VFS_ROOT() and when an error is reported,
unwind and return the error.
Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix
mckusick [Sat, 15 Dec 2018 18:49:30 +0000 (18:49 +0000)]
Ensure that the inode check-hash is not left zeroed out in the case where
the check-hash fails. Prior to the fix in -r342133 the inode with the
zeroed out check-hash was written back to disk causing further confusion.
Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix
mckusick [Sat, 15 Dec 2018 18:35:46 +0000 (18:35 +0000)]
Reorder ffs_verify_dinode_ckhash() so that it checks the inode check-hash
before copying in the inode so that the mode and link-count are not set
if the check-hash fails. This change ensures that the vnode will be properly
unwound and recycled rather than being held in the cache.
Initialize the file mode is zero so that if the loading of the inode
fails (for example because of a check-hash failure), the vnode will be
properly unwound and recycled.
Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix
mckusick [Sat, 15 Dec 2018 17:58:42 +0000 (17:58 +0000)]
Must set ip->i_effnlink = ip->i_nlink to avoid a soft updates
"panic: softdep_update_inodeblock: bad link count" when releasing
a partially initialized vnode after an inode check-hash failure.
Reported by: Gary Jennejohn <gljennjohn@gmail.com>
Reported by: Peter Holm (pho)
Sponsored by: Netflix
mckusick [Sat, 15 Dec 2018 17:32:47 +0000 (17:32 +0000)]
Fsck would find, report, and offer to fix inode check-hash failures.
If requested to fix the inode check-hash it would confirm having done
it, but then fail to make the fix. The same code is used in fsdb which,
unlike fsck, would actually fix the inode check-hash.
The discrepancy occurred because fsck has two ways to fetch inodes.
The inode by number function ginode() and the streaming inode
function getnextinode() used during pass1. Fsdb uses the ginode()
function which correctly does the fix, while fsck first encounters
the bad inode check-hash in pass1 where it is using the getnextinode()
function that failed to make the correction. This patch corrects
the getnextinode() function so that fsck now correctly fixes inodes
with incorrect inode check-hashs.
Reported by: Gary Jennejohn <gljennjohn@gmail.com>
Sponsored by: Netflix
brooks [Sat, 15 Dec 2018 15:06:22 +0000 (15:06 +0000)]
Fix bugs in plugable CC algorithm and siftr sysctls.
Use the sysctl_handle_int() handler to write out the old value and read
the new value into a temporary variable. Use the temporary variable
for any checks of values rather than using the CAST_PTR_INT() macro on
req->newptr. The prior usage read directly from userspace memory if the
sysctl() was called correctly. This is unsafe and doesn't work at all on
some architectures (at least i386.)
In some cases, the code could also be tricked into reading from kernel
memory and leaking limited information about the contents or crashing
the system. This was true for CDG, newreno, and siftr on all platforms
and true for i386 in all cases. The impact of this bug is largest in
VIMAGE jails which have been configured to allow writing to these
sysctls.
Per discussion with the security officer, we will not be issuing an
advisory for this issue as root access and a non-default config are
required to be impacted.
Reviewed by: markj, bz
Discussed with: gordon (security officer)
MFC after: 3 days
Security: kernel information leak, local DoS (both require root)
Differential Revision: https://reviews.freebsd.org/D18443
dim [Sat, 15 Dec 2018 14:08:41 +0000 (14:08 +0000)]
Update clang, llvm, lld, lldb, compiler-rt and libc++ version number to
7.0.1 release r349250. There were no functional changes since the 7.0.1
rc3 import.
mmel [Sat, 15 Dec 2018 10:38:07 +0000 (10:38 +0000)]
Improve R_AARCH64_TLSDESC relocation.
The original code did not support dynamically loaded libraries and used
suboptimal access to TLS variables.
New implementation removes lazy resolving of TLS relocation - due to flaw
in TLSDESC design is impossible to switch resolver function at runtime
without expensive locking.
Due to this, 3 specialized resolvers are implemented:
- load time resolver for TLS relocation from libraries loaded with main
executable (thus with known TLS offset).
- resolver for undefined thread weak symbols.
- slower lazy resolver for dynamically loaded libraries with fast path for
already resolved symbols.
cem [Sat, 15 Dec 2018 05:46:04 +0000 (05:46 +0000)]
efirt: When present, attempt to use EFI runtime services to shutdown
PR: maybe related to 233998 (inconclusive at this time)
Submitted by: byuu <byuu AT tutanota.com> (previous version)
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D18506
arichardson [Fri, 14 Dec 2018 21:16:04 +0000 (21:16 +0000)]
Allow bootstrapping libnv on macOS and Linux
MacOS/Linux do not define struct cmsgcred but we need to bootstrap libnv
when building on non-FreeBSD systems. Since they are not used during
bootstrap we can just omit these two functions there.
markj [Fri, 14 Dec 2018 21:07:12 +0000 (21:07 +0000)]
Add some more checking to the RISC-V page fault handler.
- Panic immediately if witness says we're holding non-sleepable locks.
This helps ensure that we don't recurse on the pmap lock in
pmap_fault_fixup().
- Panic if the kernel faults on a user address without setting an
onfault handler.
- Panic if the fault occurred in a critical section or interrupt
handler, like we do on other platforms.
- Fix some style issues in trap_pfault().
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18561
markj [Fri, 14 Dec 2018 21:04:30 +0000 (21:04 +0000)]
Avoid needless TLB invalidations in pmap_remove_pages().
pmap_remove_pages() is called during process termination, when it is
guaranteed that no other CPU may access the mappings being torn down.
In particular, it unnecessary to invalidate each mapping individually
since we do a pmap_invalidate_all() at the end of the function.
Also don't call pmap_invalidate_all() while holding a PV list lock, the
global pvh lock is sufficient.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18562
markj [Fri, 14 Dec 2018 18:50:32 +0000 (18:50 +0000)]
Clean up the riscv pmap_bootstrap() implementation.
- Build up phys_avail[] in a single loop, excluding memory used by
the loaded kernel.
- Fix an array indexing bug in the aforementioned phys_avail[]
initialization.[1]
- Remove some unneeded code copied from the arm64 implementation.
PR: 231515 [1]
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18464
mw [Fri, 14 Dec 2018 16:14:36 +0000 (16:14 +0000)]
Introduce driver for TPM 2.0 in CRB and FIFO (TIS) modes
It was written basing on:
TCG PC Client Platform TPM Profile (PTP) Specification Version 22, Revision 1.03.
It only supports Locality 0. Interrupts are only supported in FIFO mode.
The driver in FIFO mode was tested on x86 with Infineon SLB9665 discrete TPM chip.
Driver in both modes was also tested on qemu with swtpm running on host.
manu [Fri, 14 Dec 2018 10:26:17 +0000 (10:26 +0000)]
arm64: allwinner: axp81x: Fix double invertion for FLDO1
This fix booting on A64 boards when disabling the unused regulators at boot.
We did disable all the regulator handled by register 0x13 which of course contain
mandatory regulators for the board to be up.
Reported by: Mark Millard <marklmi@yahoo.com>
X-MFC-With: r340848
kadesai [Fri, 14 Dec 2018 08:04:16 +0000 (08:04 +0000)]
This patch will add support for NVME PRPs creation by driver for fastpath
capable IOs. NVME specification supports specific type of scatter gather list
called as PRP (Physical Region Page) for IO data buffers. Since NVME drive is
connected behind SAS3.5 tri-mode adapter, MegaRAID driver/firmware has to convert
OS SGLs in native NVMe PRP format. For IOs sent to firmware, MegaRAID firmware
does this job of OS SGLs to PRP translation and send PRPs to backend NVME device.
For fastpath IOs, driver will do this OS SGLs to PRP translation.
kadesai [Fri, 14 Dec 2018 08:02:44 +0000 (08:02 +0000)]
To improve RAID 1/10 Write performance, OS drivers need to issue the
required Write IOs as Fast Path IOs (after the appropriate checks
allowing Fast Path to be used) to the appropriate physical drives
(translated from the OS logical IO) and wait for all Write IOs to complete.
Design: A write IO on RAID volume will be examined if it can be sent in
Fast Path based on IO size and starting LBA and ending LBA falling on to
a Physical Drive boundary. If the underlying RAID volume is a RAID 1/10,
driver issues two fast path write IOs one for each corresponding physical
drive after computing the corresponding start LBA for each physical drive.
Both write IOs will have the same payload and are posted to HW such that
replies land in the same reply queue.
If there are no resources available for sending two IOs, driver will send
the original IO from upper layer to RAID volume through the Firmware.
When both IOs are completed by HW, the resources will be released
and SCSI IO completion handler will be called.
kadesai [Fri, 14 Dec 2018 08:01:49 +0000 (08:01 +0000)]
Detect sequential Write IOs and pass the hint that it is part of sequential
stream to help HBA Firmware do the Full Stripe Writes. For read IOs on
certain RAID volumes like Read Ahead volumes,this will help driver to
send it to Firmware even if the IOs can potentially be sent to
hardware directly (called fast path) bypassing firmware.
Design: 8 streams are maintained per RAID volume as per the combined
firmware/driver design. When there is no stream detected the LRU stream
is used for next potential stream and LRU/MRU map is updated to make this
as MRU stream. Every time a stream is detected the MRU map
is updated to make the current stream as MRU stream.
jkim [Fri, 14 Dec 2018 01:06:34 +0000 (01:06 +0000)]
Do not complain when /dev/crypto does not exist.
Now the new devcrypto engine is enabled since r342009, many users started
seeing "Could not open /dev/crypto: No such file or directory". Disable
the annoying error message as it is not very useful anyway.
bcran [Thu, 13 Dec 2018 23:20:58 +0000 (23:20 +0000)]
Print an error message in efi_main.c if we can't allocate memory for the heap
With the default Qemu parameters, only 128MB RAM gets given to a VM. This causes
the loader to be unable to allocate the 64MB it needs for the heap. This change
makes the cause of the error more obvious.
chuck [Thu, 13 Dec 2018 13:25:37 +0000 (13:25 +0000)]
nda(4) fix check for Dataset Management support
In the nda(4) driver, only set DISKFLAG_CANDELETE (a.k.a. can support
BIO_DELETE) if the drive supports Dataset Management. There are reports
that without this check, VMWare Workstation does not work reliably.
Fix is to check the ONCS field in the NVMe Controller Data structure for
support. This check previously existed but did not survive the
big-endian changes.
jhibbits [Thu, 13 Dec 2018 05:07:39 +0000 (05:07 +0000)]
powerpc/booke: Change KERNBASE to be physical load address
Previous commits have made VM_MIN_KERNEL_ADDRESS its own separate entity,
and rebased the kernel around that address instead of KERNBASE. This commit
pulls the trigger to rebase KERNBASE to a physical load address. The
eventual goal is to align the address with the AIM KERNBASE, but at this
time that's not an option.
Currently a Book-E kernel must be loaded on a 64MB boundary, due to size
issues. The common load address is at the 64MB mark (0x04000000), so simply
make that the default KERNBASE.
As of this commit, Book-E kernels can be loaded and booted with ubldr.
jhibbits [Thu, 13 Dec 2018 04:48:28 +0000 (04:48 +0000)]
powerpcspe: Fix GPR handling in SPE exception handler
Optimize the exception handler to only save and load the upper word of the
GPRs used in the emulating instruction. This reduces the save/load
overhead, and as a side effect does not overwrite the upper word of any
temporary register.
With this commit I am now able to run editors/abiword and math/gnumeric on a
e500-based system.
mmacy [Thu, 13 Dec 2018 04:40:53 +0000 (04:40 +0000)]
Generalize AES iov optimization
Right now, aesni_cipher_alloc does a bit of special-casing
for CRYPTO_F_IOV, to not do any allocation if the first uio
is large enough for the requested size. While working on ZFS
crypto port, I ran into horrible performance because the code
uses scatter-gather, and many of the times the data to encrypt
was in the second entry. This code looks through the list, and
tries to see if there is a single uio that can contain the
requested data, and, if so, uses that.
This has a slight impact on the current consumers, in that the
check is a little more complicated for the ones that use
CRYPTO_F_IOV -- but none of them meet the criteria for testing
more than one.
Submitted by: sef at ixsystems.com
Reviewed by: cem@
MFC after: 3 days
Sponsored by: iX Systems
Differential Revision: https://reviews.freebsd.org/D18522
imp [Thu, 13 Dec 2018 00:42:26 +0000 (00:42 +0000)]
Correctly implemenet atomic_swap_long for mips64.
MIPS64 has 64-bit longs, so use uint64_t for it, otherwise uint32_t.
sizeof(long) == sizeof(ptr) for all platforms, so define
atomic_swap_ptr in terms of atomic_swap_long.
manu [Wed, 12 Dec 2018 22:08:43 +0000 (22:08 +0000)]
arm64: Add mv_cp110_icu and mv_cp110_gicp
icu is a interrupt concentrator in the CP110 block and gicp
is a gic extension to allow interrupts in the CP block to be turned
into GIC SPI interrupts
manu [Wed, 12 Dec 2018 22:01:06 +0000 (22:01 +0000)]
arm64: marvell: Add driver for Marvell Ap806 System Controller
The first two clocks are for the clusters and their frequencies can be
found reading a register. Then a fixed 1200Mhz clock is present and two
fixed clocks, 'mss' which is 1200 / 6 and 'sdio' which is 1200 / 3.
jkim [Wed, 12 Dec 2018 21:56:47 +0000 (21:56 +0000)]
Enable devcryptoeng for OpenSSL.
Since OpenSSL 1.1.1, the good old BSD-specific cryptodev engine has been
deprecated in favor of this new engine. However, this engine is not
throughly tested on FreeBSD because it was originally written for Linux.
http://cryptodev-linux.org/
Also, the author actually meant to enable it by default on BSD platforms but
he failed to do so because there was a bug in the Configure script.
https://github.com/openssl/openssl/pull/7882
Now they found that it was more generic issue.
https://github.com/openssl/openssl/pull/7885
Therefore, we need to enable this engine on head to give it more exposure.