Relax allocation throttling for ditto blocks. Due to random imbalances
in allocation it tends to push block copies to one vdev, that looks
slightly better at the moment. Slightly less strict policy allows both
improve data security and surprisingly write performance, since we don't
need to touch extra metaslabs on each vdev to respect the min distance.
Alexander Motin [Fri, 17 Aug 2018 15:00:41 +0000 (15:00 +0000)]
9738 Fix third block copy allocations, broken at 9112.
Use METASLAB_WEIGHT_CLAIM weight to allocate tertiary blocks.
Previous use of METASLAB_WEIGHT_SECONDARY for that caused errors
later on metaslab_activate_allocator() call, leading to massive
load of unneeded metaslabs and write freezes.
Kristof Provost [Fri, 17 Aug 2018 15:00:10 +0000 (15:00 +0000)]
pf: Limit the maximum number of fragments per packet
Similar to the network stack issue fixed in r337782 pf did not limit the number
of fragments per packet, which could be exploited to generate high CPU loads
with a crafted series of packets.
Limit each packet to no more than 64 fragments. This should be sufficient on
typical networks to allow maximum-sized IP frames.
This addresses the issue for both IPv4 and IPv6.
MFC after: 3 days
Security: CVE-2018-5391
Sponsored by: Klara Systems
Notable fixes:
- Overlays may now be generated properly without -@
- /__local_fixups__ were not including unit address in their structure
- The error reporting a magic token was misleading, reporting
"Bad magic token in header. Got d00dfeed expected 0xd00dfeed"
if the token was missing. This has been split out into a separate message.
Rick Macklem [Fri, 17 Aug 2018 12:32:38 +0000 (12:32 +0000)]
Don't set a file's size for the MDS file of a pNFS service.
When a pNFS service is running, the size of the files created on the MDS
are normally 0, since the data is written to the data files on the DS(s).
However, without this patch, if a Setattr with a non-zero size was done by
a client, the MDS file was set to that size. This was thought to be benign,
but it turns out that files with a non-zero size plus extended attributes
can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this
panic() has not been isolated, this patch avoids the panic() and leaves
the MDS files in a consistent state of always having a size == 0.
Note that these MDS files never store data. The patch also includes an
unnecessary initialization of savsize in case some compiler or static
analyser complains it might not be initialized.
This patch only affects the NFS server when pNFS is enabled via the "-p"
command line option on nfsd.
Roger Pau Monné [Fri, 17 Aug 2018 07:27:15 +0000 (07:27 +0000)]
build: skip the database check when generating install media
There are several scripts and targets solely used to generate install
media, make sure DB_FROM_SRC is used in that case in order to prevent
checking the host database, which is irrelevant when generating
install binaries.
Conrad Meyer [Fri, 17 Aug 2018 04:40:01 +0000 (04:40 +0000)]
cryptosoft: Reduce generality of supported algorithm composition
Fix a regression introduced in r336439.
Rather than allowing any linked list of algorithms, allow at most two
(typically, some combination of encrypt and/or MAC). Removes a WAITOK
malloc in an unsleepable context (classic LOR) by placing both software
algorithm contexts within the OCF-managed session object.
Tested with 'cryptocheck -a all -d cryptosoft0', which includes some
encrypt-and-MAC modes.
Kyle Evans [Fri, 17 Aug 2018 04:15:51 +0000 (04:15 +0000)]
ls(1): Add --color=when
--color may be set to one of: 'auto', 'always', and 'never'.
'auto' is the default behavior- output colors only if -G or COLORTERM are
set, and only if stdout is a tty.
'always' is a new behavior- output colors always. termcap(5) will be
consulted unless TERM is unset or not a recognized terminal, in which case
ls(1) will fall back to explicitly outputting ANSI escape sequences.
'never' to turn off any environment variable and -G usage.
Summary:
PowerISA 3.0 adds a 'darn' instruction to "deliver a random number". This
driver was modeled after (rather, copied and gutted of) the Ivy Bridge
rdrand driver.
This uses the "Conditional Random Number" behavior to remove input bias.
From the ISA reference the 'darn' instruction, and the random number
generator backing it, conforms to the NIST SP800-90B and SP800-90C
standards, compliant to the extent possible at the time the hardware was
designed, and guarantees a minimum 0.5 bits of entropy per bit returned.
Kyle Evans [Fri, 17 Aug 2018 03:42:57 +0000 (03:42 +0000)]
subr_prf: Don't write kern.boot_tag if it's empty
This change allows one to set kern.boot_tag="" and not get a blank line
preceding other boot messages. While this isn't super critical- blank lines
are easy to filter out both mentally and in processing dmesg later- it
allows for a mode of operation that matches previous behavior.
I intend to MFC this whole series to stable/11 by the end of the month with
boot_tag empty by default to make this effectively a nop in the stable
branch.
Conrad Meyer [Fri, 17 Aug 2018 00:30:04 +0000 (00:30 +0000)]
Add xform-conforming auth_hash wrapper for Poly-1305
The wrapper is a thin shim around libsodium's Poly-1305 implementation. For
now, we just use the C algorithm and do not attempt to build the
SSE-optimized variant for x86 processors.
The algorithm support has not yet been plumbed through cryptodev, or added
to cryptosoft.
libsodium is derived from Daniel J. Bernstein et al.'s 2011 NaCl
("Networking and Cryptography Library," pronounced "salt") software library.
At the risk of oversimplifying, libsodium primarily exists to make it easier
to use NaCl. NaCl and libsodium provide high quality implementations of a
number of useful cryptographic concepts (as well as the underlying
primitics) seeing some adoption in newer network protocols.
I considered but dismissed cleaning up the directory hierarchy and
discarding artifacts of other build systems in favor of remaining close to
upstream (and easing future updates).
Nothing is integrated into the build system yet, so in that sense, no
functional change.
Alan Somers [Thu, 16 Aug 2018 22:04:00 +0000 (22:04 +0000)]
Revert r337929
FreeBSD's mkstemp sets the temporary file's permissions to 600, and has ever
since mkstemp was added in 1987. Coverity's warning is still relevant for
portable programs since OpenGroup does not require that behavior, and POSIX
didn't until 2008. But none of these programs are portable.
Kyle Evans [Thu, 16 Aug 2018 18:58:34 +0000 (18:58 +0000)]
libbe(3): Impose dataset length restrictions on boot env name validation
Previously, we only validated names for character restrictions. This is
helpful, but we should've also checked length restrictions- dataset names
must be restricted to MAXNAMELEN.
While here, move validation before doing a bunch of concatenations and fix
error handling in be_rename. It was previously setting the error state based
on return value from a libzfs function, which is wrong: libzfs errors don't
necessarily match cleanly to libbe errors. This would cause the assertion in
be_error to hit when the error was printed.
Jamie Gritton [Thu, 16 Aug 2018 18:40:16 +0000 (18:40 +0000)]
Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating
jails since FreeBSD 7.
Along with the system call, put the various security.jail.allow_foo and
security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or
BURN_BRIDGES). These sysctls had two disparate uses: on the system side,
they were global permissions for jails created via jail(2) which lacked
fine-grained permission controls; inside a jail, they're read-only
descriptions of what the current jail is allowed to do. The first use
is obsolete along with jail(2), but keep them for the second-read-only use.
Doug Ambrisko [Thu, 16 Aug 2018 15:59:02 +0000 (15:59 +0000)]
Fix a module Makefile error on amd64 so the IPMI HW interfaces are built.
When the module is being unloaded and no HW interfaces were created don't
clean up. This was exposed by the amd64 module build issue.
- Use "Dq Li" for inline commands as we do in other manuals.
- Pet "igor" and "mandoc -Tlint".
- Reword some parts for clarity.
- Add missing Xr macros.
- Reformat SEE ALSO to make the section more readable.
Reviewed by: eadler, krion, mat
Approved by: krion (mentor), mat (mentor)
Differential Revision: https://reviews.freebsd.org/D15350
Andrew Turner [Thu, 16 Aug 2018 10:00:51 +0000 (10:00 +0000)]
Remove the L1 and L2 xscale page defines and rename the generic macros to
the common name. While here move the macros to check these into pmap-v4.c
as they're only used there.
Ed Maste [Thu, 16 Aug 2018 09:11:34 +0000 (09:11 +0000)]
Enable LLD_IS_LD by default on armv7
lld should now be a usable linker for armv7, and is already used as the
bootstrap linker (for linking the kernel and userland). Also enable as
the system linker now (/usr/bin/ld) for further testing and evaluation.
(This change will be reverted in case of unexpected fallout.)
Approved by: manu
Sponsored by: The FreeBSD Foundation
Brad Davis [Wed, 15 Aug 2018 23:18:34 +0000 (23:18 +0000)]
Revert parts of r337849 and r337857
This fixes the build and I will redo these changes as part of a future review
that organizes them differently. The way I tried to do it here could be done
better. Sorry for the noise.
Approved by: will (mentor)
Differential Revision: https://reviews.freebsd.org/D16737
Toomas Soome [Wed, 15 Aug 2018 21:21:16 +0000 (21:21 +0000)]
libi386: remove BD_SUPPORT_FRAGS
BD_SUPPORT_FRAGS is preprocessor knob to allow partial reads in bioscd/biosdisk
level. However, we already have support for partial reads in bcache, and there
is no need to have duplication via preprocessor controls.
Note that bioscd/biosdisk interface is assumed to perform IO in 512B blocks,
so the only translation we have to do is 512 <-> native block size.
We were doing count_block() twice inside this function, once
unconditionally at the beginning (intended to catch the embedded block
case) and once near the end after processing the block.
The double-accounting caused the "zpool scrub" progress statistics in
"zpool status" to climb from 0% to 200% instead of 0% to 100%, and
showed double the I/O rate it was actually seeing.
This was apparently a regression introduced in commit 00c405b4b5e8,
which was an incorrect port of this OpenZFS commit:
Warner Losh [Wed, 15 Aug 2018 20:31:11 +0000 (20:31 +0000)]
stand: Use -Oz/-Os for all loader/stand builds.
While we're not super size constrained, the x86 BIOS /boot/loader has
to be less than about 520k-530k to be reliable. The LUA loader is at
this size today. -Oz saves 15-20% on the size, keeping us safely small
enough (comparable to where we were with the 4th loader). This will
also help with sjg's work on bringing in bearssl, though we may again
be looking for space in the LUA loader.
Matt Macy [Wed, 15 Aug 2018 20:23:08 +0000 (20:23 +0000)]
Fix in6_multi double free
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
- add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
goes to zero OR when the interface goes away
- decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
- add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
door wider still to races
- call inp_gcmoptions synchronously after dropping the the inpcb lock
Fundamentally multicast needs a rewrite - but keep applying band-aids for now.
Kyle Evans [Wed, 15 Aug 2018 19:46:13 +0000 (19:46 +0000)]
dd: Incorporate some changes from imp for status=progress
Notable changes from what landed in r337505:
- sigalarm handler isn't setup unless we're actually using it
- Humanized versions of the amount of data transferred in the progress
update
Andrew Turner [Wed, 15 Aug 2018 13:40:16 +0000 (13:40 +0000)]
Start to remove XScale support from the ARMv4/v5 pmap. Support for XScale
has been removed from the kernel so we can remove it from here to help
simplify the code.