Make jemalloc(3) default to retain:true on 64-bit platforms,
like it already does on Linux and OSX. This results in significantly
fewer calls to mmap(2). This should result in a small reduction
in system CPU time and improved superpage usage.
Rebecca Cran [Tue, 31 Mar 2020 02:36:39 +0000 (02:36 +0000)]
Bhyve: fix SMBIOS Type 17 table generation
According to the SMBIOS specification (revision 2.7 or newer), the
extended module size field should only be used for sizes that can't
fit in the older size field.
Mark Johnston [Tue, 31 Mar 2020 02:25:53 +0000 (02:25 +0000)]
Use a dedicated taskqueue thread for in6m_release_task().
Interfaces may be detached from a taskqueue_thread task, for example by
prison_complete(), so after r359438, when draining the queue we may end
up deadlocking.
Andrew Gallatin [Mon, 30 Mar 2020 23:29:53 +0000 (23:29 +0000)]
KTLS: Coalesce adjacent TLS trailers & headers to improve PCIe bus efficiency
KTLS uses the embedded header and trailer fields of unmapped
mbufs. This can lead to "silly" buffer lengths, where we have an
mbuf chain that will create a scatter/gather lists with a
regular pattern of 13 bytes followed by 16 bytes between each
adjacent TLS record.
For software ktls we typically wind up with a pattern where we
have several TLS records encrypted, and made ready at once. When
these records are made ready, we can coalesce these silly buffers
in sbready_compress by copying 13b TLS header of the next record
into the 16b TLS trailer of the current record. After doing so,
we now have a small 29 byte chunk between each TLS record.
This marginally increases PCIe bus efficiency. We've seen an
almost 1Gb/s increase in peak throughput on Broadwell based Xeons
running a 100% software TLS workload with Mellanox ConnectX-4
NICs.
Note that this change is ifdef'ed for KTLS, as KTLS is currently
the only user of the hdr/trailer feature of unmapped mbufs, and
peeking into them is expensive, since the ext_pgs struct lives in
separately allocated memory, and may be cold in cache.
This optimization is not applicable to HW ("NIC") TLS, as that
depends on having the entire TLS record described by a single
unmapped mbuf, so we cannot shift parts of the record between
mbufs for HW TLS.
kern_sendfile.c: fix bugs with handling of busy page states.
- Do not call into a vnode pager while leaving some pages from the
same block as the current run, xbusy. This immediately deadlocks if
pager needs to instantiate the buffer.
- Only relookup bogus pages after io finished, otherwise we might
obliterate the valid pages by out of date disk content. While there,
expand the comment explaining this pecularity.
- Do not double-unbusy on error. Split unbusy for error case, which
is left in the sendfile_swapin(), from the more properly coded
normal case in sendfile_iodone().
- Add an XXXKIB comment explaining the serious bug in the validation
algorithm, not fixed by this patch series.
John Baldwin [Mon, 30 Mar 2020 21:44:00 +0000 (21:44 +0000)]
Document EINTEGRITY errors for many system calls.
EINTEGRITY was previously documented as a UFS-specific error for
mount(2). This documents EINTEGRITY as a filesystem-independent error
that may be reported by the backing store of a filesystem.
While here, document EIO as a filesystem-independent error for both
mount(2) and posix_fadvise(2). EIO was previously only documented for
UFS for mount(2).
Ed Maste [Mon, 30 Mar 2020 20:20:15 +0000 (20:20 +0000)]
add shell script for stale dependency hack
It's rather awkward to debug issues with the dependency cleanup hacks
when implemented via make. Add a cleanup shell script and move the
libomp hack there as an initial example.
Reviewed by: brooks
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24228
Toomas Soome [Mon, 30 Mar 2020 20:07:25 +0000 (20:07 +0000)]
gallant: pound sign (0xa3) is a bit broken. Add extra glyphs.
I did add some more glyphs to provide box drawing and set of additional
glyphs to provide better support for some code pages. Still does not quite
match terminus.
Ed Maste [Mon, 30 Mar 2020 20:05:09 +0000 (20:05 +0000)]
drop GDB_LIBEXEC option (now always true)
In-tree gdb is essentially obsolete. We kept it for sparc64 (because
gdb in ports lacked sparc64 support) and as a fallback for crashinfo.
gdb was installed to /libexec on all archs other than sparc64, where the
WITHOUT_GDB_LIBEXEC option was default, with gdb installed to /usr/bin.
With sparc64's retirement WITH_GDB_LIBEXEC became the default for all
architectures, but it was still possible to set it off and install gdb
into /usr/bin.
As the next step in gdb's retirement, remove the option and install gdb
only into /libexec as the crashinfo fallback. We expect users to install
the gdb port or package for debugging. The in-tree gdb lacks support for
a number of supported architectures and does not support contemporary
DWARF debug info.
Reviewed by: jhb (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24227
Add support for multiple playback and recording devices per physical USB audio
device. This requires some structural refactoring inside the driver, mostly
about converting existing audio channel structures into arrays.
The main audio mixer is provided by the first PCM instance.
The non-first audio instances may only have a software mixer for PCM playback.
Evaluate modifier keys before the regular keys, so that if a modifier
key is pressed at the same time as a regular key, that means key with
modifier is output. Some automated USB keyboards like Yubikeys need this.
This fixes a regression issue after r357861.
Reported by: Adam McDougall <mcdouga9@egr.msu.edu>
PR: 224592
PR: 233884
MFC after: 3 days
Sponsored by: Mellanox Technologies
Mark Johnston [Mon, 30 Mar 2020 14:22:52 +0000 (14:22 +0000)]
Simplify taskqgroup inititialization.
taskqgroup initialization was broken into two steps:
1. allocate the taskqgroup structure, at SI_SUB_TASKQ;
2. initialize taskqueues, start taskqueue threads, enqueue "binder"
tasks to bind threads to specific CPUs, at SI_SUB_SMP.
Step 2 tries to handle the case where tasks have already been attached
to a queue, by migrating them to their intended queue. In particular,
tasks can't be enqueued before step 2 has completed. This breaks NFS
mountroot on systems using an iflib-based driver when EARLY_AP_STARTUP
is not defined, since mountroot happens before SI_SUB_SMP in this case.
Simplify initialization: do all initialization except for CPU binding at
SI_SUB_TASKQ. This means that until CPU binding is completed, group
tasks may be executed on a CPU other than that to which they were bound,
but this should not be a problem for existing users of the taskqgroup
KPIs.
Reported by: sbruno
Tested by: bdragon, sbruno
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24188
Kyle Evans [Mon, 30 Mar 2020 03:26:52 +0000 (03:26 +0000)]
cron: respect PATH from login.conf
As a followup to the use of login.conf environment vars (other than PATH) in
cron, this patch adds PATH (and HOME) to the list of login.conf settings
respected.
The new logic is as follows:
1. SHELL is always _PATH_BSHELL unless explicitly overridden in the crontab
file itself; no other settings are respected. This is unchanged.
2. PATH is taken from the first of: crontab file, login.conf, _PATH_DEFPATH
3. HOME is taken from the first of: crontab file, login.conf, passwd entry,
unset
4. The current directory for invoking the command is taken from the crontab
file's value of HOME (existing behavior), or the passwd entry, but not
anywhere else (so it might not equal HOME if that was set in login.conf).
Kyle Evans [Sun, 29 Mar 2020 23:59:14 +0000 (23:59 +0000)]
gdb: compile with -fcommon explicitly
As described in the comment, gdb relies on some of the linker magic that
happens with -fcommon. I suspect the life expectancy of gdb-in-base is low
enough that this isn't worth spending much time addressing, especially given
the vintage. Hit it with the -fcommon hammer so that it continues to just
work.
evdev: Add COMPAT_FREEBSD32 support for amd64 arch
Incompatibility between i386 and amd64 evdev ABIs was caused by presence of
'struct timeval' in evdev protocol. Replace it with 'struct timeval32' for
32 bit binaries.
Big-endian platforms may require additional work due to bitstr_t (array of
unsigned longs) usage in ioctl interface.
This is currently staged in vendor/ as part of the 8.0p1 import, which isn't
quite ready to land. Given that this is a simple one-line fix, apply it now
as the fallout will be pretty minimal.
-fno-common will become the default in GCC10/LLVM11.
Michael Tuexen [Sun, 29 Mar 2020 15:43:00 +0000 (15:43 +0000)]
Be a bit more precisly in the description of the sysctl variable
net.inet.tcp.pmtud_blackhole_detection. Also remove three entries,
which are not sysctl variables but statistic counters for TCP.
Thanks to 0mp@ for suggesting an improvement.
Reviewed by: bcr@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D24216
Switch rtsock tests to per-test jails and epair interfaces.
Many rtsock tests verify the ordering of the kernel messages for the
particular event. In order to avoid flaky tests due to the other tests
running, switch all tests to use personal vnet-enabled jails.
This removes all clashes on the IP addresses and brings back the ability
to run these tests simultaneously.
Toomas Soome [Sat, 28 Mar 2020 22:37:50 +0000 (22:37 +0000)]
loader.efi: restore the init and fix the color setup
The efi console init is avoided since conin setup was moved to probe.
In case the console is re-initialized, we need to pick up colors
from environment.
Toomas Soome [Sat, 28 Mar 2020 21:47:44 +0000 (21:47 +0000)]
loader: add knob to build with user malloc
This option is intended to aid development, to allow building with user malloc.
The use case would be to build userboot & test with libc (or other) malloc and
use extra malloc debug features.
Michael Tuexen [Sat, 28 Mar 2020 20:25:45 +0000 (20:25 +0000)]
Handle integer overflows correctly when converting msecs and secs to
ticks and vice versa.
These issues were caught by recently added panic() calls on INVARIANTS
systems.
Kyle Evans [Sat, 28 Mar 2020 19:43:45 +0000 (19:43 +0000)]
Re-apply r359399: telnet -fno-common fix
line and auth_level's redefinitions are just extraneous
telnetd will #define extern and then include ext.h to allocate storage for
all of these extern'd vars; however, two of them are actually defined in
libtelnet instead. Instead of doing an #ifdef extern dance around those
function pointers, just add an EXTERN macro to make it easier to
differentiate by sight which ones will get allocated in globals.c and which
ones are defined elsewhere.
hdaa: remove verbosity from the normal driver operations.
If hdaa is used in polling mode, it logs each change to the poll
interval under bootverbose, which makes it unusable (slow). These
messages are arguably useless or are a debugging leftovers at best.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Kyle Evans [Sat, 28 Mar 2020 03:58:57 +0000 (03:58 +0000)]
gas: mark dwarf2_loc_mark_labels as extern
Compiling with -fno-common complains as this header's included in multiple
compilation units. In fact, the proper definition of dwarf2_loc_mark_labels
already exists in dwarf2dbg.c, so simply mark this declaration with extern.
Enji Cooper [Sat, 28 Mar 2020 01:08:20 +0000 (01:08 +0000)]
Check in the generated copies of the manpages
These manpages were meant to be templated once per `configure` run.
Given that we're not bound by as many constants, e.g., `--prefix` isn't
generally changing for kyua in the base system, having to generate the
manpages each build seems slightly less than optimal.
In the event that one's build environment doesn't define `$SH`, the build
will also fail until this change is introduced.
Instead of jumping through hoops dealing with shells or permissions, let's
just cut to the chase and check the generated copies into the sourcebase
under usr.bin/kyua .
Brooks Davis [Fri, 27 Mar 2020 23:27:54 +0000 (23:27 +0000)]
Use the real value of MK_TESTS_SUPPORT in _libraries.
We need to build kyua libraries for kyua. Because we set MK_TESTS=no,
we can't not set MK_TESTS_SUPPORT=${MK_TESTS_SUPPORT} because the latter
defaults MK_TESTS_SUPPORT to no.
This fixes WITHOUT_TESTS + WITH_TESTS_SUPPORT builds.
John Baldwin [Fri, 27 Mar 2020 18:25:23 +0000 (18:25 +0000)]
Refactor driver and consumer interfaces for OCF (in-kernel crypto).
- The linked list of cryptoini structures used in session
initialization is replaced with a new flat structure: struct
crypto_session_params. This session includes a new mode to define
how the other fields should be interpreted. Available modes
include:
- COMPRESS (for compression/decompression)
- CIPHER (for simply encryption/decryption)
- DIGEST (computing and verifying digests)
- AEAD (combined auth and encryption such as AES-GCM and AES-CCM)
- ETA (combined auth and encryption using encrypt-then-authenticate)
Additional modes could be added in the future (e.g. if we wanted to
support TLS MtE for AES-CBC in the kernel we could add a new mode
for that. TLS modes might also affect how AAD is interpreted, etc.)
The flat structure also includes the key lengths and algorithms as
before. However, code doesn't have to walk the linked list and
switch on the algorithm to determine which key is the auth key vs
encryption key. The 'csp_auth_*' fields are always used for auth
keys and settings and 'csp_cipher_*' for cipher. (Compression
algorithms are stored in csp_cipher_alg.)
- Drivers no longer register a list of supported algorithms. This
doesn't quite work when you factor in modes (e.g. a driver might
support both AES-CBC and SHA2-256-HMAC separately but not combined
for ETA). Instead, a new 'crypto_probesession' method has been
added to the kobj interface for symmteric crypto drivers. This
method returns a negative value on success (similar to how
device_probe works) and the crypto framework uses this value to pick
the "best" driver. There are three constants for hardware
(e.g. ccr), accelerated software (e.g. aesni), and plain software
(cryptosoft) that give preference in that order. One effect of this
is that if you request only hardware when creating a new session,
you will no longer get a session using accelerated software.
Another effect is that the default setting to disallow software
crypto via /dev/crypto now disables accelerated software.
Once a driver is chosen, 'crypto_newsession' is invoked as before.
- Crypto operations are now solely described by the flat 'cryptop'
structure. The linked list of descriptors has been removed.
A separate enum has been added to describe the type of data buffer
in use instead of using CRYPTO_F_* flags to make it easier to add
more types in the future if needed (e.g. wired userspace buffers for
zero-copy). It will also make it easier to re-introduce separate
input and output buffers (in-kernel TLS would benefit from this).
Try to make the flags related to IV handling less insane:
- CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv'
member of the operation structure. If this flag is not set, the
IV is stored in the data buffer at the 'crp_iv_start' offset.
- CRYPTO_F_IV_GENERATE means that a random IV should be generated
and stored into the data buffer. This cannot be used with
CRYPTO_F_IV_SEPARATE.
If a consumer wants to deal with explicit vs implicit IVs, etc. it
can always generate the IV however it needs and store partial IVs in
the buffer and the full IV/nonce in crp_iv and set
CRYPTO_F_IV_SEPARATE.
The layout of the buffer is now described via fields in cryptop.
crp_aad_start and crp_aad_length define the boundaries of any AAD.
Previously with GCM and CCM you defined an auth crd with this range,
but for ETA your auth crd had to span both the AAD and plaintext
(and they had to be adjacent).
crp_payload_start and crp_payload_length define the boundaries of
the plaintext/ciphertext. Modes that only do a single operation
(COMPRESS, CIPHER, DIGEST) should only use this region and leave the
AAD region empty.
If a digest is present (or should be generated), it's starting
location is marked by crp_digest_start.
Instead of using the CRD_F_ENCRYPT flag to determine the direction
of the operation, cryptop now includes an 'op' field defining the
operation to perform. For digests I've added a new VERIFY digest
mode which assumes a digest is present in the input and fails the
request with EBADMSG if it doesn't match the internally-computed
digest. GCM and CCM already assumed this, and the new AEAD mode
requires this for decryption. The new ETA mode now also requires
this for decryption, so IPsec and GELI no longer do their own
authentication verification. Simple DIGEST operations can also do
this, though there are no in-tree consumers.
To eventually support some refcounting to close races, the session
cookie is now passed to crypto_getop() and clients should no longer
set crp_sesssion directly.
- Assymteric crypto operation structures should be allocated via
crypto_getkreq() and freed via crypto_freekreq(). This permits the
crypto layer to track open asym requests and close races with a
driver trying to unregister while asym requests are in flight.
- crypto_copyback, crypto_copydata, crypto_apply, and
crypto_contiguous_subsegment now accept the 'crp' object as the
first parameter instead of individual members. This makes it easier
to deal with different buffer types in the future as well as
separate input and output buffers. It's also simpler for driver
writers to use.
- bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer.
This understands the various types of buffers so that drivers that
use DMA do not have to be aware of different buffer types.
- Helper routines now exist to build an auth context for HMAC IPAD
and OPAD. This reduces some duplicated work among drivers.
- Key buffers are now treated as const throughout the framework and in
device drivers. However, session key buffers provided when a session
is created are expected to remain alive for the duration of the
session.
- GCM and CCM sessions now only specify a cipher algorithm and a cipher
key. The redundant auth information is not needed or used.
- For cryptosoft, split up the code a bit such that the 'process'
callback now invokes a function pointer in the session. This
function pointer is set based on the mode (in effect) though it
simplifies a few edge cases that would otherwise be in the switch in
'process'.
It does split up GCM vs CCM which I think is more readable even if there
is some duplication.
- I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC
as an auth algorithm and updated cryptocheck to work with it.
- Combined cipher and auth sessions via /dev/crypto now always use ETA
mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored.
This was actually documented as being true in crypto(4) before, but
the code had not implemented this before I added the CIPHER_FIRST
flag.
- I have not yet updated /dev/crypto to be aware of explicit modes for
sessions. I will probably do that at some point in the future as well
as teach it about IV/nonce and tag lengths for AEAD so we can support
all of the NIST KAT tests for GCM and CCM.
- I've split up the exising crypto.9 manpage into several pages
of which many are written from scratch.
- I have converted all drivers and consumers in the tree and verified
that they compile, but I have not tested all of them. I have tested
the following drivers:
Chuck Tuffli [Fri, 27 Mar 2020 15:28:27 +0000 (15:28 +0000)]
bhyve: fix NVMe emulation update of SQHD
The SQHD field of a Completion Queue entry indicates the current
Submission Queue head pointer value. The head pointer represents the
next entry to be consumed and is updated after consuming the current
entry.
In the Admin queue processing, the current code updates the head pointer
after reporting the value to the host via the SQHD. This gives the
impression that the Controller is perpetually one command behind in its
processing of the Admin SQ. And while this doesn't appear to bother some
initiators, it is wrong.
Fix is to update the SQ head pointer prior to writing the SQHD value in
the completion.
While here, fix missed update of dword 0 (cdw0) in the completion
message.