chuck [Mon, 29 Jun 2020 00:32:11 +0000 (00:32 +0000)]
bhyve: add basic NVMe Firmware Commit support
This commit updates the Identify Controller data to advertise the
Controller supports a single firmware slot and that firmware slot 1 is
read-only. Additionally, it returns an "Invalid Firmware Slot" error
when the host issues any Firmware Commit command (a.k.a. Firmware
Activate).
chuck [Mon, 29 Jun 2020 00:32:04 +0000 (00:32 +0000)]
bhyve: validate the NVMe LBA start and count
Add checks that the combination of Starting LBA and Number of Logical
Blocks in a command will not exceed the range of the underlying storage.
Note that because NVMe specifices the Starting LBA as a uint64_t, care
must be taken when converting it and the block count to avoid an integer
overflow.
chuck [Mon, 29 Jun 2020 00:32:01 +0000 (00:32 +0000)]
bhyve: implement NVMe SMART data I/O statistics
SMART data in NVMe includes statistics for number of read and write
commands issued as well as the number of "data units" read and written.
NVMe defines "data unit" as thousands of 512 byte blocks (e.g. 1 data
unit is 1-1,000 512 byte blocks, 3 data units are 2,001-3,000 512 byte
blocks).
This patch implements counters for:
- Data Units Read
- Data Units Written
- Host Read Commands
- Host Write Commands
and exposes the values when the guest reads the SMART/Health Log Page.
chuck [Mon, 29 Jun 2020 00:31:58 +0000 (00:31 +0000)]
bhyve: validate NVMe deallocate range values
For NVMe emulation, validate the Data Set Management LBA ranges do not
exceed the capacity of the backing storage. If they do, return an "LBA
Out of Range" error.
chuck [Mon, 29 Jun 2020 00:31:54 +0000 (00:31 +0000)]
bhyve: base pci_nvme_ioreq size on advertised MDTS
NVMe controllers advertise their Max Data Transfer Size (MDTS) to limit
the number of page descriptors in an I/O request. Take advantage of this
and size the struct pci_nvme_ioreq accordingly.
Ensuring these values match both future-proofs the code and allows
removing some complexity which only exists to handle this possibility.
chuck [Mon, 29 Jun 2020 00:31:51 +0000 (00:31 +0000)]
bhyve: refactor NVMe I/O read/write
Split the NVM I/O function (i.e. nvme_opc_write_read) into separate
functions - one for RAM based backing-store and another for disk based
backing-store for easier maintenance. No functional changes.
chuck [Mon, 29 Jun 2020 00:31:47 +0000 (00:31 +0000)]
bhyve: implement NVMe Format NVM command
The Format NVM command mainly allows the host to specify the block size
and protection information used for the Namespace. As the bhyve
implementation simply maps the capabilities of the backing storage
through to the guest, there isn't anything to implement. But a side
effect of the format is the NVMe Controller shall not return any data
previously written (i.e. erase previously written data). This patch
implements this later behavior to provide a compliant implementation.
chuck [Mon, 29 Jun 2020 00:31:41 +0000 (00:31 +0000)]
bhyve: add more compliant NVMe Get/Set Features
Create a generic Get/Set Features by saving off the contents of CDW11
from the Set command and returning the saved value in the completion of
the Get command. Implementation allows providing optional implementation
for both Set and Get.
Add infrastructure to determine which feature ID's are namespace
specific and flag violations of this category of error.
Also adds the feature specific behavior of Set Features, Number of
Queues to only allow this command once per Controller reset.
chuck [Mon, 29 Jun 2020 00:31:34 +0000 (00:31 +0000)]
bhyve: fix NVMe Get Log Page command
Fix the logic in nvme_opc_get_log_page to calculate the number of DWORDS
(uint32_t) instead of WORDS (uint16_t) for the byte length. And only
return the allowed number of Log Page bytes as determined by the user
request and actual size of the requested log page.
chuck [Mon, 29 Jun 2020 00:31:27 +0000 (00:31 +0000)]
bhyve: Consolidate NVMe CQ update
Consolidate the code which writes Completion Queue entries and updates
the CQ doorbell value. While in the neighborhood, convert the "toggle CQ
phase bit" code to use an XOR operation instead of an "if/else" branch.
chuck [Mon, 29 Jun 2020 00:31:24 +0000 (00:31 +0000)]
bhyve: add locks around NVMe queue accesses
The NVMe code attempted to ensure thread safety through a combination of
using atomics and a "busy" flag. But this approach leads to unavoidable
race conditions.
Fix is to use per-queue mutex locks to ensure thread safety within the
queue processing code. While in the neighborhood, move all the queue
initialization code to a common function.
chuck [Mon, 29 Jun 2020 00:31:17 +0000 (00:31 +0000)]
bhyve: implement NVMe Flush command
This adds support for the NVMe I/O command Flush. For block-based
devices, submit a DIOCGFLUSH to the backing storage. Otherwise, command
is treated like a NOP and completes with a Successful status.
chuck [Mon, 29 Jun 2020 00:31:14 +0000 (00:31 +0000)]
bhyve: refactor NVMe IO command handling
This refactors the NVMe I/O command processing function to make adding
new commands easier. The main change is to move command specific
processing (i.e. Read/Write) to separate functions for each NVMe I/O
command and leave the common per-command processing in the existing
pci_nvme_handle_io_cmd() function.
While here, add checks for some common errors (invalid Namespace ID,
invalid opcode, LBA out of range).
markj [Sun, 28 Jun 2020 21:35:04 +0000 (21:35 +0000)]
Fix UMA's first-touch policy on systems with empty domains.
Suppose a thread is running on a CPU in a NUMA domain with no physical
RAM. When an item is freed to a first-touch zone, it ends up in the
cross-domain bucket. When the bucket is full, it gets placed in another
domain's bucket queue. However, when allocating an item, UMA will
always go to the keg upon a per-CPU cache miss because the empty
domain's bucket queue will always be empty. This means that a non-empty
domain's bucket queues can grow very rapidly on such systems. For
example, it can easily cause mbuf allocation failures when the zone
limit is reached.
Change cache_alloc() to follow a round-robin policy when running on an
empty domain.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25355
jilles [Sun, 28 Jun 2020 21:15:29 +0000 (21:15 +0000)]
sh/tests: Fix flaky execution/bg12.0
When job control is not enabled, the shell ignores SIGINT while waiting for
a foreground process unless that process exits on SIGINT. In this case, the
foreground process is sleep and it does not exit on SIGINT because the
signal is only sent to the shell. Depending on order of events, this could
cause the SIGINT to be unexpectedly ignored.
On lightly loaded bare metal, the chance of this happening tends to be less
than 0.01% but with higher loads and/or virtualization it becomes more
likely.
Starting the sleep in background and using the wait builtin ensures SIGINT
will not be ignored.
gonzo [Sun, 28 Jun 2020 21:11:10 +0000 (21:11 +0000)]
Configure rx_delay/tx_delay values for RK3399/RK3328 GMAC
For 1000Mb mode to work reliably TX/RX delays need to be configured
between the TX/RX clock and the respective signals on the PHY
to compensate for differing trace lengths on the PCB.
dim [Sun, 28 Jun 2020 18:02:12 +0000 (18:02 +0000)]
Remove older llvm-ranlib.1 entry from ObsoleteFiles.inc, as it has
gotten its own manpage now, and should be no longer be removed by "make
delete-old".
andrew [Sun, 28 Jun 2020 15:03:07 +0000 (15:03 +0000)]
Use EFI memory map to determine attributes for Acpi mappings on arm64.
AcpiOsMapMemory is used for device memory when e.g. an _INI method wants
to access physical memory, however, aarch64 pmap_mapbios is hardcoded to
writeback. Search for the correct memory type to use in pmap_mapbios.
Submitted by: Greg V <greg_unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D25201
pstef [Sat, 27 Jun 2020 19:09:33 +0000 (19:09 +0000)]
ps(1): reuse keyword "cpu" to show CPU number
This flag will now show the processor number on which a process is running.
This change was inspired by PR129965. Initially I didn't think that the
patch attached to it was correct -- it sacrificed ki_estcpu use in "cpu"
for ki_lastcpu and I thought that the old functionality should be kept and
the new (cpu#) one added to it. But I've since discovered that ki_estcpu is
sched_4bsd-specific. What's worse, it represents the same thing as
ki_pctcpu, except ki_pctcpu is universal -- so "%cpu" has been using it
successfully. Therefore, I've decided to replace information based on
ki_estcpu with information based on ki_oncpu/ki_lastcpu.
Key parts of the code and manual changes were borrowed from top(1).
se [Sat, 27 Jun 2020 12:02:01 +0000 (12:02 +0000)]
Import new 2-clause BSD licenced implementation of the bc and dc commands
These implementations of the bc and dc programs offer a number of advantages
compared to the current implementations in the FreeBSD base system:
- They do not depend on external large number functions (i.e. no dependency
on OpenSSL or any other large number library)
- They implements all features found in GNU bc/dc (with the exception of
the forking of sub-processes, which the author of this version considers
as a security issue).
- They are significantly faster than the current code in base (more than
2 orders of magnitude in some of my tests, e.g. for 12345^100000).
- They should be fully compatible with all features and the behavior of the
current implementations in FreeBSD (not formally verified).
- They support POSIX message catalogs and come with localized messages in
Chinese, Dutch, English, French, German, Japanese, Polish, Portugueze,
and Russian.
- They offer very detailed man-pages that provide far more information than
the current ones.
The upstream sources contain a large number of tests, which are not
imported with this commit. They could be integrated into our test
framework at a latter time.
Installation of this version is controlled by the option "MK_GH_BC=yes".
This option will be set to yes by default in 13-CURRENT, but will be off
by default in 12-STABLE.
fernape [Sat, 27 Jun 2020 11:28:11 +0000 (11:28 +0000)]
killall(1): Clarify -d, -s and -v options
-d and -v are not equivalent options. The former is more verbose than the
latter and the former does not actually send the signals while the latter does.
Let them have their own paragraphs.
From the point of view of the output, -v is equivalent to -s, so describe them
close to each other. The difference is that former actually sends the signals
and the latter doesn't.
adrian [Sat, 27 Jun 2020 02:59:51 +0000 (02:59 +0000)]
[ath_hal] Add KeyMiss for AR5212/AR5416 series chips.
This is a flag from the MAC that says the received packet didn't match
a keycache slot. This isn't technically a problem as WEP keys don't
match keycache slots (they're "global" keys), but it could be useful
for tracking down CCMP decryption failures.
Right now it's a no-op - it mirrors what the AR9300 HAL does and it
just increments a counter. But, hey, maybe one day I'll use it for
diagnosing keycache/CCMP decrypt issues.
imp [Fri, 26 Jun 2020 22:05:23 +0000 (22:05 +0000)]
Chroot actually appeared in 7th Edition Unix.
Chroot appeared during the development of 7th edition Unix. The FreeBSD jail
documents, incorrectly, that Bill Joy added this to 4.2BSD on 18 March
1982. That was when Bill Joy converted from a statically coded system call glue
to dynamically generated assembler. Chroot was present in 32V, 3BSD, 4.0BSD, 4.1BSD
and 4.1cBSD well in advance of this. Kirk McKusick agrees with this analysis.
mav [Fri, 26 Jun 2020 19:55:11 +0000 (19:55 +0000)]
Add mostly dummy hw.pci.enable_aspm tunable.
The only thing this tunable enables now is reporting to ACPI _OSC that
Active State Power Management and Clock Power Management Capability are
"supported" by the OS.
I've found that at least some Supermicro server boards do not allow OS
to support native PCIe hot-plug unless it reports those capabilities.
After spending significant time in PCIe specs I have found very little
motivation for that, and none of it applies to those motherboards, not
enabling ASPM themselves. So unless OS explicitly wants to save power,
I see nothing for it to do there actually.
I guess it may get sense to support ASPM when we get Thunderbolt support.
Otherwise I have no system with PCIe hot-plug where power saving matters.
It would be nice to enable this by default, but I worry that it affect
power saving of some laptops, even though I haven't noticed that myself.
cy [Fri, 26 Jun 2020 14:18:08 +0000 (14:18 +0000)]
Add MATCH option for CONFIG_MATCH_IFACE.
If the interfaces on which wpa_supplicant is to run are not known or do
not exist, wpa_supplicant can match an interface when it arrives. Each
matched interface is separated with -M argument and the -i argument now
allows for pattern matching.
As an example, the following command would start wpa_supplicant for a
specific wired interface called lan0, any interface starting with wlan
and lastly any other interface. Each match has its own configuration
file, and for the wired interface a specific driver has also been given.
avg [Fri, 26 Jun 2020 09:46:03 +0000 (09:46 +0000)]
sound/hda: fix interrupt handler endless loop after r362294
Not all interrupt sources that affect CIS bit were acknowledged.
Specifically, bits in STATESTS (aka WAKESTS) were left set.
The fix is to disable WAKEEN and clear STATESTS bits before the HDA
interrupt is enabled. This way we should never get any STATESTS bits.
I also added placeholders for all event bits that we currently do not
enable, do not handle and do not clear. This might get useful when / if
we enable any of them.
Reported by: kib (Apollo Lake hardware)
Tested by: kib (earlier, different change)
MFC after: 2 weeks
X-MFC with: r362294
grehan [Fri, 26 Jun 2020 08:20:38 +0000 (08:20 +0000)]
Prevent calling USB backends multiple times.
The TRB processing loop could potentially call a back-end twice
with the same status transaction. While this was generally benign,
some code paths in the tablet backend weren't set up to handle
this case, resulting in a NULL dereference.
Fix by
- returning a STALL error when an invalid request was seen in the backend
- skipping a call to the backend if the number of packets in a status
transaction was zero (this code fragment was taken from the Intel ACRN
xhci backend)
delphij [Fri, 26 Jun 2020 04:46:45 +0000 (04:46 +0000)]
Don't log normal login_getpwclass(3) result.
The logging was introduced in r314527 but doesn't appear to be useful
for regular operation, and as the result, for users with no class set
(very common) the administrator would see a message like this in their
auth.log:
sshd[44251]: user root login class [preauth]
(note that the class was "" because that's what's typically configured
for most users; we would get 'default' if lc->lc_class is chosen)
Remove this log as it can be annoying as the lookup happen before
authentication and repeats, and our code is not acting upon lc_class
or pw_class directly anyways.
rmacklem [Fri, 26 Jun 2020 03:11:54 +0000 (03:11 +0000)]
Add a boolean argument to nfscl_reqstart() to indicate that ext_pgs mbufs
should be used.
For KERN_TLS (and possibly some other future network interface) the mbuf
list passed into sosend() must be ext_pgs mbufs. The krpc could simply
copy all the mbuf data into ext_pgs mbufs before calling sosend(), but
that would be inefficient for large RPC messages.
This patch adds an argument to nfscl_reqstart() to indicate that it should
fill the RPC message into ext_pgs mbufs.
It also adds fields to "struct nfsrv_descript" needed for building NFS RPC
messages in ext_pgs mbufs, along with new flags for this.
Since the argument is always "false", this commit should not result in any
semantic change. However, this commit prepares the code
for future commits that will add support for building of NFS RPC messages
in ext_pgs mbufs.
jhb [Fri, 26 Jun 2020 00:01:31 +0000 (00:01 +0000)]
Reduce contention on per-adapter lock.
- Move temporary sglists into the session structure and protect them
with a per-session lock instead of a per-adapter lock.
- Retire an unused session field, and move a debugging field under
INVARIANTS to avoid using the session lock for completion handling
when INVARIANTS isn't enabled.
- Use counter_u64 for per-adapter statistics.
Note that this helps for cases where multiple sessions are used
(e.g. multiple IPsec SAs or multiple KTLS connections). It does not
help for workloads that use a single session (e.g. a single GELI
volume).
jhb [Thu, 25 Jun 2020 23:57:30 +0000 (23:57 +0000)]
Enter and exit the network epoch for async IPsec callbacks.
When an IPsec packet has been encrypted or decrypted, the next step in
the packet's traversal through the network stack is invoked from a
crypto worker thread, not from the original calling thread. These
threads need to enter the network epoch before passing packets down to
IP output routines or up to transport protocols.
dab [Thu, 25 Jun 2020 21:34:43 +0000 (21:34 +0000)]
Add CAP_EVENT to pidfiles.
CAP_EVENT was omitted on pidfiles (in
pidfile_open()). There seems no reason why a process that creates
and writes a pidfile cannot monitor events on that file. This mod adds
the capability.
markj [Thu, 25 Jun 2020 20:30:30 +0000 (20:30 +0000)]
Implement an approximation of Linux MADV_DONTNEED semantics.
Linux MADV_DONTNEED is not advisory: it has side effects for anonymous
memory, and some system software depends on that. In particular,
MADV_DONTNEED causes anonymous pages to be discarded. If the mapping is
a private mapping of a named object then subsequent faults are to
repopulate the range from that object, otherwise pages will be
zero-filled. For mappings of non-anonymous objects, Linux MADV_DONTNEED
can be implemented in the same way as our MADV_DONTNEED.
This implementation differs from Linux semantics in its handling of
private mappings, inherited through fork(), of non-anonymous objects.
After applying MADV_DONTNEED, subsequent faults will repopulate the
mapping from the parent object rather than the root of the shadow chain.
PR: 230160
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25330
jhb [Thu, 25 Jun 2020 20:17:34 +0000 (20:17 +0000)]
Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().
dim [Thu, 25 Jun 2020 20:04:35 +0000 (20:04 +0000)]
Fix copy/paste mistake in kvm_getswapinfo(3)
It seems this manpage was copied from kvm_getloadavg(3), but the
DIAGNOSTICS section was not updated completely. Update the section with
correct information about a return value of -1.
gordon [Thu, 25 Jun 2020 19:35:37 +0000 (19:35 +0000)]
Revert OPENSSL_NO_SSL3_METHOD to keep ABI compatibility.
This define caused a couple of symbols to disappear. To keep ABI
compatibility, we are going to keep the symbols exposed, but leave SSLv3 as
not in the default config (this is what OPENSSL_NO_SSL3 achieves). The
ramifications of this is an application can still use SSLv3 if it
specifically calls the SSLv3_method family of APIs.