Make linprocfs(5) create /proc/bus/pci/devices/, and linsysfs(5)
create /sys/class/power_supply/. This silences some warnings
from biology/linux-foldingathome.
Reported by: 0mp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25557
Rick Macklem [Sat, 4 Jul 2020 03:28:13 +0000 (03:28 +0000)]
Add support for ext_pgs mbufs to nfscl_reqstart() and nfsm_set().
This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.
Since ND_EXTPG is never set yet, there is no semantic change at this time.
Add domain policy allocation for amd64 fpu_kern_ctx
Like other types of allocation, fpu_kern_ctx are frequently allocated per-cpu.
Provide the API and sketch some example consumers.
fpu_kern_alloc_ctx_domain() preferentially allocates memory from the
provided domain, and falls back to other domains if that one is empty
(DOMAINSET_PREF(domain) policy).
Maybe it makes more sense to just shove one of these in the DPCPU area
sooner or later -- left for future work.
They trigger for some people, the bug is not obvious, there are no takers
for fixing it, the issue already had to be there for years beforehand and
is low priority.
cxgbe(4): changes in the Tx path to help increase tx coalescing.
- Ask the firmware for the number of frames that can be stuffed in one
work request.
- Modify mp_ring to increase the likelihood of tx coalescing when there
are just one or two threads that are doing most of the tx. Add teeth
to the abdication mechanism by pushing the consumer lock into mp_ring.
This reduces the likelihood that a consumer will get stuck with all
the work even though it is above its budget.
- Add support for coalesced tx WR to the VF driver. This, with the
changes above, results in a 7x improvement in the tx pps of the VF
driver for some common cases. The firmware vets the L2 headers
submitted by the VF driver and it's a big win if the checks are
performed for a batch of packets and not each one individually.
Rick Macklem [Fri, 3 Jul 2020 01:19:29 +0000 (01:19 +0000)]
Add support for ext_pgs mbufs to nfsm_build().
This is the first of a series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.
Since ND_EXTPG is never set yet, there is no semantic change at this time.
Alan Somers [Thu, 2 Jul 2020 13:17:31 +0000 (13:17 +0000)]
Fix page fault in zfsctl_snapdir_getattr
Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I
can't reproduce this panic on demand, but this looks like the correct
solution.
The top 10 bits of a pte are reserved by specification[1] and are not part of
the PPN.
[1] 'Volume II: RISC-V Privileged Architectures V20190608-Priv-MSU-Ratified',
'4.4.1 Addressing and Memory Protection', page 72: "The PTE format for Sv39 is
shown in Figure 4.18. ... Bits 63–54 are reserved for future use and must be
zeroed by software for forward compatibility."
Andrew Turner [Wed, 1 Jul 2020 16:17:51 +0000 (16:17 +0000)]
Move ID reading signatures to a better header
The functions to read the common user and kernel ID registers should be
in cpu.h rather than undefined.h as they are related to CPU details and
used by undefined instruction handlers.
Update with the members of the 11th core team, core.xi
- Update the core-secretary role.
- Update the comment to mention that the sorting is done based on FreeBSD
login name
Andrew Turner [Wed, 1 Jul 2020 15:17:45 +0000 (15:17 +0000)]
Read the arm64 ID registers earlier in the boot process.
Also move parsing the registers to just after the secondary CPUs have
started. This means the kernel register view from all CPUs is available
after the CPU SYSINITs have finished, e.g. for use by ifunc resolvers.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25505
Andrew Turner [Wed, 1 Jul 2020 12:07:28 +0000 (12:07 +0000)]
Simplify the flow when getting/setting an isrc
Rather than unlocking and returning we can just perform the needed action
only when the interrupt source is valid and reuse the unlock in both the
valid irq and invalid irq cases.
Rework linux accept(2). This makes the code flow easier to follow,
and fixes a bug where calling accept(2) could result in closing fd 0.
Note that the code still contains a number of problems: it makes
assumptions about l_sockaddr_in being the same as sockaddr_in,
the EFAULT-related code looks like it doesn't work at all, and the
socket type check is racy. Those will be addressed later on;
I'm trying to work in small steps to avoid breaking one thing while
fixing another.
It fixes Redis, among other things.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25461
The "pid" field in the LinuxKPI task struct is typically set to the thread ID
and not the process ID. Make sure the linux_task_exiting() function uses tdfind()
to lookup the BSD procedure structure pointer by the "pid" field, and only
fallback to pfind() when no match is found! This makes linux_task_exiting()
in line with the rest of the code.
Ryan Moeller [Wed, 1 Jul 2020 02:32:41 +0000 (02:32 +0000)]
libifconfig: Add function to get bridge status
The new function operates similarly to ifconfig_lagg_get_lagg_status and
likewise is accompanied by a function to free the bridge status data structure.
I have included in this patch the relocation of some strings describing STP
parameters and the PV2ID macro from ifconfig into net/if_bridgevar.h as they
are useful for consumers of libifconfig.
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.
Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234." However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.
Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0. This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered. That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.
(This change does not add support for the 'bootcode' verb to EBR.)
PR: 232463
Reported by: Manish Jain <bourne.identity AT hotmail.com>
Discussed with: ae (no objection)
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D24939
o cond.c: do not eval unnecessary terms of conditionals.
o meta.c: report error if lseek in filemon_read fails
o str.c: performance improvement for Str_Match for multiple '*'
o dieQuietly: supress the failure output from make
when failing node is a sub-make or a sibling failed.
This cuts down greatly on unhelpful noise at the end of
build log. Disabled by -dj or .MAKE.DIE_QUIETLY=no
SSLv3 has been deprecated since 2015 (and broken since 2014: "POODLE"); it
should not have shipped in FreeBSD 11 (2016) or 12 (2018). No one should use
it, and if they must, they can use some implementation outside of base.
There are three symbols removed with OPENSSL_NO_SSL3_METHOD:
These symbols exist to request an explicit SSLv3 connection to a server.
There is no good reason for an application to link or invoke these symbols
instead of TLS_method(), et al (née SSLv23_method, et al). Applications
that do so have broken cryptography.
Define these symbols for some pedantic definition of ABI stability, but
remove the functionality again (r361392) after r362620.
- Add CCM driver and clocks implementations for i.MX 8M
- Add GPC driver for iMX8
- Add clock tree for i.MX 8M Quad
- Add clocks support and new compat strings (where required) for existing i.MX 6 UART, I2C, and GPIO drivers
- Enable aarch64-compatible drivers form i.MX 6 in arm64 GENERIC kernel config
- Add dtb/imx8 kernel module with DTBs for Nitrogen8M and iMX8MQ EVK
With this patch both Nitrogen8M and iMX8MQ EVK boot with NFS root up to multiuser login prompt
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D25274
Adrian Chadd [Wed, 1 Jul 2020 00:23:49 +0000 (00:23 +0000)]
[net80211] Migrate HT/legacy protection mode and preamble calculation to per-VAP flags
The later firmware devices (including iwn!) support multiple configuration
contexts for a lot of things, leaving it up to the firmware to decide
which channel and vap is active. This allows for things like off-channel
p2p sta/ap operation and other weird things.
However, net80211 is still focused on a "net80211 drives all" when it comes to driving
the NIC, and as part of this history a lot of these options are global and not per-VAP.
This is fine when net80211 drives things and all VAPs share a single channel - these
parameters importantly really reflect the state of the channel! - but it will increasingly
be not fine when we start supporting more weird configurations and more recent NICs.
Yeah, recent like iwn/iwm.
Anyway - so, migrate all of the HT protection, legacy protection and preamble
stuff to be per-VAP. The global flags are still there; they're now calculated
in a deferred taskqueue that mirrors the old behaviour. Firmware based drivers
which have per-VAP configuration of these parameters can now just listen to the
per-VAP options.
What do I mean by per-channel? Well, the above configuration parameters really
are about interoperation with other devices on the same channel. Eg, HT protection
mode will flip to legacy/mixed if it hears ANY BSS that supports non-HT stations or
indicates it has non-HT stations associated. So, these flags really should be
per-channel rather than per-VAP, and then for things like "do i need short preamble
or long preamble?" turn into a "do I need it for this current operating channel".
Then any VAP using it can query the channel that it's on, reflecting the real
required state.
This patch does none of the above paragraph just yet.
I'm also cheating a bit - I'm currently not using separate taskqueues for
the beacon updates and the per-VAP configuration updates. I can always further
split it later if I need to but I didn't think it was SUPER important here.
So:
* Create vap taskqueue entries for ERP/protection, HT protection and short/long
preamble;
* Migrate the HT station count, short/long slot station count, etc - into per-VAP
variables rather than global;
* Fix a bug with my WME work from a while ago which made it per-VAP - do the WME
beacon update /after/ the WME update taskqueue runs, not before;
* Any time the HT protmode configuration changes or the ERP protection mode
config changes - schedule the task, which will call the driver without the
net80211 lock held and all correctly serialised;
* Use the global flags for beacon IEs and VAP flags for probe responses and
other IE situations.
The primary consumer of this is ath10k. iwn could use it when sending RXON,
but we don't support IBSS or AP modes on it yet, and I'm not yet sure whether
it's required in STA mode (ie whether the firmware parses beacons to change
protection mode or whether we need to.)
Tested:
* AR9280, STA/AP
* AR9380, DWDS STA+STA/AP
* ath10k work, STA/AP
* Intel 6235, STA
* Various rtwn / run NICs, DWDS STA and STA configurations
Toomas Soome [Tue, 30 Jun 2020 21:48:58 +0000 (21:48 +0000)]
boot1.efi: use malloc family from libsa
The zfs reader development did reach to the point where linking boot1,
we will get errors about duplicate symbols Malloc, Free, Calloc.
We can just use libsa version, just as loader.efi does. The only concern is,
libsa zalloc is using fixed size heap region, I did pick 64MB as other
stage instances are using, but this size is likely not optimal. In any case,
with limited memory setups, we should boot loader.efi directly.
Make linprocfs(5) create the /proc/<PID>/task/ directores.
This is to silence down some Chromium assertions.
PR: kern/240991
Analyzed by: Alex S <iwtcex@gmail.com>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25256
Andrew Turner [Tue, 30 Jun 2020 15:58:29 +0000 (15:58 +0000)]
Add dwc_otg_acpi
Create an acpi attachment for the DWC USB OTG device. This is present in
the Raspberry Pi 4 in the USB-C port normally used to power the board. Some
firmware presents the kernel with ACPI tables rather than FDT so we need
an ACPI attachment.
Submitted by: Greg V <greg_unrelenting.technology>
Approved by: hselasky (removal of All rights reserved)
Differential Revision: https://reviews.freebsd.org/D25203
Mark Johnston [Tue, 30 Jun 2020 15:56:54 +0000 (15:56 +0000)]
Remove CRYPTO_TIMING.
It was added a very long time ago. It is single-threaded, so only
really useful for basic measurements, and in the meantime we've gotten
some more sophisticated profiling tools.
Alan Somers [Mon, 29 Jun 2020 22:12:23 +0000 (22:12 +0000)]
savecore: accept device names without the /dev/ prefix
dumpon has accepted device names without the prefix ever since r291207.
Since dumpon and savecore are always paired, they ought to accept the same
arguments. Prior to this change, specifying 'dumpdev="da3"' in
/etc/rc.conf, for example, would result in dumpon working just fine but
savecore complaining that "Dump device does not exist".
Mitchell Horne [Mon, 29 Jun 2020 19:30:35 +0000 (19:30 +0000)]
Fix printf(3) output of long doubles on RISC-V
When the RISC-V port was initially committed to FreeBSD, GCC would
generate 64-bit long doubles, and the definitions in _fpmath.h reflected
that. This was changed to 128-bit in GCC later that year [1], but the
definitions were never updated, despite the documented workaround. This
causes printf(3) and friends to interpret only the low 64-bits of a long
double in ldtoa, thereby printing incorrect values.
Update the definitions now that both clang and GCC generate 128-bit long
doubles.
Kyle Evans [Mon, 29 Jun 2020 17:47:00 +0000 (17:47 +0000)]
linux: reposition the comment for bsd_to_linux_bits/linux_to_bsd_bits
rpokala notes that splitting the definitions like this is kind of silly,
since the comment applies to both. Move the comment up (or the definition
down, depending on your perspective on life) accordingly.
John Baldwin [Mon, 29 Jun 2020 17:19:08 +0000 (17:19 +0000)]
Stop using STATIC_CFLAGS.
This was added in r293648 to pass -mlong-calls for crt1.o and gcrt1.o.
The use of -mlong-calls was removed in r358851 for LLVM 10.0, leaving
STATIC_CFLAGS empty.
Conrad Meyer [Mon, 29 Jun 2020 16:54:00 +0000 (16:54 +0000)]
vm: Add missing WITNESS warnings for M_WAITOK allocation
vm_map_clip_{end,start} and lookup_clip_start allocate memory M_WAITOK
for !system_map vm_maps. Add WITNESS warning annotation for !system_map
callers who may be holding non-sleepable locks.
* Add examples showing the use of -f, -C, -s, -n
* Rework the two already present examples that were *format* examples
* Remove .Tn suggested by mandoc(1)
* Remove reference to gdb(1) since it is not present in current
Ed Maste [Mon, 29 Jun 2020 13:30:48 +0000 (13:30 +0000)]
Revert r362261, "Re-apply r333944 to unbreak ports"
A file update in 2018 broke many ports as it misidentified shared
libraries as PIE binaries. r333944 reverted part of the change,
restoring ports builds but misidentifying objects in the opposite
direction.
Earlier this month file 5.39 was imported, and then the change
originally from r333944 was recommitted as r362261. However, the
issue was fixed upstream, so r362261 serves no purpose.
PR: 246960, 247461 [exp-run]
Sponsored by: The FreeBSD Foundation
Andrew Turner [Mon, 29 Jun 2020 09:08:36 +0000 (09:08 +0000)]
Create a kernel arm64 ID register view
In preparation for using ifuncs in the kernel is is useful to have a common
view of the arm64 ID registers across all CPUs. Add this and extract the
logic for finding the lower value of two fields to a new helper function.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25463
Kyle Evans [Mon, 29 Jun 2020 03:09:14 +0000 (03:09 +0000)]
linuxolator: implement memfd_create syscall
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.
The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.
Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:
- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
(?)
Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.
Chuck Tuffli [Mon, 29 Jun 2020 00:32:24 +0000 (00:32 +0000)]
bhyve: fix NVMe Active Namespace list
The NVMe specification requires unused entries in the Identify, Active
Namespace ID data to be zero. Fix is bzero the provided page, similar to
what is done for the Namespace Descriptors list.
Chuck Tuffli [Mon, 29 Jun 2020 00:32:21 +0000 (00:32 +0000)]
bhyve: NVMe handle zero length DSM ranges
Dataset Management range specifications may have a zero length (a.k.a.
an empty range definition). Handle the case of all ranges being empty by
completing with Success (DSM commands are advisory only). For
Deallocate, skip empty range definitions when sending TRIM's to the
backing storage.
Chuck Tuffli [Mon, 29 Jun 2020 00:32:18 +0000 (00:32 +0000)]
bhyve: fix NVMe Get Features, Predictable Latency
If the Predictable Latency Mode is not supported, NVMe Controllers must
return Invalid Field in Command status for the Get Features command
with IDs:
- Predictable Latency Mode Config
- Predictable Latency Mode Window
Chuck Tuffli [Mon, 29 Jun 2020 00:32:11 +0000 (00:32 +0000)]
bhyve: add basic NVMe Firmware Commit support
This commit updates the Identify Controller data to advertise the
Controller supports a single firmware slot and that firmware slot 1 is
read-only. Additionally, it returns an "Invalid Firmware Slot" error
when the host issues any Firmware Commit command (a.k.a. Firmware
Activate).
Chuck Tuffli [Mon, 29 Jun 2020 00:32:04 +0000 (00:32 +0000)]
bhyve: validate the NVMe LBA start and count
Add checks that the combination of Starting LBA and Number of Logical
Blocks in a command will not exceed the range of the underlying storage.
Note that because NVMe specifices the Starting LBA as a uint64_t, care
must be taken when converting it and the block count to avoid an integer
overflow.
Chuck Tuffli [Mon, 29 Jun 2020 00:32:01 +0000 (00:32 +0000)]
bhyve: implement NVMe SMART data I/O statistics
SMART data in NVMe includes statistics for number of read and write
commands issued as well as the number of "data units" read and written.
NVMe defines "data unit" as thousands of 512 byte blocks (e.g. 1 data
unit is 1-1,000 512 byte blocks, 3 data units are 2,001-3,000 512 byte
blocks).
This patch implements counters for:
- Data Units Read
- Data Units Written
- Host Read Commands
- Host Write Commands
and exposes the values when the guest reads the SMART/Health Log Page.
Chuck Tuffli [Mon, 29 Jun 2020 00:31:58 +0000 (00:31 +0000)]
bhyve: validate NVMe deallocate range values
For NVMe emulation, validate the Data Set Management LBA ranges do not
exceed the capacity of the backing storage. If they do, return an "LBA
Out of Range" error.
Chuck Tuffli [Mon, 29 Jun 2020 00:31:54 +0000 (00:31 +0000)]
bhyve: base pci_nvme_ioreq size on advertised MDTS
NVMe controllers advertise their Max Data Transfer Size (MDTS) to limit
the number of page descriptors in an I/O request. Take advantage of this
and size the struct pci_nvme_ioreq accordingly.
Ensuring these values match both future-proofs the code and allows
removing some complexity which only exists to handle this possibility.
Chuck Tuffli [Mon, 29 Jun 2020 00:31:51 +0000 (00:31 +0000)]
bhyve: refactor NVMe I/O read/write
Split the NVM I/O function (i.e. nvme_opc_write_read) into separate
functions - one for RAM based backing-store and another for disk based
backing-store for easier maintenance. No functional changes.
Chuck Tuffli [Mon, 29 Jun 2020 00:31:47 +0000 (00:31 +0000)]
bhyve: implement NVMe Format NVM command
The Format NVM command mainly allows the host to specify the block size
and protection information used for the Namespace. As the bhyve
implementation simply maps the capabilities of the backing storage
through to the guest, there isn't anything to implement. But a side
effect of the format is the NVMe Controller shall not return any data
previously written (i.e. erase previously written data). This patch
implements this later behavior to provide a compliant implementation.
Chuck Tuffli [Mon, 29 Jun 2020 00:31:41 +0000 (00:31 +0000)]
bhyve: add more compliant NVMe Get/Set Features
Create a generic Get/Set Features by saving off the contents of CDW11
from the Set command and returning the saved value in the completion of
the Get command. Implementation allows providing optional implementation
for both Set and Get.
Add infrastructure to determine which feature ID's are namespace
specific and flag violations of this category of error.
Also adds the feature specific behavior of Set Features, Number of
Queues to only allow this command once per Controller reset.
Chuck Tuffli [Mon, 29 Jun 2020 00:31:34 +0000 (00:31 +0000)]
bhyve: fix NVMe Get Log Page command
Fix the logic in nvme_opc_get_log_page to calculate the number of DWORDS
(uint32_t) instead of WORDS (uint16_t) for the byte length. And only
return the allowed number of Log Page bytes as determined by the user
request and actual size of the requested log page.