CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

[net80211] Fix interrupted scan logic and ticks comparison

The scan task refactoring stuff circa 2014-2016 broke the blocking task
into a taskqueue with some async bits, but it apparently broke scans
being interrupted by traffic.

Notably - the new "field" SCAN_PAUSE sets both SCAN_INTERRUPT and SCAN_CANCEL,
and a bunch of existing code was checking for SCAN_CANCEL only and breaking
the scan. Unfortunately it was then (a) cancelling the scan entirely and
(b) not notifying userland that scan was done.

So:
* Update the calls to scan_end() to only pass in 1 (saying the scan is complete)
  if SCAN_CANCEL is set WITHOUT SCAN_INTERRUPT. If both are set then yes,
  the scan is interrupted, but it isn't canceled - it's just paused.
* Update the "did the scan flags change whilst the driver was called" logic
  to check for canceled scans, not interrupted scans.
* The "scan done" logic now explicitly checks for either interrupted or
  completed scans. This accounts for the situation where a scan is being
  aborted via traffic but it ALSO happens to have finished (ie the last
  channel was checked.)

This doesn't ENTIRELY fix scanning as the resume function is broken
due to incorrect ticks math. Thus, the second half of this patch
changes the ieee80211_ticks_*() macros to use int instead of long,
matching the logic that the TCP code does with ticks and handles
wrapping / negative ticks values. If cast to long then the wrapping
math wouldn't work right (ie, if ticks was actually negative,
ie, after the system has been up for a while.)

This allows contbgscan() to correctly calculate if a scan should
continue based on ticks and ic->ic_lastdata .

Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D25031

libifconfig: remove redundant NULL check

Submitted by: Puneeth_kumar.Jothaiah@emc.com
Reported by: Coverity
Sponsored by: Dell EMC Isilon

Proper check if divert(4) module is present by the relevant tests

Fix the netinet/netinet6 divert tests falsely reporting 'ipdivert module is
not loaded' when the divert module is built into the kernel

Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D25026

linuxkpi: Add kstrtou16

This function convert a char * to a u16.
Simply use strtoul and cast to compare for ERANGE

Sponsored-by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D24996

linuxkpi: Add rcu_swap_protected

This macros swap an rcu pointer with a normal pointer.
The condition only seems to be used for debug/warning under linux, ignore
for now.

Sponsored-by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D24954

linuxkpi: Add overflow.h

Only add check_add_overflow and check_mul_overflow as those are the only
two needed function by DRM v5.3.
Both gcc and clang have builtin to do this check so use them directly
but throw an error if the compiler/code checker doesn't support this builtin.

Sponsored-by: The FreeBSD Foundation
Reviewed by: hselsasky
Differential Revision: https://reviews.freebsd.org/D25015

ifconfig(8): spell "groupname" consistently with SYNOPSYS.

MFC after: 1 week

Support creating and using arm64 pmap at stage 2

Add minimal support for creating stage 2 IPA -> PA mappings. For this we
need to:

- Create a new vmid set to allocate a vmid for each Virtual Machine
- Add the missing stage 2 attributes
- Use these in pmap_enter to create a new mapping
- Handle stage 2 faults

The vmid set is based on the current asid set that was generalised in
r358328. It adds a function pointer for bhyve to use when the kernel needs
to reset the vmid set. This will need to call into EL2 and invalidate the
TLB.

The stage 2 attributes have been added. To simplify setting these fields
two new functions are added to get the memory type and protection fields.
These are slightly different on stage 1 and stage 2 tables. We then use
them in pmap_enter to set the new level 3 entry to be stored.

The D-cache on all entries is cleaned to the point of coherency. This is
to allow the data to be visible to the VM. To allow for userspace to load
code when creating a new executable entry an invalid entry is created. When
the VM tried to use it the I-cache is invalidated. As the D-cache has
already been cleaned this will ensure the I-cache is synchronised with the
D-cache.

When the hardware implements a VPIPT I-cache we need to either have the
correct VMID set or invalidate it from EL2. As the host kernel will have
the wrong VMID set we need to call into EL2 to clean it. For this a second
function pointer is added that is called when this invalidation is needed.

Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D23875

[ata_da] remove duplicate definition; it trips up ye olde gcc-6 on mips32

Checked first with: irc

Properly sort ifdef archs in vm_fault_soft_fast superpage guards.

Sort broken in r360887.

powerpc/mmu: Convert PowerPC pmap drivers to ifunc from kobj

With IFUNC support in the kernel, we can finally get rid of our poor-man's
ifunc for pmap, utilizing kobj. Since moea64 uses a second tier kobj as
well, for its own private methods, this adds a second pmap install function
(pmap_mmu_init()) to perform pmap 'post-install pre-bootstrap'
initialization, before the IFUNCs get initialized.

Reviewed by: bdragon

[PowerPC] Fix invalid asm in trap code

In this context, 0 actually means 0 (i.e. this is a li instruction).

While most assemblers will ignore this, I did have a compile failure at one
point when using an external toolchain.

In the future, we should use the li syntax to make this clearer.

Sponsored by: Tag1 Consulting, Inc.

ice(4): Introduce new driver for Intel E800 Ethernet controllers

The ice(4) driver is the driver for the Intel E8xx series Ethernet
controllers; currently with codenames Columbiaville and
Columbia Park.

These new controllers support 100G speeds, as well as introducing
more queues, better virtualization support, and more offload
capabilities. Future work will enable virtual functions (like
in ixl(4)) and the other functionality outlined above.

For full functionality, the kernel should be compiled with
"device ice_ddp" like in the amd64 NOTES file, and/or
ice_ddp_load="YES" should be added to /boot/loader.conf so that
the DDP package file included in this commit can be downloaded
to the adapter. Otherwise, the adapter will fall back to a single
queue mode with limited functionality.

A man page for this driver will be forthcoming.

MFC after: 1 month
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21959

x86: Detect new feature bits

Fix an off-by-one in AVX512VPOPCNTDQ identification. That was actually the
TME bit.

Reported by: debdrup

Add version indicators to rtld.

It is wrong to relate on __FreeBSD_version, either from
include/param.h, kernel, or libc, to check for rtld features.
Rtld might be from newer world than the running userspace.

Add special private symbols exported by rtld itself, to indicate the
changes in runtime behavior, and features that cannot be otherwise
detected or deduced at runtime.

Note that the symbols are not exported from libc, so they intentionally
cannot be linked against, and exported from the private namespace from rtld.
Consumers are required to use dlsym(3).  For instance, for
_rtld_version_laddr_offset, user should do
ptr = dlsym(RTLD_DEFAULT, "_rtld_version_laddr_offset")
or even
ptr = dlvsym(RTLD_DEFAULT,  "_rtld_version_laddr_offset",
    "FBSDprivate_1.0");
Non-null ptr means that the change is present.

Also add _rtld_version__FreeBSD_version indicator to report the
headers version used at time of the rtld build.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24982

Properly check kern_sg_entries for S/G list.

ctl_data_print() is called in core context, so does not even know meaning
of ext_sg_entries.

MFC after: 1 week
Sponsored by: iXsystems, Inc.

[PowerPC] Fix atomic_cmpset_masked().

A recent kernel change caused the previously unused atomic_cmpset_masked() to
be used.

It had a typo in it.

Instead of reading the old value from an uninitialized variable, read it
from the passed-in pointer as intended.

This fixes crashes on 64 bit Book-E.

Obtained from: jhibbits

Fix entering KDB with dtrace-enabled kernel.

Reviewed by: markj, jhb
Differential Revision: https://reviews.freebsd.org/D24018

Rename dmar_get_dma_tag() to acpi_iommu_get_dma_tag().
This is needed for a new IOMMU controller support.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D24943

Update ENA driver version to v2.2.0

Driver version upgrade is connected with support for the new device
fetures, like Tx drops reporting or disabling meta caching.

Moreover, the driver configuration from the sysctl was reworked to
provide safer and better flow for configuring:
* number of IO queues (new feature),
* drbr size on Tx,
* Rx queue size.

Moreover, a lot of minor bug fixes and improvements were added.

Copyright date in the license of the modified files in this release was
updated to 2020.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Refactor ena_tx_map_mbuf() function

There is no guarantee from bus_dmamap_load_mbuf_sg() for matching
mbuf chain segments to dma physical segments.

This patch ensure correctly mapping to LLQ header and DMA segments.

Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc.

Fix double-free bug within ena_detach()

There is ena_free_all_io_rings_resources() called twice on device
detach:

ena_detach():

ena_destroy_device():
/* First call */
ena_free_all_io_rings_resources()

/* Second call */
ena_free_all_io_rings_resources()

The double-free causes panic() on kldunload, for example.

As the ena_destroy_device() is also called by ena_reset_task() it is
better to stay unchanged. Thus, remove the "Second call" of the function.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Allow disabling meta caching for ENA Tx path

Determined by a flag passed from the device. No metadata is set within
ena_tx_csum when caching is disabled.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Create ENA IO queues with optional backoff

If requested size of IO queues is not supported try to decrease it until
finding the highest value that can be satisfied.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Add sysctl node for ENA IO queues number adjustment

By default, in ena_attach() the driver attempts to acquire
ena_adapter::max_num_io_queues MSI-X vectors for the purpose of IO
queues, however this is not guaranteed. The number of vectors acquired
depends also on system resources availability.

Regardless of that, enable the number of effectively used IO queues to
be further limited through the sysctl node.

Example: Assumming that there are 8 IO queues configured by default, the
command

$ sysctl dev.ena.0.io_queues_nb=4

will reduce the number of available IO queues to 4. Similarly, the value
can be also increased up to maximum supported value. A value higher than
maximum supported number of IO queues is ignored. Zero is ignored too.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Fix assumptions about number of IO queues in the ENA

Make the ena_adapter::num_io_queues a number of effectively used IO
queues. While the ena_adapter::max_num_io_queues is an upper-bound
specified by the HW, the ena_adapter::num_io_queues may be lower than
that, depending on runtime system resources availability.

On reset, there are called ena_destroy_device() and then
ena_restore_device(). The latter calls, in turn, ena_enable_msix(),
which will attempt to re-acquire ena_adapter::max_num_io_queues of
MSIX vectors again.

Thus, the value of ena_adapter::num_io_queues may be different before
and after reset. For this reason, free the IO rings structures (drbr,
counters) in ena_destroy_device() and allocate again in
ena_restore_device().

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Rework ENA Tx buffer ring size reconfiguration

This method has been aligned with the way how the Rx queue size is being
updated - so it's now done synchronously instead of resetting the
device.

Moreover, the input parameter is now being validated if it's a power of
2. Without this, it can cause kernel panic.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Rework ENA Rx queue size configuration

This patch reworks how the Rx queue size is being reconfigured and how
the information from the device is being processed.

Reconfiguration of the queues and reset of the device in order to make
the changes alive isn't the best approach. It can be done synchronously
and it will let to pass information if the reconfiguration was
successful to the user. It now is done in the ena_update_queue_size()
function.

To avoid reallocation of the ring buffer, statistic counters and the
reinitialization of the mutexes when only new size has to be assigned,
the io queues initialization function has been split into 2 stages:
basic, which is just copying appropriate fields and the advanced, which
allocates and inits more advanced structures for the IO rings.

Moreover, now the max allowed Rx and Tx ring size is being kept
statically in the adapter and the size of the variables holding those
values has been changed to uint32_t everywhere.

Information about IO queues size is now being logged in the up routine
instead of the attach.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Add le_connect command to connect to an LE device.

PR:246664
Submitted by:MarcVeldman

Mark the ENA driver as epoch ready

Recent changes to the epoch requires driver to notify that they knows
epoch in order to prevent input packet function to enter epoch each
time the packet is received.

ENA is using NET_TASK for handling Rx, so it's entering epoch
automatically whenever this task is being executed.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Improve indentation in ena_up() and ena_down()

If the conditional check for ENA_FLAG_DEV_UP is negated, the body of the
function can have smaller indentation and it makes the code cleaner.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Expose argument names for non static ENA driver functions

As functions which are declared in the header files are intended to be
the interface and are going to be used by other files, it's better to
include argument names in the definition, so the caller won't have to
check the .c file in order to check their meaning and order.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Use single global lock in the ENA driver

Currently, the driver had 2 global locks - one was sx lock used for
up/down synchronization and the second one was mutex, which was used
for link configuration and timer service callout.

It is better to have single lock for that. We cannot use mutex, as it
can sleep and cause witness errors in up/down configuration, so sx lock
seems to be the only choice.

Callout cannot use sx lock, but the timer service is MP safe, so we just
need to avoid race between ena_down() and ena_detach(). It can be
avoided by acquiring sx lock.

Simple macros were added that are encapsulating implementation of the
lock and makes the code cleaner.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Add trigger reset function in the ENA driver

As the reset triggering is no longer a simple macro that was just
setting appropriate flag, the new function for triggering reset was
added. It improves code readability a lot, as we are avoiding additional
indentation.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Provide ENA driver version in a sysctl node

Usage example: $ sysctl hw.ena.driver_version

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Remove unused argument from static function in ena.c

The function ena_enable_msix_and_set_admin_interrupts takes two
arguments while the second is not used and so can be spared. This is a
static function, only ena.c is affected.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Enable Tx drops reporting in the ENA driver

Tx drops statistics are fetched from HW every ena_keepalive_wd() call
and are observable using one of the commands:
* sysctl dev.ena.0.hw_stats.tx_drops
* netstat -I ena0 -d

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.

Adjust ENA driver to the new HAL

* Removed adaptive interrupt moderation (not suported on FreeBSD).
* Use ena_com_free_q_entries instead of ena_com_free_desc.
* Don't use ENA_MEM_FREE outside of the ena_com.
* Don't use barriers before calling doorbells as it's already done in
  the HAL.
* Add function that generates random RSS key, common for all driver's
  interfaces.
* Change admin stats sysctls to U64.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.

Fix fallout of r319722 in CTL HA.

ha_lso is a listening socket (unless bind() has failed), so should use
solisten_upcall_set(NULL, NULL), not soupcall_clear().

MFC after: 1 week
Sponsored by: iXsystems, Inc.

Fix AES-CTR compatibility issue in ipsec

r361390 decreased blocksize of AES-CTR from 16 to 1.
Because of that ESP payload is no longer aligned to 16 bytes
before being encrypted and sent.
This is a good change since RFC3686 specifies that the last block
doesn't need to be aligned.
Since FreeBSD before r361390 couldn't decrypt partial blocks encrypted
with AES-CTR we need to enforce 16 byte alignment in order to preserve
compatibility.
Add a sysctl(on by default) to control it.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D24999

Restore XHCI operation on Armada 38x

r347343 split generic xhci driver into three files.
Include generic_xhci_fdt.c when building kernel for Armada SoCs.
This brings back XHCI support on these platforms and also
others, which use GENERIC config.

Submitted by: Kornel Duleba
Obtained from: Semihalf
MFC after: 1 week
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D24944

Do not remove upcall if we haven't yet.

This fixes assertion if we failed to bind listening HA socket.

MFC after: 1 week
Sponsored by: iXsystems, Inc.

xen-locore: fix size in GDT descriptor

There was an off-by-one in the GDT descriptor size field used by the
early Xen boot code. The GDT descriptor size should be the size of the
GDT minus one. No functional change expected as a result of this
change.

Sponsored by: Citrix Systems R&D

Fix build issue after r360292 when using both RSS and KERN_TLS options.

Sponsored by: Mellanox Technologies

Sync with Linux packet pacing enhancements in mlx5en(4).

Linux commit:
05d3ac978ed25b753bfe34fe76c50c31ee506a82

MFC after: 1 week
Sponsored by: Mellanox Technologies

Disable failing test cases in CI:

sys.netipsec.tunnel.aes_cbc_128_hmac_sha1.v4
sys.netipsec.tunnel.aes_cbc_256_hmac_sha2_256.v4
sys.netipsec.tunnel.aesni_aes_cbc_128_hmac_sha1.v4
sys.netipsec.tunnel.aesni_aes_cbc_256_hmac_sha2_256.v4

PR: 246737
Sponsored by: The FreeBSD Foundation

powerpc/booke pmap: Fix iteration for 64-bit kernel page table creation

Kernel page tables actually start at index 4096, given kernel base address
of 0xc008000000000000, not index 0, which would yield 0xc000000000000000.
Fix this by indexing at the real base, instead of the assumed base.

[PowerPC] Ensure ppc32 cpu_switch routines set up Secure-PLT.

This is a correctness fix needed to enable the ifunc conversion of the pmap
in D24993.

Since we are making function calls that may need to go through the PLT, ensure
r30 is set up correctly.

This fixes crashes when booting with D24993 applied.

Reviewed by: jhibbits (in IRC)
Sponsored by: Tag1 Consulting, Inc.

Update cryptocteon(4) and nlmsec(4) for changes in r361481.

This does not add support for separate output buffers but updates the
drivers to cope with the changes.

Pointy hat to: jhb

This commit enables a UFS filesystem to do a forcible unmount when
the underlying media fails or becomes inaccessible. For example
when a USB flash memory card hosting a UFS filesystem is unplugged.

The strategy for handling disk I/O errors when soft updates are
enabled is to stop writing to the disk of the affected file system
but continue to accept I/O requests and report that all future
writes by the file system to that disk actually succeed. Then
initiate an asynchronous forced unmount of the affected file system.

There are two cases for disk I/O errors:

   - ENXIO, which means that this disk is gone and the lower layers
     of the storage stack already guarantee that no future I/O to
     this disk will succeed.

   - EIO (or most other errors), which means that this particular
     I/O request has failed but subsequent I/O requests to this
     disk might still succeed.

For ENXIO, we can just clear the error and continue, because we
know that the file system cannot affect the on-disk state after we
see this error. For EIO or other errors, we arrange for the geom_vfs
layer to reject all future I/O requests with ENXIO just like is
done when the geom_vfs is orphaned. In both cases, the file system
code can just clear the error and proceed with the forcible unmount.

This new treatment of I/O errors is needed for writes of any buffer
that is involved in a dependency. Most dependencies are described
by a structure attached to the buffer's b_dep field. But some are
created and processed as a result of the completion of the dependencies
attached to the buffer.

Clearing of some dependencies require a read. For example if there
is a dependency that requires an inode to be written, the disk block
containing that inode must be read, the updated inode copied into
place in that buffer, and the buffer then written back to disk.

Often the needed buffer is already in memory and can be used. But
if it needs to be read from the disk, the read will fail, so we
fabricate a buffer full of zeroes and pretend that the read succeeded.
This zero'ed buffer can be updated and written back to disk.

The only case where a buffer full of zeros causes the code to do
the wrong thing is when reading an inode buffer containing an inode
that still has an inode dependency in memory that will reinitialize
the effective link count (i_effnlink) based on the actual link count
(i_nlink) that we read. To handle this case we now store the i_nlink
value that we wrote in the inode dependency so that it can be
restored into the zero'ed buffer thus keeping the tracking of the
inode link count consistent.

Because applications depend on knowing when an attempt to write
their data to stable storage has failed, the fsync(2) and msync(2)
system calls need to return errors if data fails to be written to
stable storage. So these operations return ENXIO for every call
made on files in a file system where we have otherwise been ignoring
I/O errors.

Coauthered by: mckusick
Reviewed by:   kib
Tested by:     Peter Holm
Approved by:   mckusick (mentor)
Sponsored by:  Netflix
Differential Revision:  https://reviews.freebsd.org/D24088

Update sec(4) for separate output buffers changes in r361481.

This does not add support for separate output buffers but updates the
driver to cope with the changes.

Pointy hat to: jhb

Update cesa(4) for separate output buffers changes in r361481.

This does not add support for separate output buffers but updates the
driver to cope with the changes.

Pointy hat to: jhb

Remove an extraneous line continuation from r361481.

Expand coverage of different buffer sizes.

- When -z is used, include small buffers from 1 to 32 bytes to test
  stream ciphers.  Note that while AES-XTS claims to support a block
  size of 1 in OpenSSL, it does require a minimum of 1 block of cipher
  text as it is not a stream cipher but depends on CTS to pad out the
  final partial block.

- Permit multiple AAD sizes to be set via multiple -A options, or via
  -z.  When -z is set, use small buffers from 0 to 32 bytes followed
  by powers of 2 up to 256.  When multiple sizes are specified, the
  ETA and AEAD algorithms perform the full matrix of AAD sizes by
  payload sizes.

- Only warn on unchanged ciphertext instead of erroring.  The
  currently generated plaintext and key for a couple of AES-CTR tests
  with a buffer size of 1 results in ciphertext that matches the
  plaintext.

Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25006

[ath] [ath_hal] Propagate the HAL_RESET_TYPE through to the chip reset; set it during ath_reset()

Although I added the reset type field to ath_hal_reset() years ago,
I never finished adding it both throughout the HALs and in if_ath.c.

This will eventually deprecate the ath_hal force_full_reset option
because it can be requested at the driver layer.

So:

* Teach ar5416ChipReset() and ar9300_chip_reset() about the HAL type
* Use it in ar5416Reset() and ar9300_reset() when doing a full chip reset
* Extend ath_reset() to include the HAL_RESET_TYPE parameter added in the above functions
* Use HAL_RESET_NORMAL in most calls to ath_reset()
* .. but use HAL_RESET_BBPANIC for the BB panics, and HAL_RESET_FORCE_COLD during fatal, beacon miss and other hardware related hangs.

This should be a glorified no-op outside of actual hardware issues.
I've tested things with ath_hal force_full_reset set to 1 for years now,
so I know that feature and a full reset works (albeit much slower than
a warm reset!) and it does unwedge hardware.

The eventual aim is to use this for all the places where the driver
detects a potential hang as well as if long calibration - ie, noise floor
calibration - fails to complete. That's one of the big hardware related
things that causes station mode operation to hang without easy recovery.

Differential Revision: https://reviews.freebsd.org/D24981

Support separate output buffers for aesni(4).

The backend routines aesni(4) call for specific encryption modes all
expect virtually contiguous input/output buffers.  If the existing
output buffer is virtually contiguous, always write to the output
buffer directly from the mode-specific routines.  If the output buffer
is not contiguous, then a temporary buffer is allocated whose output
is then copied to the output buffer.  If the input buffer is not
contiguous, then the existing buffer used to hold the input is also
used to hold temporary output.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24545

Support separate output buffers in ccr(4).

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24545

Add a sysctl knob to use separate output buffers for /dev/crypto.

This is a testing aid to permit using testing a driver's support of
separate output buffers via cryptocheck.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24545

Export the _kern_crypto sysctl node from crypto.c.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24545

Add support for optional separate output buffers to in-kernel crypto.

Some crypto consumers such as GELI and KTLS for file-backed sendfile
need to store their output in a separate buffer from the input.
Currently these consumers copy the contents of the input buffer into
the output buffer and queue an in-place crypto operation on the output
buffer.  Using a separate output buffer avoids this copy.

- Create a new 'struct crypto_buffer' describing a crypto buffer
  containing a type and type-specific fields.  crp_ilen is gone,
  instead buffers that use a flat kernel buffer have a cb_buf_len
  field for their length.  The length of other buffer types is
  inferred from the backing store (e.g. uio_resid for a uio).
  Requests now have two such structures: crp_buf for the input buffer,
  and crp_obuf for the output buffer.

- Consumers now use helper functions (crypto_use_*,
  e.g. crypto_use_mbuf()) to configure the input buffer.  If an output
  buffer is not configured, the request still modifies the input
  buffer in-place.  A consumer uses a second set of helper functions
  (crypto_use_output_*) to configure an output buffer.

- Consumers must request support for separate output buffers when
  creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are
  only permitted to queue a request with a separate output buffer on
  sessions with this flag set.  Existing drivers already reject
  sessions with unknown flags, so this permits drivers to be modified
  to support this extension without requiring all drivers to change.

- Several data-related functions now have matching versions that
  operate on an explicit buffer (e.g. crypto_apply_buf,
  crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf).

- Most of the existing data-related functions operate on the input
  buffer.  However crypto_copyback always writes to the output buffer
  if a request uses a separate output buffer.

- For the regions in input/output buffers, the following conventions
  are followed:
  - AAD and IV are always present in input only and their
    fields are offsets into the input buffer.
  - payload is always present in both buffers.  If a request uses a
    separate output buffer, it must set a new crp_payload_start_output
    field to the offset of the payload in the output buffer.
  - digest is in the input buffer for verify operations, and in the
    output buffer for compute operations.  crp_digest_start is relative
    to the appropriate buffer.

- Add a crypto buffer cursor abstraction.  This is a more general form
  of some bits in the cryptosoft driver that tried to always use uio's.
  However, compared to the original code, this avoids rewalking the uio
  iovec array for requests with multiple vectors.  It also avoids
  allocate an iovec array for mbufs and populating it by instead walking
  the mbuf chain directly.

- Update the cryptosoft(4) driver to support separate output buffers
  making use of the cursor abstraction.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24545

copystr(9): Move to deprecate (attempt #2)

This reapplies logical r360944 and r360946 (reverting r360955), with fixed
copystr() stand-in replacement macro.  Eventually the goal is to convert
consumers and kill the macro, but for a first step it helps if the macro is
correct.

Prior commit message:

Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults.  It's just
an older incarnation of the now-more-common strlcpy().

Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.

Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy (with correction from brooks@ -- thanks).

Remove N redundant MI implementations of copystr.  For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.

Reviewed by: jhb (earlier version)
Discussed with: brooks (thanks!)
Differential Revision: https://reviews.freebsd.org/D24672

Introduce a driver for NXP LS1046A SoC AHCI.

Implement support for AHCI controller found in
NXP QorIQ Layerscape SoCs.

Submitted by: Artur Rojek <ar@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24466

Introduce support for Epson RX-8803 RTC.

This patch introduces support for Epson RX-8803 RTC controller accessible
over I2C bus. It has a resolution of 1 sec.
Support for interrupt based alarm was not implemented.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24364

Add TCA6416 GPIO expander support.

Add basic TCA6416 GPIO expander support over I2C bus. The driver handles
enabling and disabling pins, setting pin mode to IN and OUT and
toggling the pins. External interrupts are not supported.

Submitted by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: manu, mmel
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24363

Introduce VF610 I2C controller support.

NXP LS1046A contains I2C controller compatible with Vybrid VF610.
Existing Vybrid MVF600 driver can be used to support it. For that purpose
declare driver as ofw_iicbus and add methods associated with ofw_iicbus.

For VF610 add dynamic clock prescaler calculation using clock information
from clock driver and clock frequency requested in device tree.

On the occasion add detach function and add additional error handling
in i2c_attach function.

Submitted by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24361

Add GPIO support for QorIQ boards.

This patch adds a GPIO controller support targeted for NXP LS1046A
SoC. The driver implements the following features:
* setting direction of each pin (IN or OUT)
* setting the mode of output pins (PUSHPULL or OPENDRAIN)
* setting the state of each output pin (1 or 0)
* reading the state of each input pin (1 or 0)

Submitted by: Kamil Koczurek <kek@semihalf.com>
Dawid Gorecki <dgr@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24353

Add LS1046A clockgen driver.

Driver provides probe and attach functions for LS1046A clockgen and passes
configuration information to QorIQ clockgen class. It may be used as
a reference implementation for different QorIQ clockgen devices.

Submitted by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: mmel, manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24352

Add QorIQ platform clockgen driver.

This patch adds classes and functions that can be used with various NXP
QorIQ Layerscape SoCs.

As for the clock topology - there is single platform PLL, which supplies
clocks for the peripheral bus and additional PLLs for CPU cores. There
can be multiple core PLLs (For example - LS1046A has two PLLs - CGAPLL1
and CGAPLL2). Each PLL has fixed dividers on output. The core PLLs
are not accessible from dts.

This is a preparation patch for NXP LS1046A SoC support.

Submitted by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: mmel
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24351

linuxkpi: Fix mod_timer and del_timer_sync

mod_timer is supposed to return 1 if the modified timer was pending, which
is exactly what callout_reset does so return the value after checking
that it's a correct one in case the api change.
del_timer_sync returns int so add a function and handle that.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D24983

linuxkpi: Add refcount.h

Implement some refcount functions needed by drm.
Just use the atomic_t struct and functions from linuxkpi for simplicity.

Sponsored-by: The FreeBSD Foundation
Reviewed by: hselsasky
Differential Revision: https://reviews.freebsd.org/D24985

linuxkpi: Add __same_type and __must_be_array macros

The same_type macro simply wraps around builtin_types_compatible_p which
exist for both GCC and CLANG, which returns 1 if both types are the same.
The __must_be_array macros returns 1 if the argument is an array.

This is needed for DRM v5.3

Sponsored-by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D24953

proc: refactor clearing credentials into proc_unset_cred

Improve set progress parameters, SET PSV for HW TLS in mlx5en(4).

There is no need for a fence and there is no need to provide
the TCP sequence number.

Sponsored by: Mellanox Technologies

Correctly set the initial vector for TLS v1.3 for mlx5en(4).

For TLS v1.3 the 12 bytes of the initial vector, IV, should just be copied
as-is from the kernel to the gcm_iv field, which hold the first 4 bytes,
and the remaining 8 bytes go to the subsequent implicit_iv field.
There is no need to consider the byte order on the 12 bytes of IV like
initially done.

Sponsored by: Mellanox Technologies

Update the TLS capability bit after recent PRM changes in mlx5en(4).

A CX6-DX firmware version equal to or newer than 12.27.0372 is
now required.

Sponsored by: Mellanox Technologies

Add example usage for formatting a floppy disk. Adding a more self
contained example here in the fdformat man page will allow us to
modernize and streamline the FreeBSD Handbook by cutting out some of
this legacy material.

While here, address some other minor grammatical nits in this man page.

Reviewed by: bcr (mentor)
Approved by: bcr (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24971

Fix pci-passthru MSI issues with OpenBSD guests

- Return 2 x 16-bit registers in the correct byte order
for a 4-byte read that spans the CMD/STATUS register.
This reversal was hiding the capabilities-list, which prevented
the MSI capability from being found for XHCI passthru.

- Reorganize MSI/MSI-x config writes so that a 4-byte write at the
capability offset would have the read-only portion skipped.
This prevented MSI interrupts from being enabled.

Reported and extensively tested by Anatoli (me at anatoli dot ws)

PR: 245392
Reported by: Anatoli (me at anatoli dot ws)
Reviewed by: jhb (bhyve)
Approved by: jhb, bz (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24951

vfs: use atomic_{store,load}_long to manage f_offset

... instead of depending on the compiler not to mess them up

vfs: restore mtx-protected foffset locking for 32 bit platforms

They depend on it to accurately read the offset.

The new code is not used as it would add an interrupt enable/disable
trip on top of the atomic.

This also fixes a bug where 32-bit nolock request would still lock the offset.

No changes for 64-bit.

Reported by: emaste

[skip ci] ip.4: fix typos

MFC after: 2 weeks

Chase r361344. Update unbound version strings.

Reported by: mike tancsa <mike@sentex.net>
MFC after: 1 day

Make i386 memstick images bootable.

This reverts the i386 part of r342283, "Rework UEFI ESP generation", and
the followup commit in r342690.

r342283 added an ESP to the i386 memstick image, and as a side effect
made the ESP the active partition, not the bootcode-containing UFS
partition. As a result the i386 memstick images would not boot in
either UEFI or legacy mode - UEFI failed because we do not support i386
UEFI booting, and legacy mode failed because the partition with legacy
bootcode was not active.

The bootcode-containing UFS partition is again the only, and active,
partition.

PR: 246494
Reported by: Jorge Maidana
Differential Revision: The FreeBSD Foundation

libprocstat: try to fix fallout from r361363

The revision caused libprocstat to have two undefined symbols:
- __start_set_pcpu
- __stop_set_pcpu
probably because of __GLOBL() used in sys/pcpu.h under _KERNEL.
The symbols are not accessed by anything and the linker in base does not
complain about them, but some ports are failing to build.
Hack around the problem by providing definitions for those symbols.

Probably there is a better solution, but I could not think of it yet.

Reported by: zeising
MFC after: 3 days
X-MFC with: r361363
Sponsored by: Panzura

vfs: scale foffset_lock by using atomics instead of serializing on mtx pool

Contending cases still serialize on sleepq (which would be taken anyway).

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21626

Unbreak ARM64 kernel build after r361426

X-MFC-With: r361426

Update to Zstandard 1.4.5

As usual, the full release notes are found on Github:

  https://github.com/facebook/zstd/releases/tag/v1.4.5

Notable changes include:

* Improved decompress performance on amd64 and arm (5-10%
  and 15-50%, respectively).
* '--patch-from' zstd(1) CLI option, which provides something like a very fast
  version of bspatch(1) with slightly worse compression.  See release notes.

In this update, I dropped the 3-year old -O0 workaround for an LLVM ARM bug;
the bug was fixed in LLVM SVN in 2017, but we didn't remove this workaround
from our tree until now.

MFC after: I won't, but feel free
Relnotes: yes

contrib/zstd: Revise Xlist for 1.4.5 import

Import Zstd 1.4.5

bbr: Use arc4random_uniform from libkern.

This unbreak LINT build

Reported by: jenkins, melifaro

Move <add|del|change>_route() functions to route_ctl.c in preparation of
multipath control plane changed described in D24141.

Currently route.c contains core routing init/teardown functions, route table
manipulation functions and various helper functions, resulting in >2KLOC
file in total. This change moves most of the route table manipulation parts
to a dedicated file, simplifying planned multipath changes and making
route.c more manageable.

Differential Revision: https://reviews.freebsd.org/D24870

linuxkpi: Add prandom_u32_max

This is just a wrapper around arc4random_uniform
Needed by DRM v5.3

Sponsored-by: The FreeBSD Foundation
Reviewed by: cem, hselasky
Differential Revision: https://reviews.freebsd.org/D24961

libkern: Add arc4random_uniform

This variant get a random number up to the limit passed as the argument.
This is simply a copy of the libc version.

Sponsored-by: The FreeBSD Foundation
Reviewed by: cem, hselasky (previous version)
Differential Revision: https://reviews.freebsd.org/D24962

Remove refcounting from rtentry.

After making rtentry reclamation backed by epoch(9) in r361409, there is
no reason in keeping reference counting code.

Differential Revision: https://reviews.freebsd.org/D24867

Merge llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and openmp
llvmorg-10.0.1-rc1-0-gf79cd71e145 (aka 10.0.1 rc1).

MFC after: 3 weeks

Use epoch(9) for rtentries to simplify control plane operations.

Currently the only reason of refcounting rtentries is the need to report
the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
is used by some of the customers to get the matching prefix along
with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision: https://reviews.freebsd.org/D24866

Remove a workaround for GCM requests with an empty payload.

This was copied from ccr(4) (which does require the workaround), but
is reportedly not needed for ccp(4).

Discussed with: cem
Sponsored by: Netflix

Simplify the RISC-V kernel linker invocation

Remove our custom SYSTEM_LD definition. This generates program headers
that are more consistent with other architectures, and more importantly,
are in line with what loader(8) expects when loading a kernel.

As noted in https://reviews.freebsd.org/D22920, there is no apparent
reason why the kernel would need a writable text segment, so removal of
the -N flag isn't likely to cause issue.

Reviewed by: kp, br
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24909

Fix issues with FUSE_ACCESS when default_permissions is disabled

This patch fixes two issues relating to FUSE_ACCESS when the
default_permissions mount option is disabled:

* VOP_ACCESS() calls with VADMIN set should never be sent to a fuse server
  in the form of FUSE_ACCESS operations. The FUSE protocol has no equivalent
  of VADMIN, so we must evaluate such things kernel-side, regardless of the
  default_permissions setting.

* The FUSE protocol only requires FUSE_ACCESS to be sent for two purposes:
  for the access(2) syscall and to check directory permissions for
  searchability during lookup. FreeBSD sends it much more frequently, due to
  differences between our VFS and Linux's, for which FUSE was designed. But
  this patch does eliminate several cases not required by the FUSE protocol:

  * for any FUSE_*XATTR operation
  * when creating a new file
  * when deleting a file
  * when setting timestamps, such as by utimensat(2).

* Additionally, when default_permissions is disabled, this patch removes one
  FUSE_GETATTR operation when deleting a file.

PR: 245689
Reported by: MooseFS FreeBSD Team <freebsd@moosefs.pro>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24777

Do not try to fill socket send buffer to the last byte.

Setting so_snd.sb_lowat to at least 1/8 of the socket buffer size allows
send thread more actively use PDUs coalescing, that dramatically reduces
TCP lock congestion and number of context switches, when the socket is
full and PDUs are small.

MFC after: 1 week
Sponsored by: iXsystems, Inc.

Disable nullfs cacheing on top of fusefs

Nullfs cacheing can keep a large number of vnodes active. That results in
more active FUSE file handles, causing some FUSE servers to use extra
resources. Disable nullfs cacheing for fusefs, just like we already do for
NFSv4.

PR: 245688
Reported by: MooseFS FreeBSD Team <freebsd@moosefs.pro>
MFC after: 2 weeks