CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

amd64: stop using top of the thread' kernel stack for FPU user save area

Instead do one more allocation at the thread creation time. This frees
a lot of space on the stack.

Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn(). This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.

A useful experiment now would be to reduce KSTACK_PAGES.

Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954

linux32: add a hack to avoid redefining the type of the savefpu tag

when compiling in amd64 kernel environment with -m32. This is a temporal
workaround for some future proper (but unclear) fix.

Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954

exec_machdep.c: some style, use ANSI C definition for sys_sigreturn()

Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954

amd64: move signal handling and register structures manipulations into exec_machdep.c

from machdep.c which is too large pile of unrelated things.
Some ptrace functions are moved from machdep.c to ptrace_machdep.c.

Now machdep.c contains code mostly related to the low level initialization
and regular low level operation of the architecture, while signal MD code
and registers handling is placed in exec_machdep.c.

Reviewed by: jhb, markj
Discussed with: jrtc27
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954

amd64: centralize definitions of CS_SECURE and EFL_SECURE

Requested by markj
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954

ipsec: enter epoch before calling into ipsec_run_hhooks

pfil_run_hooks which eventually can get called asserts on it.

Reviewed by: ae
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32007

dwmmc: Remove dwmmc_setup_bus call from start_cmd

There is no need to re-setup the bus before each commands.
Tested-on: Rock64, RockPro64
Reported by: avg

dwmmc: Properly implement power_off/power_up

Write to the PWREN register should be done in update_ios based
on the power_mode value in the ios struct.
Also none of the manual (RockChip and Altera) and Linux talks about
the needed for an inverted PWREN value so just remove this.
This fixes eMMC (and possibly SD) when u-boot didn't setup the controller.

Reported by: avg
Tested-on: Rock64, RockPro64

efi loader: Typo

MFC after: 3 days

clang-format: Add bitset loop macros

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation

cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it

This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.

Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029

bitset(9): Introduce BIT_FOREACH_ISSET and BIT_FOREACH_ISCLR

These allow one to non-destructively iterate over the set or clear bits
in a bitset.  The motivation is that we have several code fragments
which iterate over a CPU set like this:

while ((cpu = CPU_FFS(&cpus)) != 0) {
cpu--;
CPU_CLR(cpu, &cpus);
<do something>;
}

This is slow since CPU_FFS begins the search at the beginning of the
bitset each time.  On amd64 and arm64, CPU sets have size 256, so there
are four limbs in the bitset and we do a lot of unnecessary scanning.

A second problem is that this is destructive, so code which needs to
preserve the original set has to make a copy.  In particular, we have
quite a few functions which take a cpuset_t parameter by value, meaning
that each call has to copy the 32 byte cpuset_t.

The new macros address both problems.

Reviewed by: cem, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32028

syslog.conf.5: Fix the message priority order

PR: 219942
MFC after: 1 week

sctp: Simplify stream scheduler usage

Callers are getting the stcb send lock, so just KASSERT that.
No need to signal this when calling stream scheduler functions.
No functional change intended.

MFC after: 1 week

arm64: Handle 32bits breakpoint exception.

A different exception is raised when we hit a 32bits breakpoint, rather than
a 64bits one, so handle those as well when COMPAT_FREEBSD32 is defined.
This should fix SIGBUS at least when using breakpoints with thumb2 code.

PR: 256468
MFC After: 1 week

Fix the arm64 L2_BLOCK_MASK definition

It was missing the top 16 bits.

Sponsored by: The FreeBSD Foundation

bcm2835_sdhci: don't use DMA for kernel dumps

When handling a data irq, the sdhci driver calls the
sdhci_platform_will_handle() method, to determine if it should allow the
platform driver to handle the transfer or fall back to programmed I/O.
While dumping, the data irq path may be invoked directly (not from an
interrupt context), which the bcm2835_sdhci DMA code is not prepared to
handle. Return early in this case, to force the fallback to PIO.

Otherwise, the KASSERT that follows will be triggered, and the dump will
fail. On non-INVARIANTS kernels, the system will hang, waiting for a DMA
interrupt that will never arrive.

Reviewed by: kevans
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31893

sctp: improve consistency when calling stream scheduler

Hold always the stcb send lock when calling sctp_ss_init() and
sctp_ss_remove_from_stream().

MFC after: 1 week

mvneta: split to FDT and generic part

Split some missing routines.

Obtained from: Semihalf

endian.h: Use the __bswap* versions

Make it possible to have all these macros work without bswap* being
defined. bswap* is part of the application namespace and applications
are free to redefine those functions.

Reviewed by: emaste,jhb,markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31964

camcontrol: depop command

Implement and document the new depop command. This command manages drive elements
for drives that support it. Storage elements are typically heads. Element status
can be discovered. Elements may be removed or restored. And the status of any
current depop operation can be assessed.

depop -d elm will remove element elm and truncate available capacity.
depop -l will list the current drive elements and their current status.
depop -r elm will try to restore all retired elements and rebuild capacity.

Changing storage elements may reinitialize the drive. This operation will lose
data and may take hours to complete. Use the drive provided timeout for
operations by default.

Reviewed by: gbe (manpages)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29018

libcam: Define depop structures and introduce scsi_wrap

Define structures related to the depop set of commands (GET PHYSICAL ELEMENT
STATUS, REMOVE ELEMENT AND TRUNCATE, and RESTORE ELEMENT AND REBUILD) as
well as the CDB construction routines.

Also create scsi_wrap.c. This will have convenience routines that will do all
the elements of allocating the ccb, generating the CDB, sending the command
(looping as necessary for cases where data is returned, but it's size isn't
known up front), etc. As this functionality is fleshed out, calling many
camcontrol commands programatically gets much easier.

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29017

aio_fsync_vnode: handle ERELOOKUP after VOP_FSYNC()

Reported by: tmunro
Reviewed by: jhb, tmunro
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32023

aio_fsync_vnode: use for(;;) loop instead of label

Reviewed by: jhb, tmunro
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32023

vt: call driver's postswitch when panicking on ttyv0

In vt_kms, the postswitch callback restores fbdev mode when
panicking or entering the debugger. This ensures that even when
a graphical applicatino was running on the first tty, simple framebuffer
mode would be restored and the panic would be visible instead
of the frozen GUI. But vt wouldn't call the postswitch callback
when we're already on the first tty, so running a GUI on it
would prevent you from reading any panics.

Reviewed by: tsoome
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29961

opencrypto: Allow kern.crypto.allow_soft to be specified as a tunable

MFC after: 1 week
Sponsored by: The FreeBSD Foundation

mmc: switch mmc_helper to device_ api

Add generic mmc_helper which uses newly introduced device_*_property
api. Thanks to this change the sd/mmc drivers will be capable
of parsing both DT and ACPI description.

Ensure backward compatibility for all mmc_fdt_helper users.

Reviewed by: manu, mw
Sponsored by: Semihalf
Differential revision: https://reviews.freebsd.org/D31598

device: add device_get_property and device_has_property

Generialize bus specific property accessors. Those functions allow driver code
to access device specific information.

Currently there is only support for FDT and ACPI buses.

Reviewed by: manu, mw
Sponsored by: Semihalf
Differential revision: https://reviews.freebsd.org/D31597

acpica: add ACPI_GET_PROPERTY to access Device Specific Data (DSD)

Add lazy acquiring of DSD package, which allows accessing Device
Specific Data.

Reviewed by: manu, mw
Sponsored by: Semihalf
Differential revision: https://reviews.freebsd.org/D31596

sctp: use a valid outstream when adding it to the scheduler

Without holding the stcb send lock, the outstreams might get
reallocated if the number of streams are increased.

Reported by: syzbot+4a5431d7caa666f2c19c@syzkaller.appspotmail.com
Reported by: syzbot+aa2e3b013a48870e193d@syzkaller.appspotmail.com
Reported by: syzbot+e4368c3bde07cd2fb29f@syzkaller.appspotmail.com
Reported by: syzbot+fe2f110e34811ea91690@syzkaller.appspotmail.com
Reported by: syzbot+ed6e8de942351d0309f4@syzkaller.appspotmail.com
MFC after: 1 week

Add ELF macros found in the aaelf64 spec

The arm64 aaelf64 spec [0] has DT_AARCH64_ that could be used with
dynamic linking. It also adds GNU_PROPERTY_AARCH64_FEATURE_1_AND used
to tell the kernel which CPU features the binary is compatible with,
but does not require to execute correctly.

Add these values so the kernel and elf tools can make use of them.

[0] https://github.com/ARM-software/abi-aa/blob/2021Q1/aaelf64/aaelf64.rst

Sponsored by: The FreeBSD Foundation

if_mvneta: Build the driver as a kernel module

Fix device detach and attach routine. Add required Makefile
to build as a module. Remove entry from GENERIC, since now
it can be loaded automatically.

Tested on EspressoBin.

Obtained from: Semihalf
Reviewed by: manu
Differential revision: https://reviews.freebsd.org/D31581

stress2: Update test to ensure propper cleanup of fifo files

Update leap-seconds to leap-seconds.3676924800.

Obtained from: ftp://ftp.nist.gov/pub/time/leap-seconds.3676924800.
MFC after: 3 days

The linux rc.d script mounts several filesystems related to Linux ABI
compatibility layer. When /compat is located on a ZFS other than /,
mount would fail because they were not mounted.

Solve this by moving `linux` to depend on `zfs` which mounts all ZFS
filesystems.

Differential Revision: https://reviews.freebsd.org/D31848
MFC after: 2 weeks

[fib_algo][dxr] Merge adjacent empty range table chunks.

MFC after: 3 days

style: Fix leading whitespace in bcache.c

MFC after: 2 weeks
X-MFC-with: Further bcache changes to come

vm_page_startup: correct calculation of the starting page

Also avoid unneded calculations when phys segment end is the phys_avail[]
start.

Submitted by: alc
Reviewed by: markj
MFC after: 1 week
Fixes: 181bfb42fd01bfa9f46
Differential revision: https://reviews.freebsd.org/D32009

ciss(4): Fix typo.

ciss(4): Properly handle data underrun.

For SCSI data underrun is a part of normal life. It should not be
reported as error. This fixes MODE SENSE used by modern CAM.

MFC after: 1 month

freebsd32: Fix a double copyin in sendmsg() and recvmsg()

freebsd32_sendmsg() and freebsd32_recvmsg() both copyin the message
header twice, once directly and once in freebsd32_copyinmsghdr(). The
iovec length from the former is used when copying in msg_iov, but the
rest of the kernel uses the iovec length from the latter. When
kern_sendit() and kern_recvit() iterate over the iovec to compute the
residual for I/O, they can therefore end up walking past the end of the
copied in iovec, either resulting in a system call error, userspace
memory corruption from uiomove() with invalid iovecs, or a kernel page
fault if the copied-in iovec is followed by an unmapped KVA region.

Reported by: syzbot+7cc64cd0c49605acd421@syzkaller.appspotmail.com
Reviewed by: kib, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32010

freebsd32: Provide an ANSI definition for freebsd32_recvmsg()

Fix style in the freebsd32_sendmsg() definition.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation

ls(1): Allow LSCOLORS to specify an underline

Allows capitalizing the background color character to enable an
underline instead of bold, capitalizing the foreground color char will
still do bold.

Differential Revision: https://reviews.freebsd.org/D30547

fstyp: bump WARNS to default and work around warnings

Differential Revision: https://reviews.freebsd.org/D31588

sh: improve command completion

When multiple matches are found, we keep the provided string on the
input line and print unique matches as suggestions.

But the multiple matches might be the same command found in different
directories, so we should deduplicate the matches first and then decide
whether to autocomplete the command or not, based on the number of
unique matches.

sctp: fix FCFS stream scheduler

Reported by: syzbot+c6793f0f0ce698bce230@syzkaller.appspotmail.com
MFC after: 1 week

Eliminate snaplk / bufwait LOR when creating UFS snapshots

Each vnode has an embedded lock that controls access to its contents.
However vnodes describing a UFS snapshot all share a single snapshot
lock to coordinate their access and update. As part of creating a
new UFS snapshot, it has to have its individual vnode lock replaced
with the filesystem's snapshot lock.

The lock order for regular vnodes with respect to buffer locks is that
they must first acquire the vnode lock, then a buffer lock. The order
for the snapshot lock is reversed: a buffer lock must be acquired before
the snapshot lock.

When creating a new snapshot, the snapshot file must retain its vnode
lock until it has allocated all the blocks that it needs before
switching to the snapshot lock. This update moves one final piece of
the initial snapshot block allocation so that it is done before the
newly created snapshot is switched to use the snapshot lock.

Reported by: Witness code
MFC after: 1 week
Sponsored by: Netflix

nfscl: Use vfs.nfs.maxalloclen to limit Deallocate RPC RTT

Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not
allow a reply with partial completion. As such, the only way to
limit the time the operation takes to provide a reasonable RPC RTT
is to limit the size of the allocation/deallocation in the NFSv4.2
client.

This patch uses the sysctl vfs.nfs.maxalloclen to set
the limit on the size of the Deallocate operation.
There is no way to know how long a server will take to do an
deallocate operation, but 64Mbytes results in a reasonable
RPC RTT for the slow hardware I test on.

For an 8Gbyte deallocation, the elapsed time for doing it in 64Mbyte
chunks was the same (within margin of variability) as the
elapsed time taken for a single large deallocation
operation for a FreeBSD server with a UFS file system.

vfs: add missing VIRF_MOUNTPOINT in vfs_mountroot_shuffle

Reported by: mav

vfs: add the missing vnode interlock in vfs_mountroot_shuffle

Around v_mountedhere assignment.

unix: Fix a use-after-free in unp_drop()

We need to load the socket pointer after locking the PCB, otherwise
the socket may have been detached and freed by the time that unp_drop()
sets so_error.

This previously went unnoticed as the socket zone was _NOFREE.

Reported by: pho
MFC after: 1 week

pf: always log nat rule and do it pre-rewrite

See also https://github.com/opnsense/core/issues/5005

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D31504

lockmgr: fix lock profiling of face adaptive spinning

cache: count vnodes in cache_purgevfs

vfs: retire VNODE_REFCOUNT_FENCE_* macros

They are unused as of last year.

calendar.freebsd: Fix off-by-one error

nvme/nda: Fail all nvme I/Os after controller fails

Once the controller has failed, fail all I/O w/o sending it to the
device. The reset of the nvme driver won't schedule any I/O to the
failed device, and the controller is in an indeterminate state and can't
accept I/O. Fail both at the top end of the sim and the bottom
end. Don't bother queueing up the I/O for failure in a different task.

Reviewed by: chuck
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31341

e1000: Consistently use FALLTHROUGH

Approved by: imp
MFC after: 1 week

e1000: Use C99 bool types

Approved by: imp
MFC after: 1 week

e1000: Catch up commit with DPDK

Various syncs with the e1000 shared code from DPDK:
"cid-gigabit.2020.06.05.tar.gz released by ND"

Approved by: imp
Obtained from: DPDK
MFC after: 1 week

e1000: prevent ULP flow if cable connected

Enabling ulp on link down when cable is connect caused an infinite
loop of linkup/down indications in the NDIS driver.
After discussed, correct flow is to enable ULP only when cable is
disconnected.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Approved by: imp
Obtained from: DPDK (4bff263d54d299269966365f9697941eecaa241b)
MFC after: 1 week

e1000: clean LTO warnings

During LTO build compiler reports some 'false positive' warnings about
variables being possibly used uninitialized. This patch silences these
warnings.

Exemplary compiler warning to suppress (with LTO enabled):
error: 'link' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
if (link) {

Signed-off-by: Andrzej Ostruszka <aostruszka@marvell.com>
Approved by: imp
Obtained from: DPDK (46136031f19107f4e9b6b3a952cb7f57877a7f0f)
MFC after: 1 week

e1000: fix multicast setting in VF

In function e1000_update_mc_addr_list_vf(), "msgbuf[0]" is used prior
to initialization at "msgbuf[0] |= E1000_VF_SET_MULTICAST_OVERFLOW".
And "msgbuf[0]" is overwritten at "msgbuf[0] = E1000_VF_SET_MULTICAST".

Fix it by moving the second line prior to the first one that mentioned
above.

Fixes: dffbaf7880a8 ("e1000: revert fix for multicast in VF")
Cc: stable@dpdk.org
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Approved by: imp
Obtained from: DPDK (f58ca2f9ef6)
MFC after: 1 week

e1000: fix timeout for shadow RAM write

This fixes the timed out for shadow RAM write EEWR can't be detected.

Fixes: 5a32a257f957 ("e1000: more NICs in base driver")
Cc: stable@dpdk.org
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Min Hu (Connor) <humin29@huawei.com>
Acked-by: Haiyue Wang <haiyue.wang@intel.com>
Approved by: imp
Obtained from: DPDK (4a8ab48ec47b3616272e50620b8e1a9599358ea6)
MFC after: 1 week

e1000: cleanup pre-processor tags

The codes has been exposed correctly, so remove pre-processor tags.

Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (a50e998a0fd94e5db508710868a3417b1846425c)
MFC after: 1 week

e1000: introduce DPGFR register

Defined DPGFR, Dynamic Power Gate Force Control Register.

Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (1469e5aceffbdcebe834292aadb40b1bd1602867)
MFC after: 1 week

e1000: expose FEXTNVM registers and masks

Adding defines for FEXTNVM8 and FEXTNVM12 registers with new masks for
future use.

Signed-off-by: Nir Efrati <nir.efrati@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (6d208ec099cd870a73c6b444b350a82c7a26c5e4)
MFC after: 1 week

e1000: add missed define for VFTA

VLAN filtering using the VFTA (VLAN Filter Table Array) and
should be initialized prior to setting rx mode.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (fc9933953c90e99970aa867c38f9c6e6c5d0488d)
MFC after: 1 week

e1000: increase timeout for ME ULP exit

Due timing issues in WHL and since recovery by host is
not always supported, increased timeout for Manageability Engine(ME)
to finish Ultra Low Power(ULP) exit flow for Nahum before timer expiration.

Signed-off-by: Nir Efrati <nir.efrati@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (cf1f3ca45d33e793ca581200b4000c39a798113e)
MFC after: 1 week

e1000: add missing register defines

Added defines for the EEC, SHADOWINF and FLFWUPDATE registers needed for
the nvmupd_validate_offset function to correctly validate the NVM update
offset.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (2c7fe65ab9a31e6ebf438dad7ccc59bcde83a89f)
MFC after: 1 week

e1000: add PCIm function state

Added define to pcim function state.

Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (7ee1a3b273c7f321b50e6ba17c3d9537b1b08347)
MFC after: 1 week

e1000: expose MAC functions

Now the functions are being accessed outside of the file, we need
to properly expose them for silicon families to use.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (df01c0ee277d51f81d7d72501dba97550d3b6c4a)
MFC after: 1 week

e1000: update for i210 slow system clock

This code is required for the update for system clock.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (3f0188c8f29847038bc9f306b2570ace57e3811c)
MFC after: 1 week

e1000: remove duplicated phy codes

Add two files base.c and base.h to reduce the redundancy
in the silicon family code.
Remove the code duplication from e1000_82575 files.
Clean family specific functions from base.
Fix up a stray and duplicate function declaration.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (44dddd14059f151f39f7e075b887decfc9a10f11)
MFC after: 1 week

e1000: modify HW level time sync mechanisms

Add additional configuration space access to allow HW
level time sync mechanism.

Signed-off-by: Evgeny Efimov <evgeny.efimov@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (d53391f1fe2e0eba8818517fdf285f893d95dcc8)
MFC after: 1 week

e1000: fix minor issues and improve code style

Fix typo in piece of code of NVM access for SPT.
And cleans up the remaining instances in the shared code
where it was not adhering to the Linux code standard.
Wrong description was found in the mentioned file, so fix them.
Remove shadowing variable declarations.

Relating to operands in bitwise operations having different sizes.
Unreachable code since *clock_in_i2c_* always return success.
Don't return unused s32 and don't check for constants.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Robert Konklewski <robertx.konklewski@intel.com>
Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (b8592c89c8fbc871d22313dcac0b86c89a7d5a62)
MFC after: 1 week

e1000: add function parameter descriptions

Add function parameter descriptions to address gcc 7 warnings.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (1bf35d435c9764e83be76042fa6489dd127b6c40)
MFC after: 1 week

e1000: expose xMDIO methods

Move read and write xmdio methods to e1000_phy.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (b14d20f1b2bb0e6d95f19963c5d7f55374e0ead9)
MFC after: 1 week

e1000: add missing device ID

Adding Intel(R) I210 Gigabit Network Connection 15F6 device ID for SGMII
flashless automotive device.

Signed-off-by: Kamil Bednarczyk <kamil.bednarczyk@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (586d770bfefc01d4af97c0ddf17c960c3e49ec22)
MFC after: 1 week

e1000: support flashless i211 PBA

Add support to print PBA when using flashless.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>
Approved by: imp
Obtained from: DPDK (d3c41d90dfd5b39dec14c74cf53086f4e6634aed)
MFC after: 1 week

e1000: Update copyrights and readme

Copyrights in sync with "cid-gigabit.2020.06.05.tar.gz released by ND"
(from DPDK).

README from the latest em-7.7.8 on intel.com

Approved by: imp
MFC after: 1 week

e1000: Revert Update intel shared code

This reverts commit fc7682b17f3738573099b8b03f5628dcc8148adb.

This will be done incrementally to help with bisecting an issue in
later I21x devices (ich8lan).

PR: 258153
Approved by: imp
MFC after: 1 day

vfs: Permit unix sockets to be opened with O_PATH

As with FIFOs, a path descriptor for a unix socket cannot be used with
kevent().

In principle connectat(2) and bindat(2) could be modified to support an
AT_EMPTY_PATH-like mode which operates on the socket referenced by an
O_PATH fd referencing a unix socket. That would eliminate the path
length limit imposed by sockaddr_un.

Update O_PATH tests.

Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31970

aio_test: Validate interactions between AIO on sockets and shutdown(2)

Reviewed by: asomers, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31976

socket: Synchronize soshutdown() with listen(2) and AIO

To handle shutdown(SHUT_RD) we flush the receive buffer of the socket.
This may involve searching for control messages of type SCM_RIGHTS,
since we need to close the file references.  Closing arbitrary files
with socket buffer locks held is undesirable, mainly due to lock
ordering issues, so we instead make a copy of the socket buffer and
operate on that without any locks.  Fields in the original buffer are
cleared.

This behaviour clobbered the AIO job queue associated with a receive
buffer.  It could also cause us to leak a KTLS session reference.
Reorder socket buffer fields to address this.

An alternate solution would be to remove the hack in sorflush(), but
this is not quite feasible (yet).  In particular, though sorflush()
flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to
be queued - the flag just prevents userspace from reading more data.  I
suspect we should fix this; SBS_CANTRCVMORE represents a terminal state
and protocols can likely just drop any data destined for such a buffer.
Many of them already do, but in some cases the check is racy, and some
KPI churn will be needed to fix everything.  This approach is more
straightforward for now.

Reported by: syzbot+104d8ee3430361cb2795@syzkaller.appspotmail.com
Reported by: syzbot+5bd2e7d05f84a59d0d1b@syzkaller.appspotmail.com
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31976

socket: Remove NOFREE from the socket zone

This flag was added during the transition away from the legacy zone
allocator, commit c897b81311792ccf6a93feff2a405e2ae53f664e.  The old
zone allocator effectively provided _NOFREE semantics, but it seems that
they are not required for sockets.  In particular, we use reference
counting to keep sockets live.

One somewhat dangerous case is sonewconn(), which returns a pointer to a
socket with reference count 0.  This socket is still effectively owned
by the listening socket.  Protocols must therefore be careful to
synchronize sonewconn() calls with their pru_close implementations,
since for listening sockets soclose() will abort the child sockets.  For
example, TCP holds the listening socket's PCB read locked across the
sonewconn() call, which blocks tcp_usr_close(), and sofree()
synchronizes with a concurrent soabort() of the nascent socket.
However, _NOFREE semantics are not required here.

Eliminating _NOFREE has several benefits: it enables use-after-free
detection (e.g., by KASAN) and lets the system reclaim memory from the
socket zone under memory pressure.  No functional change intended.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31975

socket: Add assertions around naked refcount decrements

Sockets in a listen queue hold a reference to the parent listening
socket. Several code paths release this reference manually when moving
a child socket out of the queue.

Replace comments about the expected post-decrement refcount value with
assertions. Use refcount_load() instead of a plain load. No functional
change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31974

socket: Fix a use-after-free in soclose()

After releasing the fd reference to a socket "so", we should avoid
testing SOLISTENING(so) since the socket may have been freed. Instead,
directly test whether the list of unaccepted sockets is empty.

Fixes: f4bb1869ddd2 ("Consistently use the SOLISTENING() macro")
Pointy hat: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31973

ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers

ktls_get_(rx|tx)_mode() can return an errno value or a TLS mode, so
errors are effectively hidden. Fix this by using a separate output
parameter. Convert to the new socket buffer locking macros while here.

Note that the socket buffer lock is not needed to synchronize the
SOLISTENING check here, we can rely on the PCB lock.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31977

uma: Show the count of free slabs in each per-domain keg's sysctl tree

This is useful for measuring the number of pages that could be freed
from a NOFREE zone under memory pressure.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation

rpc: Convert an SOLISTENING check to an assertion

Per the comment, this socket should always be a listening socket.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation

kcov: Disable address and memory sanitizers in get_kinfo()

get_kinfo() is only called from the coverage sanitizer callbacks, which
are similarly uninstrumented.

Sponsored by: The FreeBSD Foundation

buffer pager: allow get_blksize method to return error

Reported and reviewed by: asomers
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31998

bhyve: Support setting the disk serial number for VirtIO block devices.

Reviewed by: allanjude
Obtained from: illumos
Differential Revision: https://reviews.freebsd.org/D31983

libc/locale: Fix races between localeconv(3) and setlocale(3)

Each locale embeds a lazily initialized lconv which is populated by
localeconv(3) and localeconv_l(3).  When setlocale(3) updates the global
locale, the lconv needs to be (lazily) reinitialized.  To signal this,
we set flag variables in the locale structure.  There are two problems:

- The flags are set before the locale is fully updated, so a concurrent
  localeconv() call can observe partially initialized locale data.
- No barriers ensure that localeconv() observes a fully initialized
  locale if a flag is set.

So, move the flag update appropriately, and use acq/rel barriers to
provide some synchronization.  Note that this is inadequate in the face
of multiple concurrent calls to setlocale(3), but this is not expected
to work regardless.

Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case
demonstrating the race.

PR: 258360
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31899

readelf: document that -u / --unwind is not yet implemented

ELF tool chain readelf accepts -u / --unwind but just ignores the
option. This was previously undocumented, which could be confusing for
someone encountering `readelf -u` (in a script or GNU readelf example).

Reported by: markj (in D32003)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation

readelf: include notes (-n) and unwind (-u) in --all/-a

This matches the GNU and LLVM versions of readelf.

As markj noted in the review -u is not actually implemented yet and has
no effect. The option is accepted and just ignored.

Reported by: andrew
Reviewed by: andrew, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32003

proccontrol(1): Add wxmap control

Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779

procctl(2): Add PROC_WXMAP_CTL/STATUS

It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.

Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779

Style

Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779