Guinan Sun [Mon, 6 Jul 2020 08:12:05 +0000 (08:12 +0000)]
e1000: remove duplicated phy codes
Add two files base.c and base.h to reduce the redundancy
in the silicon family code.
Remove the code duplication from e1000_82575 files.
Clean family specific functions from base.
Fix up a stray and duplicate function declaration.
Guinan Sun [Mon, 6 Jul 2020 08:12:03 +0000 (08:12 +0000)]
e1000: fix minor issues and improve code style
Fix typo in piece of code of NVM access for SPT.
And cleans up the remaining instances in the shared code
where it was not adhering to the Linux code standard.
Wrong description was found in the mentioned file, so fix them.
Remove shadowing variable declarations.
Relating to operands in bitwise operations having different sizes.
Unreachable code since *clock_in_i2c_* always return success.
Don't return unused s32 and don't check for constants.
Mark Johnston [Fri, 17 Sep 2021 16:34:21 +0000 (12:34 -0400)]
vfs: Permit unix sockets to be opened with O_PATH
As with FIFOs, a path descriptor for a unix socket cannot be used with
kevent().
In principle connectat(2) and bindat(2) could be modified to support an
AT_EMPTY_PATH-like mode which operates on the socket referenced by an
O_PATH fd referencing a unix socket. That would eliminate the path
length limit imposed by sockaddr_un.
Update O_PATH tests.
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31970
Mark Johnston [Fri, 17 Sep 2021 16:29:28 +0000 (12:29 -0400)]
socket: Synchronize soshutdown() with listen(2) and AIO
To handle shutdown(SHUT_RD) we flush the receive buffer of the socket.
This may involve searching for control messages of type SCM_RIGHTS,
since we need to close the file references. Closing arbitrary files
with socket buffer locks held is undesirable, mainly due to lock
ordering issues, so we instead make a copy of the socket buffer and
operate on that without any locks. Fields in the original buffer are
cleared.
This behaviour clobbered the AIO job queue associated with a receive
buffer. It could also cause us to leak a KTLS session reference.
Reorder socket buffer fields to address this.
An alternate solution would be to remove the hack in sorflush(), but
this is not quite feasible (yet). In particular, though sorflush()
flags the sockbuf with SBS_CANTRCVMORE, it is possible for more data to
be queued - the flag just prevents userspace from reading more data. I
suspect we should fix this; SBS_CANTRCVMORE represents a terminal state
and protocols can likely just drop any data destined for such a buffer.
Many of them already do, but in some cases the check is racy, and some
KPI churn will be needed to fix everything. This approach is more
straightforward for now.
Reported by: syzbot+104d8ee3430361cb2795@syzkaller.appspotmail.com
Reported by: syzbot+5bd2e7d05f84a59d0d1b@syzkaller.appspotmail.com
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31976
Mark Johnston [Fri, 17 Sep 2021 16:27:26 +0000 (12:27 -0400)]
socket: Remove NOFREE from the socket zone
This flag was added during the transition away from the legacy zone
allocator, commit c897b81311792ccf6a93feff2a405e2ae53f664e. The old
zone allocator effectively provided _NOFREE semantics, but it seems that
they are not required for sockets. In particular, we use reference
counting to keep sockets live.
One somewhat dangerous case is sonewconn(), which returns a pointer to a
socket with reference count 0. This socket is still effectively owned
by the listening socket. Protocols must therefore be careful to
synchronize sonewconn() calls with their pru_close implementations,
since for listening sockets soclose() will abort the child sockets. For
example, TCP holds the listening socket's PCB read locked across the
sonewconn() call, which blocks tcp_usr_close(), and sofree()
synchronizes with a concurrent soabort() of the nascent socket.
However, _NOFREE semantics are not required here.
Eliminating _NOFREE has several benefits: it enables use-after-free
detection (e.g., by KASAN) and lets the system reclaim memory from the
socket zone under memory pressure. No functional change intended.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31975
Mark Johnston [Fri, 17 Sep 2021 16:26:56 +0000 (12:26 -0400)]
socket: Add assertions around naked refcount decrements
Sockets in a listen queue hold a reference to the parent listening
socket. Several code paths release this reference manually when moving
a child socket out of the queue.
Replace comments about the expected post-decrement refcount value with
assertions. Use refcount_load() instead of a plain load. No functional
change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31974
Mark Johnston [Fri, 17 Sep 2021 16:26:06 +0000 (12:26 -0400)]
socket: Fix a use-after-free in soclose()
After releasing the fd reference to a socket "so", we should avoid
testing SOLISTENING(so) since the socket may have been freed. Instead,
directly test whether the list of unaccepted sockets is empty.
Fixes: f4bb1869ddd2 ("Consistently use the SOLISTENING() macro")
Pointy hat: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31973
Mark Johnston [Fri, 17 Sep 2021 16:14:29 +0000 (12:14 -0400)]
ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers
ktls_get_(rx|tx)_mode() can return an errno value or a TLS mode, so
errors are effectively hidden. Fix this by using a separate output
parameter. Convert to the new socket buffer locking macros while here.
Note that the socket buffer lock is not needed to synchronize the
SOLISTENING check here, we can rely on the PCB lock.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31977
Mark Johnston [Fri, 17 Sep 2021 14:44:23 +0000 (10:44 -0400)]
libc/locale: Fix races between localeconv(3) and setlocale(3)
Each locale embeds a lazily initialized lconv which is populated by
localeconv(3) and localeconv_l(3). When setlocale(3) updates the global
locale, the lconv needs to be (lazily) reinitialized. To signal this,
we set flag variables in the locale structure. There are two problems:
- The flags are set before the locale is fully updated, so a concurrent
localeconv() call can observe partially initialized locale data.
- No barriers ensure that localeconv() observes a fully initialized
locale if a flag is set.
So, move the flag update appropriately, and use acq/rel barriers to
provide some synchronization. Note that this is inadequate in the face
of multiple concurrent calls to setlocale(3), but this is not expected
to work regardless.
Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case
demonstrating the race.
PR: 258360
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31899
Ed Maste [Fri, 17 Sep 2021 13:59:41 +0000 (09:59 -0400)]
readelf: document that -u / --unwind is not yet implemented
ELF tool chain readelf accepts -u / --unwind but just ignores the
option. This was previously undocumented, which could be confusing for
someone encountering `readelf -u` (in a script or GNU readelf example).
Reported by: markj (in D32003)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.
Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779
Mike Karels [Sun, 5 Sep 2021 18:14:04 +0000 (13:14 -0500)]
Change lowest address on subnet (host 0) not to broadcast by default.
The address with a host part of all zeros was used as a broadcast long
ago, but the default has been all ones since 4.3BSD and RFC1122. Until
now, we would broadcast the host zero address as well as the configured
address. Change to not broadcasting that address by default, but add a
sysctl (net.inet.ip.broadcast_lowest) to re-enable it. Note that the
correct way to use the zero address for broadcast would be to configure
it as the broadcast address for the network.
See https:/datatracker.ietf.org/doc/draft-schoen-intarea-lowest-address/
and the discussion in https://reviews.freebsd.org/D19316. Note, Linux
now implements this.
Colin Percival [Thu, 16 Sep 2021 16:22:42 +0000 (09:22 -0700)]
EC2: Default to UEFI booting
This reduces the FreeBSD boot time by approximately 5 seconds,
roughly equally divided betwenn two factors:
* Disk I/O is faster in the EFI loader since it can perform larger
I/Os. (The BIOS loader is limited due to the use of bounce buffers
in sub-1M memory.)
* The EFI console is much faster than the VGA console.
Note however that not all EC2 instance types support UEFI; as a
general rule the newer instances (based on Amazon's "Nitro" platform)
support UEFI but the older instances (based on Xen) do not.
X-MFC: TBD based on tradeoff between performance and compatibility
Relnotes: yes
Sponsored by: https://www.patreon.com/cperciva
Colin Percival [Thu, 16 Sep 2021 02:15:44 +0000 (19:15 -0700)]
EC2: Allow AMI boot mode to be specified
The default boot method for amd64 AMIs is BIOS, but at AMI creation
time a flag can be set to specify that UEFI should be used instead.
This commit adds a variable AMIBOOTMETHOD which, if set to "UEFI",
causes the appropriate flag to be set during AMI creation.
The only boot method supported by EC2 for arm64 is UEFI.
The names of AMIs are also amended to include the boot method; they
now look like "FreeBSD 14.0-CURRENT-amd64-20210915 UEFI".
Warner Losh [Thu, 16 Sep 2021 16:18:32 +0000 (10:18 -0600)]
nanobsd: Provide empty routines for new embedded scheme
calculate_partitioning and create_code_slice are now required in
nanobsd.sh. While things work with the ones provided by legacy.sh, it's
fighting embedded/common's other actions. Instead, replace them with
stubs.
Reimplement bdf0f24bb16d556a5b by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.
Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31968
vm_phys: do not ignore phys_avail[] segments that do not fit completely into vm_phys segments
If phys_avail[] segment only intersect with some vm_phys segment, add
pages from it to the free list that belong to the given vm_phys_seg,
instead of dropping them.
The vm_phys segments are generally result of subdivision of phys_avail
segments, for instance DMA32 or LOWMEM boundaries split them. On
amd64, after UEFI in-place kernel activation (copy_staging disable)
was enabled, we typically have a large phys_avail[] segment below 4G
which crosses LOWMEM (1M) boundary. With the current way of requiring
phys_avail[] fully fit into vm_phys_seg, this memory was ignored.
Artur Rojek [Thu, 16 Sep 2021 12:14:54 +0000 (14:14 +0200)]
ena: fix building in-kernel driver
When building ENA as compiled into the kernel, the driver would fail to
build. Resolve the problem by introducing the following changes:
1. Add missing `ena_rss.c` entry in `sys/conf/files`.
2. Prevent SYSCTL_ADD_INT from throwing an assert due to an extra
CTLTYPE_INT flag.
Fixes: 986e7b92276 ("ena: Move RSS logic into its own source files") Fixes: 6d1ef2abd33 ("ena: Implement full RSS reconfiguration")
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
Rick Macklem [Thu, 16 Sep 2021 00:29:45 +0000 (17:29 -0700)]
nfscl: Add vfs.nfs.maxalloclen to limit Allocate/Deallocate RPC RTT
Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not
allow a reply with partial completion. As such, the only way to
limit the time the operation takes to provide a reasonable RPC RTT
is to limit the size of the allocation/deallocation in the NFSv4.2
client.
This patch adds a sysctl called vfs.nfs.maxalloclen to set
the limit on the size of the Allocate operation.
There is no way to know how long a server will take to do an
allocate operation, but 64Mbytes results in a reasonable
RPC RTT for the slow hardware I test on, so that is what
the default value for vfs.nfs.maxalloclen is set to.
For an 8Gbyte allocation, the elapsed time for doing it in 64Mbyte
chunks was the same as the elapsed time taken for a single large
allocation operation for a FreeBSD server with a UFS file system.
Adds a --color flag to diff(1) that supports the same options as GNU's
diff(1). The colors are customizable with the env var DIFFCOLORS in
a format similar to grep(1)'s GREPCOLORS. An example would be 04;36:41
for additions to be underlined light blue, and deletions have a red
background.
diff: move functions around and reduce their visibility
Most of them become static. There will be more such functions added in
upcoming commits, so they would be inconsistent with existing code.
Improve the existing code instead of reinforcing the unwanted pattern.
There will be more boolean flags added in upcoming commits and they
would have to be stored in ints in order to be consistent with existing
code. Change the existing code to use the bool type.
Kevin Bowling [Wed, 15 Sep 2021 14:47:19 +0000 (07:47 -0700)]
e1000: Fix up HW vlan ops
* Don't reset the entire adapter for vlan changes, fix up the problems
* Add some functions for vlan filter (vfta) manipulation
* Don't muck with the vfta if we aren't doing HW vlan filtering
* Disable interrupts when manipulating vfta on lem(4)-class NICs
* On the I350 there is a specification update (2.4.20) in which the
suggested workaround is to write to the vfta 10 times (if at first you
don't succeed, try, try again). Our shared code has the goods, use it
* Increase a VF's frame receive size in the case of vlans
From the referenced PR, this reduced vlan configuration from minutes
to seconds with hundreds or thousands of vlans and prevents wedging the
adapter with needless adapter reinitialization for each vlan ID.
John Baldwin [Wed, 15 Sep 2021 20:25:30 +0000 (13:25 -0700)]
iscsi: Abort data-out tasks queued on a terminating session.
cfiscsi_datamove_out() can race with cfiscsi_session_terminate_tasks()
and enqueue a new task after the latter function has aborted existing
tasks. This could result in a deadlock as
cfiscsi_session_terminate_tasks() waited forever for this task to
complete.
Use radix_mmu environment variable to select between Hash or Radix
MMU, when performing the CAS method call. This matches kernel's
behavior, by selecting Hash MMU by default and Radix if radix_mmu
is not zero, to make sure that both loader and kernel always select
the same MMU.
The device tree is queried to detect Radix/GTSE support and to
find out if CAS is supported, making the old CPU version and HV
bit checks unnecessary now.
Reviewed by: jhibbits
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31951
This finish (almost) the clocks implementations for the RK3328 SoC.
The clocks are now correctly implemented respecting the clock hiearchy.
The missing clocks are mostly the DDR clocks, implementing those is only
useful for debugging as we will never set them in the kernel.
The ARMCLK still needs to be rewritten so it looks closer to how the
hardware is done.
Some clocks in the RK3328 SoC (and possibly others) have registers not in
the CLKSEL_CON range. Add a macros for muxes which lives not in the range
of CLKSEL_CON which just takes a raw offset.
tcp: Avoid division by zero when KERN_TLS is enabled in tcp_account_for_send().
If the "len" variable is non-zero, we can assume that the sum of
"tp->t_snd_rxt_bytes + tp->t_sndbytes" is also non-zero.
It is also assumed that the 64-bit byte counters will never wrap around.
Differential Revision: https://reviews.freebsd.org/D31959
Reviewed by: gallatin, rrs and tuexen
Found by: "I told you so", also called hselasky
MFC after: 1 week
Sponsored by: NVIDIA Networking
John Baldwin [Wed, 15 Sep 2021 16:03:18 +0000 (09:03 -0700)]
infiniband: Disable -Wredundant-decl warnings.
ib_uverbs_flow_resources_free() is declard in two header files in
upstream OFED. Disable the warning to avoid introducing diffs to fix
the build on GCC 9.
While here, fix the ibcore module to disable the same warnings
disabled in OFED_CFLAGS.
John Baldwin [Wed, 15 Sep 2021 16:03:17 +0000 (09:03 -0700)]
<linux/overflow.h>: Don't use __has_builtin().
GCC only added support for __has_builtin in GCC 10. However, all
supported versions of GCC and clang include these builtins so just use
them unconditionally.
Martin Matuska [Wed, 15 Sep 2021 15:30:07 +0000 (17:30 +0200)]
zfs: merge openzfs/zfs@4a1195ca5 (master) into main
Notable upstream pull request merges:
#11312 Temporarily use root credentials to mount snapshots in .zfs
#12246 arc: Drop an incorrect assert
#12443 Fixed data integrity issue when underlying disk returns error
to zfs
#12522 Compressed receive with different ashift can result in incorrect
PSIZE on disk
#12535 Verify embedded blkptr's in arc_read()
#12541 Allow sending corrupt snapshots even if metadata is corrupted
Due to the quirky nature of the Synopsys Designware PCIe IP,
the type 0 configuration is broadcast and whatever device
is plugged into slot, will appear at each 32 device
positions of bus0. Mitigate the issue by filtering out
duplicated devices on this bus for both DT and ACPI cases.
Andrew Turner [Wed, 15 Sep 2021 09:13:41 +0000 (09:13 +0000)]
Use a 64 bit read to access GICR_TYPER
The GICv3 ITS only needs to implement 32 bit access to the GICR_TYPER
when the CPU implements AArch32. As this may not always be the case
use a 64 bit read when checking if the ITS is enabled on the CPU.
PR: 258217
Reported by: Olivier Delande <olivier.delande@provenrun.com>
Sponsored by: The FreeBSD Foundation
Alexander Motin [Wed, 15 Sep 2021 01:06:39 +0000 (21:06 -0400)]
ipmi(4): Limit maximum watchdog pre-timeout interval.
Previous code by default setting pre-timeout interval to 120 seconds
made impossible to set timeout interval below that, resulting in error
0xcc (Invalid data field in Request) at least on Supermicro boards.
To fix that limit maximum pre-timeout interval to ~1/4 of the timeout
interval, that sounds like a reasonable default: not too short to fire
too late, but also not too long to give many false reports.
Allan Jude [Tue, 14 Sep 2021 23:10:00 +0000 (19:10 -0400)]
Temporarily use root credentials to mount snapshots in .zfs
When mounting a snapshot in the .zfs/snapshots control directory,
temporarily assume roots credentials to perform the VFS_MOUNT().
This allows regular users and users inside jails to access these
snapshots.
The regular usermount code is not helpful here, since it requires
that the user performing the mount own the mountpoint, which won't
be the case for .zfs/snapshot/<snapname>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Allan Jude <allan@klarasystems.com> Sponsored-By: Modirum MDPay Sponsored-By: Klara Inc.
Closes #11312
John Baldwin [Tue, 14 Sep 2021 20:46:14 +0000 (13:46 -0700)]
cxgbe tom: Update rcv_nxt for a FIN after handle_ddp_close().
For TCP DDP, handle_ddp_close() needs to see the pre-FIN rcv_nxt to
determine how much data was placed in the local buffer before the FIN
was received. The changes in d59f1c49e26b broke this by updating
rcv_nxt before calling handle_ddp_close().
Fixes: d59f1c49e26b cxgbe tom: Permit rcv_nxt mismatches on FIN for iSCSI connections on T6.
Sponsored by: Chelsio Communications
John Baldwin [Tue, 14 Sep 2021 20:46:14 +0000 (13:46 -0700)]
cxgbe tom: Don't queue AIO requests on listen sockets.
This is similar to the fixes in 141fe2dceeae. One difference is that
TOE sockets do not change states (listen vs non-listen) once created,
so no lock is needed for SOLISTENING().
Ka Ho Ng [Tue, 14 Sep 2021 19:51:58 +0000 (03:51 +0800)]
ctl(4): Do hole-punching for UNMAP to file-backed LUNs
This adds support for SCSI UNMAP command to file-backed LUNs, if the
underlying file system has a non-zerofilling VOP_DEALLOCATE
implementation where some or all parts of the requested operation range
may be deallocated.
Sponsored by: The FreeBSD Foundation
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D31922
Andrew Turner [Tue, 14 Sep 2021 19:03:30 +0000 (19:03 +0000)]
Remove a bogus assertion from the gic drivers
When setting a message based interrupt range we set the base and count.
In an earlier the count was implemented as an end value, however the
asserts used to check this value was correct were incorrectly left in.
Reported by: emaste
Sponsored by: The FreeBSD Foundation
John Baldwin [Tue, 14 Sep 2021 18:43:41 +0000 (11:43 -0700)]
Add a switch structure for send tags.
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'. A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).
Previously, device driver ifnet methods switched on the type to call
type-specific functions. Now, those type-specific functions are saved
in the switch structure and invoked directly. In addition, this more
gracefully permits multiple implementations of the same tag within a
driver. In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.
Mark Johnston [Tue, 14 Sep 2021 18:29:27 +0000 (14:29 -0400)]
kcov: Integrate with KMSAN
- kern_kcov.c needs to be compiled with -fsanitize=kernel-memory when
KMSAN is configured since it calls into various other subsystems.
- Disable address and memory sanitizers in kcov(4)'s coverage sanitizer
callbacks, as they do not provide useful checking. Moreover, with
KMSAN we may otherwise get false positives since the caller (coverage
sanitizer runtime) is not instrumented.
- Disable KASAN and KMSAN interceptors in subr_coverage.c, as they do
not provide any benefit but do introduce overhead when fuzzing.