Stefan Eßer [Wed, 18 Nov 2020 19:44:30 +0000 (19:44 +0000)]
Add function getlocalbase() to libutil.
This function returns the path to the local software base directory, by
default "/usr/local" (or the value of _PATH_LOCALBASE in include/paths.h
when building the world).
The value returned can be overridden by 2 methods:
- the LOCALBASE environment variable (ignored by SUID programs)
- else a non-default user.localbase sysctl value
msun tests: use standard floating-point exception flags on lrint and fenv tests
Some platforms have additional architecture-specific floating-point flags.
Msun test cases lrint and test_fegsetenv (fenv) expects only standard flags,
so make sure to mask them appropriately.
This makes test pass on PowerPC64.
Reviewed by: jhibbits, ngie
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27202
Warner Losh [Wed, 18 Nov 2020 19:22:24 +0000 (19:22 +0000)]
mergemaster: handle symbolic links during update.
/etc/os-release is now a symbolic link to a generated file. Make
mergemaster cope with symbolic links generically. I'm no longer
a big mergemaster user, so this has only been lightly tested
by me, though Kimura-san has ran it through its paces.
Dimitry Andric [Wed, 18 Nov 2020 18:40:58 +0000 (18:40 +0000)]
When elftoolchain's objcopy (or strip) is rewriting a file in-place,
make it create the temporary file in the same directory as the source
file by default, instead of always using $TMPDIR or /tmp. If creating
that file fails because the directory is not writable, also fallback to
$TMPDIR or /tmp.
This has also been submitted upstream as:
https://sourceforge.net/p/elftoolchain/tickets/597/
Marcin Wojtas [Wed, 18 Nov 2020 15:17:55 +0000 (15:17 +0000)]
Add ENI metrics for the ENA driver
The new HAL allows the driver to read extra ENI stats. Exact meaning of
each of them can be found in base/ena_defs/ena_admin_defs.h file and
structure ena_admin_eni_stats.
Those stats are being updated inside of the timer service, which is
executed every second.
ENI metrics are turned off by default. They can be enabled, using the
sysctl node: dev.ena.X.eni_metrics.update_delay
0 value in this node means that the update is turned off. Other values
determine how many seconds must pass, before ENI metrics will be
updated.
Marcin Wojtas [Wed, 18 Nov 2020 15:07:34 +0000 (15:07 +0000)]
Add SPDX license tag to the ENA driver files
Refering to guide: https://wiki.freebsd.org/SPDX the SPDX tag should not
replace the standard license text, however it should be added over the
standard license text to make the automation easier.
Because of that, the old license was kept, but the SPDX tag was added
on top of every ENA driver file.
Marcin Wojtas [Wed, 18 Nov 2020 15:02:12 +0000 (15:02 +0000)]
Add Rx offsets support for the ENA driver
For the first descriptor in a chain the data may start at an offset.
It is optional feature of some devices, so the driver must ack that
it supports it.
The data pointer of the mbuf is simply shifted by the given value.
Marcin Wojtas [Wed, 18 Nov 2020 14:59:22 +0000 (14:59 +0000)]
Adjust ENA driver files to latest ena-com changes
* Use the new API of ena_trace_*
* Fix typo syndrom --> syndrome
* Remove validation of the Rx req ID (already performed in the ena-com)
* Remove usage of deprecated ENA_ASSERT macro
Andrew Gallatin [Wed, 18 Nov 2020 14:55:49 +0000 (14:55 +0000)]
LACP: When suppressing distributing, return ENOBUFS
When links come and go, lacp goes into a "suppress distributing" mode
where it drops traffic for 3 seconds. When in this mode, lagg/lacp
historiclally drops traffic with ENETDOWN. That return value causes TCP
to close any connection where it gets that value back from the lower
parts of the stack. This means that any TCP connection with active
traffic during a 3-second windown when an LACP link comes or goes
would get closed.
TCP treats return values of ENOBUFS as transient errors, and re-schedules
transmission later. So rather than returning ENETDOWN, lets
return ENOBUFS instead. This allows TCP connections to be preserved.
I've tested this by repeatedly bouncing links on a Netlfix CDN server
under a moderate (20Gb/s) load and overved ENOBUFS reported back to
the TCP stack (as reported by a RACK TCP sysctl).
Marcin Wojtas [Wed, 18 Nov 2020 14:50:12 +0000 (14:50 +0000)]
Fix completion descriptors alignment for the ENA
The latest generation hardware requires IO CQ (completion queue)
descriptors memory to be aligned to a 4K. It needs that feature for
the best performance.
Allocating unaligned descriptors will have a big performance impact as
the packet processing in a HW won't be optimized properly. For that
purpose adjust ena_dma_alloc() to support it.
It's a critical fix, especially for the arm64 EC2 instances.
Marcin Wojtas [Wed, 18 Nov 2020 14:30:59 +0000 (14:30 +0000)]
ena-com: Fix ena-com to allocate cdesc aligned to 4k
The latest generation hardware requires IO CQ (completion queue)
descriptors memory to be aligned to a 4K. It needs that feature for
the best performance.
Allocating unaligned descriptors will have a big performance impact as
the packet processing in a HW won't be optimized properly.
It's a critical fix, especially for the arm64 EC2 instances.
Alan Somers [Wed, 18 Nov 2020 04:35:49 +0000 (04:35 +0000)]
nfs: Mark unused statistics variable as reserved
FreeBSD's NFS exporter has long exported some unused statistics fields.
Revision r366992 removed them from nfsstat. This revision renames those
fields in the kernel's exported structures to make it clear to other
consumers that they are unused.
Alexander Motin [Wed, 18 Nov 2020 03:43:03 +0000 (03:43 +0000)]
Move ecmd memory allocation itto separate DMA tag.
Ecmd memory is not directly related to the request queue, only referenced
from it sometimes in target mode. Separate allocation should be easier
in case of fragmented memory and can be skipped when target is not built.
Upstream commit: 3928ec53395fcc26be7844dd6b63df757166c281
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed off by: Cy Schubert <cy@FreeBSD.org>
Closes #11088
Conrad Meyer [Tue, 17 Nov 2020 21:20:11 +0000 (21:20 +0000)]
linux(4) clone(2): Correctly handle CLONE_FS and CLONE_FILES
The two flags are distinct and it is impossible to correctly handle clone(2)
without the assistance of fork1(). This change depends on the pwddesc split
introduced in r367777.
I've added a fork_req flag, FR2_SHARE_PATHS, which indicates that p_pd
should be treated the opposite way p_fd is (based on RFFDG flag). This is a
little ugly, but the benefit is that existing RFFDG API is preserved.
Holding FR2_SHARE_PATHS disabled, RFFDG indicates both p_fd and p_pd are
copied, while !RFFDG indicates both should be cloned.
In Chrome, clone(2) is used with CLONE_FS, without CLONE_FILES, and expects
independent fd tables.
The previous conflation of CLONE_FS and CLONE_FILES was introduced in
r163371 (2006).
Conrad Meyer [Tue, 17 Nov 2020 20:01:21 +0000 (20:01 +0000)]
unix(4): Enhance LOCAL_CREDS_PERSISTENT ABI
As this ABI is still fresh (r367287), let's correct some mistakes now:
- Version the structure to allow for future changes
- Include sender's pid in control message structure
- Use a distinct control message type from the cmsgcred / sockcred mess
Let's have two entries in the synopsis:
- chpass now lists options which can be used for non-NIS-specific
functionalities.
- ypchpass additionally lists the NIS-specific flags.
Technically, it is an artificial distinction, as chpass and ypchpass behave
identically. Nevertheless, it might help navigating the synopsis section.
Emmanuel Vadot [Tue, 17 Nov 2020 14:58:30 +0000 (14:58 +0000)]
arm64: allwinner: Init the Display Engine clock
In case u-boot was compiled without video support set the PLL
to 432Mhz (which allow us to use most of the HDMI resolution for
tcon) and set it as the parent for the DE clock.
When copying types from one CTF container to another, ensure that we
always copy intrinsic data types before copying bitfields which are
based on those types. This ensures the type ordering in the destination
CTF container matches the assumption made elsewhere in the CTF code
that instrinsic data types will always appear before bitfields based on
those types.
This resolves the following error message some users have seen after
r366908:
"/usr/lib/dtrace/ipfw.d", line 121: failed to copy type of 'ip6p':
Conflicting type is already defined
Leandro Lupori [Tue, 17 Nov 2020 11:36:31 +0000 (11:36 +0000)]
[PowerPC] Don't overwrite vm.pmap sysctl node
After r367417, both mmu_oea64 and mmu_radix were defining the vm.pmap
sysctl node, resulting in the later definition hiding the properties of
the previous one. Avoid this issue by defining vm.pmap in a common
source file and declaring it where needed.
This change also standardizes the tunable name used to enable superpages
and change its default to disabled on radix MMU, because it still has some
issues with superpages.
Reviewed by: bdragon, jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D27156
Clean up the synopsis section & fix mandoc warnings
The synopsis section had two very similar entries. The flags documented by
the first one were a strict subset of the second one. Let's just keep only
the second entry for simplicity.
Andrew Turner [Tue, 17 Nov 2020 10:27:42 +0000 (10:27 +0000)]
Stop calling gic_v3_detach when we haven't called gic_v3_attach
The former tries to dereference memory allocated by the latter. If counting
the redistributor fails it may try to dereference memory that was never
allocated.
Kyle Evans [Tue, 17 Nov 2020 03:36:58 +0000 (03:36 +0000)]
umtx_op: reduce redundancy required for compat32
All of the compat32 variants are substantially the same, save for
copyin/copyout (mostly). Apply the same kind of technique used with kevent
here by having the syscall routines supply a umtx_copyops describing the
operations needed.
umtx_copyops carries the bare minimum needed- size of timespec and
_umtx_time are used for determining if copyout is needed in the sem2_wait
case.
Kyle Evans [Tue, 17 Nov 2020 03:34:01 +0000 (03:34 +0000)]
_umtx_op: fix a compat32 bug in UMTX_OP_NWAKE_PRIVATE
Specifically, if we're waking up some value n > BATCH_SIZE, then the
copyin(9) is wrong on the second iteration due to upp being the wrong type.
upp is currently a uint32_t**, so upp + pos advances it by twice as many
elements as it should (host pointer size vs. compat32 pointer size).
Fix it by just making upp a uint32_t*; it's still technically a double
pointer, but the distinction doesn't matter all that much here since we're
just doing arithmetic on it.
Add a test case that demonstrates the problem, placed with the libthr tests
since one messing with _umtx_op should be running these tests. Running under
compat32, the new test case will hang as threads after the first 128 get
missed in the wake. it's not immediately clear how to hit it in practice,
since pthread_cond_broadcast() uses a smaller (sleepq batch?) size observed
to be around ~50 -- I did not spend much time digging into it.
The uintptr_t change makes no functional difference, but i've tossed it in
since it's more accurate (semantically).
Ruslan Bukin [Mon, 16 Nov 2020 21:55:52 +0000 (21:55 +0000)]
Introduce IOMMU support for arm64 platform.
This adds an arm64 iommu interface and a driver for Arm System Memory
Management Unit version 3.2 (ARM SMMU v3.2) specified in ARM IHI 0070C
document.
Hardware overview is provided in the header of smmu.c file.
The support is disabled by default. To enable add 'options IOMMU' to your
kernel configuration file.
The support was developed on Arm Neoverse N1 System Development Platform
(ARM N1SDP), kindly provided by ARM Ltd.
Currently, PCI-based devices and ACPI platforms are supported only.
The support was tested on IOMMU-enabled Marvell SATA controller,
Realtek Ethernet controller and a TI xHCI USB controller with a low to
medium load only.
Many thanks to Konstantin Belousov for help forming the generic IOMMU
framework that is vital for this project; to Andrew Turner for adding
IOMMU support to MSI interrupt code; to Mark Johnston for help with SMMU
page management; to John Baldwin for explaining various IOMMU bits.
Mitchell Horne [Mon, 16 Nov 2020 18:41:49 +0000 (18:41 +0000)]
bsdiff: fix off-by-one error
The program reads oldsize bytes from oldfile, and proceeds to initialize
a suffix array of oldsize elements using divsufsort(). As per the
function's API [1], array indices 0 through n-1 are initialized.
Later, search() is called, but with index bounds [0, n]. Depending on
the contents of the malloc'd buffer, accessing this uninitialized index
at the end of can result in a segmentation fault. Fix this by passing
oldsize-1 to search(), limiting the search bounds to [0, n-1].
This bug is a result of r303285, which introduced divsufsort() as an
alternate suffix sorting function to the existing qsufsort(). It seems
that qsufsort() did initialize the final empty element, meaning it could
be safely accessed. This difference in the implementations was missed at
the time.
[1] https://github.com/y-256/libdivsufsort
Discussed with: cperciva
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26911
APIs that have deferred callbacks should have some kind of cleanup
function that callers can use to fence the callbacks. Otherwise things
like module unloading can lead to dangling function pointers, or worse.
The IB MR code is the only place that calls this function and had a
really poor attempt at creating this fence. Provide a good version in
the core code as future patches will add more places that need this
fence.
Use mlx5core to create/destroy all Dynamically Connected Targets, DCTs.
To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling drain DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.
Peter Grehan [Sun, 15 Nov 2020 12:59:24 +0000 (12:59 +0000)]
Fix regression in AHCI controller settings.
When the AHCI code was reworked to use FreeBSD struct
definitions, the valid element was mis-transcribed resulting
in the UMDA capability being hidden. This prevented Illumos
from using AHCI disk/cdrom drives.
Fix by using definitions that match the code pre-rework.
Scott Long [Sun, 15 Nov 2020 07:50:29 +0000 (07:50 +0000)]
Fix the previous revision, it suffered from an incomplete change to the
getlocalbase API. Also don't erroneously subtract the lenth from the
buffer a second time.
Scott Long [Sun, 15 Nov 2020 07:48:52 +0000 (07:48 +0000)]
Because getlocalbase() returns -1 on error, it needs to use a signed type
internally. Do that, and make sure that conversations between signed and
unsigned don't overflow
Mateusz Guzik [Sat, 14 Nov 2020 19:23:07 +0000 (19:23 +0000)]
zfs: disable periodic arc updates
They are only there to provide less innacurate statistics for debuggers.
However, this is quite heavy-weight and instead it would be better to
teach debuggers how to obtain the necessary information.
The C.UTF-8 locales is the same as the actual C locale except it does support
the unicode character set. But the collation etc are still the same as the C
locale one.
Reviewed by: many
Approved by: many
Differential Revision: https://reviews.freebsd.org/D26973
Scott Long [Sat, 14 Nov 2020 17:57:50 +0000 (17:57 +0000)]
Add the library function getlocalbase and its manual page. This helps to
unify the retrieval of the various ways that the local software base directory,
typically "/usr/local", is expressed in the system.
Reviewed by: se
Differential Revision: https://reviews.freebsd.org/D27022
Fix implicit automatic local port selection for IPv6 during connect calls.
When a user creates a TCP socket and tries to connect to the socket without
explicitly binding the socket to a local address, the connect call
implicitly chooses an appropriate local port. When evaluating candidate
local ports, the algorithm checks for conflicts with existing ports by
doing a lookup in the connection hash table.
In this circumstance, both the IPv4 and IPv6 code look for exact matches
in the hash table. However, the IPv4 code goes a step further and checks
whether the proposed 4-tuple will match wildcard (e.g. TCP "listen")
entries. The IPv6 code has no such check.
The missing wildcard check can cause problems when connecting to a local
server. It is possible that the algorithm will choose the same value for
the local port as the foreign port uses. This results in a connection with
identical source and destination addresses and ports. Changing the IPv6
code to align with the IPv4 code's behavior fixes this problem.
Sometimes users want to use freebsd-update(8) in a non-interactive way and
what they often miss is that they have to set PAGER to cat(1) in order to
avoid interactive prompts from less(1).