With the upcoming multipath changes described in D24141,
rt->rt_nhop can potentially point to a nexthop group instead of
an individual nhop.
To simplify caller handling of such cases, change ifa_rtrequest() callback
to pass changed nhop directly.
Michal Meloun [Wed, 29 Apr 2020 14:31:25 +0000 (14:31 +0000)]
Export tracing facility of GIC500 ITS block.
Possibility of tracing of processing message based interrupts is very
useful for debugging of PCIe driver, mainly for its MSI part.
Michal Meloun [Wed, 29 Apr 2020 14:14:15 +0000 (14:14 +0000)]
Don't try to set frequendcy for enumerated but newer started CPUs.
Openfirmare enumerates and installs the driver for all processors,
regardless of whether they will be started later (because of power
constrains for example).
Michal Meloun [Wed, 29 Apr 2020 14:06:42 +0000 (14:06 +0000)]
Don't allow to use FPU inside of rtld library.
Clang10 may use FPU instructions for optimizing operations with
memory blocks. But we don't want to do lengthy save/restore of all
FPU registers across each rtld_start() call.
Michal Meloun [Wed, 29 Apr 2020 13:45:21 +0000 (13:45 +0000)]
Don't try to re-initialize already preseted regulator.
Don't set initial voltage for regulators having their voltage already
in allowed range. As side effect of this change, we don't try to set
initial voltage for fixed voltage regulators - these don't have impemented
voltage set method so their initialization has always failed.
Michal Meloun [Wed, 29 Apr 2020 13:43:15 +0000 (13:43 +0000)]
Multiple fixes for rockchip iodomain driver:
- always initialize selector of voltage signaling standard.
Various versions of U-boot leaves voltage signaling standard settings
for PMUIO2 domain in different state. Always initialize it
into expected state.
- start the driver as early as possible, the IO domains should be
initialized before other drivers are attached.
- rename RK3399 register to its name founds in TRM.
This is the second part of fixes for serial port corruption observed after
DT 5.6 import.
This is a temporary hack to aid with config(8) changing in r360443.
It will not work for all cases.
env PATH is used because universe-toolchain is only built when worlds
are built, and then only if clang is needed, so it may not exist.
universe-toolchain needs to be expanded to always be built, inspected to
remove non-cross-build-safe tools, used for buildworld/buildkernel,
and potentially incremental build support.
As I noted in https://reviews.freebsd.org/D22756, INTOFF should be in effect
when calling ckmalloc/ckrealloc/ckfree to avoid memory leaks and double
frees. Therefore, change the functions to check if INTOFF is in effect
instead of applying it.
Move route-specific ddb commands to route/route_ddb.c
Currently functionality resides in rtsock.c, which is a controlling
interface, partially external to the routing subsystem.
Additionally, DDB-supporting functionality is > 100SLOC, which deserves
a separate file.
Given that, move this functionality to a newly-created net/route/ subdir.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D24561
Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
introduced sys/net/route directory intended to hold all
routing-related code. Move recently-introduced route_temporal.c and
private route_var.h header there.
One of the goals of the new routing KPI defined in r359823
is to entirely hide`struct rtentry` from the consumers.
It will allow to improve routing subsystem internals and deliver
features much faster.
This is one of the last changes, effectively moving struct rtentry
definition to a net/route_var.h header, internal to the routing subsystem.
Some files may purposely have debug info disabled or are *source files*
that attempt to run ctfconvert on them. Currently ctfconvert ignores
these errors but I have a change to make the errors real so we can
catch real problems like exceeding type limits.
Restore local kernel "prog" filtering lost in r332099.
This behavior is most relevant for ipfw(4) as documented in syslog.conf(5).
The recent addition of property-based regex filters in r359327 is a
fine workaround for this but the behavior was present since 1997 and
documented.
This only fixes local matching of the "kernel program". It does not
change the forwarded format at all. On the remote side it will still
be "kernel: ipfw:" and not be parsed as a kernel message. This matches
old behavior.
Mark Johnston [Tue, 28 Apr 2020 15:02:44 +0000 (15:02 +0000)]
Make sendfile(SF_SYNC)'s CV wait interruptible.
Otherwise, since the CV is not signalled until data is drained from the
socket, it is trivial to create an unkillable process using
sendfile(SF_SYNC) and a process-private PF_LOCAL socket pair. In
particular, the cv_wait() in sendfile() does not get interrupted until
data is drained from the receiving socket buffer.
Reported by: pho
Discussed with: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
diff(1): don't reject specifying the same format multiple times
This may happen, for instance, if one happens to have an alias of diff to
diff -up and attempts to specify the amount of context on top of that.
Aliases like this may cause other problems, but if they're really not ever
generating non-unified diffs then we should at least not break that
use-case.
In addition, we'll now pick up a format mismatch if -p is specified with
!contextual && !unified && !unset.
Fix up a small trailing whitespace nit in the tests while we're here, and
add tests to make sure that we can double up all the formatting options.
Mark Johnston [Tue, 28 Apr 2020 13:51:41 +0000 (13:51 +0000)]
Re-check for wirings after busying the page in vm_page_release_locked().
A concurrent unlocked lookup can wire the page after
vm_page_release_locked() releases the last wiring, in which case
vm_page_release_locked() must not free the page. Once the xbusy lock is
acquired, that, the object lock and the fact that the page is unmapped
ensure that the wire count cannot increase, so re-check for new wirings
after the page is xbusied.
Update the comment above vm_page_wired() to reflect the new
synchronization rules.
Reported by: glebius
Reviewed by: alc, jeff, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24592
New fib[46]_lookup() functions support multipath transparently.
Given that, switch the last rtalloc_mpath_fib() calls to
dib4_lookup() and eliminate the function itself.
Note: proper flowid generation (especially for the outbound traffic) is a
bigger topic and will be handled in a separate review.
This change leaves flowid generation intact.
This debugging code printing routing table data was introduced in rS25723,
22+ years ago. The last functional commit to this code was rS67534, 19 years ago.
The code has been turned off by default all this time.
Lastly, this code directly iterates radix tree and rtentries, which is not
not a proper interaction with routing system.
Xin LI [Tue, 28 Apr 2020 05:10:34 +0000 (05:10 +0000)]
Do not overflow when calculating file system size.
Reported by: Hyeongseok Kim <hyeongseok kim lge com>
Reviewed by: cem, Hyeongseok Kim
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24603
While we're here, let's stylize these as functions instead of just raw text.
A future change may allow arbitrary data arguments to be passed some of
these, and the distinction is useful.
Rick Macklem [Tue, 28 Apr 2020 02:11:02 +0000 (02:11 +0000)]
Get rid of uio_XXX macros used for the Mac OS/X port.
The NFS code had a bunch of Mac OS/X accessor functions named uio_XXX
left over from the port to Mac OS/X. Since that port is long forgotten,
replace the calls with the code generated by the FreeBSD macros for these
in nfskpiport.h. This allows the macros to be deleted from nfskpiport.h
and I think makes the code more readable.
This patch should not result in any semantic change.
This is a straightforward match to the command used by many in forthloader;
it uses the newly-exported config.readConfFiles() to make sure that any
loader_conf_files gets done as appropriate.
In the process, change it slightly: readConfFiles will take a string like
loader_conf_files in addition to the loaded_files table that it normally
takes. This is to facilitate the addition of a read-conf CLI command, which
will just pass in the single file to read and an empty table.
lualoader config: don't call loader.getenv() as much
We don't actually need to fetch loader_conf_files as much as we do; we've
already fetched it once at the beginning, we only really need to fetch it
again after each file we've processed. If it changes, then we can stash that
off into our local prefiles.
While here, drop a note about the recursion so that I stop trying to
change it. It may very well make redundant some of the work we're doing, but
that's OK.
John Baldwin [Mon, 27 Apr 2020 23:59:42 +0000 (23:59 +0000)]
Add support for KTLS RX over TOE to T6.
This largely reuses the TLS TOE support added in r330884. However,
this uses the KTLS framework in upstream OpenSSL rather than requiring
Chelsio-specific patches to OpenSSL. As with the existing TLS TOE
support, use of RX offload requires setting the tls_rx_ports sysctl.
Rick Macklem [Mon, 27 Apr 2020 23:55:09 +0000 (23:55 +0000)]
Fix sosend_generic() so that it can handle a list of ext_pgs mbufs.
Without this patch, sosend_generic() will try to use top->m_pkthdr.len,
assuming that the first mbuf has a pkthdr.
When a list of ext_pgs mbufs is passed in, the first mbuf is not a
pkthdr and cannot be post-r359919. As such, the value of top->m_pkthdr.len
is bogus (0 for my testing).
This patch fixes sosend_generic() to handle this case, calculating the
total length via m_length() for this case.
There is currently nothing that hands a list of ext_pgs mbufs to
sosend_generic(), but the nfs-over-tls kernel RPC code in
projects/nfs-over-tls will do that and was used to test this patch.
Warner Losh [Mon, 27 Apr 2020 23:39:32 +0000 (23:39 +0000)]
Change the flags back to an enum
This was changed in the review process for the flags sysctl. The
reasons for the change are no longer valid as the code changed after
that. Cast the one place where it might make a difference (but I don't
think it does). This restores the ability to see flags for softc in
gdb.
John Baldwin [Mon, 27 Apr 2020 23:17:19 +0000 (23:17 +0000)]
Initial support for kernel offload of TLS receive.
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
authentication algorithms and keys as well as the initial sequence
number.
- When reading from a socket using KTLS receive, applications must use
recvmsg(). Each successful call to recvmsg() will return a single
TLS record. A new TCP control message, TLS_GET_RECORD, will contain
the TLS record header of the decrypted record. The regular message
buffer passed to recvmsg() will receive the decrypted payload. This
is similar to the interface used by Linux's KTLS RX except that
Linux does not return the full TLS header in the control message.
- Add plumbing to the TOE KTLS interface to request either transmit
or receive KTLS sessions.
- When a socket is using receive KTLS, redirect reads from
soreceive_stream() into soreceive_generic().
- Note that this interface is currently only defined for TLS 1.1 and
1.2, though I believe we will be able to reuse the same interface
and structures for 1.3.
John Baldwin [Mon, 27 Apr 2020 22:27:35 +0000 (22:27 +0000)]
Update the cached MSI state when any MSI capability register is written.
bhyve uses cached copies of the MSI capability registers to generate
MSI interrupts for device models. Previously, these cached fields
were only set when the MSI capability control register was updated.
The Linux kernel recently adopted a change to deal with races in MSI
interrupt delivery that writes to the MSI capability address and data
registers to alter the destination of MSI interrupts without writing
to the MSI capability control register. bhyve was not updating its
cached registers for these writes and continued to send interrupts
with the old data value to the old address. Fix this by recomputing
the cached values for every write to any MSI capability register.
Reported by: Jason Tubnor, Ryan Moeller
Reported by: Marc Dionne (bisected the Linux kernel commit)
Reviewed by: grehan
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24593
Eric Joyner [Mon, 27 Apr 2020 22:02:44 +0000 (22:02 +0000)]
iflib: Stop interface before (un)registering VLAN
This patch is intended to solve a specific problem that iavf(4)
encounters, but what it does can be extended to solve other issues.
To summarize the iavf(4) issue, if the PF driver configures VLAN
anti-spoof, then the VF driver needs to make sure no untagged traffic is
sent if a VLAN is configured, and vice-versa. This can be an issue when
a VLAN is being registered or unregistered, e.g. when a packet may be on
the ring with a VLAN in it, but the VLANs are being unregistered. This
can cause that tagged packet to go out and cause an MDD event.
To fix this, include a new interface-dependent function that drivers can
implement named IFDI_NEEDS_RESTART(). Right now, this function is called
in iflib_vlan_unregister/register() to determine whether the interface
needs to be stopped and started when a VLAN is registered or
unregistered. The default return value of IFDI_NEEDS_RESTART() is true,
so this fixes the MDD problem that iavf(4) encounters, since the
interface rings are flushed during a stop/init.
A future change to iavf(4) will implement that function just in case the
default value changes, and to make it explicit that this interface reset
is required when a VLAN is added or removed.
Colin Percival [Mon, 27 Apr 2020 21:44:02 +0000 (21:44 +0000)]
Set use_nvd=0 in EC2 AMIs.
FreeBSD is in the process of switching from nvd(4) to nda(4) as the disk
device front-end to NVMe. Changing the default in the kernel is tricky
since existing systems may have /dev/nvd* hard-coded e.g. in /etc/fstab;
however, there's no reason to not change the default in HEAD for *new*
systems.
At present I have no intention of MFCing this to stable branches, since
someone might reasonably expect scripts they use for launching and
configuring FreeBSD 12.1 instances to work with FreeBSD 12.2 AMIs, for
example.
Reviewed by: gjb, imp
Relnotes: NVMe disks in EC2 instances launched from 13.0 and later
now show up as nda(4) devices.
Differential Revision: https://reviews.freebsd.org/D24583
John Baldwin [Mon, 27 Apr 2020 17:55:40 +0000 (17:55 +0000)]
Improve MACHINE_ARCH handling for hard vs soft-float on RISC-V.
For userland, MACHINE_ARCH reflects the current ABI via preprocessor
directives. For the kernel, the hw.machine_arch sysctl uses the ELF
header flags of the current process to select the correct MACHINE_ARCH
value.
Randall Stewart [Mon, 27 Apr 2020 16:30:29 +0000 (16:30 +0000)]
This change does a small prepratory step in getting the
latest rack and bbr in from the NF repo. When those come
in the OOB data handling will be fixed where Skyzaller crashes.
Mark Johnston [Mon, 27 Apr 2020 16:12:32 +0000 (16:12 +0000)]
Document handling of connection-mode sockets by sendto(2).
sendto(2), sendmsg(2) and sendmmsg(2) return ENOTCONN if a destination
address is specified and the socket is not connected and the socket
protocol does not automatically connect ("implied connect"). Document
that. Also document the fact that the destination address is ignored
for connection-mode sockets if the socket is already connected.
Mark Johnston [Mon, 27 Apr 2020 15:59:19 +0000 (15:59 +0000)]
Fix handling of EV_EOF for named pipes.
Contrary to the kevent man page, EV_EOF on a fifo is not cleared by
EV_CLEAR. Modify the read and write filters to clear EV_EOF when the
fifo's PIPE_EOF flag is clear, and update the man page to document the
new behaviour.
Modify the write filter to return the amount of buffer space available
even if no readers are present. This matches the behaviour for sockets.
When reading from a pipe, only call pipeselwakeup() if some data was
actually read. This prevents the continuous re-triggering of a
EVFILT_READ event on EOF when in edge-triggered mode.
Mark Johnston [Mon, 27 Apr 2020 15:59:07 +0000 (15:59 +0000)]
Call pipeselwakeup() after toggling PIPE_EOF.
This ensures that pipe_poll() and the pipe kqueue filters observe
PIPE_EOF and set EV_EOF accordingly. As a result an extra call to
knote() after setting PIPE_EOF is unnecessary.
Xin LI [Mon, 27 Apr 2020 02:01:48 +0000 (02:01 +0000)]
Fix a bug with dirty file system handling.
r356313 broke handling of dirty file system because we have restricted
the correction of "odd" byte sequences to checkfat(), and as a result
the dirty bit is never cleared. The old fsck_msdosfs code would write
FAT twice to fix the dirty bit, which is also not ideal.
Fix this by introducing a new rountine, cleardirty() which will perform
the set of clean bit only, and use it in checkfilesys() if we thought
the file system was dirty.
Mark Johnston [Sun, 26 Apr 2020 20:08:57 +0000 (20:08 +0000)]
Use a single VM object for kernel stacks.
Previously we allocated a separate VM object for each kernel stack.
However, fully constructed kernel stacks are cached by UMA, so there is
no harm in using a single global object for all stacks. This reduces
memory consumption and makes it easier to define a memory allocation
policy for kernel stack pages, with the aim of reducing physical memory
fragmentation.
Add a global kstack_object, and use the stack KVA address to index into
the object like we do with kernel_object.
Reviewed by: kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24473
psm(4): Fix wrong key-release event occuring after trackpoint use.
Some models of laptops e.g. "X1 Carbon 3rd Gen Thinkpad" have LRM buttons
wired as so called "Synaptic touchpads extended buttons" rather thah real
trackpoint buttons. Handle this case with merging of events from both
sources.
Tentatively apply https://reviews.llvm.org/D78877 (by Dave Green):
[ARM] Only produce qadd8b under hasV6Ops
When compiling for a arm5te cpu from clang, the +dsp attribute is
set. This meant we could try and generate qadd8 instructions where we
would end up having no pattern. I've changed the condition here to be
hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem
to use for similar instructions.
Fixed PR45677.
This fixes "fatal error: error in backend: Cannot select: t37: i32 =
ARMISD::QADD8b t43, t44" when compiling sys/dev/sound/pcm/feeder_mixer.c
for armv5. For some reason we do not encounter this on head, but this
error popped up while building universes for stable/12.
Introduce new fib[46]_lookup_debugnet() functions serving as a
special interface for the crash-time operations. Underlying
implementation will try to return lookup result if
datastructures are not corrupted, avoding locking.
Convert debugnet to use fib4_lookup_debugnet() and switch it
to use nexthops instead of rtentries.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D24555
The pf_frag_mtx mutex protects the fragments queue. The fragments queue
is virtualised already (i.e. per-vnet) so it makes no sense to block
jail A from accessing its fragments queue while jail B is accessing its
own fragments queue.
We used to have an issue with recursive locking with
net.link.bridge.inherit_mac. This causes us to send an ARP request while
we hold the BRIDGE_LOCK, which used to cause us to acquire the
BRIDGE_LOCK again. We can't re-acquire it, so this caused a panic.
Now that we no longer need to acquire the BRIDGE_LOCK for
bridge_transmit() this should no longer panic. Test this.
PR: 216510
Reviewed by: emaste, philip
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24251
Run the bridge datapath under epoch, rather than under the
BRIDGE_LOCK().
We still take the BRIDGE_LOCK() whenever we insert or delete items in
the relevant lists, but we use epoch callbacks to free items so that
it's safe to iterate the lists without the BRIDGE_LOCK.
Tests on mercat5/6 shows this increases bridge throughput significantly,
from 3.7Mpps to 18.6Mpps.
Reviewed by: emaste, philip, melifaro
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24250
Alan Somers [Sun, 26 Apr 2020 15:51:46 +0000 (15:51 +0000)]
mac_bsdextended: ATFify the tests
The new tests have more complete setup and cleanup, are more granular, and
correctly annotate expected failures and skipped tests. A follow-up commit
will resolve a conflict with the fusefs tests (bug 244229).
Fix order of arguments in fib[46]_lookup calls in SCTP.
r360292 introduced the wrong order, resulting in returned
nhops not being referenced, despite the fact that references
were requested. That lead to random GPF after using SCTP sockets.
Special defined macro like IPV[46]_SCOPE_GLOBAL will be introduced
soon to reduce the chance of putting arguments in wrong order.
Eric van Gyzen [Sun, 26 Apr 2020 00:41:29 +0000 (00:41 +0000)]
Fix handling of NMIs from unknown sources (BMC, hypervisor)
Release kernels have no KDB backends enabled, so they discard an NMI
if it is not due to a hardware failure. This includes NMIs from
IPMI BMCs and hypervisors.
Furthermore, the interaction of panic_on_nmi, kdb_on_nmi, and
debugger_on_panic is confusing.
Respond to all NMIs according to panic_on_nmi and debugger_on_panic.
Remove kdb_on_nmi. Expand the meaning of panic_on_nmi by making
it a bitfield. There are currently two bits: one for NMIs due to
hardware failure, and one for all others. Leave room for more.
If panic_on_nmi and debugger_on_panic are both true, don't actually panic,
but directly enter the debugger, to allow someone to leave the debugger
and [hopefully] resume normal execution.
The latter needs the former, but with a multi-job build on a fast
machine, the race is sometimes lost. This leads to "ld: error: unable to
find library -lsbuf", when linking libgeom.so.
arm64: rockchip: rk805: Use a tailq for the attached regulator
Store the attached regulator in a tailq to later find them in ofw_map.
While here, do not attempt to attach a regulator without a name, a node
might exists but if it doesn't have a name the regulator is unused.
In r326576 ("use @@@ instead of @@ in __sym_default"), an earlier version of
the phabricator-discussed patch was inadvertently committed. The commit
message claims that @@@ means that weak is not needed, but that was due to a
misunderstanding of the use of weak symbols in this context by the submitted
in the first draft of the patch; the description text was not updated to
match the discussion. As discussed in phabricator, weak is needed for
symbol interposing because of the behavior of our rtld, and is widely used
elsewhere in libc.
This partial revert restores the approved version of the patch and permits
symbol interposing for openat.
Reported by: Raymond Ramsden <rramsden AT isilon.com>
Reviewed by: dim, emaste, kib (2017)
Discussed with: kib (2020)
Differential Revision: https://reviews.freebsd.org/D11653
Michal Meloun [Sat, 25 Apr 2020 09:17:49 +0000 (09:17 +0000)]
Reorder initialization steps for given pin.
If pin is switched from fixed function to GPIO, it should have prepared
direction, pull-up/down and default value before function gets switched.
Otherwise we may produce unwanted glitch on output pin.
Right order of drive strength settings is questionable, but I think that
is slightly safer to do it also before function switch.
This fixes serial port corruption observed after DT 5.6 import.
This change is build on top of nexthop objects introduced in r359823.
Nexthops are separate datastructures, containing all necessary information
to perform packet forwarding such as gateway interface and mtu. Nexthops
are shared among the routes, providing more pre-computed cache-efficient
data while requiring less memory. Splitting the LPM code and the attached
data solves multiple long-standing problems in the routing layer,
drastically reduces the coupling with outher parts of the stack and allows
to transparently introduce faster lookup algorithms.
Route caching was (re)introduced to minimise (slow) routing lookups, allowing
for notably better performance for large TCP senders. Caching works by
acquiring rtentry reference, which is protected by per-rtentry mutex.
If the routing table is changed (checked by comparing the rtable generation id)
or link goes down, cache record gets withdrawn.
Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
cached object, which is mostly mechanical. Other moving parts like cache
cleanup on rtable change remains the same.
Rick Macklem [Sat, 25 Apr 2020 02:18:59 +0000 (02:18 +0000)]
Remove Mac OS/X macros that did nothing for FreeBSD.
The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X
port. The first of these macros was a no-op for FreeBSD and the second
is no longer used.
This patch gets rid of them. It also deletes the "mbuf_t" typedef which
is no longer used in the FreeBSD code from nfskpiport.h