John Baldwin [Thu, 1 Oct 2020 16:45:11 +0000 (16:45 +0000)]
Clear the upper 32-bits of registers in x86_emulate_cpuid().
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared. While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.
This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.
Enji Cooper [Thu, 1 Oct 2020 16:37:49 +0000 (16:37 +0000)]
Eliminate duplicate `afterinstallconfigs` target
Define separate dependent targets which `afterinstallconfigs` relies on, in
order to modify `${DESTDIR}/etc/master.passwd` and
`${DESTDIR}/etc/nsswitch.conf`.
Mark these targets .PHONY, since they manipulate configurations on the fly and
the generation logic isn't 100% defined in terms of the source files/logic,
and is variable, based on MK_foo flags.
Remove nfsstat. Running nfsstat in crashinfo will give the stats of the
running kernel instead of the stats of the crashed kernel. The current
version uses sysctls to query the stats and does not work at all (anymore)
on crash dumps.
This version incorporates many fixes in particular a fix for vi -w
Another approach was proposed to merge those fixes (see review), I find
it easier to track changes if we keep importing snapshot on regular
basis
Kyle Evans [Thu, 1 Oct 2020 01:10:51 +0000 (01:10 +0000)]
Do a sweep and remove most WARNS=6 settings
Repeating the default WARNS here makes it slightly more difficult to
experiment with default WARNS changes, e.g. if we did something absolutely
bananas and introduced a WARNS=7 and wanted to try lifting the default to
that.
Drop most of them; there is one in the blake2 kernel module, but I suspect
it should be dropped -- the default WARNS in the rest of the build doesn't
currently apply to kernel modules, and I haven't put too much thought into
whether it makes sense to make it so.
Rick Macklem [Thu, 1 Oct 2020 00:47:35 +0000 (00:47 +0000)]
Modify the NFSv4.2 VOP_COPY_FILE_RANGE() client call to return after one
successful RPC.
Without this patch, the NFSv4.2 VOP_COPY_FILE_RANGE() client call would
loop until the copy "len" was completed. The problem with doing this is
that it might take a considerable time to complete for a large "len".
By returning after a single successful Copy RPC that copied some of the
data, the application that did the copy_file_range(2) syscall will be
more responsive to signal delivery for large "len" copies.
Rick Macklem [Thu, 1 Oct 2020 00:33:44 +0000 (00:33 +0000)]
Clip the "len" argument to vn_generic_copy_file_range() at a
hole size boundary.
By clipping the len argument of vn_generic_copy_file_range() to end at
an exact multiple of hole size, holes are more likely to be maintained
during the copy.
A hole can still straddle the boundary at the end of the
copy range, resulting in a block being allocated in the
output file as it is being grown in size, but this will reduce the
likelyhood of this happening.
While here, also modify setting of blksize to better handle the
case where _PC_MIN_HOLE_SIZE is returned as 1.
John Baldwin [Wed, 30 Sep 2020 17:49:06 +0000 (17:49 +0000)]
Avoid a dubious assignment to bio_data in aio_qbio().
A user pointer is not a suitable value for bio_data and the next block
of code always overwrites bio_data anyway. Just use cb->aio_buf
directly in the call to vm_fault_quick_hold_pages().
gdb(4): Don't escape GDB special characters at application layer
In r351368, we introduced this XML- and GDB-encoded data. The protocol
'offset' should reflex the logical XML data offset, but unfortunately we
counted the GDB escapes as well.
In fact, we cannot safely do GDB character escaping at this layer at
all, because we don't know what will be flushed in a packet. It is
bogus to send only the first character of a two-character escape
sequence.
This patch "corrects" the problem by squashing these characters in the
transmitted XML document. It would be nice to transmit the characters
faithfully, but that is a more complicated change. Thread names are a
nice convenience feature for the GDB client, but one can always inspect
td_name or p_comm directly to find the true name.
Reported by: Ka Ho Ng <khng300 AT gmail.com>
Tested by: Ka Ho Ng
Reviewed by: emaste, markj, rlibby
Differential Revision: https://reviews.freebsd.org/D26599
Load/store/fetch access exceptions always indicate a violation of a PMP
rule. We can't treat those as page faults, because updating the page
table and trying again will only result in exactly the same access
exception recurring. This leaves us in an endless exception loop.
We cannot recover from these exceptions, so panic instead.
Every other architecture defines this and this is required for
interrupts to work when using QEMU's PCI VirtIO devices (which all
report an interrupt line of 0) for two reasons.
Firstly, interrupt line 0 is wrong; they use one of 0x20-0x23 with the
lines being cycled across devices like normal. Moreover, RISC-V uses
INTRNG, whose IRQs are virtual as indices into its irq_map, so even if
we have the right interrupt line we still need to try and route the
interrupt in order to ultimately call into intr_map_irq and get back a
unique index into the map for the given line, otherwise we will use
whatever happens to be in irq_map[line] (which for QEMU where the line
is initialised to 0 results in using the first allocated interrupt,
namely the RTC on IRQ 11 at time of commit).
Note that pci_assign_interrupt will still do the wrong thing for INTRNG
when using a tunable, as it will bypass INTRNG entirely and use the
tunable's value as the index into irq_map, when it should instead
(indirectly) call intr_map_irq to allocate a new entry for the given
IRQ and treat the tunable as stating the physical line in use, which is
what one would expect. This, however, is a problem shared by all INTRNG
architectures, and not exclusive to RISC-V.
Rick Macklem [Wed, 30 Sep 2020 02:18:09 +0000 (02:18 +0000)]
Make copy_file_range(2) Linux compatible for overflow of offset + len.
Without this patch, if a call to copy_file_range(2) specifies an input file
offset + len that would wrap around, EINVAL is returned.
I thought that was the Linux behaviour, but recent testing showed that
Linux accepts this case and does the copy_file_range() to EOF.
This patch changes the FreeBSD code to exhibit the same behaviour as
Linux for this case.
This appears to be a typo. The AdvSIMD field encodes support for
half-precision floating point SIMD instructions, which corresponds to
HWCAP_ASIMDHP, not HWCAP_ASIMDDP.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
John Baldwin [Tue, 29 Sep 2020 21:51:32 +0000 (21:51 +0000)]
Fallback to software for more GCM and CCM requests.
ccr(4) uses software to handle GCM and CCM requests not supported by
the crypto engine (e.g. with only AAD and no payload). This change
adds a fallback for a few more requests such as those with more SGL
entries than can fit in a work request (this can happen for GCM when
decrypting a TLS record split across 15 or more packets).
Rather than placing the epoch around the entire receive loop which
might call into rtwn_rx_frame() and USB and sleep, split the loop
into two[1] and leave us with one unlock/lock cycle as well.
PR: 249925
Reported by: thj, (rkoberman gmail.com)
Tested by: thj
Suggested by: adrian [1]
Reviewed by: adrian
MFC after: 3 days
Sponsored by: The FreeBSD Foundation (initially, paniced my iwl lab host)
Differential Revision: https://reviews.freebsd.org/D26554
Add EXAMPLES section to the man page showing the use of all flags except for
-S.
While here, clarify -f description. It not only suppresses diagnostic messages
but it also affects the exit status of the command itself. This is shown in two
of the examples.
Warner Losh [Tue, 29 Sep 2020 16:29:50 +0000 (16:29 +0000)]
Implement some time variables from kernel
OpenZFS will start using some of the kernel timekeeping bits
shortly. This implements the bare minimum of that which currently
is just the time_seconds variable.
Ruslan Bukin [Tue, 29 Sep 2020 15:10:56 +0000 (15:10 +0000)]
o Rename acpi_iommu_get_dma_tag() -> iommu_get_dma_tag().
This function isn't ACPI dependent and we may use it on FDT systems
as well.
o Don't repeat the function declaration, include iommu.h instead.
LLVM 11 changed the meaning of '-O' from '-O2' to '-O1', which resulted
in debug kernels (with 'makeoptions DEBUG=-g') being built with inlining
disabled, causing severe performance hit.
The -O2 was already being used for building amd64, powerpc, and powerpcspe.
Michael Tuexen [Tue, 29 Sep 2020 09:36:06 +0000 (09:36 +0000)]
Improve the input validation and processing of cookies.
This avoids setting the association in an inconsistent
state, which could result in a use-after-free situation.
This can be triggered by a malicious peer, if the peer
can modify the cookie without the local endpoint recognizing
it.
Thanks to Ned Williamson for reporting the issue.
cxgbe(4): Avoid unnecessary work in the firmware during netmap tx.
Bind the netmap tx queues to a special '0xff' scheduling class which
makes the firmware skip some processing related to rate limiting on the
outgoing traffic. Future firmwares will do this automatically.
cxgbe(4): fixes for netmap operation with only some queues active.
- Only active netmap receive queues should be in the RSS lookup table.
- The RSS table should be restored for NIC operation when the last
active netmap queue is switched off, not the first one.
- Support repeated netmap ON/OFF on a subset of the queues. This works
whether the the queues being enabled and disabled are the only ones
active or not. Some kring indexes have to be reset in the driver for
the second case.
Ed Maste [Mon, 28 Sep 2020 16:54:39 +0000 (16:54 +0000)]
add SIOCGIFDATA ioctl
For interfaces that do not support SIOCGIFMEDIA (for which there are
quite a few) the only fallback is to query the interface for
if_data->ifi_link_state. While it's possible to get at if_data for an
interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms.
SIOCGIFDATA is a simple ioctl to retrieve this fast with very little
resource use in comparison. This implementation mirrors that of other
similar ioctls in FreeBSD.
Warner Losh [Mon, 28 Sep 2020 16:19:29 +0000 (16:19 +0000)]
For mulitcons boot, report it and which console is primary
Until we can do proper /etc/rc output on both consoles in multicons
boot (or all of them if we ever generalize), report when we are
booting multicons. Also report the primary console. This will be a big
hint why output stops after this line (though some slow USB discovery
still happens after mountroot / init starts).
Warner Losh [Mon, 28 Sep 2020 16:19:21 +0000 (16:19 +0000)]
Report the kernel console on the boot screen
Report what console the boot loader is telling the kernel to use:
o Dual (Serial Primary)
o Dual (Video Primary)
o Serial
o Video
This allows the user to interrupt the boot and tweak the cosnole, if
needed, in a trivial way. Useful for installs where the default
selected may not be quite what you want, or when you are running a
dual setup and need to toggle over to the other console being primary.
The 'c'/'C' keys will do the cycling through the consoles. Note:
you'll still have to drop into the loader to set details about serial
consoles. And this doesn't change the console the loader is using.
Reviewed by: kevans@
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D26573
Michal Meloun [Mon, 28 Sep 2020 09:16:27 +0000 (09:16 +0000)]
Fix booting arm64 EFI with LINUX_BOOT_ABI enabled.
Use address of the pointer passed to kernel to determine whether the pointer
is a FDT block (physical address) or a module pointer (virtual kernel address).
This fragment was supposed to be committed before r366196, but I accidentally
skipped it in a patch series.
Warner Losh [Mon, 28 Sep 2020 06:00:56 +0000 (06:00 +0000)]
Speciy the dev in an easily changed variable
Rather than hard coding ada0 everywhere, use ${dev}. Also, set
dev=vtbd0 since both qemu and bhyve support this. More work
should be done to use labels instead for fstab.
qemu scripts likely need adjustment. And we should also
likely generate byhve scripts too.
Warner Losh [Mon, 28 Sep 2020 06:00:39 +0000 (06:00 +0000)]
Fix video on PCI heuristic
The video on PCI heuristic was broken. It was supposed to infer a
video device when the last element of the path was a PCI DEVICE PATH
node. However, the last node in the device path is an END node, so
this heuristic never fired.
This leads, among other things, to bhyve only producing output in the
serial connection once we leave the boot loader. This restores the
dual headed boot on bhyve + UEFI (as we did in 11.2), but will favor
serial in the absence of other config which may be a change from 11.2.
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D26572
In order to prioritize iSCSI traffic across a network,
DSCP can be used. In order not to rely on "ipfw setdscp"
or in-network reclassification, this adds the dscp value
directly to the portal group (where TCP sessions are accepted).
The incoming iSCSI session is first handled by ctld for any
CHAP authentication and the socket is then handed off to the
in-kernel iscsi driver without modification of the socket
parameters. Simply setting up the socket in ctld is sufficient
to keep sending outgoing iSCSI related traffic with the
configured DSCP value.
Michael Tuexen [Sun, 27 Sep 2020 13:24:01 +0000 (13:24 +0000)]
Improve the handling of receiving unordered and unreliable user
messages using DATA chunks. Don't use fsn_included when not being
sure that it is set to an appropriate value. If the default is
used, which is -1, this can result in SCTP associaitons not
making any user visible progress.
Thanks to Yutaka Takeda for reporting this issue for the the
userland stack in https://github.com/pion/sctp/issues/138.
Michal Meloun [Sun, 27 Sep 2020 11:37:17 +0000 (11:37 +0000)]
Don't send a signal with uninitialized 'sig' and 'code' fields.
We have a few shortcuts in the arm trap code to speed up obvious "must fail"
cases. In these situations, make sure that we fill in the "sig" and "code"
fields of the generated signal.
Michal Meloun [Sun, 27 Sep 2020 10:15:03 +0000 (10:15 +0000)]
Add LINUX_BOOT_ABI back to arm64 GENERIC kernel.
It was removed in r355289 but forgot to return it back when new u-boot booti
support was committed. Although booti is not the preferred method of
booting the kernel, it is very useful for the initial phase of porting
FreeBSD to a new platform or booting the kernel on various embedded boards
in an industrial environment.
Michal Meloun [Sun, 27 Sep 2020 09:27:39 +0000 (09:27 +0000)]
Reapply r366193 with proper commit log.
Don't map same physical memory multiple times with different cache attributes.
This is explicitly stated as architectural undefined behavior, leading to
coherency issues sooner or later.
Michal Meloun [Sun, 27 Sep 2020 09:14:16 +0000 (09:14 +0000)]
Don't map same physical memory multiple times with different cache attributes.
This is explicitly stated as architectural undefined behavior, leadint to
coherencz issues sonner or later.
Rick Macklem [Sat, 26 Sep 2020 23:05:38 +0000 (23:05 +0000)]
Bjorn reported a problem where the Linux NFSv4.1 client is
using an open_to_lock_owner4 when that lock_owner4 has already
been created by a previous open_to_lock_owner4. This caused the NFS server
to reply NFSERR_INVAL.
For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes
that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify
what error to return).
For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661,
but the NFSv4.1 server can handle this case without error.
This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple
uses of the same lock_owner in open_to_lock_owner so that it now correctly
interoperates with the Linux NFS client.
It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID.
Thanks go to Bjorn for diagnosing this and testing the patch.
He also provided a program that I could use to reproduce the problem.
Check for the only 32-bit MIPS ABIs we support, rather than !n64
There may be additional 64-bit ABIs supported, so use a positive check rather
than a negative check.
Suggested by: imp
MFC after: 1 week
Sponsored by: Juniper Networks, Inc
John Baldwin [Fri, 25 Sep 2020 21:19:56 +0000 (21:19 +0000)]
Revert most of r360179.
I had failed to notice that sgsendccb() was using cam_periph_mapmem()
and thus was not passing down user pointers directly to drivers. In
practice this broke requests submitted from userland.
Warner Losh [Fri, 25 Sep 2020 20:51:07 +0000 (20:51 +0000)]
Adjustments to includes for openzfs in _STANDALONE
Allow the necessary parts of systm.h to be visible in the _STANDALONE
environnment. Limit the reset to only being visible for _KERNEL
builds. Map KASSERT, etc to printf on failure in the bootloader until
we have more confidence things won't break and leave systems
unbootable. Eventually, this should map to a full panic in the
bootloader, but that also needs some enhancement to be more useful.
Original patch was against FreeBSD 12, and a test compile wasn't run against
head. md_tls_tcb_offset field was moved from mdthread to mdproc in the
meantime.
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
Warner Losh [Fri, 25 Sep 2020 19:02:49 +0000 (19:02 +0000)]
Dont let kernel and standalone both be defined at the same time
_KERNEL and _STANDALONE are different things. They cannot both be true
at the same time. If things that are normally visible only to _KERNEL
are needed for the _STANDALONE environment, you need to also make them
visible to _STANDALONE. Often times, this will be just a subset of the
required things for _KERNEL (eg global variables are but one example).
sys/cdefs.h is included by pretty much everything in both the loader
and the kernel, so is the ideal choke point.
Mark Johnston [Fri, 25 Sep 2020 18:55:50 +0000 (18:55 +0000)]
ng_l2tp: Fix callout synchronization in the rexmit timeout handler
A received control packet may cause the transmit queue to be flushed, in
which case ng_l2tp_seq_recv_nr() cancels the transmit timeout handler.
The handler checks to see if it was cancelled before doing anything, but
did so before acquiring the node lock, so a small race window could
cause ng_l2tp_seq_rack_timeout() to attempt to flush an empty queue,
ultimately causing a null pointer dereference.
Summary:
Two bugs:
* Elf32_Auxinfo is broken, using pointers in the union, which are 64-bits not
32.
* freebsd32_sysarch() doesn't update the 'user local' register when handling
MIPS_SET_TLS, leading to a NULL pointer dereference in the 32-bit
application.
Michal Meloun [Fri, 25 Sep 2020 16:44:01 +0000 (16:44 +0000)]
Refine locking inside of syscon driver.
In some cases, the syscon driver may be used by consumer requiring better
control about locking (ie. it may be used as registe file provider for clock
driver which needs locked access to multiple registers).
Add fine locking protocol methods together with bunch of helper functions
in syscon driver and implement this functionality in syscon_generic driver.
Michal Meloun [Fri, 25 Sep 2020 13:52:31 +0000 (13:52 +0000)]
Correctly handle nodes compatible with "syscon", "simple-bus".
Syscon can also have child nodes that share a registration file with it.
To do this correctly, follow these steps:
- subclass syscon from simplebus and expose it if the node is also
"simple-bus" compatible.
- block simplebus probe for this compatible string, so it's priority
(bus pass) doesn't colide with syscon driver.
While I'm in, also block "syscon", "simple-mfd" for the same reason.