Justin Hibbits [Sat, 2 Mar 2019 01:51:41 +0000 (01:51 +0000)]
powerpc: Scale intrcnt by mp_ncpus
On very large powerpc64 systems (2x22x4 power9) it's very easy to run out of
available IRQs and crash the system at boot. Scale the count by mp_ncpus,
similar to x86, so this doesn't happen. Further work can be done in the future
to scale the I/O IRQs as well, but that's left for the future.
John Baldwin [Fri, 1 Mar 2019 20:43:48 +0000 (20:43 +0000)]
Fix missed posted interrupts in VT-x in bhyve.
When a vCPU is HLTed, interrupts with a priority below the processor
priority (PPR) should not resume the vCPU while interrupts at or above
the PPR should. With posted interrupts, bhyve maintains a bitmap of
pending interrupts in PIR descriptor along with a single 'pending'
bit. This bit is checked by a CPU running in guest mode at various
places to determine if it should be checked. In addition, another CPU
can force a CPU in guest mode to check for pending interrupts by
sending an IPI to a special IDT vector reserved for this purpose.
bhyve had a bug in that it would only notify a guest vCPU of an
interrupt (e.g. by sending the special IPI or by resuming it if it was
idle due to HLT) if an interrupt arrived that was higher priority than
PPR and no interrupts were currently pending. This assumed that if
the 'pending' bit was set, any needed notification was already in
progress. However, if the first interrupt sent to a HLTed vCPU was
lower priority than PPR and the second was higher than PPR, the first
interrupt would set 'pending' but not notify the vCPU, and the second
interrupt would not notify the vCPU because 'pending' was already set.
To fix this, track the priority of pending interrupts in a separate
per-vCPU bitmask and notify a vCPU anytime an interrupt arrives that
is above PPR and higher than any previously-received interrupt.
This was found and debugged in the bhyve port to SmartOS maintained by
Joyent. Relevant SmartOS bugs with more background:
Conrad Meyer [Fri, 1 Mar 2019 19:21:45 +0000 (19:21 +0000)]
Fortuna: push CTR-mode loop down into randomdev hash.h interface
As a step towards adding other potential streaming ciphers. As well as just
pushing the loop down into the rijndael APIs (basically 128-bit wide AES-ICM
mode) to eliminate some excess explicit_bzero().
Remove sv_pagesize, originally introduced with r100384.
In all of the architectures we have today, we always use PAGE_SIZE.
While in theory one could define different things, none of the
current architectures do, even the ones that have transitioned from
32-bit to 64-bit like i386 and arm. Some ancient mips binaries on
other systems used 8k instead of 4k, but we don't support running
those and likely never will due to their age and obscurity.
Reviewed by: imp (who also contributed the commit message)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19280
Bjoern A. Zeeb [Fri, 1 Mar 2019 14:33:20 +0000 (14:33 +0000)]
Add ushort and ulong to linux/types.h.
When porting code once written for Linux we find not only uints but also ushort and ulong.
Provide central typedefs as part of the linuxkpi for those as well.
Reviewed by: hselasky, emaste
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19405
Emmanuel Vadot [Fri, 1 Mar 2019 13:05:37 +0000 (13:05 +0000)]
arm64: rockchip: rk3399_pll: Fix the recalc function
The plls frequency are now correctly calculated in fractional mode
and integer mode.
While here add some debug printfs (disabled by default)
Tested with powerd on the little cluster on a RockPro64.
The send_packets() function was using ring->cur as index to scan
the transmit ring. This function may also set ring->cur ahead of
ring->head, in case no more slots are available. However, the function
also uses nm_ring_space() which looks at ring->head to check how many
slots are available. If ring->head and ring->cur are different, this
results in pkt-gen advancing ring->cur beyond ring->tail.
This patch fixes send_packets() (and similar source locations) to
use ring->head as a index, rather than using ring->cur.
Kristof Provost [Fri, 1 Mar 2019 07:37:45 +0000 (07:37 +0000)]
pf: IPv6 fragments with malformed extension headers could be erroneously passed by pf or cause a panic
We mistakenly used the extoff value from the last packet to patch the
next_header field. If a malicious host sends a chain of fragmented packets
where the first packet and the final packet have different lengths or number of
extension headers we'd patch the next_header at the wrong offset.
This can potentially lead to panics or rule bypasses.
Security: CVE-2019-5597
Obtained from: OpenBSD
Reported by: Corentin Bayet, Nicolas Collignon, Luca Moro at Synacktiv
Justin Hibbits [Fri, 1 Mar 2019 04:36:55 +0000 (04:36 +0000)]
powerpc/powernv: Add OPAL flash device driver
Firmware needed by petitboot, for example, GPU firmware, can be installed to
a partition in the flash filesystem. This driver exposes the full flash
given by the device tree, letting the user manage firmware, etc, from
FreeBSD.
To use the partitions provided by the flash module, the fdt_slicer module is
needed, but the module isn't needed for raw access, so there's no direct
dependency link in here.
Ian Lepore [Fri, 1 Mar 2019 04:17:43 +0000 (04:17 +0000)]
Add another required header file.
For some reason this seems to be required on aarch64, but I can build armv7
from clean without needing this in the list. (The file does get included,
so the mystery is why armv7 works.)
Justin Hibbits [Fri, 1 Mar 2019 02:49:47 +0000 (02:49 +0000)]
powerpc/powernv: Add asynchronous token management for powernv
The OPAL firmware only supports a finite number of in-flight asynchronous
operations. Rather than have each subsystem try to manage its own, use a
central management service to hand out tokens.
More work can be done to improve asynchronous behavior, such as funneling
things through a future OPAL heartbeat handler, but capabilities will be
added as needed.
Augment the existing consumers (i2c and sensors) to use this new API.
Kyle Evans [Fri, 1 Mar 2019 01:20:21 +0000 (01:20 +0000)]
patch(1): Exit successfully if we're fed a 0-length patch
This change is made in the name of GNU patch compatibility. If GNU patch is
fed a zero-length patch, it will exit successfully with no output. This is
used in at least one port to date (comms/wsjtx), and we break on this usage.
It seems unlikely that anyone relies on patch(1) calling their completely
empty patch garbage and failing, and GNU compatibility is a plus if it helps
with porting, so make the switch.
Marcin Wojtas [Fri, 1 Mar 2019 01:18:39 +0000 (01:18 +0000)]
Prevent detaching driver if the attach is not finished
When the device is in attaching state, detach should return
EBUSY instead of success. In other case, there could be race
between attach and detach during the driver unloading.
If driver goes sleep and releases GIANT lock during attaching,
unloading module could start. In such case when attach continues
after module unload, page fault "supervisor read instruction,
page not present" occurred.
This patch works around the real issue, which is a locking
deficiency of the busses.
Justin Hibbits [Thu, 28 Feb 2019 23:00:47 +0000 (23:00 +0000)]
GEOM: Add fdt_slicer to the GEOM flashmap module for fdt-based platforms
geom_flashmap depends on a slicer being available in order to do any
work. On fdt platforms this is provided by fdt_slicer, but this needs
to be available. Often it's compiled into the kernel for platforms that
boot from the relevant media, but this is not always the case. Add the
file to the geom_flashmap module so that it can be used on platforms
which don't always need this functionality available.
John Baldwin [Thu, 28 Feb 2019 22:10:19 +0000 (22:10 +0000)]
Don't assume all children of a nexus are ports.
Specifically, ccr(4) devices are also children of cxgbe nexus devices.
Rather than making assumptions about the child device's softc, walk
the list of ports from the nexus' softc to determine if a child is a
port in t4_child_location_str(). This fixes a panic when detaching a
ccr device.
Alexander Motin [Thu, 28 Feb 2019 21:07:16 +0000 (21:07 +0000)]
Limit 24xx adapters to only MSI interrupts by default.
This was actually the known good configuration we used before.
Single MSI-X configuration doesn't even work there on my tests, just due
to lack of documentation not sure whether by design or I am doing something
wrong.
Bryan Drewery [Thu, 28 Feb 2019 20:48:18 +0000 (20:48 +0000)]
bsd.nls.mk isn't optional.
It is protected by MK_NLS. If it should really be optional then
it needs to be documented as such in share/mk/bsd.README and
.sinclude used where needed.
Invalidate cache for the PDPTE page when using PAE paging but PAT is
not supported.
According to SDM rev. 69 vol. 3, for PDPTE registers loads:
- when PAT is not supported, access to the PDPTE page is performed as
UC, see 4.9.1;
- when PAT is supported, the access is WB, see 4.9.2.
So potentially CPU might load stale memory as PDPTEs if both PAT and
self-snoop are not implemented. To be safe, add total local cache
flush to pmap_cold() before initial load of cr3, and flush PDPTE page
in pmap_pinit(), if PAT is not implemented.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D19365
Enji Cooper [Thu, 28 Feb 2019 18:12:14 +0000 (18:12 +0000)]
Remove references to pdwait4(2) and `CAP_PDWAIT` from rights(4)
@cem removed references to pdwait4(2) (a nonexistent syscall) in
r320058.
This change removes references to pdwait4(2) and `CAP_PDWAIT` in
rights(4) to not mislead the user into thinking that pdwait4(2)/`CAP_PDWAIT` is
actually implemented in the stock FreeBSD kernel.
The goal of this functionality was to simplify monitoring/manipulating
processes started with `pdfork`, et al, and avoid races with waiting on pids.
The syscall was never completed though--just discussed on the capsicum mailing
list back in 2015:
https://lists.cam.ac.uk/pipermail/cl-capsicum-discuss/2015-May/msg00012.html
. That being said, there are members of the project (@rwatson, etc) who
have longterm goals to implement this syscall to better secure pdfork(2)
calls.
Thomas Munro [Thu, 28 Feb 2019 09:13:41 +0000 (09:13 +0000)]
truss: Add support for fsync(2) and fdatasync(2).
The default handling showed the argument as hex. Add explicit handling so
we can show it as decimal, since that's how we show file descriptors
everywhere else.
Eric Joyner [Wed, 27 Feb 2019 22:26:18 +0000 (22:26 +0000)]
ixgbe(4): Fix panic triggered by assertion from interrupt
r344162 exposed a bug in one of ixgbe's interrupt filters; they are never
supposed to return 0. Fix the interrupt filter to return the proper nonzero
return value.
Warner Losh [Wed, 27 Feb 2019 22:16:59 +0000 (22:16 +0000)]
Unconditionally support unmapped BIOs. This was another shim for
supporting older kernels. However, all supported versions of FreeBSD
have unmapped I/Os (as do several that have gone EOL), remove it. It's
unlikely the driver would work on the older kernels anyway at this
point.
Warner Losh [Wed, 27 Feb 2019 22:05:01 +0000 (22:05 +0000)]
Remove #ifdef code to support FreeBSD versions that haven't been
supported in years. A number of changes have been made to the driver
that likely wouldn't work on those older versions that aren't properly
ifdef'd and it's project policy to GC such code once it is stale.
Alexander Motin [Wed, 27 Feb 2019 21:29:21 +0000 (21:29 +0000)]
Refactor command ordering/blocking mechanism in CTL.
Replace long per-LUN queue of blocked commands, scanned on each command
completion and sometimes even twice, causing up to O(n^^2) processing cost,
by much shorter per-command blocked queues, scanned only when respective
command completes, and check only commands before the previous blocker,
reducing cost to O(n).
While there, unblock aborted commands to make them "complete" ASAP to be
removed from the OOA queue and so not waste time ordering other commands
against them. Aborted commands that were not sent to execution yet should
have no visible side effects, so this is safe and easy optimization now,
comparing to commands already in processing, which are a still pain.
Together those two optimizations should fix quite pathological case, when
due to backend slowness CTL accumulated many thousands of blocked requests,
partially aborted by initiator and so supposedly not even existing, but
still wasting CTL CPU time.
Emmanuel Vadot [Wed, 27 Feb 2019 21:04:40 +0000 (21:04 +0000)]
xhci_mv: Move the driver to generic_xhci
Marvell XHCI is in fact generic-xhci, so move the driver and
add the compatible string.
While here, get and enable the phy if the dtb provide one.
The xhci bindings state that phys should be in a 'phys' property but
Marvell DTS uses 'usb-phy', only add support for 'usb-phy' for now.
John Baldwin [Wed, 27 Feb 2019 20:24:23 +0000 (20:24 +0000)]
Various cleanups to the management of multiple TCP stacks.
- Use strlcpy() with sizeof() instead of strncpy().
- Simplify initialization of TCP functions structures.
init_tcp_functions() was already called before the first call to
register a stack. Just inline the work in the SYSINIT and remove
the racy helper variable. Instead, KASSERT that the rw lock is
initialized when registering a stack.
- Protect the default stack via a direct pointer comparison.
The default stack uses the name "freebsd" instead of "default" so
this protection wasn't working for the default stack anyway.
Emmanuel Vadot [Wed, 27 Feb 2019 17:30:28 +0000 (17:30 +0000)]
mmc: dwmmc: Match on "rockchip,rk3288-dw-mshc" compatible
This is the common denominator for rockchip compatible from RK3288 to RK3399.
The other compatible are generally present in the DTS but the controllers
are the same.
Emmanuel Vadot [Wed, 27 Feb 2019 14:20:28 +0000 (14:20 +0000)]
arm64: rockchip: clk_pll: Multiple improvement
Remove the mode_val from the clock definition as it's a bit unreadable.
Use mode_shift to represent which bit control the mode in the register.
Simplify some case where we can avoid a register read before changing it.
Set the PLL back to normal mode after the PLL have stabilized.
Import a fix from illumos (thanks Toomas Soomas for pointing at it)
See https://www.illumos.org/issues/10205 for more details
Illumos commit: https://github.com/illumos/illumos-gate/commit/247b7da039fd88350c50e3d7fef15bdab6bef215
Leandro Lupori [Wed, 27 Feb 2019 13:24:42 +0000 (13:24 +0000)]
Fix kldxref on PowerPC64
When using kldxref on kernel modules built with clang8 + lld8,
kldxref would be unable to find the modules metadata information,
because PowerPC64 was using the ef_nop.c implementation of
ef_reloc().
When GNU LD was used, it was also relocating the metadata section of
the .ko file. LLD does not do this, but only generate dynamic
relocations for it. With minor changes, ef_powerpc.c can now work
for PowerPC64 too.
Ian Lepore [Wed, 27 Feb 2019 04:58:18 +0000 (04:58 +0000)]
Add support to fdt_slicer for the new style partition data documented in
devicetree/bindings/mtd/partition.txt.
In the old style, all the children of the device node which did not have a
compatible property were the partitions. In the new style, there is a child
node of the device which has a compatible string of "fixed-partitions", and
its children are the individual partitions.
Also, support the read-only property by setting the corresponding slice flag.
Ian Lepore [Wed, 27 Feb 2019 04:16:32 +0000 (04:16 +0000)]
Rename some functions and variables to have shorter names, which allows
unwrapping multiple lines of code. Also, convert some short multiline
comments into single-line comments. Change old-school FALSE to false.
All in all, no functional changes, it's just more compact and readable.
Justin Hibbits [Wed, 27 Feb 2019 03:30:49 +0000 (03:30 +0000)]
powerpc/mpc85xx: Synchronize timebase the platform correct way
Summary:
To safely synchronize timebase we need to disable the timebase on all
cores, set timebase, and resynchronize. This adds two new devices, mutually
exclusive, which attach on the SoC simplebus, to freeze and unfreeze the
timebase. The devices are singletons, and platform-specific, so no reason
to make them optional and in separate files.
This was found to be necessary for top(1) to work correctly on an AmigaOne
X5000 (P5020 SoC). It also fixes bufdaemon and bufspacedaemon hangs at
shutdown.
Test Plan: Regression test on various Book-E hardware.
Embedded lzma decompression library becomes a module usable by other
consumers, in addition to geom_uzip.
Most important code changes are
- removal of XZ_DEC_SINGLE define, we need the code to work
with XZ_DEC_DYNALLOC;
- xz_crc32_init() call is removed from geom_uzip, xz module handles
initialization on its own.
xz is no longer embedded into geom_uzip, instead the depend line for
the module is provided, and corresponding kernel option is added to
each MIPS kernel config file using geom_uzip.
The commit also carries unrelated cleanup by removing excess "device geom_uzip"
in places which were missed in r344479.
Mark Johnston [Tue, 26 Feb 2019 16:34:43 +0000 (16:34 +0000)]
Remove illumos-specific code from the x86 fasttrap_isa.c.
The file has not been touched upstream in over a decade, and the nature
of the code means that a lot of FreeBSD-specific bits are required. Remove
the dead code to improve readability. No functional change intended.
Discussed with: cem
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 26 Feb 2019 16:31:47 +0000 (16:31 +0000)]
Remove stub fasttrap implementations.
No platforms except i386, amd64 and powerpc implement fasttrap; the
fasttrap files for other arches do not contain any code and bloat
the output from cscope, so just remove them.
Emmanuel Vadot [Tue, 26 Feb 2019 13:18:14 +0000 (13:18 +0000)]
arm64: rockchip: rk805: Map the regulator
No map function was provided before so every regulator lookup resolved
the regulator with id 1, as it uses the default mapper, which is wrong.
Correctly map the regulators.
While here remove some debug printfs and make them disable by default.
Emmanuel Vadot [Tue, 26 Feb 2019 13:16:05 +0000 (13:16 +0000)]
arm64: rockchip: rk3328_pll: Multiple improvement
RockChip clocks register have a write mask in the upper 16 bits, if a 1
is present the corresponding bit in the lower 16 ones is set.
Use this instead of always setting the mask to 0xFFFF0000.
This avoids a read of the register.
While here, when switching PLL frequency, first switch it to slow mode.
When set to slow mode the PLL clock will be the external oscillator.
Changing the PLL parameters while its output is used can cause hang (sometimes).
Emmanuel Vadot [Tue, 26 Feb 2019 13:15:31 +0000 (13:15 +0000)]
arm64: rockchip: clk: ARM CLK improvement
RockChip clocks register have a write mask in the upper 16 bits, if a 1
is present the corresponding bit in the lower 16 ones is set.
Use this instead of always setting the mask to 0xFFFF0000.
This avoids a read of the register.
While here set the parent after changing its freqeuncy, this reduce the time
between changing the parent and changing the divider for the arm clock.
Emmanuel Vadot [Tue, 26 Feb 2019 13:14:49 +0000 (13:14 +0000)]
arm64: rockchip: clk: rk_clk_composite: Properly use the mask bits
RockChip clocks register have a write mask in the upper 16 bits, if a 1
is present the corresponding bit in the lower 16 ones is set.
Use this instead of always setting the mask to 0xFFFF0000.
This avoids a read of the register.
While here add some debug printf useful for debuging clock problems
i386 PAE: avoid atomic for pte_store() where possible.
Instead carefully write upper word, and only than the lower word with
PG_V, for previously invalid ptes. It provides some measurable system
time saving on buildworld.
Reviewed by: markj
Tested by: pho
Measured by: bde (early version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D19226
Bruce Evans [Tue, 26 Feb 2019 09:44:10 +0000 (09:44 +0000)]
Attempt to fix build breakage in r344458.
Non-x86 arches use an inconsistently named header for the file containing
"pc" attributes, and the ifdef messes to include the right header were out
of date in the 2 files that I added to the MI files list.
Only amd64, arm, i386, mips, powerpc and sparc64 are supposed to support
syscons. Only arm and mips were out of date in the ifdef. Test
coverage for of syscons in arm is broken (turned off) in NOTES, but
syscons is in some other arm config files which universe detects as broken.
arm64 and riscv remain broken due to the opposite bug of not turning off
sc in NOTES, the same as before r344458 (see r344443).
The header is MD to contain possibly-non-"pc" encodings of attributes, but
since the attributes are essentially virtual in graphics mode and non-x86
arches only support graphics mode, the header has always been the same on
all arches except for different style bugs, so there should be only 1 MI
copy of it for syscons' use. It was used in pcvt and still gives an an
API and an ABI, so it should be public and MI near or in sys/consio.h.