ps(1): Fix formatting of the "command" field for kernel threads.
When -H is specified, for kernel threads the command is formatted as
"<proc name>/<td name>" and truncated to MAXCOMLEN. But each of the
proc name and td name may be up to MAXCOMLEN bytes in length.
Also handle the ki_moretdname field to ensure that the full thread name
gets printed. This is already handled correctly when formatting for
"-o tdname".
andrew [Tue, 28 Jul 2020 11:32:45 +0000 (11:32 +0000)]
Add a workaround for a bug when setting the Raspberry GIO config and state
The Raspberry Pi GPIO config and state messages incorrectly return with
the tag length set to 0. We then check this value to have the response
flag set. Work around this by setting the response flag when setting the
GPIO config or state and this value is zero.
PowerPC support was fixed in r357596 by changing PCI bustag to BE as
part of the solution, but this caused regression on mips. This change
implements byte swapping of virtio PCI config area in the driver,
leaving lower layer untouched.
Submittnd by: Fernando Valle <fernando.valle@eldorado.org.br>
Reported by: arichardson
Reviewed by: alfredo, arichardson
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25416
andrew [Tue, 28 Jul 2020 10:45:29 +0000 (10:45 +0000)]
Switch the bcm2835 cpufreq driver to use the firmware interface
Use the new Raspberry Pi firmware driver in the cpufreq driver. It is
intended all drivers that need to interact with the firmware will move to
use the firmware driver, this is the first.
Reviewed by: manu
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25609
Provide missing rules for ena_datapath.c and ena_netmap.c,
which prevented the ENA driver from building.
This issue was showing up only when building the driver statically
into the kernel.
Use the existing PMC_CPUID_LEN to size pmc_cpuid in the kernel and various
buffers for reading it in libpmc. This avoids some extra syscalls and
malloc/frees.
While in here, use strlcpy to copy a user-provided cpuid string instead of
memcpy, to make sure we terminate the buffer.
ssh: Remove AES-CBC ciphers from default server and client lists
A base system OpenSSH update in 2016 or so removed a number of ciphers
from the default lists offered by the server/client, due to known
weaknesses. This caused POLA issues for some users and prompted
PR207679; the ciphers were restored to the default lists in r296634.
When upstream removed these ciphers from the default server list, they
moved them to the client-only default list. They were subsequently
removed from the client default, in OpenSSH 7.9p1.
The change has persisted long enough. Remove these extra ciphers from
both the server and client default lists, in advance of FreeBSD 13.
Reviewed by: markm, rgrimes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25833
Add initial driver for ACPI Platform Error Interfaces.
APEI allows platform to report different kinds of errors to OS in several
ways. We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI. Without respective
driver it ended up in kernel panic without any additional information.
This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling. It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise. Errors
marked as fatal still end up in kernel panic, but some more informative.
When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code. Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
Restrict definition of CTL_P1003_1B_MAXID to the kernel
This constant is only used to size an array within the kernel. There are
probably no legitimate uses in userland. Worse, since the kernel's array
could theoretically change size over time, any use of that symbol in
userland wouldn't be forwards compatible to new kernel versions.
Reviewed by: jhb
MFC after: Never
Differential Revision: https://reviews.freebsd.org/D25816
Don't include T_USER in si_trapno reported to userland.
Signals are only reported for user traps, so T_USER is redundant. It
is also a software convention and not included in the value reported
by the hardware.
vm_page_free_invalid(): Relax the xbusy assertion.
vm_page_assert_xbusied() asserts that the busying thread is the current
thread. For some uses of vm_page_free_invalid() (e.g., error handling
in vnode_pager_generic_getpages_done()), this condition might not hold.
Reported by: Jenkins via trasz
Reviewed by: chs, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25828
makesyscalls.sh: spit out a deprecation notice to stderr
This has for a while been replaced by makesyscalls.lua in the stock FreeBSD
build. Ensure downstreams get some notice that it'a going away if they're
reliant on it, maybe.
Fix the NFSv4 client so that it checks for support of TimeCreate before
trying to set it.
r362490 added support for setting of the TimeCreate (va_birthtime) attribute,
but it does so without checking to see if the server supports the attribute.
This could result in NFSERR_ATTRNOTSUPP error replies to the Setattr operation.
This patch adds code to check that the server supports TimeCreate before
attempting to do a Setattr of it to avoid these error returns.
r362490 marked that the NFSv4 attribute TimeCreate (va_birthtime) is supported,
but it did not change the NFS server code to actually do it.
As such, errors could occur when unrolling a tarball onto an NFSv4 mounted
volume, since setting TimeCreate would fail with a NFSERR_ATTRNOTSUPP reply.
This patch fixes the server so that it does TimeCreate and also makes
sure that TimeCreate will not be set for a DS file for a pNFS server.
A separate commit will add a check to the NFSv4 client for support of
the TimeCreate attribute before attempting to set it, to avoid a problem
when mounting a server that does not support the attribute.
The failures will still occur for r362490 or later kernels that do not
have this patch, since they indicate support for the attribute, but do not
actually support the attribute.
The caller from kernel is expected to provide an valid g_class
pointer, instead of traversing the global g_class list, just
use that pointer directly instead.
ian [Sun, 26 Jul 2020 18:33:29 +0000 (18:33 +0000)]
Describe the value in the 're' column of vmstat(8) in terms of freebsd's vm
implementation. The old description was left over from the 4.4 BSD Lite
import in 1994, and was a bit misleading (not all arches use simulated
reference bits, some implement reference tracking in hardware).
riscv: Include syscon_power device driver in GENERIC kernel config
QEMU's RISC-V virt machine provides syscon-power and syscon-reset
devices as the means by which to shutdown and reboot. We also need to
ensure that we have attached the syscon_generic device before attaching
any syscon_power devices, and so we introduce a new riscv_syscon device
akin to aw_syscon added in r327936. Currently the SiFive test finisher
is used as the specific implementation of such a syscon device.
This device driver supports both syscon-power and syscon-reset devices,
as specified in [1] and [2]. These provide a very simple interface for
power and reset control, and among other things are used by QEMU's virt
machine on RISC-V. A separate commit will enable this on RISC-V, as that
requires adding a RISC-V-specific riscv_syscon akin to r327936's
aw_syscon.
loader: Avoid -Wpointer-to-int cast warnings for Arm and RISC-V
On RISC-V, Clang warns with:
cast to smaller integer type 'unsigned int' from 'void (*)(void *)'
Instead, use %p as the standard format specifier for printing pointers.
Whilst Arm's pointer size is the same as unsigned, it's still cleaner to
use the right thing there too.
This device was originally used as part of the goldfish virtual hardware
platform used for emulating Android on QEMU, but is now also used as the
RTC for the RISC-V virt machine in QEMU. It provides a simple 64-bit
nanosecond timer exposed via a pair of memory-mapped 32-bit registers,
although only with 1s granularity.
ian [Sun, 26 Jul 2020 17:50:39 +0000 (17:50 +0000)]
Remove commented-out lines describing the old never-implemented -t option.
In 2018, r338094 removed the commented-out code for supporting the -t
command line option which had been present since the BSD 4.4 Lite import,
but was never implemented for freebsd.
This patch uses a slightly different algorithm for nfsrv_adj()
since ext_pgs mbuf lists are not permitted to have m_len == 0 mbufs.
As such, the code now frees mbufs after the adjustment in the list instead
of setting their m_len field to 0.
Since mbuf(s) may be trimmed off the tail of the list, the function now
returns a pointer to the last mbuf in the list. This saves the caller
from needing to use m_last() to find the last mbuf.
It also implies that it might return a nul list, which required a check for
that in nfsrvd_readlink().
This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.
Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated
to handle TLS.
geom_label: Make glabel labels more trivial by separating the tasting
routines out.
While there, also simplify the creation of label paths a little bit
by requiring the / suffix for label directory prefixes (ld_dir renamed
to ld_dirprefix to indicate the change) and stop defining macros for
these when they are only used once.
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25597
Fix an overflow bug in the blist allocator that needlessly capped max
swap size by dividing a value, which was always a multiple of 64, by
64. Remove the code that reduced max swap size down to that cap.
Eliminate the distinction between BLIST_BMAP_RADIX and
BLIST_META_RADIX. Call them both BLIST_RADIX.
Make improvments to the blist self-test code to silence compiler
warnings and to test larger blists.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.
Provides full scalability as long as all visited filesystems support the
lookup and terminal vnodes are different.
Inner workings are explained in the comment above cache_fplookup.
Capabilities and fd-relative lookups are not supported and will result in
immediate fallback to regular code.
Symlinks, ".." in the path, mount points without support for lockless lookup
and mismatched counters will result in an attempt to get a reference to the
directory vnode and continue in regular lookup. If this fails, the entire
operation is aborted and regular lookup starts from scratch. However, care is
taken that data is not copied again from userspace.
Sample benchmark:
incremental -j 104 bzImage on tmpfs:
before: 142.96s user 1025.63s system 4924% cpu 23.731 total
after: 147.36s user 313.40s system 3216% cpu 14.326 total
Sample microbenchmark: access calls to separate files in /tmpfs, 104 workers, ops/s:
before: 2165816
after: 151216530
Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25578
Revert r363123.
As Emanuel poited me the Linux processes these clock assignments in forward
order, not in reversed. I misread the original code.
Tha problem with wrong order for assigned clocks found in tegra (and some imx)
DT should be reanalyzed and solved by different way.
Add support for ext_pgs mbufs to nfsm_uiombuflist() and nfsm_split().
This patch uses a slightly different algorithm for nfsm_uiombuflist() for
the non-ext_pgs case, where a variable called "mcp" is maintained, pointing to
the current location that mbuf data can be filled into. This avoids use of
mtod(mp, char *) + mp->m_len to calculate the location, since this does
not work for ext_pgs mbufs and I think it makes the algorithm more readable.
This change should not result in semantic changes for the non-ext_pgs case.
The patch also deletes come unneeded code.
It also adds support for anonymous page ext_pgs mbufs to nfsm_split().
This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.
At this time for this case, use of ext_pgs mbufs cannot be enabled, since
ktls_encrypt() replaces the unencrypted data with encrypted data in place.
Until such time as this can be enabled, there should be no semantic change.
Also, note that this code is only used by the NFS client for a mirrored pNFS
server.
Make it possible to get/set MMC frequency from camcontrol
Enhance camcontrol(8) so that it's possible to manually set frequency for SD/MMC cards.
While here, display more information about the current controller, such as
supported operating modes and VCCQ voltages, as well as current VCCQ voltage.
Reviewed by: manu
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D25795
Make lapic_ipi_vectored(APIC_IPI_DEST_SELF) NMI safe.
Sending IPI to self or all CPUs does not require write into upper part of
the ICR, prone to races. Previously the code disabled interrupts, but it
was not enough for NMIs. Instead of that when possible write only lower
part of the register, or use special SELF IPI register in x2APIC mode.
This also removes ICR reads used to preserve reserved bits on write.
It was there from the beginning, but I failed to find explanation why,
neither I see Linux doing it. Specification even tells that ICR content
may be lost in deep C-states, so if hardware does not bother to preserve
it, why should we?
cem [Fri, 24 Jul 2020 17:34:04 +0000 (17:34 +0000)]
Add unlocked/SMR fast path to getblk()
Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality. Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.
cem [Fri, 24 Jul 2020 17:32:10 +0000 (17:32 +0000)]
Use SMR to provide safe unlocked lookup for pctries from SMR zones
Adapt r358130, for the almost identical vm_radix, to the pctrie subsystem.
Like that change, the tree is kept correct for readers with store barriers
and careful ordering. Existing locks serialize writers.
Add a PCTRIE_DEFINE_SMR() wrapper that takes an additional smr_t parameter
and instantiates a FOO_PCTRIE_LOOKUP_UNLOCKED() function, in addition to the
usual definitions created by PCTRIE_DEFINE().
Interface consumers will be introduced in later commits.
As future work, it might be nice to add vm_radix algorithms missing from
generic pctrie to the pctrie interface, and then adapt vm_radix to use
pctrie.
Remove reference to nlist(3) missed in SCCS revision 5.26 by mckusick
when converting rwhod(8) to using kern.boottime ather than extracting
the boot time from kernel memory directly.
Document that force_depend() supports only /etc/rc.d scripts
Currently, force_depend() from rc.subr(8) does not support depending on
scripts outside of /etc/rc.d (like /usr/local/etc/rc.d). The /etc/rc.d path
is hard-coded into force_depend().
vm: fix swap reservation leak and clean up surrounding code
The code did not subtract from the global counter if per-uid reservation
failed.
Cleanup highlights:
- load overcommit once
- move per-uid manipulation to dedicated routines
- don't fetch wire count if requested size is below the limit
- convert return type from int to bool
- ifdef the routines with _KERNEL to keep vm.h compilable by userspace
Being able to use tmpfs without kernel modules is very useful when building
small MFS_ROOT kernels without a real file system.
Including TMPFS also matches arm/GENERIC and the MIPS std.MALTA configs.
Compiling TMPFS only adds 4 .c files so this should not make much of a
difference to NO_MODULES build times (as we do for our minimal RISC-V
images).
Reviewed By: br (earlier version for riscv), brooks, emaste
Differential Revision: https://reviews.freebsd.org/D25317