Linux device tree binding, whose usage is obligatory,
comprises faulty representation of Marvell cryptographic
engine (CESA) - two engines are artificially gathered into
single DT node, in order to avoid certain SW limitation.
This patch improves the cesa driver to support above binding,
depending on compatible string, which helps to ensure
backward compatibility.
The code now has a single, consistant flow for all three ioctl
variants. ifdefs and for pre-FreeBSD-7 compatability are moved to
functions and macros. So the flow is alwasy the same, we impose
the cost of allocating, copying to, updating from, and freeing a
copy of struct pci_conf_io on all paths.
This change will allow PCIOCGETCONF32 support currently in
sys/compat/freebsd32/freebsd32_ioctl.c to be moved here.
Align OF_getencprop_alloc API with OF_getencprop and OF_getprop_alloc
Change OF_getencprop_alloc semantics to be combination of malloc and
OF_getencprop and return size of the property, not number of elements
allocated.
For the use cases where number of elements is preferred introduce
OF_getencprop_alloc_multi helper function that copies semantics
of OF_getencprop_alloc prior to this change.
This is to make OF_getencprop_alloc and OF_getencprop_alloc_multi
function signatures consistent with OF_getencprop_alloc and
OF_getencprop_alloc_multi.
Functionality-wise this patch is mostly rename of OF_getencprop_alloc
to OF_getencprop_alloc_multi except two calls in ofw_bus_setup_iinfo
where 1 was used as a block size.
if_awg: Add support for allwinner,{tx,rx}-delay-ps bindings
Split out delay parsing into a separate function; we'll support both
{tx,rx}-delay as well as the new versions.
While here, validate that they're within the expected range and fail to
attach if they are not. Assuming that we can clamp the delay is a bad idea
that might result in a non-working awg anyways, so we'll fail early to make
it easier to catch.
This version also unsets the tx and rx delay registers unconditionally and
then sets them if we read a non-zero delay. These delay properties should
default to 0 if not specified, as declared in the binding documentation.
Presumably the delays will be set via hardware configuration if they're not
explicitly set in FDT.
Changelist:
- remove unused nkr_slot_flags
- new nm_intr adapter callback to enable/disable interrupts
- remove unused sysctls and document the other sysctls
- new infrastructure to support NS_MOREFRAG for NIC ports
- support for external memory allocator (for now linux-only),
including linux-specific changes in common headers
- optimizations within netmap pipes datapath
- improvements on VALE control API
- new nm_parse() helper function in netmap_user.h
- various bug fixes and code clean up
OF_getprop_alloc takes element size argument and returns number of
elements in the property. There are valid use cases for such behavior
but mostly API consumers pass 1 as element size to get string
properties. What API users would expect from OF_getprop_alloc is to be
a combination of malloc + OF_getprop with the same semantic of return
value. This patch modifies API signature to match these expectations.
For the valid use cases with element size != 1 and to reduce
modification scope new OF_getprop_alloc_multi function has been
introduced that behaves the same way OF_getprop_alloc behaved prior to
this patch.
Reviewed by: ian, manu
Differential Revision: https://reviews.freebsd.org/D14850
Add the ability to control the CPU topology of created VMs
from userland without the need to use sysctls, it allows the old
sysctls to continue to function, but deprecates them at
FreeBSD_version 1200060 (Relnotes for deprecate).
The command line of bhyve is maintained in a backwards compatible way.
The API of libvmmapi is maintained in a backwards compatible way.
The sysctl's are maintained in a backwards compatible way.
Added command option looks like:
bhyve -c [[cpus=]n][,sockets=n][,cores=n][,threads=n][,maxcpus=n]
The optional parts can be specified in any order, but only a single
integer invokes the backwards compatible parse. [,maxcpus=n] is
hidden by #ifdef until kernel support is added, though the api
is put in place.
Powerpc64: Add the facility unavailable trap subsystem
Summary:
This code adds the basic infrastructure for the facility subsystem. A facility
trap is raised when an unavailable instruction is executed. One example is
executing a Hardware Transactional Memory instruction while the MSR[TM] is
disabled. In the past, there was a specific interrupt for it (FP, VEC), but the
new instructions seem to be multiplexed on this facility interrupt.
The root cause of the trap is provided on Facility Status and Control Register
(FSCR) register.
ian [Sun, 8 Apr 2018 17:06:30 +0000 (17:06 +0000)]
Allow hinted attachment on FDT-based systems. Instead of returning ENXIO
when the FDT data doesn't enable the device instance, return
BUS_PROBE_NOWILDCARD, the same as for non-FDT systems.
Summary:
Print current MSR on printtrap(). Currently, printtrap just prints srr1, which
contains part of the MSR prior to the exception. I find useful to dump the
current value of the MSR, since it changes when there is an interruption.
With this patch, this is the new printtrap model:
handled user trap:
exception = 0x700 (program)
srr0 = 0x100008a0 (0x100008a0)
srr1 = 0x800000000002f032
current msr = 0x8000000000009032
lr = 0x1000089c (0x1000089c)
curthread = 0x7a50000
pid = 714, comm = ttrap2
Summary:
It is not necessary to call isync() after calling mtmsr() function, mainly
because the mtmsr() calls 'isync' internally to synchronize the machine state
register. Other than that, isync() just calls the 'isync' instruction, thus,
the 'isync' instruction is being called twice, and that seems to be unnecessary.
This patch just remove the unecessary calls to isync() after mtmsr().
Previous limits were chosen when locking primitives had spurious lock
accesses.
Flipping the starting point to 1 (or rather 2 as the first call shifts it)
provides a modest win when mild contention is seen while not hurting worse
cases. Tested on a bunch of one, two and four socket old and new systems
(Westmere, Skylake, Threadreaper and others) by doing concurrent page faults,
buildkernel/buildworld and other stuff (although not all systems got all the
tests).
Another thing is the upper limit. It is semi-arbitrarily chosen as it was
getting out of hand for slightly less small systems (e.g. a 128-thread one).
Note that backoff is fundamentally a speculative bandaid and this change just
makes it fit a little bit better. It remains completely oblivious to the
hardware topology or the contention pattern. This is being experimented with.
When using the fsdb `blocks' command, replace the long and ugly list of
blocks with the much more concise and readable block list shown by the
prtblknos() function imported from tools/diag/prtblknos.
The ufs_disk_write() function is used to upgrade a read-only descriptor
to a read-write descriptor. Do not close the read-only descriptor until
the read-write is successfully obtained. Before this fix, a failed upgrade
left no usable descriptor with which to work.
Split tools/diag/prtblknos into two parts:
main.c - opens disk and processes the argument list
of inodes to be printed
prtblknos.c - prints out the list of blocks used by an inode
This change allows the fsdb program to import prtblknos() to use when
printing out the set of blocks used by an inode.
This program was switched to using the libufs library to ease its
integration with fsdb and any other filesystem utility that might
want to use it in the future.
Update VMCI license based on comments from core, the FreeBSD Foundation,
and VMware legal:
- Add a dual BSD-2 Clause/GPLv2 LICENSE file in the VMCI directory
- Remove the use of "All Rights Reserved"
- Per best practice, remove copyright/license info from Makefile
Reviewed by: imp, emaste, jhb, Vishnu Dasa <vdasa@vmware.com>
Approved by: VMware legal via Mark Peek <markpeek@vmware.com>
Differential Revision: https://reviews.freebsd.org/D14979
[rpi] Add fdt_pinctrl(4) support to Raspberry Pi GPIO driver
On Raspberry Pi platform GPIO controller also responsible for pins
multiplexing. Pi code predates proper FDT support in FreeBSD so a
lot of pinmux info is hardcoded. This patch:
- Implements pinctl methods in bcm2835_gpio
- Converts all devices with ad-hoc pinmux info to proper pin control
mechanisms and adds pinmux info in FreeBSD's custom dts files.
- Adds fdt_pinctrl option to RPI2 and RPI-B kernels
- Adds SPI pinmux config to FreeBSD's customization of GNU DTS.
Reviewed by: imp, manu
Differential Revision: https://reviews.freebsd.org/D14104
The sun8i-a83t-bananapi-m3-emac overlay technically doesn't match what will
be coming from upstream. The tx-delay and rx-delay should be specified in
terms of allwinner,tx-delay-ps and allwinner,rx-delay-ps respectively. The
values are still technically correct for what we write in if_awg, and
support for the new bindings will be coming soon.
ian [Sat, 7 Apr 2018 22:21:06 +0000 (22:21 +0000)]
Cast the data pointer to the correct type for the data being accessed (as
opposed to one that accidentally worked on the one arch I test-compiled for
on my first try).
Reported by: np@, O. Hartmann <ohartmann@walstatt.org>
Pointy hat: ian@
ian [Sat, 7 Apr 2018 20:38:01 +0000 (20:38 +0000)]
Add an ioctl to get/set the SPI transfer mode. Also, make the bus clock
frequency ioctl actually set the corresponding ivar instead of just storing
the value locally in the softc (and then not using it for anything). Also,
return the correct error code if the ioctl cmd is not recognized.
ian [Sat, 7 Apr 2018 20:04:03 +0000 (20:04 +0000)]
Remove the existing identify() hack to force-add a spigen device on
FDT-based systems, and instead add proper FDT probe code. Because this
driver is freebsd-specific and just provides generic userland access to run
spibus transactions, there is no bindings document to mandate a compatible
string, so just arbitrarily use "freebsd,spigen".
ian [Sat, 7 Apr 2018 18:09:31 +0000 (18:09 +0000)]
Add support for writing/changing spi device ivars. The SPI mode (polarity
and phase) and the maximum bus speed can be changed. The chip select
number cannot be changed, because the device instances which are children
of spibus are inherently associated with the chip select number they were
instantiated for.
SKZ63 Processor May Hang When Executing Code In an HLE Transaction
Region
Problem: Under certain conditions, if the processor acquires an HLE
(Hardware Lock Elision) lock via the XACQUIRE instruction in the Host
Physical Address range between 40000000H and 403FFFFFH, it may hang
with an internal timeout error (MCACOD 0400H) logged into
IA32_MCi_STATUS.
Move the pages from the range into the blacklist. Add a tunable to
not waste 4M if local DoS is not the issue.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15001
manu [Sat, 7 Apr 2018 14:17:17 +0000 (14:17 +0000)]
axp81x: Do not fail if regulators aren't properly defined
If a regulator is missing a mandatory property (like 'regulator-name'), do
not fail, regulator_parse_ofw_stdparam is returning a non-zero value so just
skip this regulator.
Also if any regulator fails to attach continue with the rest of the regulators
instead of returning ENXIO in axp8xx_attach
Add a way to temporarily suspend and resume virtual CPUs.
This is used as part of implementing run control in bhyve's debug
server. The hypervisor now maintains a set of "debugged" CPUs.
Attempting to run a debugged CPU will fail to execute any guest
instructions and will instead report a VM_EXITCODE_DEBUG exit to
the userland hypervisor. Virtual CPUs are placed into the debugged
state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl).
Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl).
The debug server suspends virtual CPUs when it wishes them to stop
executing in the guest (for example, when a debugger attaches to the
server). The debug server can choose to resume only a subset of CPUs
(for example, when single stepping) or it can choose to resume all
CPUs. The debug server must explicitly mark a CPU as resumed via
vm_resume_cpu() before the virtual CPU will successfully execute any
guest instructions.
ifconf(): correct handling of sockaddrs smaller than struct sockaddr.
Portable programs that use SIOCGIFCONF (e.g. traceroute) assume
that each pseudo ifreq is of length MAX(sizeof(struct ifreq),
sizeof(ifr_name) + ifr_addr.sa_len). For short sockaddrs we copied
too much from the source sockaddr resulting in a heap leak.
I believe only one such sockaddr exists (struct sockaddr_sco which
is 8 bytes) and it is unclear if such sockaddrs end up on interfaces
in practice. If it did, the result would be an 8 byte heap leak on
current architectures.
These have become unsorted from everything else. This is desync'd from
stable/11 due to some hand-merging that was done there, so the MFC of this
will look slightly different.
Ensure that multiplications for memory allocations cannot overflow, and
that we'll not try to allocate M_WAITOK for potentially overly large
allocations.
There was a memory leak in the DIOCRADDTABLES ioctl() code which could
be triggered by trying to add tables with the same name.
Try to provoke this memory leak. It was fixed in r331225.
pf: Improve ioctl validation for DIOCIGETIFACES and DIOCXCOMMIT
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
There's no obvious limit to the request size for these, so we limit the
requests to something which won't overflow. Change the memory allocation
to M_NOWAIT so excessive requests will fail rather than stall forever.
Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
If a user closes the socket before we call tcp_usr_abort(), then
tcp_drop() may unlock the INP. Currently, tcp_usr_abort() does not
check for this case, which results in a panic while trying to unlock
the already-unlocked INP (not to mention, a use-after-free violation).
Make tcp_usr_abort() check the return value of tcp_drop(). In the case
where tcp_drop() returns NULL, tcp_usr_abort() can skip further steps
to abort the connection and simply unlock the INP_INFO lock prior to
returning.
This caching has existed since the CSRG import, but serves no obvious
purpose. Sure, setlogin() is called rarely, but calls to getlogin()
should also be infrequent. The required invalidation was not
implemented on aarch64, arm, mips, amd riscv so updates would never
occur if getlogin() was called before setlogin().
Push RFC 5424 message format from logmsg() into fprintlog().
Now that all of parsemsg() parses both RFC 3164 and 5424 messages and
hands them to logmsg(), alter the latter to properly forward all RFC
5424 message attributes to fprintlog(). While there, make some minor
cleanups to this code:
- Instead of extending the existing code that compares hostnames and
message bodies for deduplication, print all of the relevant message
fields into a single string that we can compare ('saved').
- No longer let the behaviour of fprintflog() depend on whether
'msg == NULL' to print repetition messages, Simply decompose this
function into fprintlog_first() and fprintlog_successive(). This
makes the interpretation of function arguments less magical and also
allows us to get consistent behaviour across RFC 3164 and 5424 when
adding support for the RFC 5424 output format.
- As RFC 5424 syslog messages have a dedicated application name field,
alter the repetition messages to be printed on behalf of syslogd on
the current system. Change these messages to use the local hostname,
so that it's obvious which syslogd instance detected the repetition.
Remove f_prevhost, as it has now become unnecessary.
- Remove a useless strdup(). Deconsting the message string is safe in
this specific case.
Pat the watchdog less while producing a coredump. Prior to this change,
we patted the watchdog approximately once per 4KB page of memory. After
this change, we pat the watchdog approximately once per 128MB of memory.
On a sample machine, this translated to patting the watchdog approximately
every 5.4 seconds, which "seems reasonable". We can choose a different
value in the future, if warranted.
This has extensive field experience. It is a performance improvement, and
has not caused any known problems.
Check that in_pcbfree() is only called once for each PCB. If that
assumption is violated, "bad things" could follow.
I believe such an assert would have detected some of the problems jch@
was chasing in PR 203175 (see r307551). We also use it in our internal
TCP development efforts. And, in case a bug does slip through to
released code, this change silently ignores subsequent calls to
in_pcbfree().
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D14990
pf tests: Basic ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS
Validate the DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and
DIOCRSETTFLAGS ioctls with invalid values. These may succeed (because
the kernel uses the minimally required size, not the specified size),
but should not trigger kernel panics.
pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.
lualoader: Fix menu skipping with loader.conf(5) vars
Earlier efforts to stop loading the menu broke the ability to skip the menu
with, e.g., beastie_disable in loader.conf(5) as it was decided before
configuration was read.
Defer bringing in the menu module until we've loaded configuration so that
we can make a more informed decision on whether the menu should be skipped
or not.
aw_sid(4): Use prctl read for all reads when it's required
It was later found that some operation on the OrangePi one will cause
direct accesses to the eeprom to return wrong data again, so reading it all
once via prctl at attach time is no longer sufficient.
In cases where an application issues certain IPMI commands at a high
enough rate, the IPMI code can print large numbers of messages to the
console, such as:
ipmi0: KCS: Failed to read completion code
ipmi0: KCS error: ff
ipmi0: KCS: Failed to read completion code
ipmi0: KCS error: ff
These seem to be innocuous from a system standpoint, and the user-
space code can deal with the failures. Therefore, suppress printing
these messages to the console unless bootverbose is enabled.
pf: Improve ioctl validation for DIOCRADDTABLES and DIOCRDELTABLES
The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of
tables at a time, and as such try to allocate <number of tables> *
sizeof(struct pfr_table). This multiplication can overflow. Thanks to
mallocarray() this is not exploitable, but an overflow does panic the
system.
Arbitrarily limit this to 65535 tables. pfctl only ever processes one
table at a time, so it presents no issues there.
With r332099 changing syslogd(8) to parse RFC 5424 formatted syslog
messages, go ahead and also change the syslog(3) libc function to
generate them. Compared to RFC 3164, RFC 5424 has various advantages,
such as sub-second precision for log entry timestamps.
As this change could have adverse effects when not updating syslogd(8)
or using a different system logging daemon, add a notice to UPDATING and
increase __FreeBSD_version.
Syslogd currently uses the RFC 3164 format for its log messages.One
limitation of RFC 3164 is that it cannot be used to log entries with
sub-second precision timestamps. One of our users has expressed a desire
for doing this for doing some basic performance measurements.
This change attempts to make a first cut at switching to RFC 5424 based
logging. The first step is to alter syslogd's input path to properly
parse such messages. It alters the logmsg() prototype to match the
fields of RFC 5424. The parsemsg() function is extended to parse both
RFC 3164 and 5424 messages and call into logmsg() accordingly.
Additional changes include:
- Introducing proper parsing of timestamps, so that they can be printed
in any desired output format. This means we need to infer the year and
timezone for RFC 3164 timestamps.
- Removing ISKERNEL. This can now be realised by simply providing an
APP-NAME (== "kernel").
- Extending RFC 3164 parsing to trim off the TAG prefix and using that
to derive APP-NAME and PROCID.
- Increase MAXLINE. RFC 5424 mentions we should support 2k messages.
stand: pass --no-rosegment for i386 bits when linking with lld
btxld does not correctly handle input with other than 2 PT_LOAD
segments. Passing --no-rosegment lets lld produce output eqivalent to
ld.bfd: 2 PT_LOAD segments and no PT_GNU_RELRO.
PR: 225775
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14956
Add 32-bit compat for ioctls that take struct ifgroupreq.
Use an accessor to access ifgr_group and ifgr_groups.
Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.
The previous split of zeroing ifr_name and ifr_addr seperately is safe
on current architectures, but would be unsafe if pointers were larger
than 8 bytes. Combining the zeroing adds no real cost (a few
instructions) and makes the security property easier to verify.
Modify makesyscalls.sh to strip out SAL annotations.
No functional change.
This is based on work I started in CheriBSD and use to validate fat
pointers at the syscall boundary. Tal Garfinkel reviewed the changes,
added annotations to COMPAT* syscalls and is using them in a record and
playback framework. One can envision other uses such as a WITNESS-like
validator for copyin/out as speculated on in the review.
As this time we are only annotating sys/kern/syscalls.master as that is
sufficient for userspace work. If kernel use cases materialize, we can
annotate other syscalls.master as needed.
Submitted by: Tal Garfinkel <talg@cs.stanford.edu>
Sponsored by: DARPA, AFRL (in part)
Differential Revision: https://reviews.freebsd.org/D14285
When booted via isoboot(8) loader will be handed a disk that simply contains
an ISO9660 image. Currently this confuses it greatly. Teach it how to spot
that it's in this situation and that ISO9660 has one "partition" covering
the whole disk.
Reviewed by: imp
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14915