Add a way to temporarily suspend and resume virtual CPUs.
This is used as part of implementing run control in bhyve's debug
server. The hypervisor now maintains a set of "debugged" CPUs.
Attempting to run a debugged CPU will fail to execute any guest
instructions and will instead report a VM_EXITCODE_DEBUG exit to
the userland hypervisor. Virtual CPUs are placed into the debugged
state via vm_suspend_cpu() (implemented via a new VM_SUSPEND_CPU ioctl).
Virtual CPUs can be resumed via vm_resume_cpu() (VM_RESUME_CPU ioctl).
The debug server suspends virtual CPUs when it wishes them to stop
executing in the guest (for example, when a debugger attaches to the
server). The debug server can choose to resume only a subset of CPUs
(for example, when single stepping) or it can choose to resume all
CPUs. The debug server must explicitly mark a CPU as resumed via
vm_resume_cpu() before the virtual CPU will successfully execute any
guest instructions.
ifconf(): correct handling of sockaddrs smaller than struct sockaddr.
Portable programs that use SIOCGIFCONF (e.g. traceroute) assume
that each pseudo ifreq is of length MAX(sizeof(struct ifreq),
sizeof(ifr_name) + ifr_addr.sa_len). For short sockaddrs we copied
too much from the source sockaddr resulting in a heap leak.
I believe only one such sockaddr exists (struct sockaddr_sco which
is 8 bytes) and it is unclear if such sockaddrs end up on interfaces
in practice. If it did, the result would be an 8 byte heap leak on
current architectures.
These have become unsorted from everything else. This is desync'd from
stable/11 due to some hand-merging that was done there, so the MFC of this
will look slightly different.
Ensure that multiplications for memory allocations cannot overflow, and
that we'll not try to allocate M_WAITOK for potentially overly large
allocations.
There was a memory leak in the DIOCRADDTABLES ioctl() code which could
be triggered by trying to add tables with the same name.
Try to provoke this memory leak. It was fixed in r331225.
pf: Improve ioctl validation for DIOCIGETIFACES and DIOCXCOMMIT
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
There's no obvious limit to the request size for these, so we limit the
requests to something which won't overflow. Change the memory allocation
to M_NOWAIT so excessive requests will fail rather than stall forever.
Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
If a user closes the socket before we call tcp_usr_abort(), then
tcp_drop() may unlock the INP. Currently, tcp_usr_abort() does not
check for this case, which results in a panic while trying to unlock
the already-unlocked INP (not to mention, a use-after-free violation).
Make tcp_usr_abort() check the return value of tcp_drop(). In the case
where tcp_drop() returns NULL, tcp_usr_abort() can skip further steps
to abort the connection and simply unlock the INP_INFO lock prior to
returning.
This caching has existed since the CSRG import, but serves no obvious
purpose. Sure, setlogin() is called rarely, but calls to getlogin()
should also be infrequent. The required invalidation was not
implemented on aarch64, arm, mips, amd riscv so updates would never
occur if getlogin() was called before setlogin().
Push RFC 5424 message format from logmsg() into fprintlog().
Now that all of parsemsg() parses both RFC 3164 and 5424 messages and
hands them to logmsg(), alter the latter to properly forward all RFC
5424 message attributes to fprintlog(). While there, make some minor
cleanups to this code:
- Instead of extending the existing code that compares hostnames and
message bodies for deduplication, print all of the relevant message
fields into a single string that we can compare ('saved').
- No longer let the behaviour of fprintflog() depend on whether
'msg == NULL' to print repetition messages, Simply decompose this
function into fprintlog_first() and fprintlog_successive(). This
makes the interpretation of function arguments less magical and also
allows us to get consistent behaviour across RFC 3164 and 5424 when
adding support for the RFC 5424 output format.
- As RFC 5424 syslog messages have a dedicated application name field,
alter the repetition messages to be printed on behalf of syslogd on
the current system. Change these messages to use the local hostname,
so that it's obvious which syslogd instance detected the repetition.
Remove f_prevhost, as it has now become unnecessary.
- Remove a useless strdup(). Deconsting the message string is safe in
this specific case.
Pat the watchdog less while producing a coredump. Prior to this change,
we patted the watchdog approximately once per 4KB page of memory. After
this change, we pat the watchdog approximately once per 128MB of memory.
On a sample machine, this translated to patting the watchdog approximately
every 5.4 seconds, which "seems reasonable". We can choose a different
value in the future, if warranted.
This has extensive field experience. It is a performance improvement, and
has not caused any known problems.
Check that in_pcbfree() is only called once for each PCB. If that
assumption is violated, "bad things" could follow.
I believe such an assert would have detected some of the problems jch@
was chasing in PR 203175 (see r307551). We also use it in our internal
TCP development efforts. And, in case a bug does slip through to
released code, this change silently ignores subsequent calls to
in_pcbfree().
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D14990
pf tests: Basic ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS
Validate the DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and
DIOCRSETTFLAGS ioctls with invalid values. These may succeed (because
the kernel uses the minimally required size, not the specified size),
but should not trigger kernel panics.
pf: Improve ioctl validation for DIOCRGETTABLES, DIOCRGETTSTATS, DIOCRCLRTSTATS and DIOCRSETTFLAGS
These ioctls can process a number of items at a time, which puts us at
risk of overflow in mallocarray() and of impossibly large allocations
even if we don't overflow.
Limit the allocation to required size (or the user allocation, if that's
smaller). That does mean we need to do the allocation with the rules
lock held (so the number doesn't change while we're doing this), so it
can't M_WAITOK.
lualoader: Fix menu skipping with loader.conf(5) vars
Earlier efforts to stop loading the menu broke the ability to skip the menu
with, e.g., beastie_disable in loader.conf(5) as it was decided before
configuration was read.
Defer bringing in the menu module until we've loaded configuration so that
we can make a more informed decision on whether the menu should be skipped
or not.
aw_sid(4): Use prctl read for all reads when it's required
It was later found that some operation on the OrangePi one will cause
direct accesses to the eeprom to return wrong data again, so reading it all
once via prctl at attach time is no longer sufficient.
In cases where an application issues certain IPMI commands at a high
enough rate, the IPMI code can print large numbers of messages to the
console, such as:
ipmi0: KCS: Failed to read completion code
ipmi0: KCS error: ff
ipmi0: KCS: Failed to read completion code
ipmi0: KCS error: ff
These seem to be innocuous from a system standpoint, and the user-
space code can deal with the failures. Therefore, suppress printing
these messages to the console unless bootverbose is enabled.
pf: Improve ioctl validation for DIOCRADDTABLES and DIOCRDELTABLES
The DIOCRADDTABLES and DIOCRDELTABLES ioctls can process a number of
tables at a time, and as such try to allocate <number of tables> *
sizeof(struct pfr_table). This multiplication can overflow. Thanks to
mallocarray() this is not exploitable, but an overflow does panic the
system.
Arbitrarily limit this to 65535 tables. pfctl only ever processes one
table at a time, so it presents no issues there.
With r332099 changing syslogd(8) to parse RFC 5424 formatted syslog
messages, go ahead and also change the syslog(3) libc function to
generate them. Compared to RFC 3164, RFC 5424 has various advantages,
such as sub-second precision for log entry timestamps.
As this change could have adverse effects when not updating syslogd(8)
or using a different system logging daemon, add a notice to UPDATING and
increase __FreeBSD_version.
Syslogd currently uses the RFC 3164 format for its log messages.One
limitation of RFC 3164 is that it cannot be used to log entries with
sub-second precision timestamps. One of our users has expressed a desire
for doing this for doing some basic performance measurements.
This change attempts to make a first cut at switching to RFC 5424 based
logging. The first step is to alter syslogd's input path to properly
parse such messages. It alters the logmsg() prototype to match the
fields of RFC 5424. The parsemsg() function is extended to parse both
RFC 3164 and 5424 messages and call into logmsg() accordingly.
Additional changes include:
- Introducing proper parsing of timestamps, so that they can be printed
in any desired output format. This means we need to infer the year and
timezone for RFC 3164 timestamps.
- Removing ISKERNEL. This can now be realised by simply providing an
APP-NAME (== "kernel").
- Extending RFC 3164 parsing to trim off the TAG prefix and using that
to derive APP-NAME and PROCID.
- Increase MAXLINE. RFC 5424 mentions we should support 2k messages.
stand: pass --no-rosegment for i386 bits when linking with lld
btxld does not correctly handle input with other than 2 PT_LOAD
segments. Passing --no-rosegment lets lld produce output eqivalent to
ld.bfd: 2 PT_LOAD segments and no PT_GNU_RELRO.
PR: 225775
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D14956
Add 32-bit compat for ioctls that take struct ifgroupreq.
Use an accessor to access ifgr_group and ifgr_groups.
Use an macro CASE_IOC_IFGROUPREQ(cmd) in place of case statements such
as "case SIOCAIFGROUP:". This avoids poluting the switch statements
with large numbers of #ifdefs.
The previous split of zeroing ifr_name and ifr_addr seperately is safe
on current architectures, but would be unsafe if pointers were larger
than 8 bytes. Combining the zeroing adds no real cost (a few
instructions) and makes the security property easier to verify.
Modify makesyscalls.sh to strip out SAL annotations.
No functional change.
This is based on work I started in CheriBSD and use to validate fat
pointers at the syscall boundary. Tal Garfinkel reviewed the changes,
added annotations to COMPAT* syscalls and is using them in a record and
playback framework. One can envision other uses such as a WITNESS-like
validator for copyin/out as speculated on in the review.
As this time we are only annotating sys/kern/syscalls.master as that is
sufficient for userspace work. If kernel use cases materialize, we can
annotate other syscalls.master as needed.
Submitted by: Tal Garfinkel <talg@cs.stanford.edu>
Sponsored by: DARPA, AFRL (in part)
Differential Revision: https://reviews.freebsd.org/D14285
When booted via isoboot(8) loader will be handed a disk that simply contains
an ISO9660 image. Currently this confuses it greatly. Teach it how to spot
that it's in this situation and that ISO9660 has one "partition" covering
the whole disk.
Reviewed by: imp
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14915
Add isoboot(8) for booting BIOS systems from HDDs containing ISO images.
This is part of a project for adding the ability to create hybrid CD/USB boot
images. In the BIOS case when booting from something that isn't a CD we need
some extra boot code to actually find our next stage (loader) within an
ISO9660 filesystem. This code will reside in a GPT partition (similar to
gptboot(8) from which it is derived) and looks for /boot/loader in an
ISO9660 filesystem on the image.
Reviewed by: imp
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14914
Exit with usage when extra arguments are on command line
preventing mistakes such as "halt 0p" for "halt -p".
Approved by: bde (mentor), phk (mentor)
MFC after: 1 week
So that it doesn't rely on physmap[1] containing an address below
1MiB. Instead scan the full physmap and search for a suitable address
to place the trampoline code (below 1MiB) and the initial memory pages
(below 4GiB).
Make the INTO instruction operational in 32bit mode.
Having the IDT entry specify ring 0 DPL caused delivery of #GP instead
of #OF.
The instruction is not valid in 64bit mode, which probably explains
why the IDT entry for #OF was initially set this way. It is
interesting to note that the BOUND instruction works with the IDT #BR
entry DPL 0, most likely CPU considers #BR from BOUND as generated by
a machine, not user.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
remove special handling for stale ptrace dependencies
r318957 added special handling for stale ptrace dependency files to
support a -DNO_CLEAN build in a tree last built before r305012. That
revision is now over a year and a half old, so retire the special case.
cxgbe(4): Always display an error message if SIOCSIFFLAGS will leave
IFF_UP and IFF_DRV_RUNNING out of sync. ifhwioctl in the kernel pays no
attention to the return code from the driver ioctl during SIOCSIFFLAGS
so these messages are the only indication that the ioctl was called but
failed.
netmap: align if_ptnet guest driver to the upstream code (commit 0e15788)
The change upgrades the driver to use the split Communication Status
Block (CSB) format. In this way the variables written by the guest
and read by the host are allocated in a different cacheline than
the variables written by the host and read by the guest; this is
needed to avoid cache thrashing.
strcpy was used to copy a string into a buffer copied to userland, which
left uninitialized data after the terminating 0-byte. Use the same
approach as in tcp_subr.c: strncpy and explicit '\0'.
admbugs: 765, 822
MFC after: 1 day
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reported by: Vlad Tsyrklevich
Security: Kernel memory disclosure
Sponsored by: The FreeBSD Foundation
Fix kernel memory disclosure in linux_ioctl_socket
strlcpy is used to copy a string into a buffer to be copied to userland,
previously leaving uninitialized data after the terminating NUL. Zero
the buffer first to avoid a kernel memory disclosure.
admbugs: 765, 811
MFC after: 1 day
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reported by: Vlad Tsyrklevich
Sponsored by: The FreeBSD Foundation
Enable Marvell gpio driver to work with many controllers
This patch moves all global data structures into mv_gpio_softc,
and puts device_t parameter to functions calls everywhere where needed.
As a result, we can create multiple driver instances.
Removed names in function declaration to keep style.
Improve interrupt and resource allocation in Marvell GPIO driver
This patch adds support for more than one interrupts
in GPIO controller. It reads necessary information (such as cell size)
from FDT, so there are no magic numbers.
Note that interrupts are still not working, but this patch makes
one good step in correct direction
Introduce port debouncing mechanism in mv_gpio driver
This patch introduces gpio debouncing mechanism
with fixed memory allocation in critical section.
When you press button, value at gpio pin connected to button
is changing many times which will cause in unexpected behaviour.
Debouncing mechanism will prevent this phenomenon
Ranges in pcie-controller are unused, so could be changed to match Linux
device tree represntation. Same with interrupt-cells and interrupt-parent.
In PCI controller driver ocd_data are used for matching driver and
choose proper resources acquisition function.
fdt_win_process_child have new argument which provide information
about fdt node containing addresses of MMIO registers.
Add api for creating resource list based on 'assigned-addresses'
According to device tree binding 'assigned-addresses' can refer to PCIE MMIO
register space. New function ofw_bus_assigned_addresses_to_rl is
provided to support it.
In Linux FDT pcie does not have compatible string.
Configuration of windows in mv_common was based on fdt compatible.
Now pcie windows are configured by their parent: pcie_controller.
Processing is moved to fdt_win_process_child. fdt_win_process now
only walk through the tree. SOC_NODE_PCI is position of pcie function in
soc_node_spec array.
PCIe probe cannot use ofw_bus_search_compatible, because it needs to
check also device type and parents compatible.
GENERIC ARM config use NEW_PCIB driver (https://wiki.freebsd.org/NEW_PCIB).
To satisfy it, allocation and deallocation of PCI_RES_BUS is necessary.
Conditional compilation is added for backward compatibility with ARMv5
configs.
Make Marvell armv7 timer and wdt registers definitions common
Define timers registers for both SoCs and choose proper one during runtime
based on information from FDT.
In WDT driver there are different function for ArmadaXP and other ARMv5 SoCs.
In timer driver registers definitions are stored in resource_spec structure
and chosen during runtime.
Use PLATFORM for initializing Marvell ArmadaXP and Armada38X
Spliting armv5 and armv7 machdep is necessary for adding Armada38X and
ArmadaXP to GENERIC config.
PLATFORM framework checks SOC type in FDT and will select proper
initialization function implementation during runtime.
Pointers to SoC specific implementation are stored in array of
platform_method_t and provided to framework by FDT_PLATFORM_DEF macro.
PLATFORM framework supports also reset function. To simplify implementation
cpu_reset is moved from mv_common to armv5 and armv7 machdep.
Armada38X and ArmadaXP share now common list of files, so resolve all
dependencies as well.
manu [Wed, 4 Apr 2018 06:44:24 +0000 (06:44 +0000)]
regulator: Disable unused regulator
bootloaders such as u-boot might enable regulators, or simply regulators could
be enabled by default by the PMIC, even if we don't have a driver for
the device or subsystem.
Disable unused regulators just before going to userland.
A tunable hw.regulator.disable_unused is added to not disable them in case
this causes problems on some board but the default behavior is to disable
everything unused.
I prefer to break thinks now and fix them rather than never switch to the
case were we disable regulators.
Tested on : Pine64-LTS (an idle board goes from ~0.33A to ~0.27A)
Tested on : BananaPi M2
Differential Revision: https://reviews.freebsd.org/D14781
gordon [Wed, 4 Apr 2018 05:21:46 +0000 (05:21 +0000)]
Limit glyph count in vtfont_load to avoid integer overflow.
Invalid font data passed to PIO_VFONT can result in an integer overflow
in glyphsize. Characters may then be drawn on the console using glyph
map entries that point beyond the end of allocated glyph memory,
resulting in a kernel memory disclosure.
Submitted by: emaste
Reported by: Dr. Silvio Cesare of InfoSect
Security: CVE-2018-6917
Security: FreeBSD-SA-18:04.vt
Sponsored by: The FreeBSD Foundation
TLB1 can handle ranges up to 4GB (through e5500, larger in e6500), but
ilog2() took a unsigned int, which maxes out at 4GB-1, but truncates
silently. Increase the input range to the largest supported, at least for
64-bit targets. This lets the DMAP be completely mapped, instead of only
1GB blocks with it assuming being fully mapped.
These have been found to be practically useless. We were actually
following the Android bionic library and had some interest in replicating
the same warnings and behaviour but Android has since removed them.
We are still keeping some uses of nullability attributes in other headers,
somewhat in line with Apple's libc.
Two modules with the same name cannot be loaded, so Marvell specific drivers
cannot have the same name as the generic drivers.
Files with the same name, even in different folders overlaps their .o files,
so in order to prepare for supporting Marvell platforms in GENERIC armv7
config, modify conflicting names.
Split out delay code in Marvell timer driver for PLATFORM
The PLATFORM code will perform the software loop in the early boot,
so extract the actual delay code to handle situation, when
the timers are already initialized.
Store pointers to SoC specific functions in mv_timer_config structure
and determine proper config in runtime based on compatible string from FDT.
Compatible string for ArmadaXP timers is changed to match Linux FDT.
Armada 38x uses generic Cortex-A9 timer and separate watchdog drivers, so
it does not need to be supported by timer driver.
Store platform dependent functions and constants in mv_wdt_config
structure and select proper configuration on runtime based on
compatible string provided in FDT.
Marvell Armada38X and ArmadaXP non-repetitive registers are moved to
generic part of code.
To support armv5 as well, use proper compatible string:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/watchdog/orion_wdt.c?h=v4.13-rc3#n456
Make get_tclk and get_cpu_freq generic for Marvell armv7 SoCs
In GENERIC kernel choosing proper get_tclk and get_cpu_freq implementation must
be done in runtime. Kernel for both SoC need to have implementation of each
other functions, so common file list mv/files.arm7 is added.
Marvell armv5 SoC have their own non-generic implementation of those function.
Make mv_common.c generic for Marvell Armada38X and ArmadaXP
Preparation for adding Armada38X and ArmadaXP SoC to GENERIC config.
Supported platform are listed in soc_family enum.
struct decode_win_spec contains platform specific functions and constants.
Function mv_check_soc_family checks SoC type and chooses proper structure
in runtime, as well as platform-dependent functions.
Unnecessary dummy functions are removed.
Because of changing registers name to more generic new definition of
FDT_DEVMAP_MAX in mv_machdep is added.
Split get_sar_value function for Marvell ArmadaXP and Armada38X
get_sar_value is implemented only for ArmadaXP and Armada38X. Splitting it for
two different functions and change registers names result in more generic code.
PCI ports differ between Marvell SoCs, but have the same compatible in FDT.
Identification is made based on parent compatible during attach.
For ArmadaXP skipping enable procedure is necessary. To achieve it
sc_skip_enable_procedure flag is used.
For Armada38x find root procedure is necessary. For other SoCs root link is
always at slot 0. sc_enable_find_root_slot flag is used to select proper
behaviour.
Marvell armv5 platforms does not support msi.
Defining INTRNG remove some necessary registers and declarations of
pic_init_secondary, pic_ipi_send, pic_ipi_read and pic_ipi_clear.
Because Marvell ArmadaXP and Armada38X always use INTRNG, include all
INTRNG code and remove code that does not use it.
Separate pic registers declarations for Armada38X are unnecessary, it
works properly with ArmadaXP config.
9434 Speculative prefetch is blocked by device removal code.
Device removal code does not set spa_indirect_vdevs_loaded for pools
that never experienced device removal. At least one visual consequence
of it is completely blocked speculative prefetcher. This patch sets
the variable in such situations.