Colin Percival [Sun, 31 Dec 2017 09:24:41 +0000 (09:24 +0000)]
Use the TSLOG framework to record entry/exit timestamps for DELAY and
_vprintf; these functions are called in many places and can contribute
meaningfully to the total time spent booting.
Colin Percival [Sun, 31 Dec 2017 09:24:11 +0000 (09:24 +0000)]
Instrument thread creations for the the benefit of the TSLOG framework.
This assists in tracking time spent while the boot is being "held" waiting
for something to happen.
Colin Percival [Sun, 31 Dec 2017 09:23:52 +0000 (09:23 +0000)]
Instrument "boot holds" for the benefit of the TSLOG framework. These
are places where the "main thread" of the booting kernel (either the
thread which later becomes swapper or the thread which later becomes
init) has to stop and wait for action to take place in another thread
before continuing.
There are currently three such holds:
1. The intr_config_hooks SYSINIT waits for hooks registered via the
config_intrhook_establish function; this allows (typically) devices
which need interrupts enabled to complete their initialization to do
so before root is mounted.
2. The g_waitidle function waits for the GEOM event queue to be empty;
this ensures that all of the disks which have been attached have been
tasted before we attempt to mount root.
3. The vfs_mountroot_wait function (in addition to calling g_waitidle)
waits for holds registered via root_mount_hold; among other things, this
is used by the USB subsystem to ensure that we don't fail to mount root
if it's located on a USB disk which takes a while to probe.
Colin Percival [Sun, 31 Dec 2017 09:23:19 +0000 (09:23 +0000)]
Teach makeobjops.awk to accept PROLOG and EPILOG blocks before
METHOD and STATICMETHOD declarations; that code will be inserted
into the dispatch function before and after the method call.
Use this functionality and the TSLOG framework to record DEVICE_ATTACH
and DEVICE_PROBE entry/exit timestamps.
Colin Percival [Sun, 31 Dec 2017 09:22:31 +0000 (09:22 +0000)]
Use the TSLOG framework to record entry/exit timestamps for machine
independent functions with important roles in the early boot process:
mi_startup (with the "exit" recorded when it becomes swapper),
start_init (with the "exit" recorded when the thread is about to
"return" into the newly created init process), vfs_mountroot, and
vfs_mountroot_wait.
Colin Percival [Sun, 31 Dec 2017 09:22:07 +0000 (09:22 +0000)]
Use the TSLOG framework to record entry/exit timestamps for hammer_time.
The entry must be logged "manually" using TSRAW rather than TSENTER
since PCPU data structures have not yet been initialized and thus
curthread cannot be accessed; &thread0 is what will become curthread
later in hammer_time.
Other MD initialization code should be similarly instrumented in order
to gain visibility into the time spent before entering mi_startup; this
will require some care and testing from people with access to such
hardware.
Colin Percival [Sun, 31 Dec 2017 09:21:34 +0000 (09:21 +0000)]
Connect kern_tslog.c to the build and add TSLOG / TSLOGSIZE kernel options.
These are intended for debugging purposes and should not be added to
"generic" kernel configurations since they result in a nontrivial amount
of memory being set aside for this purpose, can break if kernel modules are
unloaded, and can potentially leak a dangerous amount of information about
timestamps used as a source of kernel entropy.
Colin Percival [Sun, 31 Dec 2017 09:21:01 +0000 (09:21 +0000)]
Code for recording timestamps of events, especially function entries/exits.
This is a very primitive system, intended for use in measuring performance
during the early system boot, before more sophisticated tools like DTrace
or infrastructure like kernel memory allocation and mutexes are available.
Because this code records pointers to strings rather than copying strings
(in order to keep the memory usage more manageable), if a kernel module is
unloaded after logging an event, Bad Things can happen. Users are advised
to not do that.
Since cycle counts from the early kernel boot are used as an initial entropy
source, publishing this information to userland could result in inadequate
entropy being kept private to the kernel RNG. Users are advised to not
enable this on systems with untrusted users.
Nathan Whitehorn [Sun, 31 Dec 2017 06:10:07 +0000 (06:10 +0000)]
Use data from the boot loader to pick the appropriate output graphics mode
instead of hard-coding a default. This information is passed implicitly by
the PS3 firmware and can be relied upon. Also adjust the default mode, if
somehow firmware doesn't pass one, to 1920x1080 from 720x480 since it is
2017.
Kyle Evans [Sun, 31 Dec 2017 05:22:26 +0000 (05:22 +0000)]
stand/fdt: Make fdt_overlay_apply signature-compatible with libfdt
libfdt will assume a writable fdt overlay blob has been passed in, so make
ours compatible to allow easier review when we try to drop libfdt into
place. overlay from the calling context is writable, making it safe to
simply rip out everything related to copying the overlay blob in
fdt_overlay_apply.
I note here that we still have problems: fdt_overlay_apply, both our version
and libfdt's, may fail and have already clobbered the base fdt to some
extent. Future work will make sure we don't apply a potentially bogus fdt,
instead discarding the base fdt if we had an error.
Alan Cox [Sun, 31 Dec 2017 04:01:47 +0000 (04:01 +0000)]
Previously, swap_pager_copy() freed swap blocks one at at time, via
swp_pager_meta_ctl(), with no opportunity to recognize freeing of
consecutive blocks and free fewer block ranges. To open that opportunity,
this change removes the SWM_FREE option from swp_pager_meta_ctl(), and
compels the caller to do the freeing when a valid block address is returned.
In swap_pager_copy(), these frees are aggregated, so that a sequence of them
can be done at one time.
The only other caller to swp_pager_meta_ctl() that passed SWM_FREE,
swp_pager_unswapped(), is also modified to handle its single free
explicitly.
Pedro F. Giffuni [Sun, 31 Dec 2017 03:34:00 +0000 (03:34 +0000)]
sysv_{ipc|shm}: update the NetBSD VCS tags to match nearer our files.
Both files originated in NetBSD:
sysv_ipc.c CVS 1.9:
Most of their changes don't apply to us as we already have similar
changes. This is a better reference for future merges.
sysv_shm.c CVS 1.39:
Most of their changes don't apply to our code but interestingly this
revision merged our changes and is a better point for reference.
Move the VCS tags to the position recommended in our committers guide
(section 8),
Mateusz Guzik [Sun, 31 Dec 2017 00:47:04 +0000 (00:47 +0000)]
locks: re-check the reason to go to sleep after locking sleepq/turnstile
In both rw and sx locks we always go to sleep if the lock owner is not
running.
We do spin for some time if the lock is read-locked.
However, if we decide to go to sleep due to the lock owner being off cpu
and after sleepq/turnstile gets acquired the lock is read-locked, we should
fallback to the aforementioned wait.
Mateusz Guzik [Sun, 31 Dec 2017 00:33:28 +0000 (00:33 +0000)]
mtx: pre-read the lock value in thread_lock_flags_
Since this function is effectively slow path, if we get here the lock is most
likely already taken in which case it is cheaper to not blindly attempt the
atomic op.
While here move hwpmc probe out of the loop to match other primitives.
Nathan Whitehorn [Sat, 30 Dec 2017 20:27:13 +0000 (20:27 +0000)]
Garbage-collect loader.ps3. It is currently disconnected from the build and
is superseded by either direct loading of the kernel by petitboot (soon to
become the installer default) or loader.kboot.
Nathan Whitehorn [Sat, 30 Dec 2017 20:23:14 +0000 (20:23 +0000)]
Check more aggressively for whether the desired properties actually exist.
If they don't, the code would look up some random part of the device tree
and seize the console inappropriately.
The ep(4) driver is the only consumer of the two functions from
elink.c. I removed the standalone module as well, and most likely,
the module metadata is not needed anywhere, but this is for later
cleanup.
Discussed with: imp, jhb
Sponsored by: The FreeBSD Foundation
The i386 FPU (AKA npx) code does not depend on ISA devices at all,
after the support for IRQ13 FPU exceptions was removed. Put the file
into the expected place in the kernel source tree.
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
Pedro F. Giffuni [Sat, 30 Dec 2017 02:07:18 +0000 (02:07 +0000)]
geom_ccd.c: Fix the licenses properly
The license merging in r109471 didn't take into account that licensing
could change. Just removing the 3rd clause obviates the copyright
assignment to the NetBSD Foundation.
We do have plenty of files that have two or more licensing as in this
case, so fix this properly by splitting back the licenses as they are
upstream.
Pedro F. Giffuni [Sat, 30 Dec 2017 01:37:08 +0000 (01:37 +0000)]
geom_ccd.c: Update the license with changes from upstream.
Part of this file originated in NetBSD, with the original file
carrying two versions of 4-clause BSD licenses. r109471 attempted to
simplify the situation by putting both licenses together.
Meanwhile, NetBSD dropped Clauses 3 and 4 from their own license, and
eventually NetBSD got permission from the University of Utah to drop the
3rd clause.
Keep the license "simple" by dropping the third clause since both TNF,
Utah/Berkeley and phk agree in principle that it can be dropped.
- Add some basic checks for i_fc* bits (ToDS, FromDS, MoreFrag, Protected);
those are used / checked across various places in Tx path.
- Mark injected 802.11 frame as encapsulated (just as it should be).
- Classify 802.11 frame in a proper way (extract ether_type from LLC header
for Data frames, use AC_BE queue for others (NoData / Management / Control).
- Subtract header length from tx_bytes statistics (so it will correspond
to the comment).
Was checked with RTL8188EU (AP) + Intel 6205 (STA).
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D13161
Ian Lepore [Sat, 30 Dec 2017 00:20:49 +0000 (00:20 +0000)]
Make kernel option KERNVIRTADDR optional, remove it from std.<platform>
files that can use the default value.
It used to be required that the low-order bits of KERNVIRTADDR matched
the low-order bits of the physical load address for all arm platforms.
That hasn't been a requirement for armv6 platforms since FreeBSD 10.
There is no longer any relationship between load addr and KERNVIRTADDR
except that both must be aligned to a 2 MiB boundary.
This change makes the default KERNVIRTADDR value 0xc0000000, and removes the
options from all the platforms that can use the default value. The default
is now defined in vmparam.h, and that file is now included in a few new
places that reference KERNVIRTADDR, since it may not come in via the
forced-include of opt_global.h on the compile command line.
Nathan Whitehorn [Fri, 29 Dec 2017 21:09:17 +0000 (21:09 +0000)]
Enhance the CHRP/pSeries platform layer:
- Densely number CPUs to avoid systems with CPUs with very high ID numbers
- Always have the BSP be CPU 0 to avoid remnant brokenness with non-0 BSPs
in other parts of the kernel.
- Improve parsing of the device tree CPU listings on SMT systems.
- Allow reboot via RTAS as well as OF for pSeries systems booted by FDT
without functioning Open Firmware.
Neither swapout_procs() nor swapout() access the map. Since the
process' vmspace is referenced only to obtain the pointer to the
vm_map, the reference is not needed as well.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13681
Nathan Whitehorn [Fri, 29 Dec 2017 20:30:10 +0000 (20:30 +0000)]
Add support for 64-bit PowerPC kernels to be directly loaded by kexec, which
is used as the bootloader on a number of PPC64 platforms. This involves the
following pieces:
- Making the first instruction a valid kernel entry point, since kexec
ignores the ELF entry value. This requires a separate section and linker
magic to prevent the linker from filling the beginning of the section
with stubs.
- Adding an entry point at 0x60 past the first instruction for systems
lacking firmware CPU shutdown support (notably PS3).
- Linker script changes to support the above.
Nathan Whitehorn [Fri, 29 Dec 2017 20:25:15 +0000 (20:25 +0000)]
Maintain alignment of in-code 64-bit quantities by design rather than luck.
If these are not aligned, the linker has to emit a different type of
relocation that the early boot self-relocation code cannot handle, even
in principle, resulting in them being set to zero and the kernel crashing.
Ian Lepore [Fri, 29 Dec 2017 20:00:19 +0000 (20:00 +0000)]
Correct a mistake and reword a couple sentences to clarify that "the value"
refers to the scale value, not the kmem_arena size that results from scaling.
Marius Strobl [Fri, 29 Dec 2017 19:07:50 +0000 (19:07 +0000)]
- Don't allow userland to switch partitions; it's next to impossible
to recover from that, especially when something goes wrong.
- When userland changes EXT_CSD, update the kernel copy before using
relevant EXT_CSD bits in mmcsd_switch_part().
Alan Somers [Fri, 29 Dec 2017 18:42:55 +0000 (18:42 +0000)]
geli: fix the resize test on arm64
The resize test used bsdlabel(8), which is not available on all
architectures. Change it to use gpart(8) instead, which should be available
everywhere.
Warner Losh [Fri, 29 Dec 2017 18:08:35 +0000 (18:08 +0000)]
Fix ubldr. uboot/lib uses defines for the loader. It's part of the
loader, but not compile as loader (it's building a library), so we
can't just include loader.mk for the defines. Move LOADER_DISK_SUPPORT
back to defs.mk for the moment.
Alan Somers [Fri, 29 Dec 2017 16:06:10 +0000 (16:06 +0000)]
Fix potential TOCTTOU bug in the geli tests
This change mostly reverts r293436, which introduced the bug due to a belief
that geli(8) would allocate md(4) devices by itself. However, that belief is
incorrect. Instead of using linear probing to find available md(4) numbers,
it's best to use the existing attach_md function.
Marius Strobl [Fri, 29 Dec 2017 12:48:19 +0000 (12:48 +0000)]
- There is no need to keep the tuning error and re-tuning interrupts
enabled (though, no interrupt generation enabled for them) all the
time as soon as (re-)tuning is supported; only enable them and let
them generate interrupts when actually using (re-)tuning.
- Also disable all interrupts except SDHCI_INT_DATA_AVAIL ones while
executing tuning and not just their signaling.
Eitan Adler [Fri, 29 Dec 2017 04:49:59 +0000 (04:49 +0000)]
bsd-family-tree: add HardenedBSD
This adds HardenedBSD which is a pseudo-fork of FreeBSD. It hasn't had a
release yet, but does does have active users and a community. As such
document it as a branch off of FreeBSD-stable. Ideally this adds enough
space so that future releases are easy enough to add.
Nathan Whitehorn [Thu, 28 Dec 2017 23:49:53 +0000 (23:49 +0000)]
Remove ELF note for Open Firmware. It is marked optional in a single 1996
draft of a never-finalized standard (CHRP) and is irrelevant in practice
on FreeBSD since we load the kernel with loader(8) on Open Firmware
platforms anyway. Moreover, loader(8), which is directly loaded by Open
Firmware, has never had an equivalent note.
Bartek Rutkowski [Thu, 28 Dec 2017 22:57:34 +0000 (22:57 +0000)]
humanize_number(3): fix math edge case in rounding large numbers
Fix for remainder overflow, when in rare cases adding remainder to divider
exceeded 1 and turned the total to 1000 in final formatting, taking up
the space for the unit character.
The fix continues the division of the original number if the above case
happens -- added the appropriate check to the for loop performing
the division. This lowers the value shown, to make it fit into the buffer
space provided (1.0M for 4+1 character buffer, as used by ls).
Add test case for the reported bug and extend test program to support
providing buffer length (ls -lh uses 5, tests hard-coded 4).
In vm_swapout_map_deactivate_pages(), it is enough to lock the map for read.
Reviewed by: alc, markj (as part of the larger patch)
Tested by: pho (again, as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13671
Marius Strobl [Thu, 28 Dec 2017 21:46:09 +0000 (21:46 +0000)]
With the advent of interrupt remapping, Intel has repurposed bit 11
(now: Interrupt_Index[15]) and assigned the previously reserved bits
55:48 (Interrupt_Index[14:0] goes into 63:49 while Destination Field
used 63:56 and bit 48 now is Interrupt_Format) in the IO redirection
tables (see the VT-d specification, "5.1.5.1 I/OxAPIC Programming").
Thus, when not using interrupt remapping, ensure that all previously
reserved bits in the high part of the RTEs are zero instead of doing
a read-modify-write for their Destination Field bits only.
Otherwise, on machines based on Apollo Lake and its derivatives such
as Denverton, typically some of the previously preserved bits remain
set after boot when not employing interrupt remapping. The result is
that INTx interrupts are not getting delivered.
Note: With an AMD IOMMU, interrupt remapping apparently bypasses the
IO APIC altogether.
Sean Bruno [Thu, 28 Dec 2017 21:26:40 +0000 (21:26 +0000)]
e1000: Add support for Ice Lake and Cannon Lake
Ths add initial support for Ice Lake and Cannon Lake ethernet devices.
This also addressed errata 1.5.4.4 for Sky Lake and Kabby Lake devices:
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561