Mark Johnston [Tue, 2 Jan 2018 18:11:54 +0000 (18:11 +0000)]
Fix some I/O ordering issues in gmirror.
- BIO_FLUSH requests were dispatched to the disks directly from
g_mirror_start() rather than going through the mirror's I/O request
queue, so they could have been reordered with preceding writes.
Address this by processing such requests from the queue, avoiding
direct dispatch.
- Handling for collisions with synchronization requests was too
fine-grained and could cause reordering of writes. In particular,
BIO_ORDERED was not being honoured. Address this by effectively
freezing the request queue any time a collision with a synchronization
request occurs. The queue is unfrozen once the collision with the
first frozen request is over.
- The above-mentioned collision handling allowed reads to jump ahead
of writes to the same offset. Address this by freezing all request
types when a collision occurs, not just BIO_WRITEs and BIO_DELETEs.
Also add some more fail points for use in testing error handling.
Conrad Meyer [Tue, 2 Jan 2018 17:25:13 +0000 (17:25 +0000)]
rpcbind: Fix race in signal termination
If a signal was delivered while the main thread was not in poll(2) and after
check was performed, we could reenter poll and never detect termination. Fix
this with the pipefd trick. (This race was introduced very recently, in
r327482.)
There has been some fallout from the change. The change itself was not valueable
enough to spend time investigating the corner cases, let's just back it out.
Ed Maste [Tue, 2 Jan 2018 14:07:55 +0000 (14:07 +0000)]
elfcopy: copy raw (untranslated) contents to binary output
Previously elfcopy used elf_getdata to obtain data from ELF sections
being copied to binary output, but elf_getdata returns data that has
been translated - that is, data is in host byte order. When the host and
target differ in endianness (e.g., converting a big-endian MIPS ELF
object to binary on an x86 host) this resulted in byte-swapped data in
certain sections such as .dynamic.
Instead use elf_rawdata to keep data in the original, target endianness.
Reported by: Hiroki Mori <yamori83@yahoo.co.jp>, Bill Yuan
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Add installer support for PS3 and PowerNV systems, also laying the
foundation for invoking efibootmgr as part of new-style EFI booting on
x86. On PS3 and PowerNV, which are booted using Linux kexec from petitboot
rather than by loader(8), install the kernel and the rest of /boot to a
FAT partition and set up the appropriate petitboot configuration file
there.
The new bootconfig installer stage can do platform-dependent modifications
more complex than partition layout and installation of boot blocks and can
be used to (as here) set up some special configuration files, run efibootmgr,
or boot0cfg.
Skip errors from being unable to set modification and creation times. If
one of the directories in the filesystem hierarchy is a FAT mountpoint,
settings its times will fail, which would cause installation to abort.
Instead, make this a best-effort thing.
Handling this error is a hack and a better internal scheme for handling
this should be added to libarchive.
Conrad Meyer [Tue, 2 Jan 2018 00:48:19 +0000 (00:48 +0000)]
rpcbind: Do not use signal-unsafe functions in SIGTERM handler
syslog(3), routines used in write_warmstart(), and exit(3) are all
signal-unsafe. Instead, set a signal-safe flag and check the flag in the
rpcbind main loop to shutdown safely.
Adrian Chadd [Tue, 2 Jan 2018 00:07:28 +0000 (00:07 +0000)]
[net80211] convert all of the WME use over to a temporary copy of WME info.
This removes the direct WME info access in the ieee80211com struct and instead
provides a method of fetching the data. Right now it's a no-op but eventually
it'll turn into a per-VAP method for drivers that support it (eg iwn, iwm,
upcoming ath10k work) as things like p2p support require this kind of behaviour.
Tested:
* ath(4), STA and AP mode
TODO:
* yes, this is slightly stack size-y, but it is an important first step
to get drivers migrated over to a sensible WME API. A lot of per-phy things
need to be converted to per-VAP before P2P, 11ac firmware, etc stuff shows up.
Eitan Adler [Mon, 1 Jan 2018 22:33:57 +0000 (22:33 +0000)]
shutdown: Assume absolute time is in the future
The original bug describes it best:
When an absolute time is specified to shutdown, the program's
behavior depends on whether that time has passed during the
current calendar day. POLA would suggest that for shutdown,
whose time argument is always supposed to be in the future,
absolute times specified without a specific date should refer
to the next occurrence of that time, rather than erroring out
if that time has already passed during the current day.
PR: 32411
Submitted by: wollman@khavrinen.lcs.mit.edu
Submitted on: 2001-11-30 20:30:01 UTC
Reviewed by: asmodai (at time of bug submission)
It does not change anything in the behavior of trap_pfault(), while
eliminating obfuscation of jumping to the code which checks for the
condition reversed of the goto cause. Also avoid force initialize the
rv variable, since it is now only accessed after storing vm_fault()
return value.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13725
On a load where single anonymous object consumes almost all memory on
the large system, swapout code executes the iteration over the
corresponding object page queue for long time, owning the map and
object locks. This blocks pagedaemon which tries to lock the object,
and blocks other threads in the process in vm_fault() waiting for the
map lock.
Handle the issue by terminating the deactivation loop if we executed
too long and by yielding at the top level in vm_daemon.
Reported by: peterj, pho
Reviewed by: alc
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13671
Warner Losh [Mon, 1 Jan 2018 05:13:03 +0000 (05:13 +0000)]
Remove sys/mips/rmi. It's been unmaintained since 2011. This hardware
is now unobtanium. It's only had API changes in the last 7 years, and
is responsible for a very large number of them. In addition, there's a
lot of code that reimplements base FreeBSD functionality, diminishing
the chances it still works. Without hardware to teset it on, or
prospects of obtaining such hardware and without vendor support, it's
time to move on.
Warner Losh [Mon, 1 Jan 2018 04:10:36 +0000 (04:10 +0000)]
Remove support for IDT. Only the RouterBoard RB533 used this chip, and
it's at least 5 years out of production. I couldn't find a used one on
ebay and other secondary markets just now, nor when I tried 4 years
ago. It dates from the initial project/mips2 merge 8 years ago, and
hasn't been updated since.
Warner Losh [Mon, 1 Jan 2018 04:10:31 +0000 (04:10 +0000)]
Retire old ADM 5120 port. It never grew much beyond the original port.
It came into the tree with the project/mips merge 8 years ago. At the
time, it was hard to find a board with enough RAM to run. Now FreeBSD
requires at least 2x the RAM it did then. No changes have happened to
this port apart from API churn and license tagging since then. It ran
OK at the time it was committed, but no sightings in the wild have
happened since shortly after it was committed.
https://www.linux-mips.org/wiki/Adm5120_devices lists a bunch of
boards that were available 5 years ago (but are no longer
available). The beefiest one had only 64MB of RAM which is too
small. The Mirktik RB1xx never had more than 32MB.
Also remove confusing QEMU config file that never ever worked in QEMU
for mips. MALTA is used for that. Another of my past mistakes, false
starts that never amounted to anything.
Warner Losh [Mon, 1 Jan 2018 04:10:25 +0000 (04:10 +0000)]
Remove sys/mips/alchemy. It was still-born when I committed it and it
never got better. It never worked on real hardware and is still mostly
stubs after 8 years when I added it. It has had no real update in that
time apart from API churn. It was added just so it didn't get lost in
the project/mips merge, but maybe it should have been lost as nothing
has come of it. It is time to give up the ghost on this one.
Approved by: me, shooting my own dog
Discussed on: mips@
After removal of loader.ps3, change petitboot configuration in release media
to directly kexec the kernel. Unlike the old loader.ps3 code, this also works
on PowerNV systems, which also use petitboot.
Ian Lepore [Sun, 31 Dec 2017 22:43:24 +0000 (22:43 +0000)]
Add a validbcd() routine that uses the bcd2bin_data[] array and returns a
bool indicating whether the input value represents a valid BCD byte.
The existing bcd2bin() routine will KASSERT if asked to convert a bad value,
but sometimes the kernel has to handle BCD data from untrusted sources, so
this will provide a mechanism to validate data before attempting conversion.
This would be have easier/cleaner if the bcd2bin_data[] array contained an
out-of-range value (such as 0xff) in the infill locations that aren't valid,
but it's a global symbol that might be referenced by out-of-tree code
relying on the current scheme, so I'm leaving that alone.
Kyle Evans [Sun, 31 Dec 2017 22:35:32 +0000 (22:35 +0000)]
aw_sid: Add support for a64
Newer Allwinner SoCs have nearly identical SID controllers with efuse space
starting at 0x200 into their register space and thermal data available at
0x234, making all of these fairly trivial additions.
The h3 will be added at a later time after some testing, due to a silicon
bug that causes the rootkey (at least) to be read incorrectly unless first
read via the control register.
Alan Cox [Sun, 31 Dec 2017 21:36:42 +0000 (21:36 +0000)]
The variable "minslptime" is pointless and always has been, ever since its
introduction in r83366. (At that time, this code appeared in vm/vm_glue.c,
because vm/vm_swapout.c did not exist.) When the FOREACH_THREAD loop
completes, we know that the sleep time for every thread is above whichever
threshold is being applied.
Colin Percival [Sun, 31 Dec 2017 21:00:21 +0000 (21:00 +0000)]
Wrap includes in sys/tslog.h with #ifdef TSLOG.
This is necessary because some non-kernel code #defines _KERNEL and then
includes kernel headers; as a result, it was getting conflicting versions
of curthread and curproc. Non-kernel code should probably refrain from
defining _KERNEL, but for now hiding these indirect inclusions fixes the
build.
Nathan Whitehorn [Sun, 31 Dec 2017 20:23:39 +0000 (20:23 +0000)]
Remove PIR from PCPU data. It has an implementation-defined meaning that
is of limited utility outside of platform-specific code and can vary
at runtime when running as a hypervisor guest, so does not even have the
virtue of being a static identifier.
vt(4): add support for configurable console palette
Introduce new set of loader tunables kern.vt.color.N.rgb, where N is a
number from 0 to 15. The value is either comma-separated list decimal
numbers ranging from 0 to 255 that represent values of red, green, and
blue components respectively (i.e. "128,128,128") or 6-digit hex triplet
commonly used to represent colors in HTML or xterm settings (i.e. #808080)
Each tunable overrides one of the 16 hardcoded palette codes and can be set
in loader.conf(5)
Nathan Whitehorn [Sun, 31 Dec 2017 20:10:08 +0000 (20:10 +0000)]
Make newer binutils happy by using a bl-type branch instead of b, which
displeases it for some reason. LR is not relevant in this code, so just
do what it wants.
Nathan Whitehorn [Sun, 31 Dec 2017 20:08:16 +0000 (20:08 +0000)]
Provide relative, as well as absolute, addresses in trap panic panics. This
makes it easier to cross-correlate them with instruction listings without
worrying about where the kernel was relocated to.
Ian Lepore [Sun, 31 Dec 2017 18:53:13 +0000 (18:53 +0000)]
Allow use of .WAIT in the LOCAL_DIRS and LOCAL_LIB_DIRS lists.
A comment in Makefile.inc1 has long stated that LOCAL_DIRS are built last,
after the base system. Incremental improvements in parallel building over
the years have led to LOCAL_DIRS being built in parallel with base system
directories. This change allows the .WAIT directive to appear in LOCAL_DIRS
and LOCAL_LIB_DIRS lists to give the user some control over parallel
building of local additions.
Colin Percival [Sun, 31 Dec 2017 09:24:41 +0000 (09:24 +0000)]
Use the TSLOG framework to record entry/exit timestamps for DELAY and
_vprintf; these functions are called in many places and can contribute
meaningfully to the total time spent booting.
Colin Percival [Sun, 31 Dec 2017 09:24:11 +0000 (09:24 +0000)]
Instrument thread creations for the the benefit of the TSLOG framework.
This assists in tracking time spent while the boot is being "held" waiting
for something to happen.
Colin Percival [Sun, 31 Dec 2017 09:23:52 +0000 (09:23 +0000)]
Instrument "boot holds" for the benefit of the TSLOG framework. These
are places where the "main thread" of the booting kernel (either the
thread which later becomes swapper or the thread which later becomes
init) has to stop and wait for action to take place in another thread
before continuing.
There are currently three such holds:
1. The intr_config_hooks SYSINIT waits for hooks registered via the
config_intrhook_establish function; this allows (typically) devices
which need interrupts enabled to complete their initialization to do
so before root is mounted.
2. The g_waitidle function waits for the GEOM event queue to be empty;
this ensures that all of the disks which have been attached have been
tasted before we attempt to mount root.
3. The vfs_mountroot_wait function (in addition to calling g_waitidle)
waits for holds registered via root_mount_hold; among other things, this
is used by the USB subsystem to ensure that we don't fail to mount root
if it's located on a USB disk which takes a while to probe.
Colin Percival [Sun, 31 Dec 2017 09:23:19 +0000 (09:23 +0000)]
Teach makeobjops.awk to accept PROLOG and EPILOG blocks before
METHOD and STATICMETHOD declarations; that code will be inserted
into the dispatch function before and after the method call.
Use this functionality and the TSLOG framework to record DEVICE_ATTACH
and DEVICE_PROBE entry/exit timestamps.
Colin Percival [Sun, 31 Dec 2017 09:22:31 +0000 (09:22 +0000)]
Use the TSLOG framework to record entry/exit timestamps for machine
independent functions with important roles in the early boot process:
mi_startup (with the "exit" recorded when it becomes swapper),
start_init (with the "exit" recorded when the thread is about to
"return" into the newly created init process), vfs_mountroot, and
vfs_mountroot_wait.
Colin Percival [Sun, 31 Dec 2017 09:22:07 +0000 (09:22 +0000)]
Use the TSLOG framework to record entry/exit timestamps for hammer_time.
The entry must be logged "manually" using TSRAW rather than TSENTER
since PCPU data structures have not yet been initialized and thus
curthread cannot be accessed; &thread0 is what will become curthread
later in hammer_time.
Other MD initialization code should be similarly instrumented in order
to gain visibility into the time spent before entering mi_startup; this
will require some care and testing from people with access to such
hardware.
Colin Percival [Sun, 31 Dec 2017 09:21:34 +0000 (09:21 +0000)]
Connect kern_tslog.c to the build and add TSLOG / TSLOGSIZE kernel options.
These are intended for debugging purposes and should not be added to
"generic" kernel configurations since they result in a nontrivial amount
of memory being set aside for this purpose, can break if kernel modules are
unloaded, and can potentially leak a dangerous amount of information about
timestamps used as a source of kernel entropy.
Colin Percival [Sun, 31 Dec 2017 09:21:01 +0000 (09:21 +0000)]
Code for recording timestamps of events, especially function entries/exits.
This is a very primitive system, intended for use in measuring performance
during the early system boot, before more sophisticated tools like DTrace
or infrastructure like kernel memory allocation and mutexes are available.
Because this code records pointers to strings rather than copying strings
(in order to keep the memory usage more manageable), if a kernel module is
unloaded after logging an event, Bad Things can happen. Users are advised
to not do that.
Since cycle counts from the early kernel boot are used as an initial entropy
source, publishing this information to userland could result in inadequate
entropy being kept private to the kernel RNG. Users are advised to not
enable this on systems with untrusted users.
Nathan Whitehorn [Sun, 31 Dec 2017 06:10:07 +0000 (06:10 +0000)]
Use data from the boot loader to pick the appropriate output graphics mode
instead of hard-coding a default. This information is passed implicitly by
the PS3 firmware and can be relied upon. Also adjust the default mode, if
somehow firmware doesn't pass one, to 1920x1080 from 720x480 since it is
2017.
Kyle Evans [Sun, 31 Dec 2017 05:22:26 +0000 (05:22 +0000)]
stand/fdt: Make fdt_overlay_apply signature-compatible with libfdt
libfdt will assume a writable fdt overlay blob has been passed in, so make
ours compatible to allow easier review when we try to drop libfdt into
place. overlay from the calling context is writable, making it safe to
simply rip out everything related to copying the overlay blob in
fdt_overlay_apply.
I note here that we still have problems: fdt_overlay_apply, both our version
and libfdt's, may fail and have already clobbered the base fdt to some
extent. Future work will make sure we don't apply a potentially bogus fdt,
instead discarding the base fdt if we had an error.
Alan Cox [Sun, 31 Dec 2017 04:01:47 +0000 (04:01 +0000)]
Previously, swap_pager_copy() freed swap blocks one at at time, via
swp_pager_meta_ctl(), with no opportunity to recognize freeing of
consecutive blocks and free fewer block ranges. To open that opportunity,
this change removes the SWM_FREE option from swp_pager_meta_ctl(), and
compels the caller to do the freeing when a valid block address is returned.
In swap_pager_copy(), these frees are aggregated, so that a sequence of them
can be done at one time.
The only other caller to swp_pager_meta_ctl() that passed SWM_FREE,
swp_pager_unswapped(), is also modified to handle its single free
explicitly.
Pedro F. Giffuni [Sun, 31 Dec 2017 03:34:00 +0000 (03:34 +0000)]
sysv_{ipc|shm}: update the NetBSD VCS tags to match nearer our files.
Both files originated in NetBSD:
sysv_ipc.c CVS 1.9:
Most of their changes don't apply to us as we already have similar
changes. This is a better reference for future merges.
sysv_shm.c CVS 1.39:
Most of their changes don't apply to our code but interestingly this
revision merged our changes and is a better point for reference.
Move the VCS tags to the position recommended in our committers guide
(section 8),
Mateusz Guzik [Sun, 31 Dec 2017 00:47:04 +0000 (00:47 +0000)]
locks: re-check the reason to go to sleep after locking sleepq/turnstile
In both rw and sx locks we always go to sleep if the lock owner is not
running.
We do spin for some time if the lock is read-locked.
However, if we decide to go to sleep due to the lock owner being off cpu
and after sleepq/turnstile gets acquired the lock is read-locked, we should
fallback to the aforementioned wait.
Mateusz Guzik [Sun, 31 Dec 2017 00:33:28 +0000 (00:33 +0000)]
mtx: pre-read the lock value in thread_lock_flags_
Since this function is effectively slow path, if we get here the lock is most
likely already taken in which case it is cheaper to not blindly attempt the
atomic op.
While here move hwpmc probe out of the loop to match other primitives.
Nathan Whitehorn [Sat, 30 Dec 2017 20:27:13 +0000 (20:27 +0000)]
Garbage-collect loader.ps3. It is currently disconnected from the build and
is superseded by either direct loading of the kernel by petitboot (soon to
become the installer default) or loader.kboot.
Nathan Whitehorn [Sat, 30 Dec 2017 20:23:14 +0000 (20:23 +0000)]
Check more aggressively for whether the desired properties actually exist.
If they don't, the code would look up some random part of the device tree
and seize the console inappropriately.
The ep(4) driver is the only consumer of the two functions from
elink.c. I removed the standalone module as well, and most likely,
the module metadata is not needed anywhere, but this is for later
cleanup.
Discussed with: imp, jhb
Sponsored by: The FreeBSD Foundation
The i386 FPU (AKA npx) code does not depend on ISA devices at all,
after the support for IRQ13 FPU exceptions was removed. Put the file
into the expected place in the kernel source tree.
Discussed with: jhb
Sponsored by: The FreeBSD Foundation