When the owner of the wire reference releases the last reference, it
might be that the page was already attempted to be freed (but free
cannot be performed at that time due to wire). Check that the page
was removed from the object as the indicator of the free attempt and
finish the free operation if so.
Do not leak rv->psind in some specific situations.
Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver). Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting. Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld. In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.
Clean psind on vm_reserv_break() to avoid the situation.
Do not use C constant suffixes. Bit values are small enough to not
require typing, despite they are used for 64bit MSR writes. The added
cast in hw_ibrs_recalculate() is redundand but I prefer to add it for
clarity.
Reported by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Justin Hibbits [Tue, 13 Feb 2018 03:44:50 +0000 (03:44 +0000)]
Unify metadata load files for arm, mips, powerpc, sparc64
Summary:
All metadata.c files are very similar, with only trivial changes. Unify them
into a single common file, with minor special-casing where needed.
Jeff Roberson [Mon, 12 Feb 2018 22:53:00 +0000 (22:53 +0000)]
Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.
Eric van Gyzen [Mon, 12 Feb 2018 19:49:20 +0000 (19:49 +0000)]
Update the MTU in affected routes when IPv6 RA changes the MTU
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU. Update the routes so they do not provide
stale MTUs.
This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.
Landon J. Fuller [Mon, 12 Feb 2018 19:36:26 +0000 (19:36 +0000)]
siba(4): Ignore disabled per-core address match entries.
Previously, the address regions described by disabled admatch entries would
be treated as being mapped to the given core; while incorrect, this was
essentially harmless given that the entries describe unused address space
on the few affected devices.
We now perform parsing of per-core admatch registers and interrupt flags in
siba_erom, correctly skip any disabled admatch entries, and use the
siba_erom API in siba_add_children() to perform enumeration of attached
cores.
Ian Lepore [Mon, 12 Feb 2018 17:41:11 +0000 (17:41 +0000)]
Add a new sysctl, debug.clock_do_io, to allow manully triggering a one-shot
read or write of all registered realtime clocks. In the read case, the
values read are simply discarded. For writes, there's no alternative but
to actually write the current system time to the device.
Mark the pages used for the initial page-table entries as wired. This
makes them consistent with the way other page-table pages are allocated.
It also provides the rest of the VM system a good clue that these pages
are used.
Ian Lepore [Mon, 12 Feb 2018 16:25:56 +0000 (16:25 +0000)]
Replace the existing print_ct() private debugging function with a set of
three public functions to format and print the three major data structures
used by realtime clock drivers (clocktime, bcd_clocktime, and timespec).
Warner Losh [Mon, 12 Feb 2018 15:31:53 +0000 (15:31 +0000)]
Add Lua as a scripting langauge to /boot/loader
liblua glues the lua run time into the boot loader. It implements all
the runtime routines that lua expects. In addition, it has a few
standard 'C' headers that nueter various aspects of the LUA build that
are too specific to lua to be in libsa. Many refinements from the
original code to improve implementation and the number of included lua
libraries. Use int64_t for lua_Number. Have "/boot/lua" be the default
module path. Numerous cleanups from the original GSoC project,
including hacking libsa to allow lua to be built with only one change
outside luaconf.h.
Add the final bit of lua glue to bring in liblua and plug into the
multiple interpreter framework, previously committed.
Add LOADER_LUA option, currently off by default.
Presently, this is an experimental option. One must opt-in to using
this by defining WITH_LOADER_LUA and WITHOUT_FORTH. It's been
lightly tested, so keep a backup copy of your old loader handy.
The menu code, coming in the next commit, hasn't been exhaustively
tested. A LUA boot loader is 60k larger than a FORTH one, which is
80k larger than a no-interpreter one. Subtle changes in size
may tip things past some subtle limit (the binary is ~430k now
when built with LUA). A future version may offer coexistance.
Bump FreeBSD version to 1200058 to mark the milestone.
Pedro Souza's 2014 Summer of Code project. Rui Paulo, Pedro Arthur,
Zakary Nafziger and Wojciech A. Koszek also contributed. Warner Losh
reworked it extensively into its current form.
Obtained from: https://wiki.freebsd.org/SummerOfCode2014/LuaLoader
Sponsored by: Google Summer of Code
Relnotes: Yes
MFC After: 1 month
Differential Review: https://reviews.freebsd.org/D14295
Warner Losh [Mon, 12 Feb 2018 14:48:05 +0000 (14:48 +0000)]
Use standard pattern for stdargs.h
We don't support older compilers. Most of the code in these files is
for pre-3.0 gcc, which is at least 15 years obsolete. Move to using
phk's sys/_stdargs.h for all these platforms.
Provide further mitigation against CVE-2017-5715 by flushing the
return stack buffer (RSB) upon returning from the guest.
This was inspired by this linux commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kvm?id=117cc7a908c83697b0b737d15ae1eb5943afe35b
Warner Losh [Mon, 12 Feb 2018 06:51:20 +0000 (06:51 +0000)]
Turn devmatch on by default.
Turn devmatch on by default. However, use 'start' instead of
'onestart' in the devmatch.conf file so the setting of
'devmatch_enable' is honored. Give an example of what to put in
devd.conf if you want to disable just the run-time part of devmatch.
Warner Losh [Mon, 12 Feb 2018 04:45:26 +0000 (04:45 +0000)]
Switch to using devmatch to autoload drivers. Remove usb.conf
as obsolete because devmatch gets its information from the same
place as the genration scripts.
Warner Losh [Sun, 11 Feb 2018 17:45:38 +0000 (17:45 +0000)]
Consistent macro indentation is the hobgoblin of small minds
Line up the macro definitions and names here like the machine/stdarg.h
files that this replaced. This is easier to read and also makes it
easier to match up with other includes. Also two space indent va_end
to match the rest of the surrounding if block.
Devin Teske [Sun, 11 Feb 2018 03:02:29 +0000 (03:02 +0000)]
Fix typo in dtrace_tcp(4)
Using args[2]->tcps_state as-documented results in error:
operator -> cannot be applied to pointer to type "void"
This error is accurate as the synopsis for tcp:::state-change is:
tcp:::state-change(void *, csinfo_t *, void *, tcpsinfo_t *, void *,
tcplsinfo_t *);
args[2] refers to the third argument which is always NULL (as-
documented). The to-state for the TCP connection state transition is
actually in the fourth argument, args[3]->tcps_state.
Justin Hibbits [Sat, 10 Feb 2018 17:17:15 +0000 (17:17 +0000)]
Fix uninitialized warning, and work around a bug in gcc over clobbering
Summary:
r329077 caused gcc to emit uninitialized use warnings. Attempting to
fix those warnings yielded the following warnings:
usr.bin/tftp/main.c: In function 'main':
usr.bin/tftp/main.c:181: warning: variable 'el' might be clobbered by
'longjmp' or 'vfork'
usr.bin/tftp/main.c:182: warning: variable 'hist' might be clobbered by
'longjmp' or 'vfork'
This is a known bug in gcc, found at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24239
Work around that by simply marking hist and el as static.
Eugene Grosbein [Sat, 10 Feb 2018 17:09:51 +0000 (17:09 +0000)]
ppp(8): fix code producing debugging logs
Fix several cases when long buffer is copied to shorter one
using snprintf that results in contents truncation and
clobbering unsaved errno value and creation of misleading logs.
Alexander Motin [Sat, 10 Feb 2018 00:55:46 +0000 (00:55 +0000)]
Teach mps(4) and mpr(4) drivers to autotune chain frames.
This is a first part of the change. It makes the drivers to calculate
the required number of chain frames to satisfy worst case scenarios, but
it does not change existing overly strict limits on them. The next step
will be to rewrite the allocator to not require megabytes of physically
contiguous address space, that may be problematic if done after boot,
after doing which the limits can be removed. Until that this code can
just correct user set limits, if they are set too high.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14261
Ed Maste [Sat, 10 Feb 2018 00:22:35 +0000 (00:22 +0000)]
Promote llvm-cov to a standalone option
Introduce WITH_/WITHOUT_LLVM_COV to match GCC's WITH_/WITHOUT_GCOV.
It is intended to provide a superset of the interface and functionality
of gcov.
It is enabled by default when building Clang, similarly to gcov and GCC.
This change moves one file in libllvm to be compiled unconditionally.
Previously it was included only when WITH_CLANG_EXTRAS was set, but the
complexity of a new special case for (CLANG_EXTRAS | LLVM_COV) is not
worth avoiding a tiny increase in build time.
Reviewed by: dim, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D142645
Fix PowerMac G5 thermal management, plus likely other bugs, introduced in
r328113 and affecting SMP systems.
The way the time is set on PowerMacs is racy and relies on all the
CPUs in the system setting a register simultaneously in a rendezvous. A
few-cycle delay can result in out-of-sync times, which can break the
scheduler and result in calls like mtx_sleep() and pause() never timing out
if the thread is migrated while sleeping. r328113 added a call to a no-op
function between the beginning of the rendezvous and setting the time that
was only called on APs and added enough cycles to cause a problematic offset.
For some reason, the fan-management code was the first place this appeared.
Kirk McKusick [Fri, 9 Feb 2018 19:50:47 +0000 (19:50 +0000)]
Merge biodone_finish() back into biodone(). The primary purpose is
to make the order of operations clearer to avoid the race condition
that was fixed in r328914. In particular, this commit corrects a
similar race that existed in the soft updates callback.
Doing some sleuthing through the SVN repository, it appears that
bufdone_finish() was added to support XFS:
Changes imported from XFS for FreeBSD project:
- add fields to struct buf (needed by XFS)
- 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3
- b_pin_count, count of pinned buffer
- add new B_MANAGED flag
- add breada() function to initiate asynchronous I/O on read-ahead blocks.
- add bufdone_finish(), bpin(), bunpin_wait() functions
Patches provided by: kan
Reviewed by: phk
Silence on: arch@
Conrad Meyer [Fri, 9 Feb 2018 19:46:51 +0000 (19:46 +0000)]
tftp(1): Fix libedit state corruption involving signals
This bug was first reported 14 years ago. The problem was understood 8.5
years ago. A patch that is functionally identical to this one was proposed
almost 8 years ago and languished in the PR system / Bugzilla.
PR: 63197
Submitted by: lxv AT omut.org, fernando.apesteguia AT gmail.com
Reported by: freebsd AT nbritton.org
Each contiguous range of blocks is printed on a line.
The distance metric is the size of the gap from the end of the
previous set of blocks to the beginning of the next set of blocks.
Short distances are desirable.
On bootup, the amd64 pmap initialization code creates page-table
mappings for the pages used for the kernel and some initial allocations
used for the page table. It maps the kernel and the blocks used for
these initial allocations using 2MB pages.
However, if the kernel does not end on a 2MB boundary, it still maps the
last portion using a 2MB page, but reports that the unused 4K blocks
within this 2MB allocation are free physical blocks. This means that
these same physical blocks could also be mapped elsewhere - for example,
into a user process. Given the proximity to the kernel text and data
area, it seems wise to avoid allowing someone to write data to physical
blocks also mapped into these virtual addresses.
(Note that this isn't a security vulnerability: the direct map makes
most/all memory on the system mapped into kernel space. And, nothing
in the kernel should be trying to access these pages, as the virtual
addresses are unused. It simply seems wise to avoid reusing these
physical blocks while they are mapped to virtual addresses so close
to the kernel text and data area.)
Consequently, let's reserve the physical blocks covered by the
page-table mappings for these initial allocations.
r328536 broke symbol loading on amd64 at least (and probably other
arches). r328826 contained the problem to ppc only by adding
pre-processors guards.
Fix this properly by moving the endianness conversion to separate
helper functions, and make the conversion more robust by using sizeof
instead of having to manually code the size of each field.
Finally list the fields in each structure in a macro in order to avoid
code repetition.
Gleb Smirnoff [Fri, 9 Feb 2018 04:45:39 +0000 (04:45 +0000)]
Fix boot_pages exhaustion on machines with many domains and cores, where
size of UMA zone allocation is greater than page size. In this case zone
of zones can not use UMA_MD_SMALL_ALLOC, and we need to postpone switch
off of this zone from startup_alloc() until full launch of VM.
o Always supply number of VM zones to uma_startup_count(). On machines
with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over
a page. In the latter case account VM zones for number of allocations
from the zone of zones.
o Rewrite startup_alloc() so that it will immediately switch off from
itself any zone that is already capable of running real alloc.
In worst case scenario we may leak a single page here. See comment
in uma_startup_count().
o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some
extra SYSINITs, e.g. vm_page_init() may sneak in before.
o While here, remove uma_boot_pages_mtx. With recent changes to boot
pages calculation, we are guaranteed to use all of the boot_pages
in the early single threaded stage.
Warner Losh [Fri, 9 Feb 2018 00:36:55 +0000 (00:36 +0000)]
Set script.lang in the environment to either 'forth' or 'simple' to
reflect what scripting language was compiled into the loader. I
anticipate that being able to find this out quickly from the OK prompt
will be useful in troubleshooting in the future.
Eric van Gyzen [Fri, 9 Feb 2018 00:13:05 +0000 (00:13 +0000)]
Fix ICMPv6 redirects
icmp6_redirect_input() validates that a redirect packet came from the
current gateway for the respective destination. To do this, it compares
the source address, which has an embedded scope zone id, to the next-hop
address, which does not. If the address is link-local, which should be
the case, the comparison fails and the redirect is ignored.
Insert the scope zone id into the next-hop address so the comparison
is accurate.
Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases.
Kirk McKusick [Thu, 8 Feb 2018 23:06:58 +0000 (23:06 +0000)]
The goal of this change is to prevent accidental foot shooting by
folks running filesystems created on check-hash enabled kernels
(which I will call "new") on a non-check-hash enabled kernels (which
I will call "old). The idea here is to detect when a filesystem is
run on an old kernel and flag the filesystem so that when it gets
moved back to a new kernel, it will not start getting a slew of
check-hash errors.
Back when the UFS version 2 filesystem was created, it added a file
flag FS_INDEXDIRS that was to be set on any filesystem that kept
some sort of on-disk indexing for directories. The idea was precisely
to solve the issue we have today. Specifically that a newer kernel
that supported indexing would be able to tell that the filesystem
had been run on an older non-indexing kernel and that the indexes
should not be used until they had been rebuilt. Since we have never
implemented on-disk directory indicies, the FS_INDEXDIRS flag is
cleared every time any UFS version 2 filesystem ever created is
mounted for writing.
This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH
flag. Thus, the FS_METACKHASH is definitively known to have always
been cleared. The FS_INDEXDIRS flag has been moved to a new block
of flags that will always be cleared starting with this commit
(until they get used to implement some future feature which needs
to detect that the filesystem was mounted on a kernel that predates
the new feature).
If a filesystem with check-hashes enabled is mounted on an old
kernel the FS_METACKHASH flag is cleared. When that filesystem is
mounted on a new kernel it will see that the FS_METACKHASH has been
cleared and clears all of the fs_metackhash flags. To get them
re-enabled the user must run fsck (in interactive mode without the
-y flag) which will ask for each supported check hash whether it
should be rebuilt and enabled. When fsck is run in its default preen
mode, it will just ignore the check hashes so they will remain
disabled.
The kernel has always disabled any check hash functions that it
does not support, so as more types of check hashes are added, we
will get a non-surprising result. Specifically if filesystems get
moved to kernels supporting fewer of the check hashes, those that
are not supported will be disabled. If the filesystem is moved back
to a kernel with more of the check-hashes available and fsck is run
interactively to rebuild them, then their checking will resume.
Otherwise just the smaller subset will be checked.
A side effect of this commit is that filesystems running with
cylinder-group check hashes will stop having them checked until
fsck is run to re-enable them (since none of them currently have
the FS_METACKHASH flag set). So, if you want check hashes enabled
on your filesystems after booting a kernel with these changes, you
need to run fsck to enable them. Any newly created filesystems will
have check hashes enabled. If in doubt as to whether you have check
hashes emabled, run dumpfs and look at the list of enabled flags
at the end of the superblock details.
Warner Losh [Thu, 8 Feb 2018 22:59:51 +0000 (22:59 +0000)]
Fix build of userboot.so
Since it's not possible to unset a variable easily, create a new
variable 'PIC' to signal that we are creating a shared object that we
want to install. defs.mk refains from defining NO_PIC and ITNERALLIB
when PIC is defined. This unbreaks userboot.so building.
Dimitry Andric [Thu, 8 Feb 2018 21:11:48 +0000 (21:11 +0000)]
Pull in r324594 from upstream clang trunk (by Alexander Ivchenko):
Fix for #31362 - ms_abi is implemented incorrectly for values >=16
bytes.
Summary:
This patch is a fix for following issue:
https://bugs.llvm.org/show_bug.cgi?id=31362 The problem was caused by
front end lowering C calling conventions without taking into account
calling conventions enforced by attribute. In this case win64cc was
no correctly lowered on targets other than Windows.
This fixes clang 6.0.0 assertions when building the emulators/wine and
emulators/wine-devel ports, and should also make it use the correct
Windows calling conventions. Bump __FreeBSD_version to make the fix
easy to detect.
Warner Losh [Thu, 8 Feb 2018 17:07:27 +0000 (17:07 +0000)]
Move to tabs for indentation and to 8-space notches, per style(9).
4 space indentation with a mix of tabs and spaces is a hassle. Update
to project-standard hard-tabs with 8-space indentation in these files.
This matches the new code coming in better as well.
Justin Hibbits [Thu, 8 Feb 2018 05:18:30 +0000 (05:18 +0000)]
Temporarily widen count for interrupt rate calculations on 32-bit archs
If the interrupt count is very high (greater than ~42M), notably on one-shot
execution on long running systems, the intermediate multiplication step in the
rate calculation will overflow the width of a 32-bit architecture long (32
bits), causing the rest of the calculation to calculate with a truncated value,
and report very low rates (sometimes 0).
Andriy Gapon [Wed, 7 Feb 2018 21:51:59 +0000 (21:51 +0000)]
exec_map_first_page: fix an inverse condition introduced in r254138
While the bug itself was serious, as we could either pass a non-busied
page to vm_pager_get_pages() or leak a busy page, it could only be
triggered under a very rare condition where the page is already inserted
into the object, but it is not valid yet.