Scott Long [Tue, 6 Feb 2018 06:42:25 +0000 (06:42 +0000)]
Return a C errno for cam_periph_acquire().
There's no compelling reason to return a cam_status type for this
function and doing so only creates confusion with normal C
coding practices. It's technically an API change, but the periph API
isn't widely used. No efffective change to operation.
Gleb Smirnoff [Tue, 6 Feb 2018 04:16:00 +0000 (04:16 +0000)]
Followup on r302393 by cperciva, improving calculation of boot pages required
for UMA startup.
o Introduce another stage of UMA startup, which is entered after
vm_page_startup() finishes. After this stage we don't yet enable buckets,
but we can ask VM for pages. Rename stages to meaningful names while here.
New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
BOOT_RUNNING.
Enabling page alloc earlier allows us to dramatically reduce number of
boot pages required. What is more important number of zones becomes
consistent across different machines, as no MD allocations are done before
the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
startup_alloc(), however that may change, so vm_page_startup() provides
its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
functions calculates sizes of zones zone and kegs zone, and calculates how
many pages UMA will need to bootstrap.
It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
but also as args.size.
Kirk McKusick [Tue, 6 Feb 2018 00:19:46 +0000 (00:19 +0000)]
Occasional cylinder-group check-hash errors were being reported on
systems running with a heavy filesystem load. Tracking down this
bug was elusive because there were actually two problems. Sometimes
the in-memory check hash was wrong and sometimes the check hash
computed when doing the read was wrong. The occurrence of either
error caused a check-hash mismatch to be reported.
The first error was that the check hash in the in-memory cylinder
group was incorrect. This error was caused by the following
sequence of events:
- We read a cylinder-group buffer and the check hash is valid.
- We update its cg_time and cg_old_time which makes the in-memory
check-hash value invalid but we do not mark the cylinder group dirty.
- We do not make any other changes to the cylinder group, so we
never mark it dirty, thus do not write it out, and hence never
update the incorrect check hash for the in-memory buffer.
- Later, the buffer gets freed, but the page with the old incorrect
check hash is still in the VM cache.
- Later, we read the cylinder group again, and the first page with
the old check hash is still in the VM cache, but some other pages
are not, so we have to do a read.
- The read does not actually get the first page from disk, but rather
from the VM cache, resulting in the old check hash in the buffer.
- The value computed after doing the read does not match causing the
error to be printed.
The fix for this problem is to only set cg_time and cg_old_time as
the cylinder group is being written to disk. This keeps the in-memory
check-hash valid unless the cylinder group has had other modifications
which will require it to be written with a new check hash calculated.
It also requires that the check hash be recalculated in the in-memory
cylinder group when it is marked clean after doing a background write.
The second problem was that the check hash computed at the end of the
read was incorrect because the calculation of the check hash on
completion of the read was being done too soon.
- When a read completes we had the following sequence:
- bufdone()
-- b_ckhashcalc (calculates check hash)
-- bufdone_finish()
--- vfs_vmio_iodone() (replaces bogus pages with the cached ones)
- When we are reading a buffer where one or more pages are already
in memory (but not all pages, or we wouldn't be doing the read),
the I/O is done with bogus_page mapped in for the pages that exist
in the VM cache. This mapping is done to avoid corrupting the
cached pages if there is any I/O overrun. The vfs_vmio_iodone()
function is responsible for replacing the bogus_page(s) with the
cached ones. But we were calculating the check hash before the
bogus_page(s) were replaced. Hence, when we were calculating the
check hash, we were partly reading from bogus_page, which means
we calculated a bad check hash (e.g., because multiple pages have
been mapped to bogus_page, so its contents are indeterminate).
The second fix is to move the check-hash calculation from bufdone()
to bufdone_finish() after the call to vfs_vmio_iodone() so that it
computes the check hash over the correct set of pages.
With these two changes, the occasional cylinder-group check-hash
errors are gone.
Submitted by: David Pfitzner <dpfitzner@netflix.com>
Reviewed by: kib
Tested by: David Pfitzner
bwn(4): migrate bwn(4) to the native bhnd(9) interface, and drop siba_bwn.
- Remove the shim interface that allowed bwn(4) to use either siba_bwn or
bhnd(4), replacing all siba_bwn calls with their bhnd(4) bus equivalents.
- Drop the legay, now-unused siba_bwn bus driver.
- Clean up bhnd(4) board flag defines referenced by bwn(4).
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13518
John Baldwin [Mon, 5 Feb 2018 23:35:33 +0000 (23:35 +0000)]
Ignore relocation tables for non-memory-resident sections.
As a followup to r328101, ignore relocation tables for ELF object
sections that are not memory resident. For modules loaded by the
loader, ignore relocation tables whose associated section was not
loaded by the loader (sh_addr is zero). For modules loaded at runtime
via kldload(2), ignore relocation tables whose associated section is
not marked with SHF_ALLOC.
Reported by: Mori Hiroki <yamori813@yahoo.co.jp>, adrian
Tested on: mips, mips64
MFC after: 1 month
Sponsored by: DARPA / AFRL
John Baldwin [Mon, 5 Feb 2018 23:27:42 +0000 (23:27 +0000)]
Always give ELF brands a chance to veto a match.
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.
Adrian Chadd [Mon, 5 Feb 2018 20:37:29 +0000 (20:37 +0000)]
[arswitch] disable ARP copy-to-CPU port for AR9340 for now.
I'll have to go double check to see if it does indeed pass ARP frames between
switch ports with this disabled, but it seems required for the CPU port to see
ARP traffic.
Ed Maste [Mon, 5 Feb 2018 17:29:12 +0000 (17:29 +0000)]
Linuxolator whitespace cleanup
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator. Clean these up so that new
architectures do not inherit whitespace issues.
This was a hack to be able to mount ext4 filesystems read-only while not
supporting all the features. We now support all those features so it
doesn't make sense to keep the undocumented hack.
It is possible, for complex fork()/collapse situations, to have
sibling address spaces to partially share shadow chains. If one
sibling performs wiring, it can happen that a transient page, invalid
and busy, is installed into a shadow object which is visible to other
sibling for the duration of vm_fault_hold(). When the backing object
contains the valid page, and the wiring is performed on read-only
entry, the transient page is eventually removed.
But the sibling which observed the transient page might perform the
unwire, executing vm_object_unwire(). There, the first page found in
the shadow chain is considered as the page that was wired for the
mapping. It is really the page below it which is wired. So we unwire
the wrong page, either triggering the asserts of breaking the page'
wire counter.
As the fix, wait for the busy state to finish if we find such page
during unwire, and restart the shadow chain walk after the sleep.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14184
Modify ip6_get_prevhdr() to be able use it safely.
Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.
In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.
Reported by: Maxime Villard <max at m00nbsd dot net>
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14158
Adrian Chadd [Mon, 5 Feb 2018 07:05:28 +0000 (07:05 +0000)]
[arswitch] Enable ATU dump support for the AR9340.
This indeed uses the same registers as the AR8216 and later chips.
There seems to be an issue with ARP requests being sent out from the CPU
through this switch here, so figuring that out is next. Learning works fine on
the AR8327 ethernet switch on the /other/ gigabit ethernet port, so I don't
think it's the network stack or ethernet driver.
Marius Strobl [Mon, 5 Feb 2018 00:18:21 +0000 (00:18 +0000)]
Flesh out the creation of sparc64 UFS images. This has only been verified
to yield working images in a native build as rootgen.sh generally doesn't
support cross-testing so far.
psm(4): Fix panic occuring soon after PS/2 packet has been rejected by
synaptics or elantech sanity checker.
After packet has been rejected contents of packet buffer is not cleared
with setting of inputbytes counter to 0. So when this packet buffer is
filled again being an element of circular queue, new data appends to old
data rather than overwrites it. This leads to packet buffer overflow
after 10 rounds.
Fix it with setting of packet's inputbytes counter to 0 after rejection.
Dimitry Andric [Sun, 4 Feb 2018 20:33:47 +0000 (20:33 +0000)]
Bump clang's __FreeBSD_cc_version, to cope with r328816, which removed
-Wno-error=tautological-constant-compare again (this flag is now out of
-Wextra after upstream https://reviews.llvm.org/rL322901). Otherwise
the MK_SYSTEM_COMPILER logic will not build a cross-tools compiler.
Justin Hibbits [Sun, 4 Feb 2018 20:07:08 +0000 (20:07 +0000)]
Only look for L2 cache controllers for mpc85xx_cache
The L3 cache controller (Corenet Platform Cache) is listed with one of its
compatible strings as "cache", which this driver can't attach to. Restrict
to a known list of primary cache controller strings, as found in the l2cache
devicetree binding.
Justin Hibbits [Sun, 4 Feb 2018 15:37:58 +0000 (15:37 +0000)]
Minimal changes for MPR to build on architectures with physical addresses larger than virtual
Summary:
Some architectures use large (36-bit) physical addresses, with smaller
virtual addresses. Casting between vm_paddr_t (or bus_addr_t) and void * is
considered illegal, so cast through uintptr_t. No functional change on existing
platforms.
Alan Somers [Sun, 4 Feb 2018 14:49:55 +0000 (14:49 +0000)]
geom: don't write stack garbage in disk labels
Most consumers of g_metadata_store were passing in partially unallocated
memory, resulting in stack garbage being written to disk labels. Fix them by
zeroing the memory first.
gvirstor repeated the same mistake, but in the kernel.
Also, glabel's label contained a fixed-size string that wasn't
initialized to zero.
Fix regression introduced in r328806, preventing boot at least on all
PowerPC Apple hardware, and likely all Open Firmware systems.
The loader would allocate memory for its heap at whatever address Open
Firmware gave it, which would in general be the lowest unallocated address,
usually starting a page or two above 0. As the kernel is linked at 1 MB,
and loader insists on running the kernel at its link address, any heap
larger than 1 MB would overlap the kernel, causing loader memory allocations
to corrupt the kernel and vice versa.
Although r328806 made this problem much worse by increasing the heap size
to 8 MB, causing 88% of the loader heap to overlap with the kernel, the
problem has always existed. The old heap size was 1 MB and, unless that
started exactly at zero, which would cause other problems, some number of
pages of the loader heap still overlapped with the kernel.
This patch solves the issue in two ways and cleans up some related code:
- Moves the loader heap inside of the loader. This guarantees that the
heap will be contiguous with the loader and simplifies the heap
allocation code at no cost, since the heap lives in BSS.
- Moves the loader, previously at 28 MB and dangerously close to the kernel
it loads, a bit higher to 44 MB. This has the effect of breaking loader
on non-embedded PPC machines with < 48 MB of RAM, but we did not support
those anyway.
The fundamental problem is that the way loader loads ELF files is
incredibly fragile, but that can't be fixed without fundamental
architectural changes.
Marius Strobl [Sat, 3 Feb 2018 23:14:11 +0000 (23:14 +0000)]
o Let rtld(1) set up psABI user trap handlers prior to executing the
objects' init functions instead of doing the setup via a constructor
in libc as the init functions may already depend on these handlers
to be in place. This gets us rid of:
- the undefined order in which libc constructors as __guard_setup()
and jemalloc_constructor() are executed WRT __sparc_utrap_setup(),
- the requirement to link libc last so __sparc_utrap_setup() gets
called prior to constructors in other libraries (see r122883).
For static binaries, crt1.o still sets up the user trap handlers.
o Move misplaced prototypes for MD functions in to the MD prototype
section of rtld.h.
o Sprinkle nitems().
Xin LI [Sat, 3 Feb 2018 09:15:13 +0000 (09:15 +0000)]
After r328426, g_label depends on UFS (option FFS) code to read UFS
superblock, and the kernel will fail to link when UFS is not built
in. This commit makes it depend on a small portion of FFS bits and
thereby fixes build for this situation.
This is intended as an interim bandaid, and the actual superblock
reading code should probably be made independent of UFS, so we do
not need to depend on it (see kib@'s comment in the review for
details), and we will revisit this once the superblock check hashes
are all in place.
Ed Maste [Sat, 3 Feb 2018 01:23:48 +0000 (01:23 +0000)]
Make cross-endian loader changes apply only to powerpc
The cross-endian loader change in r328536 (review D12422) broke symbol
loading on (at least) amd64 kernels. Temporarily paper over the issue
by restricting the cross-endian support to only powerpc, until a proper
fix arrives.
Kirk McKusick [Fri, 2 Feb 2018 23:26:52 +0000 (23:26 +0000)]
Check and report error returns from sbput(3) calls.
Convert to using cgput(3) for writing cylinder groups.
Check and report error returns from cgput(3).
Dimitry Andric [Fri, 2 Feb 2018 22:28:12 +0000 (22:28 +0000)]
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
6.0.0 (branches/release_60 r324090).
This introduces retpoline support, with the -mretpoline flag. The
upstream initial commit message (r323155 by Chandler Carruth) contains
quite a bit of explanation. Quoting:
Introduce the "retpoline" x86 mitigation technique for variant #2 of
the speculative execution vulnerabilities disclosed today,
specifically identified by CVE-2017-5715, "Branch Target Injection",
and is one of the two halves to Spectre.
Summary:
First, we need to explain the core of the vulnerability. Note that
this is a very incomplete description, please see the Project Zero
blog post for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative
execution of the processor to some "gadget" of executable code by
poisoning the prediction of indirect branches with the address of
that gadget. The gadget in turn contains an operation that provides a
side channel for reading data. Most commonly, this will look like a
load of secret data followed by a branch on the loaded value and then
a load of some predictable cache line. The attacker then uses timing
of the processors cache to determine which direction the branch took
*in the speculative execution*, and in turn what one bit of the
loaded value was. Due to the nature of these timing side channels and
the branch predictor on Intel processors, this allows an attacker to
leak data only accessible to a privileged domain (like the kernel)
back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In
many cases, the compiler can simply use directed conditional branches
and a small search tree. LLVM already has support for lowering
switches in this way and the first step of this patch is to disable
jump-table lowering of switches and introduce a pass to rewrite
explicit indirectbr sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as a
trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures
the processor predicts the return to go to a controlled, known
location. The retpoline then "smashes" the return address pushed onto
the stack by the call with the desired target of the original
indirect call. The result is a predicted return to the next
instruction after a call (which can be used to trap speculative
execution within an infinite loop) and an actual indirect branch to
an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this
device. For 32-bit ABIs there isn't a guaranteed scratch register
and so several different retpoline variants are introduced to use a
scratch register if one is available in the calling convention and to
otherwise use direct stack push/pop sequences to pass the target
address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the
retpoline thunk by the compiler to allow for custom thunks if users
want them. These are particularly useful in environments like
kernels that routinely do hot-patching on boot and want to hot-patch
their thunk to different code sequences. They can write this custom
thunk and use `-mretpoline-external-thunk` *in addition* to
`-mretpoline`. In this case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are
from precompiled runtimes (such as crt0.o and similar). The ones we
have found are not really attackable, and so we have not focused on
them here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z
retpolineplt` (or use similar functionality from some other linker).
We strongly recommend also using `-z now` as non-lazy binding allows
the retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typic al workloads, and relatively minor hits (approximately
2%) even for extremely syscall-heavy applications. This is largely
due to the small number of indirect branches that occur in
performance sensitive paths of the kernel.
When using these patches on statically linked applications,
especially C++ applications, you should expect to see a much more
dramatic performance hit. For microbenchmarks that are switch,
indirect-, or virtual-call heavy we have seen overheads ranging from
10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically
reduce the impact of hot indirect calls (by speculatively promoting
them to direct calls) and allow optimized search trees to be used to
lower switches. If you need to deploy these techniques in C++
applications, we *strongly* recommend that you ensure all hot call
targets are statically linked (avoiding PLT indirection) and use both
PGO and ThinLTO. Well tuned servers using all of these techniques saw
5% - 10% overhead from the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality
available as soon as possible. Happy for more code review, but we'd
really like to get these patches landed and backported ASAP for
obvious reasons. We're planning to backport this to both 6.0 and 5.0
release streams and get a 5.0 release with just this cherry picked
ASAP for distros and vendors.
This patch is the work of a number of people over the past month:
Eric, Reid, Rui, and myself. I'm mailing it out as a single commit
due to the time sensitive nature of landing this and the need to
backport it. Huge thanks to everyone who helped out here, and
everyone at Intel who helped out in discussions about how to craft
this. Also, credit goes to Paul Turner (at Google, but not an LLVM
contributor) for much of the underlying retpoline design.
Adrian Chadd [Fri, 2 Feb 2018 22:05:36 +0000 (22:05 +0000)]
[arswitch] begin tidying up the learning and ATU management, introduce ATU APIs.
* Refactor the initial learning configuration (port learning, address expiry,
handling address moving between ports, etc, etc) into a separate HAL routine
* and ensure that it's consistent between switch chips - the AR8216,8316,724x,9331
SoCs all share the same switch code.
* .. the AR8327 needs doing - the defaults seem OK for now
* .. the AR9340 is different but it's also programmed now.
* Add support for flushing a single port worth of ATU entries
* Add support for fetching the ATU table from AR8216 and derived chips
Tested:
* AR9344, Carambola 2
TODO:
* Further testing on other chips
* Add AR9340 support
* Add AR8327 support
Warner Losh [Fri, 2 Feb 2018 15:40:49 +0000 (15:40 +0000)]
Invent new LDR_INTERP for the loader interpreter to use. Use this in
preference to LIBFICL{,32}. LIBFICL{,32} are now always defined, but
LDR_INTERP{,32} is defined empty when building w/o forth (aka the
simple interpreter) and defined to LIBFICL{,32} when we are building
forth.
Warner Losh [Fri, 2 Feb 2018 15:01:49 +0000 (15:01 +0000)]
Remove pcibios forth support.
I had thought that this would be useful. However it was committed too
late, and wound up being unused. It's in the way of future work now,
so retire it rather than bring it forward.
Warner Losh [Fri, 2 Feb 2018 15:01:44 +0000 (15:01 +0000)]
These 4th words were an attempt to allow integration into the boot
loader scripts. However, that path won't be taken after all it
seems. Remove this code before it decays into uselessness. Also remove
build dependencies on forth no longer needed.
Warner Losh [Fri, 2 Feb 2018 15:01:33 +0000 (15:01 +0000)]
Retire pnp.4th and the code needed only for 4th words used here.
This has never been installed. It was added to the tree disconnected
to the build in FreeBSD 5 (17 years ago) and has never been used as
far as I can tell. The desired improvements never really happened
(despite a couple minor cleanups along the way). It's relevance is
long past, so better to retire it.
On pageout, in vnode generic pager, for partially dirty page, only
clear dirty bits for completely invalid blocks.
Otherwise we might not write out the last chunk that is shorter than
512 bytes, if the file end is not aligned on disk block boundary.
This become important after the r324794.
PR: 225586
Reported by: tris_vern@hotmail.com
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Merge r1.120 from NetBSD:
Fix a pretty simple, yet pretty tragic typo: we should return IPPROTO_DONE,
not IPPROTO_NONE. With IPPROTO_NONE we will keep parsing the header chain
on an mbuf that was already freed.
Reported by: Maxime Villard <max at m00nbsd dot net>
MFC after: 3 days
Warner Losh [Fri, 2 Feb 2018 06:32:26 +0000 (06:32 +0000)]
Centralize several variables.
MK_CTF, MK_SSP, MK_PROFILE, NO_PIC, and INTERNALLIB are always the
same, so set them in defs.mk. MAN= is common, so set it here too.
This removes a lot of boring repetition from the Makefiles that added
almost no value.
Adrian Chadd [Fri, 2 Feb 2018 02:05:14 +0000 (02:05 +0000)]
[etherswitch] add the first pass of a simple API to flush and fetch the L2 address table from the ethernet switch.
This stuff may be a bit fluid during this -HEAD cycle as various other
switch features are added, but the current stuff is enough to drive
initial development and features on the atheros range of integrated
and external switches.
* add a method to flush the whole address table;
* add a method to flush all addresses on a given port;
* add a method to download the address table;
* .. and then a method to fetch entries from the address table.
The table fetch/read methods pass through to the drivers for now since
the drivers may implement different ways of fetching/caching the address
table data. The atheros devices for example fetch the table by
iterating over the table through a set of registers and so you need
to keep that locked whilst you iterate otherwise you may have the table
flushed half way by a port status change.
This is a no-op until the userland and arswitch code shows up.
Adrian Chadd [Thu, 1 Feb 2018 21:58:52 +0000 (21:58 +0000)]
[atheros] Fix-up the base address stuff after I did a drive-by with the calibration data location.
The old way required the data to be present really early and copied it from
memory mapped NOR flash; this only worked during kernel boot but not for
ath/ath_hal modules.