Andriy Gapon [Wed, 7 Feb 2018 21:51:59 +0000 (21:51 +0000)]
exec_map_first_page: fix an inverse condition introduced in r254138
While the bug itself was serious, as we could either pass a non-busied
page to vm_pager_get_pages() or leak a busy page, it could only be
triggered under a very rare condition where the page is already inserted
into the object, but it is not valid yet.
Rework ipfw dynamic states implementation to be lockless on fast path.
o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.
Warner Losh [Wed, 7 Feb 2018 18:33:53 +0000 (18:33 +0000)]
Cull Atmel board configs no longer relevant.
Remove most of the Atmel at91 boards. Most of them are no longer
relevant or used by people. Kept ATMEL since it should work on all the
boards that still work (I've not confirmed this, since I don't have
all these boards). Also kept SAM9G20EK, since I have several boards
that it is used on. If I've deleted a kernel in error, please let me
know.
Gleb Smirnoff [Wed, 7 Feb 2018 18:32:51 +0000 (18:32 +0000)]
Fix three miscalculations in amount of boot pages:
o Most of startup zones have struct uma_slab embedded into the slab,
so provide macro UMA_SLAB_SPACE and use it instead of UMA_SLAB_SIZE,
when calculating how many pages would certain kind of allocations
require. Some zones are offpage, so we might have a positive inaccuracy.
o The keg for the zone of zones is allocated "dynamically", so we
need +1 when calculating amount of pages for kegs. [1]
o The zones of zones and zones of kegs have arbitrary alignment of 32,
and this also needs to be accounted for. [2]
While here, spread more comments and improve diagnostic messages.
Alex Richardson [Wed, 7 Feb 2018 16:58:01 +0000 (16:58 +0000)]
Fix compilation of mips_postboot_fixup() with a C11 compiler
The _Alignas specifier must come before the declaration and not after. It
works if _Alignas() expands to __attribute__(aligned(x)) which was the only
case I tested before.
Mark Johnston [Wed, 7 Feb 2018 16:57:10 +0000 (16:57 +0000)]
Dequeue wired pages lazily.
Previously, wiring a page would cause it to be removed from its page
queue. In the common case, unwiring causes it to be enqueued at the tail
of that page queue. This change modifies vm_page_wire() to not dequeue
the page, thus avoiding the highly contended page queue locks. Instead,
vm_page_unwire() takes care of requeuing the page as a single operation,
and the page daemon dequeues wired pages as they are encountered during
a queue scan to avoid needlessly revisiting them later. For pages in
PQ_ACTIVE we do even better, since a requeue is unnecessary.
The change improves scalability for some common workloads. For instance,
threads wiring pages into the buffer cache no longer need to modify
global page queues, and unwiring is usually done by the bufspace thread,
so concurrency is not as much of an issue. As another example, many
sysctl handlers wire the output buffer to avoid faults on copyout, and
since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid
modifying the page queue in this case.
The change also adds a block comment describing some properties of
struct vm_page's reference counters, and the busy lock.
Warner Losh [Wed, 7 Feb 2018 16:28:26 +0000 (16:28 +0000)]
Add a note about why we have the conditional before including
bsd.compiler.mk. It's so fmake from older 9.x systems still
works (still a supported build config, and having the note here
will let us know when we can cull it more easily).
Also pull in a related change from include to sinclude from
arichardson@'s cross building work, as well as it's companion in
Makefile.inc1 with a note about why we do the odd thing there.
Adrian Chadd [Wed, 7 Feb 2018 09:37:22 +0000 (09:37 +0000)]
[ath] Use the BSSID address logic for STA VAPs too.
For DWDS VAPs on ath(4) we need to ensure that the STA vap and hostap VAP
have different MAC addresses. If the STA code path doesn't utilise the
address assign / reclaim path then it doesn't update the bitmap with which
address was allocated.
This should fix a bunch of corner issues I've been seeing with DWDS STA + AP
VAPs that I was working around with manual MAC address assignment.
Kyle Evans [Wed, 7 Feb 2018 01:54:13 +0000 (01:54 +0000)]
if_awg: Skip emac reset if configured for internal PHY
On the OrangePi One at least, emac reset when an ethernet cable is not
plugged in seems to break ethernet. Soft reset will fail, even with
increasing the delay and retries to wait for up to 20 seconds. This can be
reproduced across at least two different OrangePi One's by simply leaving
ethernet cable unplugged when awg attaches. Whether it's plugged in or not
through u-boot process makes no difference.
Skipping the reset in this configuration doesn't seem to cause any problems,
tried across many many reboots with and without ethernet cable plugged in.
Tested on: OrangePi One
Tested on: Other boards (manu)
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D13974
Warner Losh [Tue, 6 Feb 2018 23:12:16 +0000 (23:12 +0000)]
Avoid find -s, use find | sort instead.
find -s was introduced to make the metalog more
deterministic. However, find -s is not portable. find | sort is
portable and accomplishes the same goals, even if it isn't
pedantically the same. TZS is the same before / after the change so
any fussy differences between the two are moot and there won't be
METALOG churn across this change.
Jeff Roberson [Tue, 6 Feb 2018 22:10:07 +0000 (22:10 +0000)]
Use per-domain locks for vm page queue free. Move paging control from
global to per-domain state. Protect reservations with the free lock
from the domain that they belong to. Refactor to make vm domains more
of a first class object.
Gleb Smirnoff [Tue, 6 Feb 2018 22:06:59 +0000 (22:06 +0000)]
Fix boot_pages calculation for machines that don't have UMA_MD_SMALL_ALLOC.
o Call uma_startup1() after initializing kmem, vmem and domains.
o Include 8 eight VM startup pages into uma_startup_count() calculation.
o Account for vmem_startup() and vm_map_startup() preallocating pages.
o Account for extra two allocations done by kmem_init() and vmem_create().
o Hardcode the place of execution of vm_radix_reserve_kva(). Using SYSINIT
allowed several other SYSINITs to sneak in before it, thus bumping
requirement for amount of boot pages.
Scott Long [Tue, 6 Feb 2018 21:01:38 +0000 (21:01 +0000)]
Cache the value of the request and reply frame size since it's used quite
a bit in the normal operation of the driver. Covert it to represent bytes
instead of 32bit words. Fix what I believe to be is a bug in this respect
with the Tri-mode cards.
Mark Felder [Tue, 6 Feb 2018 20:12:05 +0000 (20:12 +0000)]
Fix firstboot fs mount logic
The firstboot logic has an error which causes the filesystem to be
mounted readonly even though root_rw_mount=YES. This fixes the error to
ensure that the root filesystem is mounted rw as expected after the run
of the firstboot scripts.
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D14226
John Baldwin [Tue, 6 Feb 2018 17:01:10 +0000 (17:01 +0000)]
Use a workaround to compile the crt init functions correctly with clang.
The MIPS assembly parser treats forward-declared local symbols as global
symbols. This results in CALL16 relocations being used against local
(private) symbols which then fail to resolve when linking binaries.
Add .local to force the init and fini functions to be treated as local as
a workaround.
Mark Johnston [Tue, 6 Feb 2018 16:02:33 +0000 (16:02 +0000)]
Simplify synchronization read error handling.
Since synchronization reads are performed by submitting a request to
the external mirror provider, we know that the request returns with an
error only when gmirror was unable to read a copy of the block from any
mirror. Thus, there is no need to retry the request from the
synchronization error handler.
Alexander Motin [Tue, 6 Feb 2018 16:02:25 +0000 (16:02 +0000)]
Fix queue length reporting in mps(4) and mpr(4).
Both drivers were found to report CAM bigger queue depth then they really
can handle. It made them later under high load with many disks return
some of submitted requests back with CAM_REQUEUE_REQ status for later
resubmission.
Diagnostic buffer fixes for the mps(4) and mpr(4) drivers.
In mp{r,s}_diag_register(), which is used to register diagnostic
buffers with the mp{r,s}(4) firmware, we allocate DMAable memory.
There were several issues here:
o No checking of the bus_dmamap_load() return value. If the load
failed or got deferred, mp{r,s}_diag_register() continued on as if
nothing had happened. We now check the return value and bail
out if it fails.
o No waiting for a deferred load callback. bus_dmamap_load()
calls a supplied callback when the mapping is done. This is
generally done immediately, but it can be deferred.
mp{r,s}_diag_register() did not check to see whether the callback
was already done before proceeding on. We now sleep until the
callback is done if it is deferred.
o No call to bus_dmamap_sync(... BUS_DMASYNC_PREREAD) after the
memory is allocated and loaded. This is necessary on some
platforms to synchronize host memory that is going to be updated
by a device.
Both drivers would also panic if the firmware was reinitialized while
a diagnostic buffer operation was in progress. This fixes that problem
as well. (The driver will reinitialize the firmware in various
circumstances, but the problem I ran into was that the firmware would
generate an IOC Fault due to a PCIe error.)
mp{r,s}var.h:
Add a new structure, struct mpr_busdma_context, that is
used for deferred busdma load callbacks.
Add a prototype for mp{r,s}_memaddr_wait_cb().
mp{r,s}.c:
Add a new busdma callback function, mp{r,s}_memaddr_wait_cb().
This provides synchronization for callers that want to
wait on a deferred bus_dmamap_load() callback.
mp{r,s}_user.c:
In bus_dmamap_register(), add a call to bus_dmamap_sync()
with the BUS_DMASYNC_PREREAD flag set after an allocation
is loaded.
Also, check the return value of bus_dmamap_load(). If it
fails, bail out. If it is EINPROGRESS, wait for the
callback to happen. We use an interruptible sleep (msleep
with PCATCH) and let the callback clean things up if we get
interrupted.
In mpr_diag_read_buffer() and mps_diag_read_buffer(), call
bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD) before copying
the data out to make sure the data is in stable storage.
In mp{r,s}_post_fw_diag_buffer() and
mp{r,s}_release_fw_diag_buffer(), check the reply to see
whether it is NULL. It can be NULL (and the command non-NULL)
if the controller gets reinitialized while we're waiting for
the command to complete but the driver structures aren't
reallocated. The driver structures generally won't be
reallocated unless there is a firmware upgrade that changes
one of the IOCFacts.
When freeing diagnostic buffers in mp{r,s}_diag_register()
and mp{r,s}_diag_unregister(), zero/NULL out the buffer after
freeing it. This will prevent a duplicate free in some
situations.
Alex Richardson [Tue, 6 Feb 2018 15:41:45 +0000 (15:41 +0000)]
crossbuild: Make the CHECK_TIME variable work on Linux
Linux /usr/bin/find doesn't understand the -mtime -0s flag.
Instead create a temporary file and compare that file's mtime to
sys/sys/param.h to check whether the clock is correct.
Alex Richardson [Tue, 6 Feb 2018 15:41:26 +0000 (15:41 +0000)]
Allow compiling usr.bin/find on Linux and Mac
When building FreeBSD the makefiles invoke find with various flags such as
`-s` that aren't supported in the native /usr/bin/find. To fix this I
build the FreeBSD version of find and use that when crossbuilding.
Inserting lots if #ifdefs in the code is rather ugly but I don't see a
better solution.
Alex Richardson [Tue, 6 Feb 2018 15:41:15 +0000 (15:41 +0000)]
Make mips_postboot_fixup work when building the kernel with clang+lld
The compiler/linker can align fake_preload anyway it would like. When
building the kernel with gcc+bfd this always happened to be a multiple of 8.
When I built the kernel with clang and linked with lld fake_preload
happened to only be aligned to 4 bytes which caused a an ADDRS trap because
the compiler will emit sd instructions to store to this buffer.
Kyle Evans [Tue, 6 Feb 2018 14:57:03 +0000 (14:57 +0000)]
dtb/allwinner: Add sun7i-a20-lamobo-r1.dts (Banana Pi R1)
FreeBSD boots on this board, but the ethernet switch is not currently
supported, resulting in no ethernet.
A U-Boot port will be added once the ethernet switch is at least basically
supported, but we add its DTS to the build here to lower the barrier-to-boot
while work is underway.
Remove libreadline from the source tree, all consumers but gdb
has been switched to libedit long ago, libreadline was built as an
internallib for a while and kept only for gdbtui which was broken using
libreadline.
Since gdb has been mostly deorbitted in all arches, gdbtui was only installed
on arm and sparc64, given it has been removed, gdb has been switched to use
libedit, no consumers are left for libreadline. Thus this removal
Remove gdbtui, it was already not installed on every arches
only installed on arm and sparc64.
It is the only bits that keeps us having libreadline in base
The rest of gdb can be switched to libedit and will be in another
commit
Scott Long [Tue, 6 Feb 2018 06:55:55 +0000 (06:55 +0000)]
Fix a case where a request frame can be composed that requires 2 or more
SGList elements, but there's only enough space in the request frame for
either 1 element or a chain frame pointer. Previously, the code would
hit the wrong case, add the SGList element, but then fail to add the
chain frame due to lack of space. Re-arrange the code to catch this case
earlier and handle it.
Scott Long [Tue, 6 Feb 2018 06:42:25 +0000 (06:42 +0000)]
Return a C errno for cam_periph_acquire().
There's no compelling reason to return a cam_status type for this
function and doing so only creates confusion with normal C
coding practices. It's technically an API change, but the periph API
isn't widely used. No efffective change to operation.
Gleb Smirnoff [Tue, 6 Feb 2018 04:16:00 +0000 (04:16 +0000)]
Followup on r302393 by cperciva, improving calculation of boot pages required
for UMA startup.
o Introduce another stage of UMA startup, which is entered after
vm_page_startup() finishes. After this stage we don't yet enable buckets,
but we can ask VM for pages. Rename stages to meaningful names while here.
New list of stages: BOOT_COLD, BOOT_STRAPPED, BOOT_PAGEALLOC, BOOT_BUCKETS,
BOOT_RUNNING.
Enabling page alloc earlier allows us to dramatically reduce number of
boot pages required. What is more important number of zones becomes
consistent across different machines, as no MD allocations are done before
the BOOT_PAGEALLOC stage. Now only UMA internal zones actually need to use
startup_alloc(), however that may change, so vm_page_startup() provides
its need for early zones as argument.
o Introduce uma_startup_count() function, to avoid code duplication. The
functions calculates sizes of zones zone and kegs zone, and calculates how
many pages UMA will need to bootstrap.
It counts not only of zone structures, but also of kegs, slabs and hashes.
o Hide uma_startup_foo() declarations from public file.
o Provide several DIAGNOSTIC printfs on boot_pages usage.
o Bugfix: when calculating zone of zones size use (mp_maxid + 1) instead of
mp_ncpus. Use resulting number not only in the size argument to zone_ctor()
but also as args.size.
Kirk McKusick [Tue, 6 Feb 2018 00:19:46 +0000 (00:19 +0000)]
Occasional cylinder-group check-hash errors were being reported on
systems running with a heavy filesystem load. Tracking down this
bug was elusive because there were actually two problems. Sometimes
the in-memory check hash was wrong and sometimes the check hash
computed when doing the read was wrong. The occurrence of either
error caused a check-hash mismatch to be reported.
The first error was that the check hash in the in-memory cylinder
group was incorrect. This error was caused by the following
sequence of events:
- We read a cylinder-group buffer and the check hash is valid.
- We update its cg_time and cg_old_time which makes the in-memory
check-hash value invalid but we do not mark the cylinder group dirty.
- We do not make any other changes to the cylinder group, so we
never mark it dirty, thus do not write it out, and hence never
update the incorrect check hash for the in-memory buffer.
- Later, the buffer gets freed, but the page with the old incorrect
check hash is still in the VM cache.
- Later, we read the cylinder group again, and the first page with
the old check hash is still in the VM cache, but some other pages
are not, so we have to do a read.
- The read does not actually get the first page from disk, but rather
from the VM cache, resulting in the old check hash in the buffer.
- The value computed after doing the read does not match causing the
error to be printed.
The fix for this problem is to only set cg_time and cg_old_time as
the cylinder group is being written to disk. This keeps the in-memory
check-hash valid unless the cylinder group has had other modifications
which will require it to be written with a new check hash calculated.
It also requires that the check hash be recalculated in the in-memory
cylinder group when it is marked clean after doing a background write.
The second problem was that the check hash computed at the end of the
read was incorrect because the calculation of the check hash on
completion of the read was being done too soon.
- When a read completes we had the following sequence:
- bufdone()
-- b_ckhashcalc (calculates check hash)
-- bufdone_finish()
--- vfs_vmio_iodone() (replaces bogus pages with the cached ones)
- When we are reading a buffer where one or more pages are already
in memory (but not all pages, or we wouldn't be doing the read),
the I/O is done with bogus_page mapped in for the pages that exist
in the VM cache. This mapping is done to avoid corrupting the
cached pages if there is any I/O overrun. The vfs_vmio_iodone()
function is responsible for replacing the bogus_page(s) with the
cached ones. But we were calculating the check hash before the
bogus_page(s) were replaced. Hence, when we were calculating the
check hash, we were partly reading from bogus_page, which means
we calculated a bad check hash (e.g., because multiple pages have
been mapped to bogus_page, so its contents are indeterminate).
The second fix is to move the check-hash calculation from bufdone()
to bufdone_finish() after the call to vfs_vmio_iodone() so that it
computes the check hash over the correct set of pages.
With these two changes, the occasional cylinder-group check-hash
errors are gone.
Submitted by: David Pfitzner <dpfitzner@netflix.com>
Reviewed by: kib
Tested by: David Pfitzner
bwn(4): migrate bwn(4) to the native bhnd(9) interface, and drop siba_bwn.
- Remove the shim interface that allowed bwn(4) to use either siba_bwn or
bhnd(4), replacing all siba_bwn calls with their bhnd(4) bus equivalents.
- Drop the legay, now-unused siba_bwn bus driver.
- Clean up bhnd(4) board flag defines referenced by bwn(4).
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13518
John Baldwin [Mon, 5 Feb 2018 23:35:33 +0000 (23:35 +0000)]
Ignore relocation tables for non-memory-resident sections.
As a followup to r328101, ignore relocation tables for ELF object
sections that are not memory resident. For modules loaded by the
loader, ignore relocation tables whose associated section was not
loaded by the loader (sh_addr is zero). For modules loaded at runtime
via kldload(2), ignore relocation tables whose associated section is
not marked with SHF_ALLOC.
Reported by: Mori Hiroki <yamori813@yahoo.co.jp>, adrian
Tested on: mips, mips64
MFC after: 1 month
Sponsored by: DARPA / AFRL
John Baldwin [Mon, 5 Feb 2018 23:27:42 +0000 (23:27 +0000)]
Always give ELF brands a chance to veto a match.
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.
Adrian Chadd [Mon, 5 Feb 2018 20:37:29 +0000 (20:37 +0000)]
[arswitch] disable ARP copy-to-CPU port for AR9340 for now.
I'll have to go double check to see if it does indeed pass ARP frames between
switch ports with this disabled, but it seems required for the CPU port to see
ARP traffic.
Ed Maste [Mon, 5 Feb 2018 17:29:12 +0000 (17:29 +0000)]
Linuxolator whitespace cleanup
A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator. Clean these up so that new
architectures do not inherit whitespace issues.
This was a hack to be able to mount ext4 filesystems read-only while not
supporting all the features. We now support all those features so it
doesn't make sense to keep the undocumented hack.
It is possible, for complex fork()/collapse situations, to have
sibling address spaces to partially share shadow chains. If one
sibling performs wiring, it can happen that a transient page, invalid
and busy, is installed into a shadow object which is visible to other
sibling for the duration of vm_fault_hold(). When the backing object
contains the valid page, and the wiring is performed on read-only
entry, the transient page is eventually removed.
But the sibling which observed the transient page might perform the
unwire, executing vm_object_unwire(). There, the first page found in
the shadow chain is considered as the page that was wired for the
mapping. It is really the page below it which is wired. So we unwire
the wrong page, either triggering the asserts of breaking the page'
wire counter.
As the fix, wait for the busy state to finish if we find such page
during unwire, and restart the shadow chain walk after the sleep.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14184
Modify ip6_get_prevhdr() to be able use it safely.
Instead of returning pointer to the previous header, return its offset.
In frag6_input() use m_copyback() and determined offset to store next
header instead of accessing to it by pointer and assuming that the memory
is contiguous.
In rip6_input() use offset returned by ip6_get_prevhdr() instead of
calculating it from pointers arithmetic, because IP header can belong
to another mbuf in the chain.
Reported by: Maxime Villard <max at m00nbsd dot net>
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D14158
Adrian Chadd [Mon, 5 Feb 2018 07:05:28 +0000 (07:05 +0000)]
[arswitch] Enable ATU dump support for the AR9340.
This indeed uses the same registers as the AR8216 and later chips.
There seems to be an issue with ARP requests being sent out from the CPU
through this switch here, so figuring that out is next. Learning works fine on
the AR8327 ethernet switch on the /other/ gigabit ethernet port, so I don't
think it's the network stack or ethernet driver.
Marius Strobl [Mon, 5 Feb 2018 00:18:21 +0000 (00:18 +0000)]
Flesh out the creation of sparc64 UFS images. This has only been verified
to yield working images in a native build as rootgen.sh generally doesn't
support cross-testing so far.
psm(4): Fix panic occuring soon after PS/2 packet has been rejected by
synaptics or elantech sanity checker.
After packet has been rejected contents of packet buffer is not cleared
with setting of inputbytes counter to 0. So when this packet buffer is
filled again being an element of circular queue, new data appends to old
data rather than overwrites it. This leads to packet buffer overflow
after 10 rounds.
Fix it with setting of packet's inputbytes counter to 0 after rejection.
Dimitry Andric [Sun, 4 Feb 2018 20:33:47 +0000 (20:33 +0000)]
Bump clang's __FreeBSD_cc_version, to cope with r328816, which removed
-Wno-error=tautological-constant-compare again (this flag is now out of
-Wextra after upstream https://reviews.llvm.org/rL322901). Otherwise
the MK_SYSTEM_COMPILER logic will not build a cross-tools compiler.
Justin Hibbits [Sun, 4 Feb 2018 20:07:08 +0000 (20:07 +0000)]
Only look for L2 cache controllers for mpc85xx_cache
The L3 cache controller (Corenet Platform Cache) is listed with one of its
compatible strings as "cache", which this driver can't attach to. Restrict
to a known list of primary cache controller strings, as found in the l2cache
devicetree binding.