Michal Meloun [Sat, 21 Oct 2017 12:06:18 +0000 (12:06 +0000)]
Make elf_aux_info() as public libc function.
- Teach elf aux vector functions about newly added AT_HWCAP and AT_HWCAP2
vectors.
- Export _elf_aux_info() as new public libc function elf_aux_info(3)
The elf_aux_info(3) should be considered as FreeBSD counterpart of glibc
getauxval() with more robust interface.
Note:
We cannot name this new function as getauxval(), with glibc compatible
interface. Some ports autodetect its existence and then expects that all
Linux specific AT_<*> vectors are defined and implemented.
Michal Meloun [Sat, 21 Oct 2017 12:05:01 +0000 (12:05 +0000)]
Add AT_HWCAP2 ELF auxiliary vector.
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
- expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
same way as for AT_HWCAP.
Bjoern A. Zeeb [Fri, 20 Oct 2017 21:40:59 +0000 (21:40 +0000)]
With r181803 on 2008-08-17 23:27:27Z the first VIMAGE commit went into
HEAD. Enable VIMAGE in GENERIC kernels and some others (where GENERIC does
not exist) on HEAD.
Disable building LINT-VIMAGE with VIMAGE being default.
This should give it a lot more exposure in the run-up to 12 to help
us evaluate whether to keep it on by default or not.
We are also hoping to get better performance testing.
The feature can be disabled using nooptions.
Requested by: many
Reviewed by: kristof, emaste, hiren
X-MFC after: never
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D12639
Mark Johnston [Fri, 20 Oct 2017 14:56:13 +0000 (14:56 +0000)]
Avoid the nbp lookup in the final loop iteration in flushbuflist().
The end of the loop must re-lookup the next buf since the bufobj lock
is dropped in the loop body. If the lookup fails, the loop is restarted.
This mechanism non-obviously also terminates the loop when the end of
the buf list is reached. Split up the two loops termination cases to
make the code a bit less fragile. No functional change intended.
If filesystem block size is less than the page size, it is possible
that the page-out run contains partially clean pages. E.g., the chunk
of the page might be bdwrite()-ed, or some thread performed bwrite()
on a buffer which references a chunk of the paged out page. As
result, the assertion added in r319975, which checked that all pages
in the run are dirty, does not hold on such filesystems.
One solution is to remove the assert, but it is undesirable, because
we do overwrite the valid on-disk content. I cannot provide a scenario
where such write would corrupt the file data, but I do not like it on
principle. Another, in my opinion proper, solution is to only write
parts of the pages still marked dirty. The patch implements this, it
skips clean blocks and only writes the dirty block runs.
Note that due to clustering, write one page might clean other pages in
the run, so the next write range must be calculated only after the
current range is written out.
More, due to a possible invalidation, and the fact that the object
lock is dropped and reacquired before the checks, it is possible that
the whole page-out pages run appears to consist of only clean pages.
For this reason, it is impossible to assert that there is some work
for the pageout method to do (i.e. assert that there is at least one
dirty page in the run). But such clearing can only occur due to
invalidation, and not due to a parallel write, because we own the
vnode lock exclusive.
Reported by: fsu
In collaboration with: pho
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D12668
The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore
Mateusz Guzik [Fri, 20 Oct 2017 03:38:58 +0000 (03:38 +0000)]
amd64: __exclusive_cache_line pv_chunks_mutex and pv_list_locks
Note that pv_list_locks is an array and currently it fits 2 locks per line.
Resizing it and/or putting more locks in different lines requires several tests.
Mateusz Guzik [Fri, 20 Oct 2017 00:30:35 +0000 (00:30 +0000)]
mtx: clean up locking spin mutexes
1) shorten the fast path by pushing the lockstat probe to the slow path
2) test for kernel panic only after it turns out we will have to spin,
in particular test only after we know we are not recursing
Ed Maste [Thu, 19 Oct 2017 16:40:17 +0000 (16:40 +0000)]
psci: change bootverbose string to 'PSCI 0.2 compatible'
Prior to r324754 we treated PSCI 0.2 and 1.0 as identical, and r324754
extended that to include all PSCI 1.x revisions. Change the string
emitted under bootverbose to reference '0.2 compatible' to avoid
confusion when the system includes a later PSCI rev.
Discussed with: andrew
Sponsored by: The FreeBSD Foundation
Andriy Gapon [Thu, 19 Oct 2017 16:36:07 +0000 (16:36 +0000)]
remove spa_sync_on assert from spa_async_thread_vd
Unlike spa_async_thread that can get started only from spa_sync()
spa_async_thread_vd can get started from other contexts.
Additionally, spa_async_thread_vd does not really depend on
spa sync being enabled.
The incorrect assert could be triggered by importing a pool in the
read-only mode and then disconnecting one of its disks.
In this case spa_sync_on was false because the pool was read-only
and spa_async_thread_vd was started to handle SPA_ASYNC_REMOVE event.
Note: spa_async_thread_vd() currently exists only in FreeBSD, it was
split out of spa_async_thread() in r253990.
Andrew Turner [Thu, 19 Oct 2017 13:22:52 +0000 (13:22 +0000)]
Allow later PSCI revisions to also work. The latest ARM Trusted Firmware
reports version 1.1 so the check was failing. As thjis is a minor change
from 1.0, and future 1.x revisions are also expected to be backwards
compatible just ignore the minor revision in the init handler.
Alexander Motin [Thu, 19 Oct 2017 09:01:15 +0000 (09:01 +0000)]
Relax per-ifnet cif_vrs list double locking in carp(4).
In all cases where cif_vrs list is modified, two locks are held: per-ifnet
CIF_LOCK and global carp_sx. It means to read that list only one of them
is enough to be held, so we can skip CIF_LOCK when we already have carp_sx.
This fixes kernel panic, caused by attempts of copyout() to sleep while
holding non-sleepable CIF_LOCK mutex.
Alan Cox [Thu, 19 Oct 2017 04:13:47 +0000 (04:13 +0000)]
Batch atomic updates to the number of active, inactive, and laundry
pages by vm_object_terminate_pages(). For example, for a "buildworld"
workload, this batching reduces vm_object_terminate_pages()'s average
execution time by 12%. (The total savings were about 11.7 billion
processor cycles.)
Justin Hibbits [Thu, 19 Oct 2017 03:38:53 +0000 (03:38 +0000)]
Add some more devices to the MPC85XX-based configs
These devices bring the configs closer to a desktop-like (GENERIC) kernel
config.
* The Freescale DIU support was added to the config in r306358.
Without keyboard support video support is nearly pointless, so add ukbd and
ums.
* The AmigaOne X5000, and P1022 devboard, both use a variant of the ds1307 RTC
* cpufreq scaling is currently supported by the p1022. More SoCs will be added
eventually.
Cy Schubert [Thu, 19 Oct 2017 03:17:50 +0000 (03:17 +0000)]
Anticongestion refinements for ntpd rc script. This reverts r324681
and checks if ntp leapfile needs fetching before entering into the
anticongestion sleep.
Unfortunately some ports still use their own sleeps so, this commit
doesn't address the complete problem which is compounded by every
port that uses its own anticongestion mechanism.
Mateusz Guzik [Thu, 19 Oct 2017 01:38:31 +0000 (01:38 +0000)]
sysctl: only take mem lock if oldlen is > 4 * PAGE_SIZE
The previous limit of just one page is hit by ps.
The entire mechanism should be reworked, if not whacked. It seems the intent
is to reduce kernel dos-ability - some handlers wire the amount of memory
passed here. Handlers should probably stop wiring in the first place or in
the worst case indicate they are doing so so that the check is done only if
necessary. It should also probably be a counter, not a lock.
Mateusz Guzik [Wed, 18 Oct 2017 22:00:44 +0000 (22:00 +0000)]
Don't take Giant for SMP status and cpu topology sysctls.
Not only this lock doesn't play any role here, dirtying it slows down
other things a little bit as giant-held checks (e.g. DROP_GIANT) are
spread all over the kernel.
Ryan Libby [Wed, 18 Oct 2017 19:28:28 +0000 (19:28 +0000)]
ql*_def.h: fix QL_ALIGN parenthesization
QL_ALIGN is a set of copies of roundup2, but it was missing an outer set
of parentheses, which began to matter with r324538. Now, fully copy the
parenthesization of roundup2.
Ed Schouten [Wed, 18 Oct 2017 19:22:53 +0000 (19:22 +0000)]
Import the latest CloudABI definitions, version 0.16.
The most important change in this release is the removal of the
poll_fd() system call; CloudABI's equivalent of kevent(). Though I think
that kqueue is a lot saner than many of its alternatives, our
experience is that emulating this system call on other systems
accurately isn't easy. It has become a complex API, even though I'm not
convinced this complexity is needed. This is why we've decided to take a
different approach, by looking one layer up.
We're currently adding an event loop to CloudABI's C library that is API
compatible with libuv (except when incompatible with Capsicum).
Initially, this event loop will be built on top of plain inefficient
poll() calls. Only after this is finished, we'll work our way backwards
and design a new set of system calls to optimize it.
Interesting challenges will include integrating asynchronous I/O into
such a system call API. libuv currently doesn't aio(4) on Linux/BSD, due
to it being unreliable and having undesired semantics.
John Baldwin [Wed, 18 Oct 2017 17:23:16 +0000 (17:23 +0000)]
Remove CPU_HAVEFPU.
Instead, use a runtime decision to handle COP1 traps. If floating point
support is present in the current CPU, enable saving of the floating point
state. If support is not present, fail with SIGILL.
Mark Johnston [Wed, 18 Oct 2017 15:38:05 +0000 (15:38 +0000)]
Move kernel dump offset tracking into MI code.
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.
This is needed to implement support for kernel dump compression in the
MI kernel dump code.
Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.
Now that OBJS has grown an OBJS_SRCS_FILTER variable, use this variable
in the computation of BCOBJS and LLOBJS too. Also move BCOBJS and LLOBJS
computation to be next to the OBJS computation: this should both make
the parallel structure clearer and serve to remind people changing OBJS
that parallel changes are required in BCOBJS and LLOBJS.
A side effect of this change is that BCOBJS and LLOBJS will be available
even when LLVM_LINK has not been defined, but that seems like a positive
change: there's no reason we can't ask "what bitcode files would you
generate" just because we can't link those files together into a
complete bitcode representation of a binary or library.
Improve logic of CLEANFILES+=${PROG_FULL}.{bc,ll}.
The build rule describing how to create ${PROG_FULL}.{bc,ll} is only
dependent on LLVM_LINK being defined, not on MK_DEBUG_FILES being "yes".
Move the addition of ${PROG_FULL}.{bc,ll} out of the conditional block
under `.if ${MK_DEBUG_FILES} != "no"` and up next to where the build
rules for ${PROG_FULL}.{bc,ll} are defined.
Warner Losh [Tue, 17 Oct 2017 23:38:27 +0000 (23:38 +0000)]
Revert "Unify boot1 with loader" change r324646
Back out the unification commit to boot1. There's some issues on the
arm and arm64 platforms that need to be addressed with code
changes. There's also a discussion on arch@ about the future of
boot1.efi vs just using loader.efi that needs to play out. So take a
pause on these changes until the arm issues can be fixed and it's
clear boot1.efi will survive into FreeBSD 12.
Ed Maste [Tue, 17 Oct 2017 21:13:26 +0000 (21:13 +0000)]
embed_mfs: add error handling, usage
Ensure that we are called with two arguments, and that the output file
is writable. Also, if we cannot find the mfs section report the output
file name rather than "kernel", as this script may be used with other
than kernels.
Ryan Libby [Tue, 17 Oct 2017 20:37:31 +0000 (20:37 +0000)]
cxgbe: delete now-redundant vnet decls
r324539 gathered some vnet decls into netinet/tcp_var.h, so that they
are now redundant in dev/cxgbe/tom/{t4_cpl_io.c,t4_ddp.c}. This triggers
gcc -Wredundant-decls.
Mark Johnston [Tue, 17 Oct 2017 19:41:45 +0000 (19:41 +0000)]
Fix a racy VI_DOOMED check in MNT_VNODE_FOREACH_ALL().
MNT_VNODE_FOREACH_ALL() is supposed to avoid returning doomed vnodes,
but the VI_DOOMED check it used was done without the vnode interlock
held, so it could race with a concurrent vgone().
Ed Maste [Tue, 17 Oct 2017 19:11:29 +0000 (19:11 +0000)]
loader.mk: clean md.o even if MD_IMAGE_SIZE not defined
We don't normally provide special handling for optionally-included src
files, but md.o depends on both md.c and the value of ${MD_IMAGE_SIZE}.
Previously if one built with MD_IMAGE_SIZE, executed "make clean", and
then built with a different MD_IMAGE_SIZE md.o would not be rebuilt.
Reported by: Zakary Nafziger
Sponsored by: The FreeBSD Foundation
Gordon Tetlow [Tue, 17 Oct 2017 17:22:36 +0000 (17:22 +0000)]
Update wpa_supplicant/hostapd for 2017-01 vulnerability release.
hostapd: Avoid key reinstallation in FT handshake
Prevent reinstallation of an already in-use group key
Extend protection of GTK/IGTK reinstallation of WNM-Sleep Mode cases
Fix TK configuration to the driver in EAPOL-Key 3/4 retry case
Prevent installation of an all-zero TK
Fix PTK rekeying to generate a new ANonce
TDLS: Reject TPK-TK reconfiguration
WNM: Ignore Key Data in WNM Sleep Mode Response frame if no PMF in use
WNM: Ignore WNM-Sleep Mode Response if WNM-Sleep Mode has not been used
WNM: Ignore WNM-Sleep Mode Response without pending request
FT: Do not allow multiple Reassociation Response frames
TDLS: Ignore incoming TDLS Setup Response retries
Andriy Gapon [Tue, 17 Oct 2017 16:03:59 +0000 (16:03 +0000)]
never retry oustanding requests when terminating iscsi session
CAM_REQ_ABORTED sounds natural for aborting outstanding requests when
tearing down a session, but that status actually causes eligible
requests to be tried again. That's completely useless, so let's use
CAM_DEV_NOT_THERE instead. Perhaps there is a better status, but this
should be good enough. The change should affect only the session
termination.
Andriy Gapon [Tue, 17 Oct 2017 15:39:38 +0000 (15:39 +0000)]
iscsi: do not hold the global lock while tearing down a session
It should be sufficient to hold the lock just for removing the session
from the session list. Everything else should be covered by the session
specific lock.
On top of that, at present we can get a deadlock caused by waiting on
the CAM SIM reference count while holding the global lock. A specific
scenario involving ZFS is this:
- concurrent termination of two sessions, S1 and S2
- session S1 completed all I/Os and sleeps in CAM waiting for device
close by ZFS;
- session S2 is also dead now, but can not forcefully complete
outstanding requests by calling iscsi_session_cleanup() from
iscsi_maintenance_thread_terminate(), since it can't get the same
global sc_lock;
- as soon as there are unfinished requests, ZFS can not do
spa_config_enter() as writer, and so can not close the device for
session S1;
- deadlock.
Ed Maste [Tue, 17 Oct 2017 02:51:45 +0000 (02:51 +0000)]
write.2: correct maximum nbytes size for EINVAL error
In FreeBSD 11 and later debug.iosize_max_clamp defaults to 0, and the
maximum nbytes count for write(2) is SSIZE_MAX. Update the man page to
document this, and mention the sysctl that can be set to obtain the
previous behaviour.
PR: 196666
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Ryan Libby [Tue, 17 Oct 2017 01:12:17 +0000 (01:12 +0000)]
gdb kernel server: fixup Search:memory style
This is a NFC patch to move around the Search:memory implementation so
that it doesn't exceed the standard column width and doesn't take so
much vertical space in gdb_trap.
Rick Macklem [Mon, 16 Oct 2017 23:28:12 +0000 (23:28 +0000)]
Use taskqueue(9) to do writes/commits to mirrored DSs concurrently.
When the NFSv4.1 pNFS client is using a Flexible File Layout specifying
mirrored Data Servers, it must do the writes and commits to all mirrors.
This patch modifies the client to use a taskqueue to perform these writes
and commits concurrently.
The number of threads can't be changed for taskqueue(9), so it is set
to 4 * mp_ncpus by default, but this can be overridden by setting the
sysctl vfs.nfs.pnfsiothreads.
- Fix it by replacing m_cat() with m_prev->m_next = m_new
(m_cat() will try to append data - as a result, there will be no
fragmentation).
- Move some constants out of the loop.
Re-evaluate thread' signal mask after ptracestop().
The stop drops process lock, which allows the signal mask to be
changed and our selected signal might become blocked, i.e. should be
returned to the process queue instead of delivery.
Also, for the existing check of the process no longer having an
attached debugger, we should not loose the signal, but requeue it.
Reported and tested by: bdrewery
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Romain Tartière [Mon, 16 Oct 2017 17:21:52 +0000 (17:21 +0000)]
Add a quick description of the geom_getxml(3), geom_xml2tree(3),
geom_gettree(3) and geom_deletetree(3) functions provided by libgeom and are
not documented in libgeom(3).
Matt Joras [Mon, 16 Oct 2017 16:14:50 +0000 (16:14 +0000)]
Properly reset the fields in clean_unrhdr.
In r324542 I neglected to reset the first and last fields of struct
unrhdr. This causes a tmpfs to fail the unr(9) consistency checks with
DIAGNOSTIC on. Fix this by resetting the fields by calling init_unrhdr.
While here, change a loop to use TAILQ_FOREACH_SAFE since it is more
readable and equally fast.
Fix the pv_chunks pc_lru tailq handling in reclaim_pv_chunk().
For processing, reclaim_pv_chunk() removes the pv_chunk from the lru
list, which makes pc_lru linkage invalid. Then the pmap lock is
released, which allows for other thread to free the last pv entry
allocated from the chunk and call free_pv_chunk(), which tries to
modify the invalid linkage.
Similarly, the chunk is inserted into the private tailq new_tail
temporary. Again, free_pv_chunk() might be run and corrupt the
linkage for the new_tail after the pmap lock is dropped.
This is a consequence of r299788 elimination of pvh_global_lock, which
allowed for reclaim to run in parallel with other pmap calls which
free pv chunks.
As a fix, do not remove the chunk from pc_lru queue, use a marker to
remember the position in the queue iteration. We can safely operate
on the chunks after the chunk's pmap is locked, we fetched the chunk
after the marker, and we checked that chunk pmap is same as we have
locked, because chunk removal from pc_lru requires both pv_chunk_mutex
and the pmap mutex owned.
Note that the fix lost an optimization which was present in the
previous algorithm. Namely, new_tail requeueing rotated the pv chunks
list so that reclaim didn't scan the same pv chunks that couldn't be
freed (because they contained a wired and/or superpage mapping) on
every invocation. An additional change is planned which would improve
this.
Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Kristof Provost [Mon, 16 Oct 2017 15:03:45 +0000 (15:03 +0000)]
pf tests: Basic IPv6 forwarding tests
Pass/block packets in the forwarding path with pf.
Introduce the pft_set_rules() helper function, because we need to
remember to flush states between individual tests. If not we can get
packets passing despite rules blocking them because they match states
created in a previous test.
Extend pft_ping.py to be able to send IPv6 echo requests.