ae [Wed, 26 Nov 2014 17:44:49 +0000 (17:44 +0000)]
Do not use xform_ipip as decapsulation fallback.
xform_ipip was used as fallback with low priority for IPIP
encapsulated packets that were decrypted. In some cases
it can decapsulate packets, that it shouldn't. This leads to situations,
when wrong configurations are magically working. Also it can propagate
wrong ingress interface and this can break security.
Now we redesigned the IPSEC code and IPIP encapsulation is called directly
from ipsec_output, and decapsulation is done in the ipsec_input with m_striphdr.
kib [Wed, 26 Nov 2014 14:10:00 +0000 (14:10 +0000)]
The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
the process, and interlocking with process mutex and thread lock.
The main reason of this is that turnstile locks are after thread
locks, so you e.g. cannot unlock blockable mutex (think process
mutex) while owning thread lock.
- Virtual and profiling itimers, since the timers activation is done
from the clock interrupt context. Replace the p_slock by p_itimmtx
and PROC_ITIMLOCK().
- Profiling code (profil(2)), for similar reason. Replace the p_slock
by p_profmtx and PROC_PROFLOCK().
- Resource usage accounting. Need for the spinlock there is subtle,
my understanding is that spinlock blocks context switching for the
current thread, which prevents td_runtime and similar fields from
changing (updates are done at the mi_switch()). Replace the p_slock
by p_statmtx and PROC_STATLOCK().
The split is done mostly for code clarity, and should not affect
scalability.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Wed, 26 Nov 2014 14:09:04 +0000 (14:09 +0000)]
Fix SA_SIGINFO | SA_RESETHAND handling. The sysent' sv_sendsig()
method needs pre-reset state of the ps_siginfo to correctly construct
signal frame.
Move sigdflt() call after the sv_sendsig() invocation in postsig().
Simultaneously extract common code from trapsignal() and postsig()
into new helper postsig_done().
bapt [Tue, 25 Nov 2014 22:25:13 +0000 (22:25 +0000)]
Reduce overlinking
The framework now ensure by itself that pthread is added to the link chain
as the last component if linked to kerberos hence avoid with out any explicit
addition prevent issue like CVE-2014-8475
jamie [Tue, 25 Nov 2014 21:01:08 +0000 (21:01 +0000)]
In preparation for using clang's -Wcast-qual:
Use __DECONST (instead of my own attempted re-invention) for the iov
parameters to jail_get/set(2). Similarly remove the decost-ish hack
from execvp's argv, except the __DECONST is only added at very end.
While I'm at it, remove an unused variable and fix a comment typo.
delphij [Tue, 25 Nov 2014 20:59:22 +0000 (20:59 +0000)]
Reinstitate send() after syslogd restarts.
In r228193 the test of CONNPRIV have been moved to before the _usleep
and send in vsyslog(). When syslogd restarts, this would prevent the
message being logged after the disconnect/connect dance for
scenario #1.
mav [Tue, 25 Nov 2014 17:53:35 +0000 (17:53 +0000)]
Coalesce last data move and command status for read commands.
Make CTL core and block backend set success status before initiating last
data move for read commands. Make CAM target and iSCSI frontends detect
such condition and send command status together with data. New I/O flag
allows to skip duplicate status sending on later fe_done() call.
For Fibre Channel this change saves one of three interrupts per read command,
increasing performance from 126K to 160K IOPS. For iSCSI this change saves
one of three PDUs per read command, increasing performance from 1M to 1.2M
IOPS.
avg [Tue, 25 Nov 2014 15:24:05 +0000 (15:24 +0000)]
whitespace and cosmetic changes in callout_reset family of macros
- add parentheses around macro parameters for consistent style
- remove redundant parentheses around an expression
- use tab before a line continuation symbol
jhb [Tue, 25 Nov 2014 12:44:18 +0000 (12:44 +0000)]
Only pass 6 arguments to the 'run' function on amd64. amd64's
makecontext on FreeBSD only supports a maximum of 6 arguments. This
fixes the setcontext_link test on amd64.
des [Tue, 25 Nov 2014 09:47:15 +0000 (09:47 +0000)]
The fallback flag in nsdispatch prevents the fallback implementation of
getgroupmembership() from invoking the correct backend in the compat case.
Replace it with a nesting depth counter so it only blocks one level (the
first is the group -> group_compat translation, the second is the actual
backend). This is one of two bugs that break getgrouplist() in the compat
case, the second being that the backend's own getgroupmembership() method
is ignored. Unfortunately, that is not easily fixable without a redesign
of our nss implementation (which is also needed to implement the +@group
syntax in /etc/passwd).
markj [Tue, 25 Nov 2014 07:01:38 +0000 (07:01 +0000)]
Adjust some checks missed in r274637, now that pi_rname can be NULL.
Additionally fix a misparenthesization in the same check, noticed while
fixing the first bug. This bug only appears to cause problems if the same
USDT probe appears twice within a static function.
markj [Tue, 25 Nov 2014 06:43:17 +0000 (06:43 +0000)]
The module load address always needs to be included when setting the dm_*_va
fields of dt_module_t. Previously, this was only done on architectures where
kernel modules have type ET_REL; this change fixes that. As a result, symbol
name resolution in the stack() action now works properly for kernel modules
on i386.
rpaulo [Mon, 24 Nov 2014 21:49:40 +0000 (21:49 +0000)]
Import libgpio.
This is a thin wrapper around the kernel interface which should make
it easier to write GPIO applications. gpioctl(8) will be converted to
use this library in a separate commit.
bapt [Mon, 24 Nov 2014 21:31:08 +0000 (21:31 +0000)]
Implement LIBADD
LIBADD will automatically set DPADD and LDADD when needed including their
dependencies, LIBADD automatically handles private and internal libs so that
the end user Makefile does not have to care about it.
This allows to reduce overlinking on the base system leaving the framework get
the dependencies properly.
It also allows to built components binaries statically.
dim [Mon, 24 Nov 2014 20:07:09 +0000 (20:07 +0000)]
For now, disable using -fsanitize=bounds for the libc ssp tests, when
using clang 3.5.0, until the runtime support (via compiler-rt) is added.
Otherwise, this would lead to link errors about missing support
libraries.
jhb [Mon, 24 Nov 2014 19:55:45 +0000 (19:55 +0000)]
Add a bus_get_domain() wrapper around BUS_GET_DOMAIN(). Use this to add
a new per-device '%domain' sysctl node that returns the NUMA domain a
device is associated with if it is associated with one.
Note that this API is still a WIP and might change before 11.0 actually
ships.
Differential Revision: https://reviews.freebsd.org/D930
Reviewed by: kib, adrian
ian [Mon, 24 Nov 2014 16:12:11 +0000 (16:12 +0000)]
Add busdma sync ops before reading and after modifying the descriptor rings.
This was previously working by accident because BUSDMA_COHERENT_MEMORY has
always been set to strongly-ordered on arm. Now we're moving towards
normal-uncacheable (what might be called write-combining on other platforms)
and using the proper sync ops will be more important. Of course, that
opens the question of just what is the "proper" sync op for shared
concurrent dma access as opposed to accesses where the handoff of control
of the memory has well-defined sequence points that match the available
busdma sync operations.
philip [Mon, 24 Nov 2014 14:00:27 +0000 (14:00 +0000)]
Add a sysctl `net.link.tap.deladdrs_on_close' to configure whether tap
should delete configured addresses and routes when the interface is
closed. Default is enabled (preserve current behaviour).
mav [Mon, 24 Nov 2014 11:37:27 +0000 (11:37 +0000)]
Replace home-grown CTL IO allocator with UMA.
Old allocator created significant lock congestion protecting its lists
of preallocated I/Os, while UMA provides much better SMP scalability.
The downside of UMA is lack of reliable preallocation, that could guarantee
successful allocation in non-sleepable environments. But careful code
review shown, that only CAM target frontend really has that requirement.
Fix that making that frontend preallocate and statically bind CTL I/O for
every ATIO/INOT it preallocates any way. That allows to avoid allocations
in hot I/O path. Other frontends either may sleep in allocation context
or can properly handle allocation errors.
On 40-core server with 6 ZVOL-backed LUNs and 7 iSCSI client connections
this change increases peak performance from ~700K to >1M IOPS! Yay! :)