imp [Thu, 16 Feb 2017 21:56:46 +0000 (21:56 +0000)]
Remove references to EISA support from the vx driver, along with EISA
support. Fix a comment block that's shared with both vx and ep. Remove
obsolete refernce to statically compiling a kernel with a fixed number
of vx devices. Have not removed EISA from the title of the document
the register definitions were originally derived from (though no doubt
more recent docments were also consulted).
rstone [Thu, 16 Feb 2017 21:18:31 +0000 (21:18 +0000)]
Revert r313814 and r313816
Something evidently got mangled in my git tree in between testing and
review, as an old and broken version of the patch was apparently submitted
to svn. Revert this while I work out what went wrong.
vangyzen [Thu, 16 Feb 2017 20:47:41 +0000 (20:47 +0000)]
Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.
vangyzen [Thu, 16 Feb 2017 20:44:44 +0000 (20:44 +0000)]
pf: use inet_ntoa_r() instead of inet_ntoa(); maybe fix IPv6 OS fingerprinting
inet_ntoa() cannot be used safely in a multithreaded environment
because it uses a static local buffer. Instead, use inet_ntoa_r()
with a buffer on the caller's stack.
This code had an INET6 conditional before this commit, but opt_inet6.h
was not included, so INET6 was never defined. Apparently, pf's OS
fingerprinting hasn't worked with IPv6 for quite some time.
This commit might fix it, but I didn't test that.
Reviewed by: gnn, kp
MFC after: 2 weeks
Relnotes: yes (if I/someone can test pf OS fingerprinting with IPv6)
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9625
rstone [Thu, 16 Feb 2017 20:06:21 +0000 (20:06 +0000)]
Fix a typo in my previous commit
Somehow in the late stages of testing my sched_ule patch, a character was
accidentally deleted from the file. Correct this.
While I'm committing anyway, the previous commit message requires some
clarification: in the normal case of unlending priority after releasing
a mutex, the thread that was doing the lending will be woken up and
immediately become the highest-priority thread, and in that case no
priority inversion would take place. However, if that thread is pinned
to a different CPU, then the currently running thread that just had its
priority lowered will not be preempted and then priority inversion can
occur.
Reported by: O. Hartmann (typo), jhb (scheduler clarification)
MFC after: 1 month
Pointy hat to: rstone
rstone [Thu, 16 Feb 2017 19:41:13 +0000 (19:41 +0000)]
Check for preemption after lowering a thread's priority
When a high-priority thread is waiting for a mutex held by a
low-priority thread, it temporarily lends its priority to the
low-priority thread to prevent priority inversion. When the mutex
is released, the lent priority is revoked and the low-priority
thread goes back to its original priority.
When the priority of that thread is lowered (through a call to
sched_priority()), the schedule was not checking whether
there is now a high-priority thread in the run queue. This can
cause threads with real-time priority to be starved in the run
queue while the low-priority thread finishes its quantum.
Fix this by explicitly checking whether preemption is necessary
when a thread's priority is lowered.
https://www.illumos.org/issues/7500
With the integration of:
commit 0f6d88aded0d165f5954688a9b13bac76c38da84
Author: Alex Reece <alex@delphix.com>
Date: Sat Jul 26 13:40:04 2014 -0800
4873 zvol unmap calls can take a very long time for larger datasets
the dnode's dn_bufs field was changed from a list to a tree. As a result,
the dn_unlisted_l0_blkid field is no longer necessary.
Author: Stephen Blinick <stephen.blinick@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
ed [Thu, 16 Feb 2017 06:52:53 +0000 (06:52 +0000)]
Remove unnecessary #includes from the kqueue(2) man page.
Now that <sys/event.h> can be included on its own, adjust the manual
page accordingly. Remove both unnecessary #include statements from the
synopsis and the example code.
While there, also add a note to the BUGS section to mention that
previous versions of this header file still depend on <sys/types.h>.
mjg [Wed, 15 Feb 2017 23:33:14 +0000 (23:33 +0000)]
rwlock: tidy up r313392
While a new bit was added and thread alignment got shifted to accomodate it,
RW_READERS_SHIFT was not modified accordingly and clashed with the new flag.
This was surprisingly harmless. If the lock was taken for writing, other flags
were tested. If the lock was taken for reading, it would correctly work for
readers > 1 and this was the only relevant test performed.
imp [Wed, 15 Feb 2017 23:04:25 +0000 (23:04 +0000)]
Remove Micro Channel Architecture support. Of the commonly available
machines, only a few 486 machines that used it, and those haven't had
enough memory to run FreeBSD for quite some time (often limited to
16MB).
Not to be confused with the Machine Check Architecture, which is still
very much alive and used (and untouched by this commit).
mav [Wed, 15 Feb 2017 19:46:00 +0000 (19:46 +0000)]
Fix handling of negative sbspace() return values.
I found that at least with Chelsio NICs TOE sockets quite often report
negative sbspace() values. Using unsigned variable to store it resulted
in attempts to aggregate too much data in one sosend() call, that caused
errors and following connection termination.
trasz [Wed, 15 Feb 2017 16:52:21 +0000 (16:52 +0000)]
Change the "devfs_fsync: vop_stdfsync failed" from panic to a printf.
It's not a proper fix, but should be better than what we have now.
Since it got broken some six months ago it results in an incredibly
annoying and trivially reproducible panic every time eg an USB disk
gets disconnected.
emaste [Wed, 15 Feb 2017 15:32:29 +0000 (15:32 +0000)]
localtime: return NULL if time_t out of range of struct tm
Previously we would truncate tm.tm_year for any time_t corresponding to
a year that does not fit in int. This issue was discovered because it
caused the bash-static build to fail when linking with LLD.
As reported by Rafael EspĂndola:
Configure has
AC_FUNC_MKTIME
which expands to a test of mktime that fails with the freebsd
implementation. Given that, bash compiles a mktime.o file that
defines just mktime and uses localtime. That goes in a .a file
that is before libc.
The freebsd libc defines mktime in localtime.o, which also defines
localtime among other functions.
When lld sees an undefined reference to mktime from libc, it uses
the bash provided one and then tries to find a definition of
localtime. It is found on libc's localtime.o, but now we have a
duplicated error.
The reason it works with bfd is that bash doesn't use mktime
directly and the undefined reference from libc is resolved to the
libc implementation. It would also fail to link if bash itself
directly used mktime.
The bash-static configure test verifies that, for many values of t, either
localtime(t) returns NULL or mktime(localtime(t)) == t. This test failed
when localtime returned a truncated tm_year.
This was fixed in tzcode in 2004 but has persisted in our tree since
rS2708.
Reported by: Rafael EspĂndola
Reviewed by: bapt
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D9534
andrew [Wed, 15 Feb 2017 14:56:47 +0000 (14:56 +0000)]
Load the new sp_el0 with interrupts disabled in fork_trampoline. If an
interrupt arrives in fork_trampoline after sp_el0 was written we may then
switch to a new thread, enter userland so change this stack pointer, then
return to this code with the wrong value. This fixes this case by moving
the load of sp_el0 until after interrupts have been disabled.
Reported by: Mark Millard (markmi@dsl-only.net)
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D9593
royger [Wed, 15 Feb 2017 14:34:40 +0000 (14:34 +0000)]
bxe: enable usage with NetXtreme II BCM57840 2x20GbE chip
Current bxe probe function won't attach to devices with the NetXtreme II
BCM57840 2x20GbE chip, enable it by adding it's chip ID to the list of
supported chips.
Tested on: HP ProLiant WS460c Gen9
Reviewed by: gnn
MFC after: 1 week
Sponsored by: Citrix Systems R&D
Differential Revision: https://reviews.freebsd.org/D9609
andrew [Wed, 15 Feb 2017 13:56:04 +0000 (13:56 +0000)]
Port the Linux AMX 10G network driver to FreeBSD as axgbe. It is unlikely
we will import a newer version of the Linux code so the linuxkpi was not
used.
This is still missing 10G support, and multicast has not been tested.
gonzo [Wed, 15 Feb 2017 02:52:43 +0000 (02:52 +0000)]
[psm] Fix calculation for clickpad softbuttons at the top
On laptops like the ThinkPad X240, ClickPad buttons are located at the
top. The hw.psm.synaptics.softbuttons_y sysctl was supposed to allow this
by setting the value to a negative one (e.g. -1700). However, the
condition was wrong (double negative), and doing that placed the buttons
in an unreachable area.
markj [Wed, 15 Feb 2017 01:50:58 +0000 (01:50 +0000)]
Apply MADV_FREE to exec_map entries only after a lowmem event.
This effectively provides the same benefit as applying MADV_FREE inline
upon every execve, since the page daemon invokes lowmem handlers prior to
scanning the inactive queue. It also has less overhead; the cost of
applying MADV_FREE is very noticeable on many-CPU systems since it includes
that of a TLB shootdown of global PTEs. For instance, this change nearly
halves the system CPU usage during a buildkernel on a 128-vCPU EC2
instance (with some other patches applied).
avg [Tue, 14 Feb 2017 22:46:39 +0000 (22:46 +0000)]
mca: use time_uptime instead of ticks for CMCI throttling
This solves several problems.
First of all, cmc_throttle is specified in seconds and there was no
conversion between ticks and seconds when they were mixed together.
Second, we avoid potential problems with ticks wrapping around.
Resolution of time_uptime should be sufficient for the throttling
purposes.
markj [Tue, 14 Feb 2017 21:51:19 +0000 (21:51 +0000)]
Register nss_atexit() before parsing nsswitch.conf for the first time.
NSS modules are loaded when nsswitch.conf is parsed and may register their
own atexit handlers with libc. nss_atexit() unloads any dynamically loaded
NSS modules, so it should run only after the modules' atexit handlers have
been invoked.
dchagin [Tue, 14 Feb 2017 19:13:27 +0000 (19:13 +0000)]
Replace Linuxulator implementation of readdir(), getdents() and
getdents64() with wrapper over kern_getdirentries().
The patch was originally written by emaste@ and then adapted by trasz@
and me.
Note:
1. I divided linux_getdents() and linux_readdir() as in case when the
getdents() called with count = 1 (readdir() case) it can overwrite
user stack (by writing to user buffer pointer more than 1 byte).
2. Linux returns EINVAL in case when user supplied buffer is not enough
to contain fetched dirent.
3. Linux returns ENOTDIR in case when fd points to not a directory.
avg [Tue, 14 Feb 2017 17:49:08 +0000 (17:49 +0000)]
add svcpool_close to handle killed nfsd threads
This patch adds a new function to the server krpc called
svcpool_close(). It is similar to svcpool_destroy(), but does not free
the data structures, so that the pool can be used again.
This function is then used instead of svcpool_destroy(),
svcpool_create() when the nfsd threads are killed.
kib [Tue, 14 Feb 2017 17:44:30 +0000 (17:44 +0000)]
Add RLIM_SAVED_MAX and RLIM_SAVED_CUR symbols.
Define them as RLIM_INFINITY. This is allowed by POSIX in case all
resource limits are representable in an object of type rlim_t. Since
we do not allow negative rlim_t, with some strength this definition is
conforming.
We are not conforming fully still because POSIX requires rlim_t to be
unsigned type. Fixing this without breaking ABI to redefine
RLIM_INFINITY is impossible.
badger [Tue, 14 Feb 2017 17:13:23 +0000 (17:13 +0000)]
sleepq_catch_signals: do thread suspension before signal check
Since locks are dropped when a thread suspends, it's possible for another
thread to deliver a signal to the suspended thread. If the thread awakens from
suspension without checking for signals, it may go to sleep despite having
a pending signal that should wake it up. Therefore the suspension check is
done first, so any signals sent while suspended will be caught in the
subsequent signal check.
mav [Tue, 14 Feb 2017 16:33:42 +0000 (16:33 +0000)]
Do not rely on data alignment after m_pullup().
In general case m_pullup() does not really guarantee any data alignment.
Instead of depenting on side effects caused by data being always copied
out of mbuf cluster (which is probably a bug by itself), always allocate
aligned BHS buffer and read data there directly from socket.
While there, reuse new icl_conn_receive_buf() function to read digests.
The code could probably be even more optimized to aggregate those reads,
but until that done, this is still easier then the way it was before.
avg [Tue, 14 Feb 2017 13:54:05 +0000 (13:54 +0000)]
try to fix RACCT_RSS accounting
There could be a race between the vm daemon setting RACCT_RSS based on
the vm space and vmspace_exit (called from exit1) resetting RACCT_RSS to
zero. In that case we can get a zombie process with non-zero RACCT_RSS.
If the process is jailed, that may break accounting for the jail.
There could be other consequences.
Fix this race in the vm daemon by updating RACCT_RSS only when a process
is in the normal state. Also, make accounting a little bit more
accurate by refreshing the page resident count after calling
vm_pageout_map_deactivate_pages().
Finally, add an assert that the RSS is zero when a process is reaped.
bz [Tue, 14 Feb 2017 01:20:03 +0000 (01:20 +0000)]
Use %s __func__ to print the actual function name (been looking at
the wrong one for too often lately at first), and also use %#lx to
get the 0x prefix for the address.
gonzo [Tue, 14 Feb 2017 00:04:36 +0000 (00:04 +0000)]
[sdhci_acpi] Add support for Bay Trail SDHC SD card slot
Add ACPI device 80860F14 with _UID 3 to the list of known devices. It
make SD card available on NUCs and Minnowboard. Previously added _UID 1
covered only eMMC devices.
dim [Mon, 13 Feb 2017 20:56:53 +0000 (20:56 +0000)]
Fix build of BSD dtc when NDEBUG is defined (MK_ASSERT_DEBUG=no):
* Initialize correct parent in binary_operator's constructor.
* Include <errno.h> explicitly, otherwise errno is undefined (without
NDEBUG, this is accidentally 'fixed' by including <iostream>).
mav [Mon, 13 Feb 2017 20:36:28 +0000 (20:36 +0000)]
Remove M_PKTHDR from m_getm2() in icl_pdu_append_data().
ip_data_mbuf is always appended to ip_bhs_mbuf, so it does not need own
packet header. This change first avoids allocation/initialization of the
header, and then avoids dropping one when it later gets to socket buffer.
ed [Mon, 13 Feb 2017 19:00:09 +0000 (19:00 +0000)]
Make <sys/event.h> work on its own.
Right now this header file doesn't want to build when included on its
own, as it depends on some integer types that are only declared
internally. Switch it over to use <sys/_types.h> and the __*
counterparts.
ae [Mon, 13 Feb 2017 11:37:52 +0000 (11:37 +0000)]
Remove IPsec related PCB code from SCTP.
The inpcb structure has inp_sp pointer that is initialized by
ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec
security policies associated with a specific socket.
An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket
options to configure these security policies. Then ip[6]_output()
uses inpcb pointer to specify that an outgoing packet is associated
with some socket. And IPSEC_OUTPUT() method can use a security policy
stored in the inp_sp. For inbound packet the protocol-specific input
routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms
to inbound security policy configured in the inpcb.
SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends
packets. Thus IPSEC_OUTPUT() method does not consider such packets as
associated with some socket and can not apply security policies
from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY()
method is called from protocol-specific input routine, it can specify
inpcb pointer and associated with socket inbound policy will be
checked. But there are two problems:
1. Such check is asymmetric, becasue we can not apply security policy
from inpcb for outgoing packet.
2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and
access to inp_sp is protected. But for SCTP this is not correct,
becasue SCTP uses own locks to protect inpcb.
To fix these problems remove IPsec related PCB code from SCTP.
This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options
will be not applicable to SCTP sockets. To be able correctly check
inbound security policies for SCTP, mark its protocol header with
the PR_LASTHDR flag.
kib [Mon, 13 Feb 2017 09:04:38 +0000 (09:04 +0000)]
Rework r313352.
Rename kern_vm_* functions to kern_*. Move the prototypes to
syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as
needed, to avoid headers pollution.
Requested by: alc, jhb
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D9535
avos [Mon, 13 Feb 2017 02:15:56 +0000 (02:15 +0000)]
iwi: add 12-14 2GHz channels into channel list.
Return full channel list via iwi_getradiocaps() method
(ieee80211_init_channels() was replaced with iwi_getradiocaps()
to be consistent with other drivers).
PR: 216923
Submitted and tested by: ds@ukrhub.net (original patch)
MFC after: 5 days
sbruno [Sun, 12 Feb 2017 23:06:41 +0000 (23:06 +0000)]
Only trigger em_local_timer on queue index 0. This was causing continuous
em_local_timer() executions during normal operation and was very likely
to cause a lock up on igb(4) devices.
kib [Sun, 12 Feb 2017 21:05:44 +0000 (21:05 +0000)]
Consistently handle negative or wrapping offsets in the mmap(2) syscalls.
For regular files and posix shared memory, POSIX requires that
[offset, offset + size) range is legitimate. At the maping time,
check that offset is not negative. Allowing negative offsets might
expose the data that filesystem put into vm_object for internal use,
esp. due to OFF_TO_IDX() signess treatment. Fault handler verifies
that the mapped range is valid, assuming that mmap(2) checked that
arithmetic gives no undefined results.
For device mappings, leave the semantic of negative offsets to the
driver. Correct object page index calculation to not erronously
propagate sign.
In either case, disallow overflow of offset + size.
Update mmap(2) man page to explain the requirement of the range
validity, and behaviour when the range becomes invalid after mapping.
Reported and tested by: royger (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
dchagin [Sun, 12 Feb 2017 15:22:50 +0000 (15:22 +0000)]
Fix r313284.
Members of the syscall argument structures are padded to a word size. So,
for COMPAT_LINUX32 we should convert user supplied system call arguments
which is 32-bit in that case to the array of register_t.