Andriy Gapon [Thu, 19 Jan 2017 18:46:41 +0000 (18:46 +0000)]
fix a thread preemption regression in schedulers introduced in r270423
Commit r270423 fixed a regression in sched_yield() that was introduced
in earlier changes. Unfortunately, at the same time it introduced an
new regression. The problem is that SWT_RELINQUISH (6), like all other
SWT_* constants and unlike SW_* flags, is not a bit flag. So, (flags &
SWT_RELINQUISH) is true in cases where that was not really indended,
for example, with SWT_OWEPREEMPT (2) and SWT_REMOTEPREEMPT (11).
A straight forward fix would be to use (flags & SW_TYPE_MASK) ==
SWT_RELINQUISH, but my impression is that the switch types are designed
mostly for gathering statistics, not for influencing scheduling
decisions.
So, I decided that it would be better to check for SW_PREEMPT flag
instead. That's also the same flag that was checked before r239157.
I double-checked how that flag is used and I am confident that the flag
is set only in the places where we really have the preemption:
- critical_exit + td_owepreempt
- sched_preempt in the ULE scheduler
- sched_preempt in the 4BSD scheduler
Fix problem with suspend and resume when using Skylake chipsets. Make
sure the XHCI controller is reset after halting it. The problem is
clearly a BIOS bug as the suspend and resume is failing without
loading the XHCI driver. The same happens when using Linux and the
XHCI driver is not loaded.
Conrad Meyer [Thu, 19 Jan 2017 16:46:05 +0000 (16:46 +0000)]
ffs_vnops: Simplify extattr access
As suggested in r167010, use the structure type and macros to access and
modify UFS2 extended attributes. Add assertions that pointers are
aligned in places where we now access the data through a structure
pointer, instead of character-by-character.
Remove TMPFS_ASSERT_ELOCKED(). Its claims are already stated by other
asserts nearby and by VFS guarantees.
Change TMPFS_ASSERT_LOCKED() and one inlined place to use
ASSERT_VOP_(E)LOCKED() instead of hand-rolled imprecise asserts.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Alan Somers [Wed, 18 Jan 2017 20:24:37 +0000 (20:24 +0000)]
Fix several Coverity CIDs in devd
CID 1362055, 1362054: File descriptor leaks during shutdown
CID 1362013: Potential null-termination fail with long network device names
CID 1362097: Uncaught exception during memory pressure
CID 1362017, 1362016: Unchecked errors, possibly resulting in weird behavior
if two devd instances start at the same time.
CID 1362015: Unchecked error that will probably never fail
Conrad Meyer [Wed, 18 Jan 2017 17:55:49 +0000 (17:55 +0000)]
ufs/extattr.h: Fix documentation of ea_name termination
The ea_name string is not nul-terminated. Correct the documentation.
Because the subsequent field is padded to 8 bytes, and the padding is
zeroed, the ea_name string will appear to be nul-terminated whenever the
length isn't exactly one (mod eight).
This was introduced in r167010 (2007).
Additionally, mark the length fields as unsigned. This particularly
matters for the single byte ea_namelength field, which can represent
extended attribute names up to 255 bytes long.
Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.
- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.
- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().
- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.
- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.
- How rate limiting works:
1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.
2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.
3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.
4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.
Andrew Turner [Wed, 18 Jan 2017 13:27:24 +0000 (13:27 +0000)]
Use the kernel stack in the ARM FBT DTrace provider. This is used to find
the fifth argument to functions being traced, however there was an error
where the userspace stack was being used. This may be invalid leading to
a kernel panic if this address is unmapped.
Justin Hibbits [Wed, 18 Jan 2017 03:42:21 +0000 (03:42 +0000)]
Use the explicit expanded form of cmp.
Clang apparently requires the explicit form of this instruction, and rejects
uses which ignore the optional cmpD register. This was the only use of the
shorthand form of the instruction, so just fix it up to match the others.
PR: kern/215681
Submitted by: Mark Millard
Reported by: Mark Millard <markmi _AT_ dsl-only.net>
MFC after: 2 weeks
Ed Schouten [Tue, 17 Jan 2017 22:03:08 +0000 (22:03 +0000)]
Sync in the latest CloudABI generated source files.
Languages like C++17 and Go provide direct support for slice types:
pointer/length pairs. The CloudABI generator now has more complete for
this, meaning that for the C binding, pointer/length pairs now use an
automatic naming scheme of ${name} and ${name}_len.
Apart from this change and some reformatting, the ABI definitions are
identical. Binary compatibility is preserved entirely.
Alexander Motin [Tue, 17 Jan 2017 18:32:47 +0000 (18:32 +0000)]
Remove writing 'residual' field of struct ctl_scsiio.
This field has no practical use and never readed. Initiators already
receive respective residual size from frontends. Removed field had
different semantics, which looks useless, and was never passed through
by any frontend.
While there, fix kern_data_resid field support in case of HA, missed in
r312291.
Initialize IPFW static rules rmlock with RM_RECURSE flag.
This lock was replaced from rwlock in r272840. But unlike rwlock, rmlock
doesn't allow recursion on rm_rlock(), so at this time fix this with
RM_RECURSE flag. Later we need to change ipfw to avoid such recursions.
Gleb Smirnoff [Tue, 17 Jan 2017 03:52:57 +0000 (03:52 +0000)]
Fix regression from r310655, which broke operation of bsnmpd if it is bound
to a non-wildcard address. As documented in ip(4), doing sendmsg(2) with
IP_SENDSRCADDR on a socket that is bound to non-wildcard address is
completely different to using this control message on a wildcard one.
A fix is to add a bool to mark whether we did setsockopt(IP_RECVDSTADDR)
on the socket, and use IP_SENDSRCADDR control message only if we did.
While here, garbage collect absolutely useless udp_recv() function that
establishes some structures on stack to never use them later.
[zynq] Fix panic on USB PHY initialization failure
The Zedboard has a hardware bug where initialization of the USB PHY
occasionally fails on boot-up. Fix regression in -CURRENT when
kernel panics on such occasion. 11-RELEASE branch works fine
PR: 215862
Submitted by: Thomas Skibo <thoma555-bsd@yahoo.com>
Ed Maste [Mon, 16 Jan 2017 20:34:42 +0000 (20:34 +0000)]
disambiguate msleep KASSERT diagnostics
Previously "panic: msleep" could happen for a few different reasons.
Break the KASSERTs out into individual cases to identify the failing
condition. Found during the investigation that resulted in r308288.
Reviewed by: kib, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8604
Maxim Sobolev [Mon, 16 Jan 2017 17:46:38 +0000 (17:46 +0000)]
Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:
o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).
In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.
Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.
This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.
There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.
Sean Bruno [Mon, 16 Jan 2017 16:58:12 +0000 (16:58 +0000)]
Change startup order for the no EARLY_AP_STARTUP case to initialize
gtaskqueue bits at SI_SUB_INIT_IF instead of waiting until SI_SUB_SMP
which is far too late.
Add an assertion in taskqgroup_attach() to catch startup initialization
failures in the future.
Ian Lepore [Mon, 16 Jan 2017 16:44:13 +0000 (16:44 +0000)]
Remove arm's cpuconf.h, and references to it, after moving a few lines from
it into pmap-v4.h where they are used. Other than those few lines of
support for different MMU types, nothing in cpuconf.h has been used in our
code for quite a while.
The file existed to set up a variety of symbols to describe the
architecture. Over the past few years we have converted all of our source
to use the new architecture symbols standardized by ARM Inc, and predefined
by both clang and gcc.
Alexander Motin [Mon, 16 Jan 2017 16:19:55 +0000 (16:19 +0000)]
Make CTL frontends report kern_data_resid for under-/overruns.
It seems like kern_data_resid was never really implemented. This change
finally does it. Now frontends update this field while transferring data,
while CTL/backends getting it can more flexibly handle the result.
At this point behavior should not change significantly, still reporting
errors on write overrun, but that may be changed later, if we decide so.
CAM target frontend still does not properly handle overruns due to CAM API
limitations. We may need to add some fields to struct ccb_accept_tio to
pass information about initiator requested transfer size(s).
Michael Zhilin [Mon, 16 Jan 2017 15:36:36 +0000 (15:36 +0000)]
[gpioths] new driver for temperature/humidity sensor DHT11
This patch adds driver for temperature/humidity sensor connected via GPIO.
To compile it into kernel add "device gpioths". To activate driver, use
hints (.at and .pins) for gpiobus. As result it will provide temperature &
humidity values via sysctl.
DHT11 is cheap & popular temperature/humidity sensor used via GPIO on ARM
or MIPS devices like Raspberry Pi or Onion Omega.
Reviewed by: adrian
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D9185
Ed Maste [Mon, 16 Jan 2017 14:49:29 +0000 (14:49 +0000)]
rtld: do not rely on a populated GOT on amd64
On rela architectures GNU BFD ld and gold store the relocation addend
in GOT entries (in addition to the relocation's r_addend field).
rtld previously relied on this to access its own _DYNAMIC symbol in
order to apply its own relocations.
However, recording addends in the GOT is not specified by the ABI,
and some versions of LLVM's LLD linker leave the GOT uninitialized on
rela architectures.
BFD ld does not populate the GOT on sparc64, and sparc64 rtld has a
machine-dependent rtld_dynamic_addr() function that returns the
_DYNAMIC address. Use the same approach on amd64, obtaining the %rip-
relative _DYNAMIC address following a suggestion from Rafael EspĂndola.
Architectures other than amd64 should be addressed in future work.
PR: 214972
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D9180
Toomas Soome [Sun, 15 Jan 2017 20:03:13 +0000 (20:03 +0000)]
loader.efi: find_currdev() can leak memory
The find_currdev() is using variable "copy" to store the reference to trimmed
devpath pointer, if for some reason the efi_devpath_handle() fails, we will
leak this copy.
Jilles Tjoelker [Sun, 15 Jan 2017 13:40:14 +0000 (13:40 +0000)]
skel: Do not set -o emacs in .shrc.
sh has defaulted to 'set -o emacs' since FreeBSD 9.0. Therefore, do not set
this again in .shrc, since that only serves to prevent invocations like
'sh -o vi' and 'sh +o emacs' to have the intended effect.
PR: 215958
Submitted by: Andras Farkas
MFC after: 1 week
Kristof Provost [Sun, 15 Jan 2017 10:21:25 +0000 (10:21 +0000)]
arswitch: Ensure the lock is always held when calling arswitch_modifyreg()
arswitch_setled() and a number of _global_setup functions did not acquire the
lock before calling arswitch_modifyreg(). With WITNESS enabled this would
instantly panic.
Discovered on a TPLink-3600:
("panic: mutex arswitch not owned at sys/dev/etherswitch/arswitch/arswitch_reg.c:236")
Reviewed by: adrian, kan
Differential Revision: https://reviews.freebsd.org/D9187
Enji Cooper [Sun, 15 Jan 2017 09:13:41 +0000 (09:13 +0000)]
Mark testcases which use cap_enter as expected failures until the
PR is resolved so those of us that run the tests don't have the
bogus failures counted against our overall results
Enji Cooper [Sun, 15 Jan 2017 09:05:26 +0000 (09:05 +0000)]
Turn COMPILER_VERSION/COMPILER_TYPE make check into a compile-time check
of the clang version
This works around breakage on ^/stable/10 when running installworld from
a ^/stable/10 host where the test wouldn't be compiled on the first
go-around and would be missing when make installworld is run.
Mark Johnston [Sun, 15 Jan 2017 03:50:08 +0000 (03:50 +0000)]
Avoid unnecessary page lookups in vm_object_madvise().
vm_object_madvise() is frequently used to apply advice to a contiguous
set of pages in an object with no backing object. Optimize this case by
skipping non-resident subranges in constant time, and by iterating over
resident pages using the object memq, thus avoiding radix tree lookups on
each page index in the specified range.
While here, move MADV_WILLNEED handling to vm_page_advise(), and rename the
"advise" parameter to vm_object_madvise() to "advice."
Michael Zhilin [Sat, 14 Jan 2017 23:24:50 +0000 (23:24 +0000)]
[etherswitch] Support Micrel KSZ8995MA switch chip
This is Micrel KSZ8995MA driver code. KSZ8995MA uses SPI bus to control.
This code is written & tested on @SRCHACK's ksz8995ma board and FON2100
with gpiospi.
etherswitchcfg support commands: addtag, ingress, striptag, dropuntagged.
Submitted by: Hiroki Mori <yamori813@yahoo.co.jp>
Reviewed by: mizhka, adrian
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D8790
This patch adds missing hints for ath0 (eepromaddr) and GPIO (mask & leds).
ath0 doesn't work without eeprom hints, so this commit should make wifi
works on Onion Omega.
GPIO mask is required if you want to use gpiobus and GPIO pins on your
board. Onion Omega has several leds connected to gpio pins (one on board,
one color on dock).
This commit adds mask for gpiobus and allow you to turn off/on leds via
/dev/leds/{board,blue,green,red} (on by default).
Tested on Onion Omega 1.
Reviewed by: adrian
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D9107
Alexander Motin [Sat, 14 Jan 2017 20:41:44 +0000 (20:41 +0000)]
Alike to r312190 decouple iSCSI connection limits from defaults.
Connection parameters should remain at defaults until negotiated.
While there, remove sythetic limits, applied if kernel provided none.
iscsid has no own limitations, no configuration and no any idea what
values are good. Assume kernel knows what it requests.
Alexander Motin [Sat, 14 Jan 2017 18:04:12 +0000 (18:04 +0000)]
Decouple iSCSI connection limits from defaults.
If initiator does not negotiate some parameter, it expects one to get
default value, not some unknown remote hardware limit. On the side side,
if some parameter is negotiated, its default value from RFC should not
be used for anything.
Enji Cooper [Sat, 14 Jan 2017 12:55:32 +0000 (12:55 +0000)]
Remove CFLAGS for sha2_test
The previous code used to grab definitions from these openssl/openssh,
but this is no longer needed and is no longer correct. libnetbsd
provides all of the needed definitions
libnetbsd is added to CFLAGS automatically via netbsd-tests.test.mk --
hence all of CFLAGS can be cleared