Eric Badger [Tue, 14 Feb 2017 17:13:23 +0000 (17:13 +0000)]
sleepq_catch_signals: do thread suspension before signal check
Since locks are dropped when a thread suspends, it's possible for another
thread to deliver a signal to the suspended thread. If the thread awakens from
suspension without checking for signals, it may go to sleep despite having
a pending signal that should wake it up. Therefore the suspension check is
done first, so any signals sent while suspended will be caught in the
subsequent signal check.
Alexander Motin [Tue, 14 Feb 2017 16:33:42 +0000 (16:33 +0000)]
Do not rely on data alignment after m_pullup().
In general case m_pullup() does not really guarantee any data alignment.
Instead of depenting on side effects caused by data being always copied
out of mbuf cluster (which is probably a bug by itself), always allocate
aligned BHS buffer and read data there directly from socket.
While there, reuse new icl_conn_receive_buf() function to read digests.
The code could probably be even more optimized to aggregate those reads,
but until that done, this is still easier then the way it was before.
Andriy Gapon [Tue, 14 Feb 2017 13:54:05 +0000 (13:54 +0000)]
try to fix RACCT_RSS accounting
There could be a race between the vm daemon setting RACCT_RSS based on
the vm space and vmspace_exit (called from exit1) resetting RACCT_RSS to
zero. In that case we can get a zombie process with non-zero RACCT_RSS.
If the process is jailed, that may break accounting for the jail.
There could be other consequences.
Fix this race in the vm daemon by updating RACCT_RSS only when a process
is in the normal state. Also, make accounting a little bit more
accurate by refreshing the page resident count after calling
vm_pageout_map_deactivate_pages().
Finally, add an assert that the RSS is zero when a process is reaped.
Bjoern A. Zeeb [Tue, 14 Feb 2017 01:20:03 +0000 (01:20 +0000)]
Use %s __func__ to print the actual function name (been looking at
the wrong one for too often lately at first), and also use %#lx to
get the 0x prefix for the address.
[sdhci_acpi] Add support for Bay Trail SDHC SD card slot
Add ACPI device 80860F14 with _UID 3 to the list of known devices. It
make SD card available on NUCs and Minnowboard. Previously added _UID 1
covered only eMMC devices.
Dimitry Andric [Mon, 13 Feb 2017 20:56:53 +0000 (20:56 +0000)]
Fix build of BSD dtc when NDEBUG is defined (MK_ASSERT_DEBUG=no):
* Initialize correct parent in binary_operator's constructor.
* Include <errno.h> explicitly, otherwise errno is undefined (without
NDEBUG, this is accidentally 'fixed' by including <iostream>).
Alexander Motin [Mon, 13 Feb 2017 20:36:28 +0000 (20:36 +0000)]
Remove M_PKTHDR from m_getm2() in icl_pdu_append_data().
ip_data_mbuf is always appended to ip_bhs_mbuf, so it does not need own
packet header. This change first avoids allocation/initialization of the
header, and then avoids dropping one when it later gets to socket buffer.
Ed Schouten [Mon, 13 Feb 2017 19:00:09 +0000 (19:00 +0000)]
Make <sys/event.h> work on its own.
Right now this header file doesn't want to build when included on its
own, as it depends on some integer types that are only declared
internally. Switch it over to use <sys/_types.h> and the __*
counterparts.
The inpcb structure has inp_sp pointer that is initialized by
ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec
security policies associated with a specific socket.
An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket
options to configure these security policies. Then ip[6]_output()
uses inpcb pointer to specify that an outgoing packet is associated
with some socket. And IPSEC_OUTPUT() method can use a security policy
stored in the inp_sp. For inbound packet the protocol-specific input
routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms
to inbound security policy configured in the inpcb.
SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends
packets. Thus IPSEC_OUTPUT() method does not consider such packets as
associated with some socket and can not apply security policies
from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY()
method is called from protocol-specific input routine, it can specify
inpcb pointer and associated with socket inbound policy will be
checked. But there are two problems:
1. Such check is asymmetric, becasue we can not apply security policy
from inpcb for outgoing packet.
2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and
access to inp_sp is protected. But for SCTP this is not correct,
becasue SCTP uses own locks to protect inpcb.
To fix these problems remove IPsec related PCB code from SCTP.
This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options
will be not applicable to SCTP sockets. To be able correctly check
inbound security policies for SCTP, mark its protocol header with
the PR_LASTHDR flag.
Rename kern_vm_* functions to kern_*. Move the prototypes to
syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as
needed, to avoid headers pollution.
Requested by: alc, jhb
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D9535
Return full channel list via iwi_getradiocaps() method
(ieee80211_init_channels() was replaced with iwi_getradiocaps()
to be consistent with other drivers).
PR: 216923
Submitted and tested by: ds@ukrhub.net (original patch)
MFC after: 5 days
Sean Bruno [Sun, 12 Feb 2017 23:06:41 +0000 (23:06 +0000)]
Only trigger em_local_timer on queue index 0. This was causing continuous
em_local_timer() executions during normal operation and was very likely
to cause a lock up on igb(4) devices.
Consistently handle negative or wrapping offsets in the mmap(2) syscalls.
For regular files and posix shared memory, POSIX requires that
[offset, offset + size) range is legitimate. At the maping time,
check that offset is not negative. Allowing negative offsets might
expose the data that filesystem put into vm_object for internal use,
esp. due to OFF_TO_IDX() signess treatment. Fault handler verifies
that the mapped range is valid, assuming that mmap(2) checked that
arithmetic gives no undefined results.
For device mappings, leave the semantic of negative offsets to the
driver. Correct object page index calculation to not erronously
propagate sign.
In either case, disallow overflow of offset + size.
Update mmap(2) man page to explain the requirement of the range
validity, and behaviour when the range becomes invalid after mapping.
Reported and tested by: royger (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Dmitry Chagin [Sun, 12 Feb 2017 15:22:50 +0000 (15:22 +0000)]
Fix r313284.
Members of the syscall argument structures are padded to a word size. So,
for COMPAT_LINUX32 we should convert user supplied system call arguments
which is 32-bit in that case to the array of register_t.
Ian Lepore [Sun, 12 Feb 2017 00:52:22 +0000 (00:52 +0000)]
Enable usb low and full speed devices connected to the imx6 root hubs.
This enables the PHY circuitry for UTMI+ level 2 and 3, and sets the
flag to tell the ehci code that the root hub has a transaction translator
in it. For imx6 we can use the standard ehci_get_port_speed_portsc()
function to find out what speed device is connected to the port.
Change type of the prot parameter for kern_vm_mmap() from vm_prot_t to int.
This makes the code to pass whole word of the mmap(2) syscall argument
prot to the syscall helper kern_vm_mmap(), which can validate all
bits. The change provides temporal fix for sys/vm/mmap_test
mmap__bad_arguments, which was broken after r313352.
PR: 216976
Reported and tested by: ngie
Sponsored by: The FreeBSD Foundation
Enji Cooper [Sat, 11 Feb 2017 20:27:06 +0000 (20:27 +0000)]
Use SRCTOP to refer to awk source in contrib/awk and remove unnecessary AWKSRC prefix
for maketab.c
The former simplifies pathing in make/displayed output, whereas the latter was just
unnecessarily superfluous since .PATH referenced the path to maketab.c earlier on in
the Makefile.
Ryan Stone [Sat, 11 Feb 2017 17:05:08 +0000 (17:05 +0000)]
Don't zero out srtt after excess retransmits
If the TCP stack has retransmitted more than 1/4 of the total
number of retransmits before a connection drop, it decides that
its current RTT estimate is hopelessly out of date and decides
to recalculate it from scratch starting with the next ACK.
Unfortunately, it implements this by zeroing out the current RTT
estimate. Drop this hack entirely, as it makes it significantly more
difficult to debug connection issues. Instead check for excessive
retransmits at the point where srtt is updated from an ACK being
received. If we've exceeded 1/4 of the maximum retransmits,
discard the previous srtt estimate and replace it with the latest
rtt measurement.
Toomas Soome [Sat, 11 Feb 2017 15:25:49 +0000 (15:25 +0000)]
loader: implement MEDIA_FILEPATH_DP support in efipart
The efipart rework did break the ARM systems as the new code is
using more exact filters to sort the devices and we need to
add support for MEDIA_FILEPATH_DP device paths.
Ian Lepore [Sat, 11 Feb 2017 01:07:46 +0000 (01:07 +0000)]
Stop including sys/types.h from arm's machine/atomic.h, fix the places
where atomic.h was being included without ensuring that types.h (via
param.h) was included first, as required by atomic(9).
Kenneth D. Merry [Fri, 10 Feb 2017 22:02:45 +0000 (22:02 +0000)]
Change the isp(4) driver to not adjust the tag type for REQUEST SENSE.
The isp(4) driver was changing the tag type for REQUEST SENSE
commands to Head of Queue, when the CAM CCB flag
CAM_TAG_ACTION_VALID was NOT set. CAM_TAG_ACTION_VALID is set
when the tag action in the XPT_SCSI_IO is not CAM_TAG_ACTION_NONE
and when the target has tagged queueing turned on.
In most cases when CAM_TAG_ACTION_VALID is not set, it is because
the target is not doing tagged queueing. In those cases, trying to
send a Head of Queue tag may cause problems. Instead, default to
sending a simple tag.
IBM tape drives claim to support tagged queueing in their standard
Inquiry data, but have the DQue bit set in the control mode page
(mode page 10). CAM correctly detects that these drives do not
support tagged queueing, and clears the CAM_TAG_ACTION_VALID flag
on CCBs sent down to the drives.
This caused the isp(4) driver to go down the path of setting the
tag action to a default value, and for Request Sense commands only,
set the tag action to Head of Queue.
If an IBM tape drive does get a Head of Queue tag, it rejects it with
Invalid Message Error (0x49,0x00). (The Qlogic firmware translates that
to a Transport Error, which the driver translates to an Unrecoverable
HBA Error, or CAM_UNREC_HBA_ERROR.) So, by default, it wasn't possible
to get a good response from a REQUEST SENSE to an FC-attached IBM
tape drive with the isp(4) driver.
IBM tape drives (tested on an LTO-5 with G9N1 firmware and a TS1150
with 4470 firmware) also have a bug in that sending a command with a
non-simple tag attribute breaks the tape drive's Command Reference
Number (CRN) accounting and causes it to ignore all subsequent
commands because it and the initiator disagree about the next
expected CRN. The drives do reject the initial command with a head
of queue tag with an Invalid Message Error (0x49,0x00), but after that
they ignore any subsequent commands. IBM confirmed that it is a bug,
and sent me test firmware that fixes the bug. However tape drives in
the field will still exhibit the bug until they are upgraded.
Request Sense is not often sent to targets because most errors are
reported automatically through autosense in Fibre Channel and other
modern transports. ("Modern" meaning post SCSI-2.) So this is not
an error that would crop up frequently. But Request Sense is useful on
tape devices to report status information, aside from error reporting.
This problem is less serious without FC-Tape features turned on,
specifically precise delivery of commands (which enables Command
Reference Numbers), enabled on the target and initiator. Without
FC-Tape features turned on, the target would return an error and
things would continue on.
And it also does not cause problems for targets that do tagged
queueing, because in those cases the isp(4) driver just uses the
tag type that is specified in the CCB, assuming the
CAM_TAG_ACTION_VALID flag is set, and defaults to sending a Simple
tag action if it isn't an ordered or head of queue tag.
sys/dev/isp/isp.c:
In isp_start(), don't try to send Request Sense commands
with the Head of Queue tag attribute if the CCB doesn't
have a valid tag action. The tag action likely isn't valid
because the target doesn't support tagged queueing.
John Baldwin [Fri, 10 Feb 2017 19:25:52 +0000 (19:25 +0000)]
Drop the "created from" line from files generated by makesyscalls.sh.
This information is less useful when the generated files are included in
source control along with the source. If needed it can be reconstructed
from the $FreeBSD$ tag in the generated file. Removing this information
from the generated output permits committing the generated files along
with the change to the system call master list without having inconsistent
metadata in the generated files.
Gleb Smirnoff [Fri, 10 Feb 2017 17:46:26 +0000 (17:46 +0000)]
Move tcp_fields_to_net() static inline into tcp_var.h, just below its
friend tcp_fields_to_host(). There is third party code that also uses
this inline.
When using Blue-Flame, BF, the QPN overrides the VLAN, CV, and SV
fields in the WQE. Thus, BF may only be used for QPNs with bits 6,7
unset.
The current ethernet driver code reserves a TX QP range with 256b
alignment.
This is wrong because if there are more than 64 TX QPs in use, QPNs >=
base + 65 will have bits 6/7 set.
This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.
The new mechanism introduced here will support reservation for "Eth
QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:
1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
and request "BF enabled QPs" if BF is supported for the function
2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0] - number of QPs
b. param1[31-24] - flags controlling QPs reservation
Bit 31 refers to Eth blueflame supported QPs. Those QPs must have bits
6 and 7 unset in order to be used in Ethernet.
Bits 24-30 of the flags are currently reserved.
When a function tries to allocate a QP, it states the required
attributes for this QP. Those attributes are considered "best-effort".
If an attribute, such as Ethernet BF enabled QP, is a must-have
attribute, the function has to check that attribute is supported
before trying to do the allocation.
In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those
attributes and masks out unsupported attributes as well. In order to
notify VFs which attributes are supported, the VF uses QUERY_FUNC_CAP
command. This command's mailbox is filled by the PF, which notifies
which QP allocation attributes it supports.
Obtained from: Linux (dual BSD/GPLv2 licensed)
Submitted by: Dexuan Cui @ microsoft . com
Differential Revision: https://reviews.freebsd.org/D8868
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
Flexible and asymmetric allocation of EQs and MSI-X vectors for PF/VFs.
Previously, the mlx4 driver queried the firmware in order to get the
number of supported EQs. Under SRIOV, since this was done before the
driver notified the firmware how many VFs it actually needs, the
firmware had to take into account a worst case scenario and always
allocated four EQs per VF, where one was used for events while the
others were used for completions. Now, when the firmware supports the
asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (-->
MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query
the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X
vectors per function. Moreover, when running in the new
firmware/driver mode, the limitation that the number of EQs should be
a power of two is lifted.
Obtained from: Linux (dual BSD/GPLv2 licensed)
Submitted by: Dexuan Cui @ microsoft . com
Differential Revision: https://reviews.freebsd.org/D8867
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
The file type DTYPE_VNODE can be assigned as a fallback if VOP_OPEN()
did not initialized file type. This is a typical code path used by
normal file systems.
Also, change error returned for inappropriate file type used for
O_EXLOCK to EOPNOTSUPP, as declared in the open(2) man page.
Reported by: cy, dhw, Iblis Lin <iblis@hs.ntnu.edu.tw>
Tested by: dhw
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
These examples show expected behavior of indent(1). They are meant to be used
together with a regression test mechanism, either Kyua, a Makefile or perhaps
something else. The mechanism should in essence do this:
indent -P${test}.pro < ${test}.0 > ${test}.0.run
and compare ${test}.0.stdout to ${test}.0.run. If the files differ or the exit
status isn't 0, the test failed.
* ${test}.pro is an indent(1) profile: a list of options passed through a file.
The program doesn't complain if the file doesn't exist.
* ${test}.0 is a C source file which acts as input for indent(1). It doesn't
have to have any particular formatting, since it's the output that matters.
* ${test}.0.stdout contains expected output. It doesn't have to be formatted in
Kernel Normal Form as the point of the tests is to check for regressions in
the program and not to check that it always produces KNF.
Ermal Luçi [Fri, 10 Feb 2017 05:16:14 +0000 (05:16 +0000)]
The patch provides the same socket option as Linux IP_ORIGDSTADDR.
Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.
The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.
This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.
Sponsored-by: rsync.net
Differential Revision: D9235 Reviewed-by: adrian
Mark Johnston [Fri, 10 Feb 2017 02:01:32 +0000 (02:01 +0000)]
When patching USDT probes, use non-unique names for aliases of weak symbols.
Aliases are normally given names that include a key that's unique for each
input object file. This, for example, ensures that aliases for identically
named local symbols in different object files don't conflict. However, in
some cases the static linker will leave an undefined alias after merging
identical weak symbols, resulting in a link error. A non-unique name allows
the aliases to be merged as well.
Eric Joyner [Fri, 10 Feb 2017 01:04:11 +0000 (01:04 +0000)]
ixl(4): Update to 1.7.12-k
Refresh upstream driver before impending conversion to iflib.
Major new features:
- Support for Fortville-based 25G adapters
- Support for I2C reads/writes
(To prevent getting or sending corrupt data, you should set
dev.ixl.0.debug.disable_fw_link_management=1 when using I2C
[this will disable link!], then set it to 0 when done. The driver implements
the SIOCGI2C ioctl, so ifconfig -v works for reading I2C data,
but there are read_i2c and write_i2c sysctls under the .debug sysctl tree
[the latter being useful for upper page support in QSFP+]).
- Addition of an iWARP client interface (so the future iWARP driver for
X722 devices can communicate with the base driver).
- Compiling this option in is enabled by default, with "options IXL_IW" in
GENERIC.
Increase a chance of devfs_close() calling d_close cdevsw method.
If a file opened over a vnode has an advisory lock set at close,
vn_closefile() acquires additional vnode use reference to prevent
freeing the vnode in vn_close(). Side effect is that for device
vnodes, devfs_close() sees that vnode reference count is greater than
one and refuses to call d_close(). Create internal version of
vn_close() which can avoid dropping the vnode reference if needed, and
use this to execute VOP_CLOSE() without acquiring a new reference.
Note that any parallel reference to the vnode would still prevent
d_close call, if the reference is not from an opened file, e.g. due to
stat(2).
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Do not establish advisory locks when doing open(O_EXLOCK) or open(O_SHLOCK)
for files which do not have DTYPE_VNODE type.
Both flock(2) and fcntl(2) syscalls refuse to acquire advisory lock on
a file which type is not DTYPE_VNODE. Do the same when lock is
requested from open(2).
Restructure the block in vn_open_vnode() which handles O_EXLOCK and
O_SHLOCK open flags to make it easier to quit its execution earlier
with an error.
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Adrian Chadd [Thu, 9 Feb 2017 23:15:11 +0000 (23:15 +0000)]
[ath] initial station side quiet IE support.
This implements hardware assisted quiet IE support. Quiet time is
an optional interval on DFS channels (but doesn't have to be DFS
only channels! sigh) where the station and AP can be quiet in order
to allow for channel utilisation measurements. Typically that's
stuff like radar detection, spectral scan, other-BSS frame sniffing,
checking how busy the air is, etc.
The hardware implements it as one of the generic timers, which is
supplied a period, offset from the trigger period and duration
to stay quiet. The AP can announce quiet time configurations which
change, and so this code also tracks that.
Implementation details:
* track the current quiet time IE
* compare the new one against the previous one - if only the TBTT
counter changes, don't update things
* If tbttcount=1 then program it into the hardware - that is when
it is easiest to program the correct starting offset (one TBTT +
configured offset).
* .. later on check to see if it can be done on any tbttcount
* If the IE goes away then remove the quiet timer and clear the
config
* Upon reset, state change, new beacon - clear quiet time IE
and just let it resync from the next beacon.
History:
This was work done initially by sibridgetech.com in 2011/2012/2013
as part of some FreeBSD wifi DFS contracting work they had for a
third party. They implemented the net80211 quiet time IE pieces
and had some test code for the station side which didn't entirely
use the timers correctly.
I figured out how to use the timers correctly without stopping/starting
the transmit DMA engine each time. When done correctly, the timer
just needs to be programmed once and left alone until the next
configuration change.
So, thanks to Himali Patel and Parthiv Shah for their work way
back then. I finally figured it out and finished it!
TODO:
* Now, I'd rather net80211 did the quiet time IE tracking and parsing,
pushing configurations into the driver is needed. I'll look at
doing that in a subsequent update.
* This doesn't handle multiple quiet time IEs, which will currently
just mess things up. I'll look into supporting that in the future
(at least by only obeying "one" of them, and then ignoring
subsequent IEs in a beacon/probe frame.)
* This also implements the STA side and not the AP side - the AP
side will come later, and involves taking various other intervals
into account (eg the beacon offset for multi-VAP modes, the
SWBA time, etc, etc) as well as obtaining the configuration when
a beacon is configured/generated rather than "hearing" an IE.
Alan Somers [Thu, 9 Feb 2017 21:30:53 +0000 (21:30 +0000)]
Fix setting birthtime in ZFS
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
* In zfs_freebsd_setattr, if the caller wants to set the birthtime,
set the bits that zfs_settattr expects
* In zfs_setattr, if XAT_CREATETIME is set, set xoa_createtime,
expected by zfs_xvattr_set. The two levels of indirection seem
excessive, but it minimizes diffs vs OpenZFS.
* In zfs_setattr, check for overflow of va_birthtime (from delphij)
* Remove red herring in zfs_getattr
sys/cddl/contrib/opensolaris/uts/common/sys/vnode.h
* Un-booby-trap some macros
New tests are under review at https://github.com/pjd/pjdfstest/pull/6