gonzo [Sun, 2 Oct 2016 03:20:31 +0000 (03:20 +0000)]
Modularize evdev
- Convert "options EVDEV" to "device evdev" and "device uinput", add
modules for both new devices. They are isolated subsystems and do not
require any compile-time changes to general kernel subsytems
- For hybrid drivers that have evdev as an optional way to deliver input
events add option EVDEV_SUPPORT. Update all existing hybrid drivers
to use it instead of EVDEV
- Remove no-op DECLARE_MODULE in evdev, it's not required, MODULE_VERSION
is enough
- Add evdev module dependency to uinput
Submitted by: Vladimir Kondratiev <wulf@cicgroup.ru>
vangyzen [Sun, 2 Oct 2016 01:42:45 +0000 (01:42 +0000)]
Add GARP retransmit capability
A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.
To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).
Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.
Submitted by: David A. Bright <david.a.bright@dell.com>
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D7695
markj [Sun, 2 Oct 2016 00:56:21 +0000 (00:56 +0000)]
rtsold: Log messages about unexpected RAs at LOG_DEBUG.
Because rtsold listens for RAs on a raw socket, it may receive RAs from
interfaces that it does not manage. Such events can result in excessive
logging.
jhb [Sat, 1 Oct 2016 22:12:33 +0000 (22:12 +0000)]
Use timercmp() and timersub() in kdump.
Previously, kdump used the kernel-only timervalsub() macro which required
defining _KERNEL when including <sys/time.h>. Now, kdump uses the existing
userland API. The timercmp() usage to check for a backwards timestamp is
also clearer and simpler than the previous code which checked the result of
the subtraction for a negative value.
While here, take advantage of the 3-arg timersub() to store the subtraction
results in a tempory timeval instead of overwriting the timestamp in the
ktrace record and then having to restore it.
jhb [Sat, 1 Oct 2016 22:08:07 +0000 (22:08 +0000)]
Expose kernel-only errno values if _WANT_KERNEL_ERRNO is defined.
The kernel uses a few negative errno values for internal conditions
such as requesting a system call restart. Normally these errno values
are not exposed to userland. However, kdump needs access to these
values as some of then can be present in a ktrace system call return
record. Previously kdump was defining _KERNEL to gain access to ehse
values, but was then having to manually declare 'errno' (and doing it
incorrectly). Now, kdump uses _WANT_KERNEL_ERRNO instead of _KERNEL
and uses the system-provided declaration of errno.
jhb [Sat, 1 Oct 2016 22:01:41 +0000 (22:01 +0000)]
Handle 64-bit system call arguments (off_t, id_t).
In particular, 64-bit system call arguments use up two register_t
arguments for 32-bit processes. They must also be aligned on a 64-bit
boundary on 32-bit powerpc processes. This fixes the decoding of
lseek(), procctl(), and wait6() arguments for 32-bit processes (both
native and via freebsd32).
Note that the ktrace system call return record only returns a single
register, so the return value of lseek is always truncated to the low
32-bits for 32-bit processes.
rmacklem [Sat, 1 Oct 2016 19:39:09 +0000 (19:39 +0000)]
r297225 broke udp_output() for the case where the "addr" argument
is NULL and the function jumps to the "release:" label.
For this case, the "inp" was write locked, but the code attempted to
read unlock it. This patch fixes the problem.
This case could occur for NFS over UDP mounts, where the server was
down for a few minutes under certain circumstances.
gonzo [Sat, 1 Oct 2016 17:57:32 +0000 (17:57 +0000)]
Use VM_MEMATTR_WRITE_COMBINING memattr for mmap(2) on framebuffer
VM_MEMATTR_WRITE_COMBINING sets write-through cache flag for framebuffer
memory that prevents pixel data from being stuck in cache until evicition
happens
gonzo [Sat, 1 Oct 2016 17:43:02 +0000 (17:43 +0000)]
Provide way for framebuffer driver to request mmap(2) mapping type
On ARM if memattr is not overriden mmap(2) maps framebuffer
memory as WBWA which means part of changes to content in userland
end up in cache and appear on screen gradually as cache lines are
evicted. This change adds configurable memattr that hardware fb
implementation can set to get the memory mapping type it
requires:
- Add new flag FB_FLAG_MEMATTR that indicates that framebuffer
driver overrides default memattr
- Add new field fb_memattr to struct fb_info to specify requested
memattr
Reviewed by: ray
Differential Revision: https://reviews.freebsd.org/D8064
markj [Sat, 1 Oct 2016 01:30:34 +0000 (01:30 +0000)]
nd6_dad_timer(): don't assert that the address is tentative.
It appears that this assertion can be tripped in some cases when
multiple interfaces are on the same link. Until this is resolved, revert a
part of r306305 and simply log a message if the DAD timer fires on a
non-tentative address.
Declare a module for evdev and add dependency to ukbd(4) and ums(4)
Prepare for making evdev a module. "Pure" evdev device drivers (like
touchscreen) and evdev itself can be built as a modules regardless of
"options EVDEV" in kernel config. So if people does not require evdev
functionality in hybrid drivers like ums and ukbd they can, for instance,
kldload evdev and utouchscreen to run FreeBSD in kiosk mode.
cam_periph_ccbwait could return while ccb in progress
In cam_periph_runccb, cam_periph_ccbwait was using the value of the ccb
pinfo.index and status fields to determine whether the ccb was done,
but these fields are updated without a contending lock and could glitch
into states that would be erroneously interpreted as done. Instead,
have cam_periph_ccbwait look for the explicit result of the function
cam_periph_done.
Add NetBSD 5.1.4, 5.2.2 & 7.0.1 releases to the tree.
Ammend the position of NetBSD 6.0.2 release in the tree as it came
after OpenBSD[1] & DragonFlyBSD[2] release according to the release
information.
The entries for the 6.0.5 & 6.1.5 releases were incorrect (fetched from
NetBSD CVS copy) and confirmed with history page[3]
adrian [Fri, 30 Sep 2016 19:59:56 +0000 (19:59 +0000)]
Add librss, a simple wrapper around RSS APIs so applications can begin auto-tuning.
I've used this in a handful of RSS test applications. It is just some
very simple functions to fetch the RSS configuration, query the per-bucket
CPU set, and mark sockets as local to an RSS bucket. It should be sufficient
for both thread-based and process-based workloads.
(Yes, I wrote a manpage.)
This is based on some early RSS API and wrapper API work I did whilst
I was at Netflix. Thanks to Netflix for the very original work that
spawned this; thanks to Peter Grehan for his feedback about RSS APIs
and thanks to Jack Vogel and Navdeep Parhar for the NIC-facing side of the
APIs. These fed into the simple userland API I wrote up here.
Restore pre-r300383 behavior when a frame is sent:
- stop scan;
- send frame;
- when beacon arrives and our bit in TIM is not set - restart the scan.
NOTE:
Ideally, this should introduce new interface (ieee80211_pause_anyscan());
however, since ieee80211_cancel_anyscan() is not used by drivers and only
called by ieee80211_start_pkt() the current patch overrides it's behavior
instead.
Tested with Intel 3945BG, STA mode
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D7979
cem [Fri, 30 Sep 2016 18:12:16 +0000 (18:12 +0000)]
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags
Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.
On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one. (Running a basic file
server workload.)
Submitted by: Anton Rang <rang at acm.org>
Reviewed by: cem (earlier version), kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8041
Compute two new metrics. Disk load, the average number of transactions
we have queued up normaliazed to the queue size. Also compute buckets
of latency to help compute, in userland, estimates of Median, P90, P95
and P99 values.
Previously free vnodes would always by directly returned to the global
LRU list. With this change up to mnt_free_list_batch vnodes are collected
first.
syncer runs always return the batch regardless of its size.
While vnodes on per-mnt lists are not counted as free, they can be
returned in case of vnode shortage.
The blacklistd daemon attempted to restore the filtering rules
before the database of blocked addresses was opened, so no rules
were being reloaded. Now the rules are properly recreated when the
daemon is started with '-r'.
This bug was fixed locally, and then sent upstream to NetBSD.
This changeset is the import the NetBSD version of the change,
which added debugging output to alert about a null database.
Reviewed by: emaste
Obtained from: NetBSD
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Fix a cluster of bugs in list EFI environment variables:
1. Size returned for variable name is in bytes, not CHAR16 (the
UEFI standard is unclear on this, where it is clear on the size of
the variable).
2. Dynamically allocate the buffers so we can grow them if someone
defines a super-long variable name.
These two fixes allow me to examine all the variables in my BIOS and
also removes the repeated printing of variables.
des [Fri, 30 Sep 2016 13:05:32 +0000 (13:05 +0000)]
Reinstate Xr macros that were accidentally removed in a previous
commit. Add some missing cross-references to the SEE ALSO section.
Bump date now that there are content changes.
Move the ConnectX-3 and ConnectX-2 driver from sys/ofed into sys/dev/mlx4
like other PCI network drivers. The sys/ofed directory is now mainly
reserved for generic infiniband code, with exception of the mthca driver.
- Add new manual page, mlx4en(4), describing how to configure and load
mlx4en.
- All relevant driver C-files are now prefixed mlx4, mlx4_en and
mlx4_ib respectivly to avoid object filename collisions when compiling
the kernel. This also fixes an issue with proper dependency file
generation for the C-files in question.
- Device mlxen is now device mlx4en and depends on device mlx4, see
mlx4en(4). Only the network device name remains unchanged.
- The mlx4 and mlx4en modules are now built by default on i386 and
amd64 targets. Only building the mlx4ib module depends on
WITH_OFED=YES .
FreeBSD supports lazy allocation of PCI BAR, that is, when a device
driver's attach method is invoked, even if the device's PCI BAR
address wasn't initialized, the invocation of bus_alloc_resource_any()
(the call chain: pci_alloc_resource() -> pci_alloc_multi_resource() ->
pci_reserve_map() -> pci_write_bar()) would allocate a proper address
for the PCI BAR and write this 'lazy allocated' address into the PCI
BAR.
This model works fine for native FreeBSD device drivers, but _not_ for
device drivers shared with Linux (e.g. dev/mlx5/mlx5_core/mlx5_main.c
and ofed/drivers/net/mlx4/main.c. Both of them use
pci_request_regions(), which doesn't work properly with the PCI BAR
lazy allocation, because pci_resource_type() -> _pci_get_rle() always
returns NULL, so pci_request_regions() doesn't have the opportunity to
invoke bus_alloc_resource_any(). We now use pci_find_bar() in
pci_resource_type(), which is able to locate all available PCI BARs
even if some of them will be lazy allocated.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: hps
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8071
TEGRA: Prepare Tegra subtree for inclusion into ARM generic kernel.
- use DEFINE_CLASS_0() for driver classes
- unify driver names
- cleanup driver definitions and bindings
Replace explicit TUNABLE_INT to sysctl with CTLFLAG_TUN
- Replace tunables-only hw.psm.synaptics_support, hw.psm.trackpoint_support,
and hw.psm.elantech_support with respective sysctls declared with
CTLFLAG_TUN. It simplifies checking them in userland, also makes them
easier to get discovered by user
- Get rid of debug.psm.loglevel and hw.psm.tap_enabled TUNABLE_INT
declaration by adding CTLFLAG_TUN to read/write sysctls that were
already declared for these tunables.
Use the cell-index property as the unit number if available.
Summary:
NXP/Freescale, among others, includes an optional cell-index property
on nodes to denote the SoC block number of the node. This can be useful if, for
example, a node is disabled or nonexistent in the fdt, or the blocks are not
organized in address-sorted order. For instance, on the P1022, DMA2 is located
at CCSR offset 0xC000, while DMA1 is located at 0x21000.
hiren [Fri, 30 Sep 2016 00:10:57 +0000 (00:10 +0000)]
This adds a sysctl which allows you to disable the TCP hostcache. This is handy
during testing of network related changes where cached entries may pollute your
results, or during known congestion events where you don't want to unfairly
penalize hosts.
Prior to r232346 this would have meant you would break any connection with a sub
1500 MTU, as the hostcache was authoritative. All entries as they stand today
should simply be used to pre populate values for efficiency.
In icmp6_reflect() use original source address of erroneous packet as
destination address for source selection algorithm when original
destination address is not one of our own.
Reported by: Mark Kamichoff <prox at prolixium com>
Tested by: Mark Kamichoff <prox at prolixium com>
MFC after: 1 week
Restructure code slightly to save ip_tos bits earlier. Fix the bug
where the ip_tos field is zeroed out before assigning to the iptos
variable. Restore the ip_tos and ip_ver fields only if they have
been zeroed during the pseudo-header checksum calculation.
The IORESOURCE_XXX defines should resemble a bitmask while SYS_RES_XXX
are not bitmasks. Fix return value of pci_resource_flags() to reflect
this change.