emaste [Mon, 3 Oct 2016 17:49:26 +0000 (17:49 +0000)]
Retire WITHOUT_ELFCOPY_AS_OBJCOPY option
In FreeBSD 11 ELF Tool Chain's elfcopy is installed as objcopy by
default, with the option to switch back to GNU objcopy by setting
WITHOUT_ELFCOPY_AS_OBJCOPY in make.conf.
We plan to remove the outdated in-tree binutils in FreeBSD 12, so
remove the temporary transition aid.
Reviewed by: brooks, imp
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D7337
gallatin [Mon, 3 Oct 2016 13:23:43 +0000 (13:23 +0000)]
Conditionally move initial vfs bio alloc above 4G
On machines with just the wrong amount of physical memory (enough to
have a lot of bufs, but not enough to use VM_FREELIST_DMA32) it is
possible for 32-bit address limited devices to have little to no
memory left when attaching, due to potentially large vfs bio configs
consuming all memory below 4GB not protected by VM_FREELIST_ISADMA.
This causes the 32-bit devices to allocate from VM_FREELIST_ISADMA,
leaving that freelist emtpy when ISA devices need DMAable memory.
Rather than decrease VM_DMA32_NPAGES_THRESHOLD, use the time honored
technique of putting initially allocated kernel data structs
at the end (or at least not the beginning) of memory.
Since this allocation is done at boot and is wired, is not freed,
so the system is low on 32-bit (and ISA) dma'ble memory forever.
So it is a good candidate to move above 4GB.
While here, remove an unneeded round_page() from kmem_malloc's size
argument as suggested by alc. The first thing kmem_malloc() does
is a round_page(size), so there is no need to do it before the call.
emaste [Mon, 3 Oct 2016 13:12:44 +0000 (13:12 +0000)]
libc arc4_stir: use only kern.arandom sysctl
The sysctl cannot fail. If it does fail on some FreeBSD derivative or
after some future change, just abort() so that the problem will be found
and fixed.
It's preferable to provide an arc4random() function that cannot fail and
cannot return poor quality random data. While abort() is not normally
suitable for a library, it makes sense here.
Reviewed by: ed, jonathan, markm
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8077
marcel [Mon, 3 Oct 2016 02:37:28 +0000 (02:37 +0000)]
Prefer <stdint.h> over <sys/types.h>. While here remove redundant
inclusion of <sys/queue.h>.
Move the inclusion of the disk partitioning headers out of order
and inbetween standard headers and local header. They will change
in a subsequent commit.
marcel [Mon, 3 Oct 2016 01:46:47 +0000 (01:46 +0000)]
Replace STAILQ with TAILQ. TAILQs are portable enough that they can
be used on both macOS and Linux. STAILQs are not. In particular,
STAILQ_LAST does not next on Linux. Since neither STAILQ_FOREACH_SAFE
nor TAILQ_FOREACH_SAFE exist on Linux, replace its use with a regular
TAILQ_FOREACH. The _SAFE variant was only used for having the next
pointer in a local variable.
gonzo [Mon, 3 Oct 2016 01:07:06 +0000 (01:07 +0000)]
Fix attach/detach methods
- Initialize lock before starting worker process
- Do not hold lock when destroying evdev. By that time ther should be no
other active code pathes that can access softc
sevan [Mon, 3 Oct 2016 00:25:15 +0000 (00:25 +0000)]
Amend history to mention predecessor originated from 386BSD[1] & current implementation from NetBSD[2].
Reword history since the utility was renamed once more in FreeBSD 5.0.
Separate out author & historical information regarding character code conversion.
Add AUTHORS section.
avos [Sun, 2 Oct 2016 20:35:55 +0000 (20:35 +0000)]
net80211: ieee80211_ratectl*: switch to reusable KPI
Replace various void * / int argument combinations with common structures:
- ieee80211_ratectl_tx_status for *_tx_complete();
- ieee80211_ratectl_tx_stats for *_tx_update();
While here, improve amrr_tx_update() for a bit:
1. In case, if receiver is not known (typical for Ralink USB drivers),
refresh Tx rate for all nodes on the interface.
2. There was a misuse:
- otus(4) sends non-decreasing counters (as originally intended);
- but ural(4), rum(4) and run(4) are using 'read & clear' registers
to obtain statistics for some period of time (and those 'last period'
values are used as arguments for tx_update()). If arguments are not big
enough, they are just discarded after the next call.
Fix: move counting into *_tx_update()
(now otus(4) will zero out all node counters after every tx_update() call)
Tested with:
- Intel 3945BG (wpi(4)), STA mode.
- WUSB54GC (rum(4)), STA / HOSTAP mode.
- RTL8188EU (urtwn(4)), STA mode.
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D8037
avos [Sun, 2 Oct 2016 19:39:23 +0000 (19:39 +0000)]
net80211: add one-vap version of ieee80211_iterate_nodes()
- Add a counter into 'struct ieee80211_node_table' to save current number
of allocated nodes.
(allows to remove array overflow checking in ieee80211_iterate_nodes()).
- Add ieee80211_iterate_nodes_vap() function; unlike non-vap version,
it iterates on nodes for specified vap only.
In addition to the above:
- Remove ieee80211_iterate_nt(); it is not used by drivers / net80211
outside ieee80211_iterate_nodes() function + cannot be separated due
to structural changes in code.
Since size of 'struct ieee80211_node_table' (part of ieee80211com,
which is a part of driver's softc) is changed all wireless drivers /
kernel need to be recompiled.
Tested with wpi(4), STA mode.
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D7996
kib [Sun, 2 Oct 2016 17:02:59 +0000 (17:02 +0000)]
Export the mq_getfd_np() symbol from librt.so, which allows to get
file descriptor for the given posix mqueue. Export the
timer_oshandle_np() symbol to get ktimer id for the given posix timer.
Requested by: Lewis Donzis <lew@perftech.com>
Reviewed by: jilles
Discussed with: kan
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
gonzo [Sun, 2 Oct 2016 03:20:31 +0000 (03:20 +0000)]
Modularize evdev
- Convert "options EVDEV" to "device evdev" and "device uinput", add
modules for both new devices. They are isolated subsystems and do not
require any compile-time changes to general kernel subsytems
- For hybrid drivers that have evdev as an optional way to deliver input
events add option EVDEV_SUPPORT. Update all existing hybrid drivers
to use it instead of EVDEV
- Remove no-op DECLARE_MODULE in evdev, it's not required, MODULE_VERSION
is enough
- Add evdev module dependency to uinput
Submitted by: Vladimir Kondratiev <wulf@cicgroup.ru>
vangyzen [Sun, 2 Oct 2016 01:42:45 +0000 (01:42 +0000)]
Add GARP retransmit capability
A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.
To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).
Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.
Submitted by: David A. Bright <david.a.bright@dell.com>
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D7695
markj [Sun, 2 Oct 2016 00:56:21 +0000 (00:56 +0000)]
rtsold: Log messages about unexpected RAs at LOG_DEBUG.
Because rtsold listens for RAs on a raw socket, it may receive RAs from
interfaces that it does not manage. Such events can result in excessive
logging.
jhb [Sat, 1 Oct 2016 22:12:33 +0000 (22:12 +0000)]
Use timercmp() and timersub() in kdump.
Previously, kdump used the kernel-only timervalsub() macro which required
defining _KERNEL when including <sys/time.h>. Now, kdump uses the existing
userland API. The timercmp() usage to check for a backwards timestamp is
also clearer and simpler than the previous code which checked the result of
the subtraction for a negative value.
While here, take advantage of the 3-arg timersub() to store the subtraction
results in a tempory timeval instead of overwriting the timestamp in the
ktrace record and then having to restore it.
jhb [Sat, 1 Oct 2016 22:08:07 +0000 (22:08 +0000)]
Expose kernel-only errno values if _WANT_KERNEL_ERRNO is defined.
The kernel uses a few negative errno values for internal conditions
such as requesting a system call restart. Normally these errno values
are not exposed to userland. However, kdump needs access to these
values as some of then can be present in a ktrace system call return
record. Previously kdump was defining _KERNEL to gain access to ehse
values, but was then having to manually declare 'errno' (and doing it
incorrectly). Now, kdump uses _WANT_KERNEL_ERRNO instead of _KERNEL
and uses the system-provided declaration of errno.
jhb [Sat, 1 Oct 2016 22:01:41 +0000 (22:01 +0000)]
Handle 64-bit system call arguments (off_t, id_t).
In particular, 64-bit system call arguments use up two register_t
arguments for 32-bit processes. They must also be aligned on a 64-bit
boundary on 32-bit powerpc processes. This fixes the decoding of
lseek(), procctl(), and wait6() arguments for 32-bit processes (both
native and via freebsd32).
Note that the ktrace system call return record only returns a single
register, so the return value of lseek is always truncated to the low
32-bits for 32-bit processes.
rmacklem [Sat, 1 Oct 2016 19:39:09 +0000 (19:39 +0000)]
r297225 broke udp_output() for the case where the "addr" argument
is NULL and the function jumps to the "release:" label.
For this case, the "inp" was write locked, but the code attempted to
read unlock it. This patch fixes the problem.
This case could occur for NFS over UDP mounts, where the server was
down for a few minutes under certain circumstances.
gonzo [Sat, 1 Oct 2016 17:57:32 +0000 (17:57 +0000)]
Use VM_MEMATTR_WRITE_COMBINING memattr for mmap(2) on framebuffer
VM_MEMATTR_WRITE_COMBINING sets write-through cache flag for framebuffer
memory that prevents pixel data from being stuck in cache until evicition
happens
gonzo [Sat, 1 Oct 2016 17:43:02 +0000 (17:43 +0000)]
Provide way for framebuffer driver to request mmap(2) mapping type
On ARM if memattr is not overriden mmap(2) maps framebuffer
memory as WBWA which means part of changes to content in userland
end up in cache and appear on screen gradually as cache lines are
evicted. This change adds configurable memattr that hardware fb
implementation can set to get the memory mapping type it
requires:
- Add new flag FB_FLAG_MEMATTR that indicates that framebuffer
driver overrides default memattr
- Add new field fb_memattr to struct fb_info to specify requested
memattr
Reviewed by: ray
Differential Revision: https://reviews.freebsd.org/D8064
markj [Sat, 1 Oct 2016 01:30:34 +0000 (01:30 +0000)]
nd6_dad_timer(): don't assert that the address is tentative.
It appears that this assertion can be tripped in some cases when
multiple interfaces are on the same link. Until this is resolved, revert a
part of r306305 and simply log a message if the DAD timer fires on a
non-tentative address.
Declare a module for evdev and add dependency to ukbd(4) and ums(4)
Prepare for making evdev a module. "Pure" evdev device drivers (like
touchscreen) and evdev itself can be built as a modules regardless of
"options EVDEV" in kernel config. So if people does not require evdev
functionality in hybrid drivers like ums and ukbd they can, for instance,
kldload evdev and utouchscreen to run FreeBSD in kiosk mode.
cam_periph_ccbwait could return while ccb in progress
In cam_periph_runccb, cam_periph_ccbwait was using the value of the ccb
pinfo.index and status fields to determine whether the ccb was done,
but these fields are updated without a contending lock and could glitch
into states that would be erroneously interpreted as done. Instead,
have cam_periph_ccbwait look for the explicit result of the function
cam_periph_done.
Add NetBSD 5.1.4, 5.2.2 & 7.0.1 releases to the tree.
Ammend the position of NetBSD 6.0.2 release in the tree as it came
after OpenBSD[1] & DragonFlyBSD[2] release according to the release
information.
The entries for the 6.0.5 & 6.1.5 releases were incorrect (fetched from
NetBSD CVS copy) and confirmed with history page[3]
adrian [Fri, 30 Sep 2016 19:59:56 +0000 (19:59 +0000)]
Add librss, a simple wrapper around RSS APIs so applications can begin auto-tuning.
I've used this in a handful of RSS test applications. It is just some
very simple functions to fetch the RSS configuration, query the per-bucket
CPU set, and mark sockets as local to an RSS bucket. It should be sufficient
for both thread-based and process-based workloads.
(Yes, I wrote a manpage.)
This is based on some early RSS API and wrapper API work I did whilst
I was at Netflix. Thanks to Netflix for the very original work that
spawned this; thanks to Peter Grehan for his feedback about RSS APIs
and thanks to Jack Vogel and Navdeep Parhar for the NIC-facing side of the
APIs. These fed into the simple userland API I wrote up here.
Restore pre-r300383 behavior when a frame is sent:
- stop scan;
- send frame;
- when beacon arrives and our bit in TIM is not set - restart the scan.
NOTE:
Ideally, this should introduce new interface (ieee80211_pause_anyscan());
however, since ieee80211_cancel_anyscan() is not used by drivers and only
called by ieee80211_start_pkt() the current patch overrides it's behavior
instead.
Tested with Intel 3945BG, STA mode
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D7979
cem [Fri, 30 Sep 2016 18:12:16 +0000 (18:12 +0000)]
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags
Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.
On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one. (Running a basic file
server workload.)
Submitted by: Anton Rang <rang at acm.org>
Reviewed by: cem (earlier version), kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8041
Compute two new metrics. Disk load, the average number of transactions
we have queued up normaliazed to the queue size. Also compute buckets
of latency to help compute, in userland, estimates of Median, P90, P95
and P99 values.
Previously free vnodes would always by directly returned to the global
LRU list. With this change up to mnt_free_list_batch vnodes are collected
first.
syncer runs always return the batch regardless of its size.
While vnodes on per-mnt lists are not counted as free, they can be
returned in case of vnode shortage.