Add the lb program, which is able to load-balance input traffic
received from a netmap port over M groups, with N netmap pipes in
each group. Each received packet is forwarded to one of the pipes
chosen from each group (using an L3/L4 connection-consistent hash function).
This also adds a man page for lb and some cross-references in related
man pages.
Eugene Grosbein [Mon, 19 Nov 2018 06:33:38 +0000 (06:33 +0000)]
MFC r339465: rc.initdiskless: add support for auxiliary NVRAM.
Currently, rc.inidiskless assumes that local system configuration
changes are kept in some mountable file system. For example,
nanobsd uses dedicated partition mounted as /cfg for this.
However, small embedded devices like MIPS routers may have no enough flash
space to keep full-blown file system but have only one or couple
small flash blocks to keep persistent local configuration overrides.
This change extends rc.initdiskless and introduces ability to run auxiliary
command /conf/T/M/extract that is supposed to extract configuration overrides
from such local storage.
For example, the command /conf/default/etc/extract may contain something like:
cd "$1" && bsdcpio --quiet -idu < /dev/map/cfg
bsdcpio command extracts compressed archive from the storage to /etc
assuming the storage is exposed by the kernel as /dev/map/cfg to userland.
Rick Macklem [Mon, 19 Nov 2018 00:49:08 +0000 (00:49 +0000)]
MFC: r339999
Fix NFS client vnode locking to avoid a crash during forced dismount.
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.
Ed Maste [Sun, 18 Nov 2018 14:52:16 +0000 (14:52 +0000)]
MFC r340329: build(7): clarify buildenv target can be used for non-cross builds
make buildenv can be used for building for the same architecture as
the host (perhaps this is a degenerate case of cross-building).
TARGET and TARGET_ARCH do not need to be set in this case.
Kristof Provost [Sun, 18 Nov 2018 12:09:26 +0000 (12:09 +0000)]
MFC r340067:
pfsync: Ensure uninit is done before pf
pfsync touches pf memory (for pf_state and the pfsync callback
pointers), not the other way around. We need to ensure that pfsync is
torn down before pf.
Kristof Provost [Sun, 18 Nov 2018 12:04:24 +0000 (12:04 +0000)]
MFC r340066:
Notify that the ifnet will go away, even on vnet shutdown
pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.
Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.
Kristof Provost [Sun, 18 Nov 2018 10:57:31 +0000 (10:57 +0000)]
MFC r339676:
pf: Fix copy/paste error in IPv6 address rewriting
We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.
Kristof Provost [Sun, 18 Nov 2018 10:47:36 +0000 (10:47 +0000)]
MFC r339470:
pf synproxy will do the 3WHS on behalf of the target machine, and once
the 3WHS is completed, establish the backend connection. The trigger
for "3WHS completed" is the reception of the first ACK. However, we
should not proceed if that ACK also has RST or FIN set.
MFC r339554:
Rework if_ipsec(4) to use epoch(9) instead of rmlock.
* use CK_LIST and FNV hash to keep chains of softc;
* read access to softc is protected by epoch();
* write access is protected by ipsec_ioctl_sx. Changing of softc fields
is allowed only when softc is unlinked from CK_LIST chains.
* linking/unlinking of softc is allowed only when ipsec_ioctl_sx is
exclusive locked.
* the plain LIST of all softc is replaced by hash table that uses ingress
address of tunnels as a key.
* added support for appearing/disappearing of ingress address handling.
Now it is allowed configure non-local ingress IP address, and thus the
problem with if_ipsec(4) configuration that happens on boot, when
ingress address is not yet configured, is solved.
MFC r339555:
Follow the fix in r339532 (by glebius):
Fix exiting an epoch(9) we never entered. May happen only with MAC.
MFC r339642:
Remove softc from idhash when interface is destroyed.
MFC r339646:
Add the check that current VNET is ready and access to srchash is
allowed.
ipsec_srcaddr() callback can be called during VNET teardown, since
ingress address checking subsystem isn't VNET specific. And thus
callback can make access to already freed memory. To prevent this,
use V_ipsec_idhtbl pointer as indicator of VNET readiness. And make
epoch_wait() after resetting it to NULL in vnet_ipsec_uninit() to
be sure that ipsec_srcaddr() is finished its work.
MFC r339551:
Add handling for appearing/disappearing of ingress addresses to if_gif(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
and set it otherwise;
* remove the note about ingress address from BUGS section.
MFC r339552:
Add handling for appearing/disappearing of ingress addresses to if_gre(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
and set it otherwise;
MFC r339553:
Add handling for appearing/disappearing of ingress addresses to if_me(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
and set it otherwise;
Sponsored by: Yandex LLC
MFC r339649:
Add the check that current VNET is ready and access to srchash is allowed.
This change is similar to r339646. The callback that checks for appearing
and disappearing of tunnel ingress address can be called during VNET
teardown. To prevent access to already freed memory, add check to the
callback and epoch_wait() call to be sure that callback has finished its
work.
MFC r339550,339556:
Add KPI that can be used by tunneling interfaces to handle IP addresses
appearing and disappearing on the host system.
Such handling is need, because tunneling interfaces must use addresses,
that are configured on the host as ingress addresses for tunnels.
Otherwise the system can send spoofed packets with source address, that
belongs to foreign host.
The KPI uses ifaddr_event_ext event to implement addresses tracking.
Tunneling interfaces register event handlers and then they are
notified by the kernel, when an address disappears or appears.
ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event()
in the ip_encap.c
MFC r339537:
Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.
MFC r339539:
Add IPFW_RULE_JUSTOPTS flag, that is used by ipfw(8) to mark rule,
that was added using "new rule format". And then, when the kernel
returns rule with this flag, ipfw(8) can correctly show it.
Reported by: lev
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17373
MFC r339535:
Do not allow use `create` keyword as hostname when ifconfig(8) is invoked
for already existing interface.
It appeared, that ifconfig(8) assumes `create` keyword as hostname and
tries to resolve it, when `ifconfig ifname create` invoked for already
existing interface. This can produce some unexpected results, when hostname
resolving has successfully happened. This patch adds check for such case.
When an interface is already exists, and create is only one argument,
return error message. But when there are some other arguments, just remove
create keyword from the arguments list.
MFC r339533:
Add sadb_x_sa2 extension to SADB_ACQUIRE requests.
SADB_ACQUIRE requests are send by kernel, when security policy doesn't
have corresponding security association for outbound packet. IKE daemon
usually registers its handler for such messages and when the kernel asks
for SA it can handle this request. Now such requests will contain
additional fields that can help IKE daemon to create SA. And IKE now
can create SAs using only information from SADB_ACQUIRE request, this
is useful when many if_ipsec(4) interfaces are in use and IKE doesn track
security policies that was installed by kernel.
MFC r339542:
Retire IPFIREWALL_NAT64_DIRECT_OUTPUT kernel option. And add ability
to switch the output method in run-time. Also document some sysctl
variables that can by changed for NAT64 module.
NAT64 had compile time option IPFIREWALL_NAT64_DIRECT_OUTPUT to use
if_output directly from nat64 module. By default is used netisr based
output method. Now both methods can be used, but they require different
handling by rules.
evdev: Use console lock as evdev lock for all supported keyboard drivers.
Now evdev part of keyboard drivers does not take any locks if corresponding
input/eventN device node is not opened by userland consumers.
Do not assert console lock inside evdev to handle the cases when keyboard
driver is called from some special single-threaded context like shutdown
thread.
MFC r339824:
evdev: disable evdev if it is invoked from KDB or panic context
This allow to prevent deadlock on entering KDB if one of evdev locks is
already taken by userspace process.
Also this change discards all but LED console events produced by KDB as
unrelated to userspace.
Alan Somers [Thu, 15 Nov 2018 19:06:07 +0000 (19:06 +0000)]
MFC r340314:
libjail: fix handling of allow.mount.fusefs in jailparam_init
fusefs is inconsistently named. The kernel module is named "fuse", but the
mount helper is named "mount_fusefs" and the jail(8) parameter is named
"allow.mount.fusefs". Special case it in libjail.
Reviewed by: jamie
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D17929
netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.
Changelist:
- handle errors returned by virtqueue_enqueue() properly (they were
previously ignored)
- make sure netmap XOR rest of the kernel access each virtqueue.
- compute the number of netmap slots for TX and RX separately, according to
whether indirect descriptors are used or not for a given virtqueue.
- make sure sglist are freed according to their type (mbufs or netmap
buffers)
- add support for mulitiqueue and netmap host (aka sw) rings.
- intercept VQ interrupts directly instead of intercepting them in txq_eof
and rxq_eof. This simplifies the code and makes it easier to make sure
taskqueues are not running for a VQ while it is in netmap mode.
- implement vntet_netmap_config() to cope with changes in the number of queues.
Sponsored by: Sunny Valley Networks
Differential Revision: https://reviews.freebsd.org/D17916
Approved by: re (gjb)
Michael Tuexen [Thu, 15 Nov 2018 17:25:32 +0000 (17:25 +0000)]
MFC r340361:
Fix printing of 64-bit counters on 32-bit ppc platforms.
Several statistic counters are uint64_t values and are printed by systat
using %lu. This results in displaying wrong numbers. Use PRIu64 instead.
While there, print variables of size_t using %zd.
Approved by: re (gjb@)
Differential Revision: https://reviews.freebsd.org/D17838
Glen Barber [Thu, 15 Nov 2018 16:42:59 +0000 (16:42 +0000)]
MFC r340406:
The roff ascii.gz documentation installed to /usr/share/doc
was removed in r318881 when roff was removed from the base
system.
This results in the doc.txz distribution set containing a
single directory (./) which is empty.
Remove the "Additional documentation" option from the menu
selection of bsdinstall(8), as the plain-text documentation
installed in /usr/share/doc is installed as part of the
packageworld target.
The doc entry has not been removed from EXTRA_DISTRIBUTIONS
in Makefile.inc1, in case its removal triggers an issue with
freebsd-update(8), which is currently aware of the world/doc
component, so the empty doc.txz continues to be created as
a precaution.
Approved by: re (rgrimes)
Sponsored by: The FreeBSD Foundation
Kyle Evans [Thu, 15 Nov 2018 16:03:52 +0000 (16:03 +0000)]
MFC r340334: libbe(3): Set canmount properly when activating a new BE
The previously activated BE should have canmount=noauto set on it upon
activation of the new BE, but we previously did not touch canmount on either
old or new BE.
Eric van Gyzen [Wed, 14 Nov 2018 21:31:26 +0000 (21:31 +0000)]
MFC r340425 (by cem)
amdsmn(4)/amdtemp(4): Attach to Ryzen 2 hostbridges
As reported, tested, and patch supplied by Johannes.
There may be future work to do to support multiple sensors, but for now, any
sensor at all is a strict improvement for Ryzen 2 systems.
PR: 228480
Submitted by: Johannes Lundberg <johalun0 AT gmail.com> (earlier version)
Reported by: deischen@, Johannes, and numerous others
Early MFC approved by: cem
Approved by: re (kib)
Relnotes: yes
Stefan Eßer [Wed, 14 Nov 2018 20:35:04 +0000 (20:35 +0000)]
MFC S340428: Prepare move of ctm from base to a port (misc/ctm) by:
- Adding a note to UPDATING
- Adding a note to the history section of the manpage ctm.1
- Adding a message printed to STDERR to the ctm program
This version is meant for release in FreeBSD-12.0 and should remain in
FreeBSD-12 over its life-time.
A follow-up commit will remove ctm from -CURRENT after the MFC to 12
has happened.
Approved by: re
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D17969
Brooks Davis [Mon, 12 Nov 2018 21:51:36 +0000 (21:51 +0000)]
MFC r340302:
Fix freebsd32 mknod(at).
As dev_t is now a 64-bit integer, it requires special handling as a
system call argument. 64-bit arguments are split between two 64-bit
integers due to the way arguments are promoted to allow reuse of most
system call implementations. They must be reassembled before use.
Further, 64-bit arguments at an odd offset (counting from zero) are
padded and slid to the next slot on powerpc and mips. Fix the
non-COMPAT11 system call by adding a freebsd32_mknodat() and
appropriately padded declerations.
The COMPAT11 system calls are fully compatible with the 64-bit
implementations so remove the freebsd32_ versions.
Use uint32_t consistently as the type of the old dev_t. This matches
the old definition.
Brooks Davis [Mon, 12 Nov 2018 18:23:51 +0000 (18:23 +0000)]
Regen after r340377: MFC r340272, r340274, r340294
r340272: Make __sysctl follow the freebsd32_foo convention.
r340274: Make freebsd32_umtx_op follow the freebsd32_foo convention.
r340294: Fix a number of bugs in freebsd32's capabilities.conf.
Brooks Davis [Mon, 12 Nov 2018 18:21:17 +0000 (18:21 +0000)]
MFC r340272, r340274, r340294
r340272:
Make __sysctl follow the freebsd32_foo convention.
Sponsored by: DARPA, AFRL
r340274:
Make freebsd32_umtx_op follow the freebsd32_foo convention.
Sponsored by: DARPA, AFRL
r340294:
Fix a number of bugs in freebsd32's capabilities.conf.
Bugs range from failure to update after changing syscall implementaion
names to using the wrong name. Somewhat confusingly, the name in
capabilities.conf is exactly the string that appears in syscalls.master,
not the name with a COMPAT* prefix which is the actual function name.
Found while making a change to use the default capabilities.conf.
Stephen Hurd [Mon, 12 Nov 2018 16:28:07 +0000 (16:28 +0000)]
MFC r340310:
Fix first-packet completion
The first packet after the ring is initialized was never
completed as isc_txd_credits_update() would not include it in the
count of completed packets. This caused netmap to never complete
a batch. See PR 233022 for more details.
PR: 233022
Reported by: lev
Reviewed by: lev
Approved by: re (kib)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17931
Stephen Hurd [Mon, 12 Nov 2018 16:08:14 +0000 (16:08 +0000)]
MFC r340236:
Fix rxcsum issue introduced in r338838
r338838 attempted to fix issues with rxcsum and rxcsum6.
However, the rxcsum bits were set as though if_setcapenablebit() was
being called, not if_togglecapenable() which is in use. As a result,
it was not possible to disable rxcsum when rxcsum6 was supported.
PR: 233004
Reported by: lev
Reviewed by: lev
Approved by: re (kib)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17881
Devin Teske [Sun, 11 Nov 2018 06:05:28 +0000 (06:05 +0000)]
MFC r339971: Add new rc keywords: enable, disable, delete
This adds new keywords to rc/service to enable/disable a service's
rc.conf(5) variable and "delete" to remove the variable.
When the "service_delete_empty" variable in rc.conf(5) is set to "YES"
(default is "NO") an rc.conf.d file (in /etc/ or /usr/local/etc) is
deleted if empty after modification using "service $foo delete".
MFC r340212:
Sometimes the complete split packet may be queued too early and the
transaction translator will return a NAK. Ignore this message and
retry the complete split instead.
Approved by: re (kib)
Sponsored by: Mellanox Technologies
MFC r340100:
Do not use bzero() for the O_ICMP6TYPE opcode.
The buffer is already zeroed in compile_rule() function, and also it
may contain configured F_NOT flag in o.len field. This fixes the
filling for "not icmp6types" opcode.
MFC r340175:
Do not print "ip6" keyword in print_icmp6types() for O_ICMP6TYPE opcode.
It produces incompatibility when rules listing is used again to
restore saved ruleset, because "ip6" keyword produces separate opcode.
The kernel already has the check and only IPv6 packets will be checked
for matching.
John Baldwin [Thu, 8 Nov 2018 22:39:38 +0000 (22:39 +0000)]
MFC 340164,340168,340170: Add custom cpu_lock_delay() for x86.
340164:
Add a KPI for the delay while spinning on a spin lock.
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI. Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms. However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.
340168:
Add a delay_tsc() static function for when DELAY() uses the TSC.
This uses slightly simpler logic than the existing code by using the
full 64-bit counter and thus not having to worry about counter
overflow.
340170:
Add a custom implementation of cpu_lock_delay() for x86.
Avoid using DELAY() since it can try to use spin locks on CPUs without
a P-state invariant TSC. For cpu_lock_delay(), always use the TSC if
it exists (even if it is not P-state invariant) to delay for a
microsecond. If the TSC does not exist, read from I/O port 0x84 to
delay instead.
Glen Barber [Thu, 8 Nov 2018 21:58:23 +0000 (21:58 +0000)]
MFC r340260 (emaste):
Avoid buffer underwrite in icmp_error
icmp_error allocates either an mbuf (with pkthdr) or a cluster depending
on the size of data to be quoted in the ICMP reply, but the calculation
failed to account for the additional padding that m_align may apply.
Include the ip header in the size passed to m_align. On 64-bit archs
this will have the net effect of moving everything 4 bytes later in the
mbuf or cluster. This will result in slightly pessimal alignment for
the ICMP data copy.
Also add an assertion that we do not move m_data before the beginning of
the mbuf or cluster.
Approved by: re (kib)
Security: CVE-2018-17156
Sponsored by: The FreeBSD Foundation
Tijl Coosemans [Thu, 8 Nov 2018 19:56:29 +0000 (19:56 +0000)]
MFC r340181, r340185:
On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers. The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver. Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.
Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration. These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.
Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again. That is a much bigger effort
however and this commit is meant to be MFCable.
This enables linux64 support in x11/nvidia-driver*.