MFC r341538:
ipoib: Notify on modify QP failure only when relevant
Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.
The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.
MFC r341537:
ipoib: increase the non-cm queue length
When a packet needs fragmentation, it might generate more than 3 fragments.
With the queue length 3, all fragments are generated faster than the
queue is drained, which effectively drops fourth and later fragments on
the floor.
MFC r341536:
ipoib: Don't do a light flush when MTU is unchanged.
When changing the MTU of ibX network interfaces, check that the MTU was really
changed before requesting an update of the multicast rules. Else we might go
into an infinite loop joining and leaving ibX multicast groups towards the
opensm master interface.
MFC r341535:
ipoib: correct setting MTU from inside ipoib(4).
It is not enough to set ifnet->if_mtu to change the interface MTU.
System saves the MTU for route in the radix tree, and route cache keeps
the interface MTU as well. Since addition of the multicast group causes
recalculation of MTU, even bringing the interface up changes MTU from
4042 to 1500, which makes the system configuration inconsistent. Worse,
ip_output() prefers route MTU over interface MTU, so large packets are
not fragmented and dropped on floor.
Fix it for ipoib(4) using the same approach (or hack) as was applied
for it_tun/if_tap in r339012. Thanks to bz@ for giving the hint.
MFC r341534:
ibcore: Fix clearing of bound device interface.
Binding to a loopback device is not allowed. Make sure the destination
device address is global by clearing the bound device interface.
Only do this conditionally, else link local addresses won't work.
Trying to validate loopback fails because rtalloc1() resolves system
local addresses to the loopback network interface, lo0. Fix this by
explicitly checking for loopback during validation of the source
and destination network address. If the source address belongs to
a local network interface and is equal to the destination address,
there is no need to run the destination address through rtalloc1().
The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:
Overrunning array class->method_table of 80 8-byte elements at element index 127
(byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class)
(which evaluates to 127).
The port number in the listen_id_priv has been observed to be zero which
means no port has been selected. The current code lacks a check for invalid
port number.
MFC r340360:
Add ability to use dynamic external prefix in ipfw_nptv6 module.
Now an interface name can be specified for nptv6 instance instead of
ext_prefix. The module will track if_addr_ext events and when suitable
IPv6 address will be added to specified interface, it will be configured
as external prefix. When address disappears instance becomes unusable,
i.e. it doesn't match any packets.
Changelist:
- Replace netmap passthrough host support with a more general
mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop.
No kernel threads are used to use this feature: the application
is required to spawn a thread (or a process) and issue a
SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The
kernel loop is executed by the ioctl implementation, which returns
to userspace only when a different thread calls SYNC_KLOOP_STOP
or the netmap file descriptor is closed.
- Update the if_ptnet driver to cope with the new data structures,
and prune all the obsolete ptnetmap code.
- Add support for "null" netmap ports, useful to allocate netmap_if,
netmap_ring and netmap buffers to be used by specialized applications
(e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect.
- Various fixes and code refactoring.
Sponsored by: Sunny Valley Networks
Differential Revision: https://reviews.freebsd.org/D18015
Cy Schubert [Tue, 11 Dec 2018 01:49:06 +0000 (01:49 +0000)]
As part of the general cleanup of the ipfilter code, special cases
are committed separately to document fixing them separately from
the general cleanup. In this case we don't want to hide the utter
brokenness of what is being fixed.
Clean up a discombobulated block of #if's, with one block unreachable.
ip_fil.c is used in ipftest which is used to dry-run test ipfilter
rules in userspace without loading them in the kernel. The call to
(*ifp->if_output) matches that in the FreeBSD kernel.
Further testing and work will be required to make ipftest fully
functional.
Prevent periodic/etc/weekly/340.noid from descending into root directories
of jails. Jails have their own user/group databases and this script
can produce multiple false warnings, not to mention significant extra
load in case of large jailed subtrees. Leave this check for jailed
invocations of the same script.
Eugene Grosbein [Mon, 10 Dec 2018 14:09:15 +0000 (14:09 +0000)]
MFC r340321: Move definition of $jail_conf variable to /etc/defaults/rc.conf
from jail startup script so it can be successfully queried
with the command "sysrc jail_conf".
Eugene Grosbein [Mon, 10 Dec 2018 13:41:28 +0000 (13:41 +0000)]
MFC r340319: jail(8): introduce new command option -e to exhibit
a list of configured non-wildcard jails with their parameters,
no matter running or not.
The option -e takes separator argument that is used
to separate printed parameters. It will be used with following
additions to system periodic scripts to differentiate parts
of directory tree belonging jails as opposed to host's.
If word in ${param?word} is missing, the shell shall write a default error
message. So expanding ${param?} when param is not set should write an error
message like
Matcher function incorrectly assumed that moffset that we get from
findmust is in bytes. Fix this by introducing a stepback function,
taking short path if MB_CUR_MAX is 1, and going back byte-by-byte,
checking if we have a legal character sequence otherwise.
Cy Schubert [Sat, 8 Dec 2018 17:50:00 +0000 (17:50 +0000)]
MFC r341377, r341388 (fixup):
Restore handling of PMTU discovery, removed through an unifdef(1)
following the MFV of r254219 into r255332. In addition the 'FreeBSD'
macro was never defined in ipfilter 5.1.2 thus it never would have
been enabled in the first place.
This work is prompted by a general cleanup of the IP Filter code
prompted by working to resolve a PR. More to follow.
Cy Schubert [Sat, 8 Dec 2018 17:28:52 +0000 (17:28 +0000)]
MFC r341384:
Remove IFF_DRVRLOCK as it is used in IRIX only (and we all know IRIX
is dead). This includes collaterally removing code shared by HP/UX,
SGI, and Linux, where IP Filter will in all likelihood for various
reasons never run again.
Mike Karels [Sat, 8 Dec 2018 14:54:33 +0000 (14:54 +0000)]
MFC r340474:
Fix flags collision causing inability to enable CBQ in ALTQ
The CBQ BORROW flag conflicts with the RMCF_CODEL flag; the
two sets of definitions actually define the same things. The symptom
is that a kernel with CBQ support and not CODEL fails to load a QoS
policy with the obscure error "pfctl: DIOCADDALTQ: Cannot allocate memory."
If ALTQ_DEBUG is enabled, the error becomes a little clearer:
"rmc_newclass: CODEL not configured for CBQ!" is printed by the kernel.
There really shouldn't be two sets of macros that have to be defined
consistently, but the include structure isn't right for exporting
CBQ flags to altq_rmclass.h. Re-align the definitions, and add
CTASSERTs in the kernel to ensure that the definitions are consistent.
MFC r341008:
Fix possible panic during ifnet detach in rtsock.
The panic can happen, when some application does dump of routing table
using sysctl interface. To prevent this, set IFF_DYING flag in
if_detach_internal() function, when ifnet under lock is removed from
the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent
ifnet detach during routes enumeration. In case, if some interface was
detached in the time before we take the lock, add the check, that ifnet
is not DYING. This prevents access to memory that could be freed after
ifnet is unlinked.
MFC r341334:
Adapt the fix in r341008 to correctly work with EBR.
IFNET_RLOCK_NOSLEEP() is epoch_enter_preempt() in FreeBSD 12+. Holding
it in sysctl_rtsock() doesn't protect us from ifnet unlinking, because
unlinking occurs with IFNET_WLOCK(), that is rw_wlock+sx_xlock, and it
doesn check that concurrent code is running in epoch section. But while
we are in epoch section, we should be able to do access to ifnet's
fields, even it was unlinked. Thus do not change if_addr and if_hw_addr
fields in ifnet_detach_internal() to NULL, since rtsock code can do
access to these fields and this is allowed while it is running in epoch
section.
This should fix the race, when ifnet_detach_internal() unlinks ifnet
after we checked it for IFF_DYING in sysctl_dumpentry.
Move free(ifp->if_hw_addr) into ifnet_free_internal(). Also remove the
NULL check for ifp->if_description, since free(9) can correctly handle
NULL pointer.
Yuri Pankov [Thu, 6 Dec 2018 10:48:46 +0000 (10:48 +0000)]
MFC r340204:
Cleanup locale tools:
- Simplify the source dir specification, and update README
appropriately
- Drop the LC (doonly) processing, it's broken, and even if fixed, not
really useful
- Don't remove the target directories while installing new data as it
removes Makefile.depend which we don't manage; only rm the files we
are going to add/replace/delete instead
- Restrict adding bsd.endian.mk to colldef and ctypedef Makefiles, it's
not needed in other (text-only) categories
- GC unused scripts; they don't seem to be particularly helpful standalone
as well
Yuri Pankov [Thu, 6 Dec 2018 10:41:22 +0000 (10:41 +0000)]
MFC r340144:
Add hybrid C.UTF-8 locale being identical to default C locale except
that it uses the same ctype maps and functions as other UTF-8 locales.
Yuri Pankov [Wed, 5 Dec 2018 17:10:06 +0000 (17:10 +0000)]
MFC r339827:
localedef: define characters in "space" class also as "print", except
for the known conflicts ("control" characters can't be "print"able).
POSIX doesn't explicitly forbid this, and actually includes <space>
character in "print".
Michael Tuexen [Tue, 4 Dec 2018 22:14:18 +0000 (22:14 +0000)]
MFC r340782:
A TCP stack is required to check SEG.ACK first, when processing a
segment in the SYN-SENT state as stated in Section 3.9 of RFC 793,
page 66. Ensure this is also done by the TCP RACK stack.
Reviewed by: rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18034
Michael Tuexen [Tue, 4 Dec 2018 22:11:41 +0000 (22:11 +0000)]
MFC r340777:
Ensure that the default RTT stack can make an RTT measurement if
the TCP connection was initiated using the RACK stack, but the
peer does not support the TCP RACK extension.
This ensures that the TCP behaviour on the wire is the same if
the TCP connection is initated using the RACK stack or the default
stack.
Reviewed by: rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18032
Michael Tuexen [Tue, 4 Dec 2018 22:05:36 +0000 (22:05 +0000)]
MFC r340738:
Improve two KASSERTs in the TCP RACK stack.
There are two locations where an always true comparison was made in
a KASSERT. Replace this by an appropriate check and use a consistent
panic message. Also use this code when checking a similar condition.
Gordon Tetlow [Tue, 4 Dec 2018 18:31:21 +0000 (18:31 +0000)]
MFC r341484
Always treat firmware request and response sizes as unsigned.
This fixes an incomplete bounds check on the guest-supplied request
size where a very large request size could be interpreted as a negative
value and not be caught by the bounds check.
Submitted by: jhb
Reported by: Reno Robert
Approved by: so
Security: FreeBSD-SA-18:14.bhyve
Security: CVE-2018-17160
Eugene Grosbein [Tue, 4 Dec 2018 07:39:54 +0000 (07:39 +0000)]
MFC r340135: Make ng_pptpgre(8) netgraph node be able to restore order
for packets reordered in transit instead of dropping them altogether.
It uses sequence numbers of PPtPGRE packets.
A set of new sysctl(8) added to control this ability or disable it:
net.graph.pptpgre.reorder_max (1) defines maximum length of node's
private reorder queue used to keep data waiting for late packets.
Zero value disables reordering. Default value 1 allows the node to restore
the order for two packets swapped in transit. Greater values allow the node
to deliver packets being late after more packets in sequence
at cost of increased kernel memory usage.
net.graph.pptpgre.reorder_timeout (1) defines time value in miliseconds
used to wait for late packets. It may be useful to increase this
if reordering spot is distant.
Stephen Hurd [Mon, 3 Dec 2018 15:18:35 +0000 (15:18 +0000)]
MFC r341156:
Fix first-packet completion
The first packet after the ring is initialized was never
completed as isc_txd_credits_update() would not include it in the
count of completed packets. This caused netmap to never complete
a batch. See PR 233022 for more details.
This is the same fix as the r340310 for e1000
PR: 233607
Reported by: lev
Reviewed by: lev
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D18368
Ian Lepore [Mon, 3 Dec 2018 04:07:18 +0000 (04:07 +0000)]
MFC r341071, r341160
r341071:
Restore the ability to override the disk unit/partition at the boot: prompt
in gptboot.
When arch-independent geli support was added, a new static 'gdsk' struct
was added, but there was still a static 'dsk' struct, and when you typed
in an alternate disk/partition, the string was parsed into that struct,
which was then never used for anything. Now the string gets parsed into
gdsk.dsk, the struct that's actually used.
r341160:
Add comments describing the bootargs handoff between loader(8) and gptboot
or zfsboot, when loader(8) is the BTX loader. No functional changes.
Ed Maste [Mon, 3 Dec 2018 02:32:39 +0000 (02:32 +0000)]
MFC r340095: Remove apparently unused 0-byte files that cause grief on Windows
r235274 added a sort regression test (it operates by comparing output
against GNU sort). The commit included a number of 0-byte files, one
of which ends in a trailing . which reportedly breaks svn/git checkouts
on Windows.
It appears these were added accidentally, so just remove them.