MFC r341472:
Add ability to request listing and deleting only for dynamic states.
This can be useful, when net.inet.ip.fw.dyn_keep_states is enabled, but
after rules reloading some state must be deleted. Added new flag '-D'
for such purpose.
Retire '-e' flag, since there can not be expired states in the meaning
that this flag historically had.
Also add "verbose" mode for listing of dynamic states, it can be enabled
with '-v' flag and adds additional information to states list. This can
be useful for debugging.
MFC r341471:
Reimplement how net.inet.ip.fw.dyn_keep_states works.
Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it worked only when the default rule is
"allow from any to any".
Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.
The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.
Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.
ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.
ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.
External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.
Kristof Provost [Fri, 1 Feb 2019 10:04:53 +0000 (10:04 +0000)]
MFC r343418:
pf: Fix use-after-free of counters
When cleaning up a vnet we free the counters in V_pf_default_rule and
V_pf_status from shutdown_pf(), but we can still use them later, for example
through pf_purge_expired_src_nodes().
Free them as the very last operation, as they rely on nothing else themselves.
Build fix for missing NET_EPOCH_XXX() dependencies after r343650.
This patch is to be reverted when the relevant changes are MFC'ed.
This is a direct commit.
MFC r343395:
Fix refcounting leaks in IPv6 MLD code leading to loss of IPv6
connectivity.
Looking at past changes in this area like r337866, some refcounting
bugs have been introduced, one by one. For example like calling
in6m_disconnect() and in6m_rele_locked() in mld_v1_process_group_timer()
where previously no disconnect nor refcount decrement was done.
Calling in6m_disconnect() when it shouldn't causes IPv6 solitation to no
longer work, because all the multicast addresses receiving the solitation
messages are now deleted from the network interface.
This patch reverts some recent changes while improving the MLD
refcounting and concurrency model after the MLD code was converted
to using EPOCH(9).
List changes:
- All CK_STAILQ_FOREACH() macros are now properly enclosed into
EPOCH(9) sections. This simplifies assertion of locking inside
in6m_ifmultiaddr_get_inm().
- Corrected bad use of in6m_disconnect() leading to loss of IPv6
connectivity for MLD v1.
- Factored out checks for valid inm structure into
in6m_ifmultiaddr_get_inm().
MFC r343394:
When detaching a network interface drain the workqueue freeing the inm's
because the destructor will access the if_ioctl() callback in the ifnet
pointer which is about to be freed. This prevents use-after-free.
MFC r343392:
Fix duplicate acquiring of refcount when joining IPv6 multicast groups.
This was observed by starting and stopping rpcbind(8) multiple times.
Brooks Davis [Wed, 30 Jan 2019 23:36:02 +0000 (23:36 +0000)]
MFC r340242:
Add a top-level make target to rebuild all sysent files.
The sysent target is useful when changing makesyscalls.sh, when
making paired changes to syscalls.master files, or in a future where
freebsd32 sysent entries are built from the default syscalls.master.
Marius Strobl [Wed, 30 Jan 2019 11:56:10 +0000 (11:56 +0000)]
MFC: r343481
- In _iflib_fl_refill(), don't mark an RX buffer as available in the
corresponding bitmap before adding an mbuf has actually succeeded.
Previously, m_gethdr(M_NOWAIT, ...) failing caused a "hole" in the
RX ring but not in its bitmap. One implication of such a hole was
that in a subsequent call to _iflib_fl_refill() with the RX buffer
accounting still indicating another reclaimable buffer, bit_ffc(3)
nevertheless returned -1 in frag_idx which in turn caused havoc
when used as an index. Thus, additionally assert that frag_idx is
0 or greater.
Another possible consequence of a hole in the RX ring was a NULL-
dereference when trying to use the unallocated mbuf, for example
in iflib_rxd_pkt_get().
This bug was introduced with r341095, MFCed to stable/12 in r343304.
While at it, make the variable declarations in _iflib_fl_refill()
conform to style(9) and remove redundant checks already performed
by bit_ffc{,_at}(3).
- In iflib_queues_alloc(), don't pass redundant M_ZERO to bit_alloc(3).
Kristof Provost [Tue, 29 Jan 2019 17:49:38 +0000 (17:49 +0000)]
MFC r343295:
pf: Validate psn_len in DIOCGETSRCNODES
psn_len is controlled by user space, but we allocated memory based on it.
Check how much memory we might need at most (i.e. how many source nodes we
have) and limit the allocation to that.
Andrew Gallatin [Mon, 28 Jan 2019 14:34:59 +0000 (14:34 +0000)]
MFC r343430
Fix an iflib driver unload panic introduced in r343085
The new loop to sync and unload descriptors was indexed
by "i", rather than "j". The panic was caused by "i"
being advanced rather than "j", and eventually becoming
out of bounds.
Pedro F. Giffuni [Mon, 28 Jan 2019 02:26:05 +0000 (02:26 +0000)]
MFC r342379, r342383:
gai_strerror() - Update string error messages according to RFC 3493.
Error messages in gai_strerror(3) vary largely among OSs.
For new software we largely replaced the obsoleted EAI_NONAME and
with EAI_NODATA but we never updated the corresponding message to better
match the intended use. We also have references to ai_flags and ai_family
which are not very descriptive for non-developer end users.
Bring new error messages based on informational RFC 3493, which has
obsoleted RFC 2553, and make them consistent among the header and
manpage.
MFC r343238:
urtw(4): add length checks in Rx path.
- Check if buffer can contain Rx descriptor before accessing it.
- Verify upper / lower bounds for frame length.
- Do not pass too short frames into ieee80211_find_rxnode().
While here:
- Move cleanup to the function end.
- Reuse IEEE80211_IS_DATA() macro.
Pedro F. Giffuni [Mon, 28 Jan 2019 01:36:55 +0000 (01:36 +0000)]
MFC r343459:
ext2fs: Add some extra consistency checks for the superblock.
Maliciously formed, or badly corrupted, filesystems can cause kernel
panics. In general, such acts of foot-shooting can only be accomplished
by root, but in a world with VM images that is moving towards automated
mounts it is important to have some form of prevention.
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert
of Fraunhofer FKIE.
Incidentaly this should also fix a memory corruption issue reported by
Dr Silvio Cesare of InfoSect.
Huge thanks to all reseachers for making us aware of the issue.
MFC r343234:
run(4): add more length checks in Rx path.
- Discard frames that are bigger than MCLBYTES (to prevent buffer overrun).
- Check buffer length before accessing its contents.
- Fix len <-> dmalen check - the last includes Rx Wireless information
structure size.
- Fix out-of-bounds read during Rx node search for ACK / CTS frames
(monitor mode only).
While here:
- Mark few suspicious places with comments.
- Move common cleanup to the function end.
MFC r343340:
net80211: fix channel list construction for non-auto operating mode.
Change the way how channel list mode <-> desired mode match is done:
- Match channel list mode for next non-auto desired modes:
* 11b: 11g, 11ng, 11acg;
* 11a: 11na, 11ac
- Add pre-defined channels only when one of the next conditions met:
* the desired channel mode is 'auto' or
* the desired channel and selected channel list modes are exactly
the same or
* the previous rule (11g / 11n / 11ac promotion) applies.
Before r275875 construction work properly for all except
11ng / 11na / 11acg / 11ac modes - these were broken at all
(i.e., the scan list was empty); after r275875 all checks were removed,
so scan table was populated by all device-compatible channels
(desired mode was ignored).
For example, if I will set 'ifconfig wlan0 mode 11ng' for RTL8821AU:
- pre-r275875: nothing, scan will not work;
- after r275875: both 11ng and 11na bands were scanned; also, since 11b
channel list was used, 14th channel was scanned too.
- after this change: only 11ng - 1-13 channels - are used for scanning.
Marius Strobl [Sun, 27 Jan 2019 19:04:02 +0000 (19:04 +0000)]
MFC: r342634
o Don't allocate resources for SDMA in sdhci(4) if the controller or the
front-end doesn't support SDMA or the latter implements a platform-
specific transfer method instead. While at it, factor out allocation
and freeing of SDMA resources to sdhci_dma_{alloc,free}() in order to
keep the code more readable when adding support for ADMA variants.
o Base the size of the SDMA bounce buffer on MAXPHYS up to the maximum
of 512 KiB instead of using a fixed 4-KiB-buffer. With the default
MAXPHYS of 128 KiB and depending on the controller and medium, this
reduces the number of SDHCI interrupts by a factor of ~16 to ~32 on
sequential reads while an increase of throughput of up to ~84 % was
seen.
Front-ends for broken controllers that only support an SDMA buffer
boundary of a specific size may set SDHCI_QUIRK_BROKEN_SDMA_BOUNDARY
and supply a size via struct sdhci_slot. According to Linux, only
Qualcomm MSM-type SDHCI controllers are affected by this, though.
Requested by: Shreyank Amartya (unconditional bump to 512 KiB)
o Introduce a SDHCI_DEPEND macro for specifying the dependency of the
front-end modules on the sdhci(4) one and bump the module version
of sdhci(4) to 2 via an also newly introduced SDHCI_VERSION in order
to ensure that all components are in sync WRT struct sdhci_slot.
o In sdhci(4):
- Make pointers const were applicable,
- replace a few device_printf(9) calls with slot_printf() for
consistency, and
- sync some local functions with their prototypes WRT static.
Since r287197 ieee80211com is a part of drivers softc; as a result,
after detach all pointers to it (iv_ic, ni_ic) are invalid. Most
possible users (tasks, interrupt handlers) are blocked / removed
when device is stopped; however, ioctl handlers were not tracked
and may crash if ieee80211com structure is accessed.
Since ieee80211com pointer access from ieee80211vap structure is not
protected by lock (constant after interface creation) and used in
many other places just use reference counting for ioctl handlers;
on detach set 'detached' flag and wait until reference counter goes to 0.
For KBI stability the last element of iv_spare[] array was reused.
Add a bunch of examples on how to use ZFS features like:
- listing available space,
- setting and displaying a userquota,
- displaying pool I/O statistics and pool history,
- displaying the compression ratio for a dataset,
- various list options (sorting, removing headers),
- performing a dry-run of a snapshot delete,
- removing a range of snapshots,
- setting a custom property,
- preventing removal of a snapshot with ZFS holds,
- permission sets for zfs send/receive.
Additionally, clarify the existing examples a bit when
it comes to displaying space by mentioning UFS explicitly.
Other examples include displaying I/O in top(1), querying
sysctl(8) for active CPUs and available RAM. Mention systat(1)
and its options, too.
While here, reformat the example to upload a dmesg(8) a bit
to wrap properly.
Thanks to Allan Jude for his help with some of the ZFS examples.
MFC r343249:
Fix duplicate wpa_supplicant(8) / hostapd(8) startup with devd(8)
Do not invoke 'wlan_up' function from devd(8) on interface
creation event (an example to create such event:
'ifconfig wlan0 create wlandev rtwn0');
they're typically produced during 'service netif (re)start'
and result in duplicate interface initialization.
From the user side if WPA option is used, this result in messages like:
- /etc/rc.d/wpa_supplicant: WARNING: failed to start wpa_supplicant
or
- wpa_supplicant already running? (pid=xxxx).
(for HOSTAP interfaces this race may result in startup failure).
As a side effect, wpa_supplicant(8) / hostapd(8) will not be
invoked when new wlan(4) interface is created manually and
corresponding configuration for it is present in rc.conf(5).
This change does not affect device attach / removal events.
MFC r343190:
net80211: drop m_pullup call from ieee80211_crypto_decap.
For most wireless drivers Rx mbuf is allocated as one
contiguous chunk; only few are using chains for allocations -
but even then at least MCLBYTES (minus Rx descriptor size) is
available in the first mbuf.
In addition to the above, m_pullup was never called here - otherwise,
reallocation will break post-crypto_decap logic (ieee80211_decap,
ieee80211_deliver_data...), so just remove it; length check is left
in case if some truncated frame appears here.
Alexander Motin [Fri, 25 Jan 2019 19:56:06 +0000 (19:56 +0000)]
MFC r342557, r342559: Reimplement nvd(4) detach handling.
Previous code typically crashed in case of NVMe device unplug or even clean
detach while some I/Os are still in flight. To fix this the new code calls
disk_gone() and waits for confirmation of all references gone before calling
disk_destroy(), freeing other resources and allowing controller detach.
While there, fix disk lists locking and reimplement unit numbers assignment.
Stephen Hurd [Fri, 25 Jan 2019 18:30:12 +0000 (18:30 +0000)]
MFC r343047:
Fix window update issue when scaling disabled
When the TCP window scale option is not used, and the window
opens up enough in one soreceive, a window update will not be sent.
For example, if recwin == 65535, so->so_rcv.sb_hiwat >= 262144, and
so->so_rcv.sb_hiwat <= 524272, the window update will never be sent.
This is because recwin and adv are clamped to TCP_MAXWIN << tp->rcv_scale,
and so will never be >= so->so_rcv.sb_hiwat / 4
or <= so->so_rcv.sb_hiwat / 8.
This patch ensures a window update is sent if the window opens by
TCP_MAXWIN << tp->rcv_scale, which should only happen when the window
size goes from zero to the max expressible.
This issue looks like it was introduced in r306769 when recwin was clamped
to TCP_MAXWIN << tp->rcv_scale.
Michael Tuexen [Fri, 25 Jan 2019 15:44:53 +0000 (15:44 +0000)]
MFC r342879:
Fix getsockopt() for IP_OPTIONS/IP_RETOPTS.
r336616 copies inp->inp_options using the m_dup() function.
However, this function expects an mbuf packet header at the beginning,
which is not true in this case.
Therefore, use m_copym() instead of m_dup().
This issue was found by syzkaller.
Reviewed by: mmacy@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18753
Jilles Tjoelker [Thu, 24 Jan 2019 22:34:30 +0000 (22:34 +0000)]
MFC r343105: libedit: Avoid out of bounds read in 'bind' command
This is CVS revision 1.31 from NetBSD lib/libedit/chartype.c:
Make sure that argv is NULL terminated since functions like tty_stty rely
on it to be so (Gerry Swinslow)
This broke when the wide-character support was enabled in libedit. The
conversion from multibyte to wide-character did not supply the apparently
expected terminating NULL in the new argv array.
Alexander Motin [Wed, 23 Jan 2019 01:23:19 +0000 (01:23 +0000)]
MFC r342558: Switch from mutexes to atomics in GEOM_DEV I/O path.
Mutexes in I/O path there were used twice per I/O to atomically access
several variables to close and/or destroy the device on last request
completion. I found the way to fit all required info into one integer,
suitable for atomic operations. It opened race window on device close,
but addition of timeout to the msleep() there should cover it.
Profiling shows removal of significant spinning time on those mutexes
and IOPS increase from ~600K to >800K to NVMe on 72-core systems.
Alexander Motin [Wed, 23 Jan 2019 00:56:53 +0000 (00:56 +0000)]
Increase MTX_POOL_SLEEP_SIZE from 128 to 1024.
This value remained unchanged for 15 years, and now this bump reduces
lock spinning in GEOM and BIO layers while doing ~1.6M IOPS to 4 NVMe
on 72-core system from ~25% to ~5% by the cost of additional 28KB RAM.
While there, align struct mtx_pool fields to cache lines.
Alexander Motin [Wed, 23 Jan 2019 00:54:08 +0000 (00:54 +0000)]
MFC r342399: Remove CAM SIM lock from NVMe SIM.
CAM does not require SIM lock since FreeBSD 10.4, and NVMe code never
required it at all, using per-queue locks instead. This formally allows
parallel request submission in CAM mode as much as single per-device and
per-queue locks of CAM allow.
Alexander Motin [Tue, 22 Jan 2019 21:04:03 +0000 (21:04 +0000)]
MFC r342977 (by cem): amdtemp(4): Add support for Family 15h, Model >=60h
Family 15h is a bit of an oddball. Early models used the same temperature
register and spec (mostly[1]) as earlier CPU families.
Model 60h-6Fh and 70-7Fh use something more like Family 17h's Service
Management Network, communicating with it in a similar fashion. To support
them, add support for their version of SMU indirection to amdsmn(4) and use
it in amdtemp(4) on these models.
While here, clarify some of the deviceid macros in amdtemp(4) that were
added with arbitrary, incorrect family numbers, and remove ones that were
not used. Additionally, clarify intent and condition of heterogenous
multi-socket system detection.
[1]: 15h adds the "adjust range by -49°C if a certain condition is met,"
which previous families did not have.
Reported by: D. C. <tjoard AT gmail.com>
PR: 234657
Tested by: D. C. <tjoard AT gmail.com>
Andrew Gallatin [Tue, 22 Jan 2019 17:34:53 +0000 (17:34 +0000)]
MFC r341095:
Use busdma unconditionally in iflib
- Remove the complex mechanism to choose between using busdma
and raw pmap_kextract at runtime. The reduced complexity makes
the code easier to read and maintain.
- Fix a bug in the small packet receive path where clusters were
repeatedly mapped but never unmapped. We now store the cluster's
bus address and avoid re-mapping the cluster each time a small
packet is received.
This patch fixes bugs I've seen where ixl(4) will not even
respond to ping without seeing DMAR faults.
I see a small improvement (14%) on packet forwarding tests using
a Haswell based Xeon E5-2697 v3. Olivier sees a small
regression (-3% to -6%) with lower end hardware.
Kristof Provost [Tue, 22 Jan 2019 01:07:18 +0000 (01:07 +0000)]
MFC r343041
pf: silence a runtime warning
Sometimes, for negated tables, pf can log 'pfr_update_stats: assertion failed'.
This warning does not clarify anything for users, so silence it, just as
OpenBSD has.