Kornel Duleba [Thu, 27 Jan 2022 10:25:25 +0000 (11:25 +0100)]
enetc: Wait for pending transmissions before disabling TX queues
According to the RM it's not safe to disable a TX ring while it is busy
transmitting frames.
In order to be safe wait until the ring is empty. (cidx==pidx)
Use this opportunity to remove a set-but-unused variable.
Obtained from: Semihalf
Sponsored by: Alstom Group
Kornel Duleba [Thu, 27 Jan 2022 09:24:26 +0000 (10:24 +0100)]
enetc: Simply TX ring credits counting logic
According to the RM rings can hold at most ring_size - 1 descriptors at any time.
No additional logic is needed since iflib already respects this constrain.
Thanks to that the pidx == cidx situation is not ambiguous and indicates an
empty ring.
Use that to simplify the logic that calculates the amount of processed frames.
Obtained from: Semihalf
Sponsored by: Alstom Group
Kornel Duleba [Thu, 27 Jan 2022 08:26:07 +0000 (09:26 +0100)]
enetc: Disable HW IP packet alignment
The NIC can IP align received packets.
It was observed that it caused some rare stalls, that required full board reset.
Disable this feature for now. It doesn't provide any significant performance
improvement anyway.
Obtained from: Semihalf
Sponsored by: Alstom Group
ufs: be more persistent with finishing some operations
when the vnode is doomed after relock. The mere fact that the vnode is
doomed does not prevent us from doing UFS operations on it while it is
still belongs to UFS, which is determined by non-NULL v_data. Not
finishing some operations, e.g. not syncing the inode block only because
the vnode started reclamation, is not correct.
Add macro IS_UFS() which incapsulates the v_data != NULL, and use it
instead of VN_IS_DOOMED() for places where the operation completion is
important.
Reviewed by: markj, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34072
syncer VOP_FSYNC(): unlock syncer vnode around call to VFS_SYNC()
The lock is unneccessary since the mount point is busied, which prevents
unmount and syncer vnode deallocation. Having the vnode locked causes
innocent LoRs and complicates debugging.
Also stop starting write accounting around it. Any caller of
VOP_FSYNC() must do it already, and sync_vnode() does.
Reported and tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34072
The buffer must not be accessed by any other thread, it is freshly
allocated. As such, LK_NOWAIT should be nop but also it prevents
recording the order between the buffer lock and any other locks we might
own in the call to getnewbuf(). In particular, if we own FFS snap lock,
it should avoid triggering false positive warning.
Reviewed by: markj, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34072
Kirk McKusick [Mon, 31 Jan 2022 01:20:10 +0000 (17:20 -0800)]
In GEOM debugging output, show consumer for cloned and duplicated bio's.
When using bio's created by g_clone_bio() or g_duplicate_bio()
their consumer device (the device to which their I/O requests
are sent) is listed by the geom debugging facility as [unknown].
If available, this update lists the consumer associated with
the bio's parent.
Dimitry Andric [Sun, 30 Jan 2022 20:41:24 +0000 (21:41 +0100)]
Apply clang fix for assertion failure compiling science/chrono
Merge commit 6b0f35931a44 from llvm git (by Jennifer Yu):
Fix signal during the call to checkOpenMPLoop.
The root problem is a null pointer is accessed during the call to
checkOpenMPLoop, because loop up bound expr is an error expression
due to error diagnostic was emit early.
To fix this, in setLCDeclAndLB, setUB and setStep instead return false,
return true when LB, UB or Step contains Error, so that the checking is
stopped in checkOpenMPLoop.
Note this only fixes the assertion reported in bug 261567; some other
tweaks for port dependencies are probably still required to make it
build to completion.
For historical reasons, the integer is stored with an offset of plus 14.
That means, for a given max path length of 1024 the valid values
are -1009 .. 1037 and not -1023 .. 1023
Alexander Motin [Sun, 30 Jan 2022 02:59:03 +0000 (21:59 -0500)]
GEOM: Set G_CF_DIRECT_SEND/RECEIVE for taste consumers.
All I/O requests through the taste consumers are synchronous, done
with g_read_data() and without any locks held. It makes no sense
to delegate the I/O to g_down/g_up threads.
This removes many of context switches during disk retaste.
If the NVMe Controller doesn't support Namespace Management, it should
return "Invalid Namespace or Format" when the Host request Identify
Namespace with the global NSID value.
Chuck Tuffli [Sun, 30 Jan 2022 07:10:59 +0000 (23:10 -0800)]
bhyve nvme: Fix Set Features, AEN
NVMe Controllers which do not support Endurance Groups must return an
error when the Endurance Group Event Aggregate Log Change Notices bit is
set in Set Features, Asynchronous Event Configuration.
Chuck Tuffli [Sun, 30 Jan 2022 07:09:57 +0000 (23:09 -0800)]
bhyve nvme: Fix LBA out-of-range calculation
The function which checks for a valid LBA range mistakenly named an
input value as NLB ("Number of Logical Blocks") instead of "number of
blocks". The NVMe specification defines NLB as a zero-based value (i.e.
NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed
parameter is a 1's-based value.
Fix is to rename the variable to avoid future confusion.
While in the neighborhood, also check that the starting LBA is less than
the size of the backing storage to avoid an integer overflow.
Chuck Tuffli [Sun, 30 Jan 2022 07:09:10 +0000 (23:09 -0800)]
bhyve nvme: Update v1.4 Identify Controller data
Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE).
Also, do not advertise secure erase functionality in the Format NVM
Attributes field of the Identify Controller data structure as the
Controller does not implement secure erase.
Chuck Tuffli [Sun, 30 Jan 2022 07:08:47 +0000 (23:08 -0800)]
bhyve nvme: Add Temperature Threshold support
This adds the ability for a guest OS to send Set / Get Feature,
Temperature Threshold commands. The implementation assumes a constant
temperature and will generate an Asynchronous Event Notification if the
specified threshold is above/below this value. Although the
specification allows 9 temperature values, this implementation only
implements the Composite Temperature.
While in the neighborhood, move the clear of the CSTS register in the
reset function after all other cleanup. This avoids a race with the
guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe
emulation is actually complete with the reset.
Chuck Tuffli [Sun, 30 Jan 2022 07:07:29 +0000 (23:07 -0800)]
bhyve nvme: Remove redundant AER Limit checks
The NVMe emulation checked if the Asynchronous Event Request Limit
(a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function
is only called from nvme_opc_async_event_req() which also checks for
exceeding the AERL.
Chuck Tuffli [Sun, 30 Jan 2022 07:05:58 +0000 (23:05 -0800)]
bhyve nvme: Fix NVM Format completion status
The NVM Format command is unique among the Admin commands in that it
needs to finish asynchronously. For this reason, the emulation code
invented a synthetic completion status (NVME_NO_STATUS) to indicate that
the command was still in progress and the command processing loop should
not generate a completion message. The implementation used the value
0xffff for the synthetic value as this set both the Status Code and
Status Code Type fields to reserved values.
Format initialized the completion status to this value and expected
error cases to override it with a status code/type appropriate to the
situation. The macros used to set the NVMe status are careful not to
modify bit 0 (i.e. the phase bit), which with the synthetic completion
status, causes the phase bit to get out of sync. When running tests in a
guest with illegal NVM Format commands, Admin commands would eventually
hang because it appeared there were no completions due to the incorrect
phase bit value.
Fix is to only set NVME_NO_STATUS if the blockif delete command
succeeds. While in the neighborhood, add a missing break statement when
NVM Format is not supported.
Dimitry Andric [Sat, 29 Jan 2022 21:28:12 +0000 (22:28 +0100)]
Apply llvm fix for assertion failure compiling recent libc++
Merge commit c7c84b90879f from llvm git (by Adrian Prantl):
[DwarfDebug] Refuse to emit DW_OP_LLVM_arg values wider than 64 bits
DwarfExpression::addUnsignedConstant(const APInt &Value) only supports
wider-than-64-bit values when it is used to emit a top-level DWARF
expression representing the location of a variable. Before this change,
it was possible to call addUnsignedConstant on >64 bit values within a
subexpression when substituting DW_OP_LLVM_arg values.
This can trigger an assertion failure (e.g. PR52584, PR52333) when it
happens in a fragment (DW_OP_LLVM_fragment) expression, as
addUnsignedConstant on >64 bit values splits the constant into separate
DW_OP_pieces, which modifies DwarfExpression::OffsetInBits.
This change papers over the assertion errors by bailing on overly wide
DW_OP_LLVM_arg values. A more comprehensive fix might be to be to split
wide values into pointer-sized fragments.
- Use Ar macros for arguments
- Stylize the argument synopsis to the -d flag
- Change the width of the list to one of the actual tags in the list
- Stylize "ugen" and "/dev/ugen" with Cm as those are constant strings,
which are usually treated as command modifiers.
- Break long lines to reduce the number of warnings from linters
- Fix examples; the -d flag is now required when specifying the unit and
the address with the "dot notation".
Peter Jeremy [Sat, 29 Jan 2022 10:15:51 +0000 (21:15 +1100)]
geom_gate: Distinguish between classes of errors
The geom_gate API provides 2 distinct paths for exchanging error
details between the kernel and the userland client: Including an error
code in the g_gate_ctl_io structure passed in the ioctl(2) call or
having the ioctl(2) call return -1 with an error code in errno. The
latter reflects errors in the ioctl(2) call itself whilst the former
reflects errors within the geom_gate instance.
The G_GATE_CMD_START ioctl blocks waiting for an I/O request to be
directed to the geom_gate instance and the wait can fail
(necessitating an error return) if the geom_gate instance is destroyed
or if the msleep(9) fails. The code previously treated both error
cases indentically: Returning ECANCELED as a geom_gate instance error
(which the ggatec treats as a fatal error). Whilst this is the correct
behaviour if the geom_gate instance is destroyed, a msleep(9) failure
is unrelated to the geom_gate instance itself and should be reported
as an ioctl(2) "failure". The distinction is important because
msleep(9) can return ERESTART, which means the system call should be
retried (and this will occur automatically as part of the generic
syscall return processing).
This change alters the msleep(9) handling to directly return the error
code from msleep(9), which ensures ERESTART is correctly handled,
rather than being treated as a fatal error.
- stop on first error
- improve awk script: print the last two characters for bigram - not the second word
- remove unnecessary checks
- use mktemp
- refactor
Kristof Provost [Thu, 27 Jan 2022 21:01:09 +0000 (22:01 +0100)]
mbuf: do not restore dying interfaces
When we remove an interface it is first removed from the interface list
V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for
any possible references to stop being used (i.e.
epoch_wait/epoch_drain_callbacks) before we tear it fully down.
However, the index in ifindex_table is not removed, so m_rcvif_restore()
can still find the (now dying) interface.
This results in panics, for example when dummynet restores the rcvif
pointer and passes a packet to ip6_input() we can panic because the
AF_INET6 domain has already been removed (so we end up dereferencing a
NULL pointer there).
Check that the interface is not dying before we restore it, which is
equivalent to checking its presence in V_ifnet, and thus ensures that
future accesses (while in NET_EPOCH) are safe.
Ed Maste [Fri, 28 Jan 2022 22:15:02 +0000 (17:15 -0500)]
dma: exit if invoked with invalid (zero) argc
This was prompted by the recent pkexec vulnerability (CVE-2021-4034).
This change is being made on general principle for setuid/setgid
binaries and is not in response to an actual issue.
Reviewed by: kevans, markj (both earlier)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34087
Mark Johnston [Fri, 28 Jan 2022 17:49:28 +0000 (12:49 -0500)]
sort: Fix message catalogue usage
- Check that catopen() succeeded before calling catclose(). musl will
crash in the latter if the catalogue descriptor is -1.
- Keep the message catalogue open for most of sort(1)'s actual
operation.
- Don't use catgets(3) to print error messages if catopen(3) had failed.
Reviewed by: arichardson, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34081
John Baldwin [Fri, 28 Jan 2022 21:14:03 +0000 (13:14 -0800)]
Make <vm/vm_extern.h> more self-contained.
Add a nested include of <sys/systm.h> for recently added assertions.
Without this, existing code (such as in drm-kmod) needs to be patched
to add the newly required header.
While here, rewrite the assertions using KASSERT().
John Baldwin [Fri, 28 Jan 2022 21:07:04 +0000 (13:07 -0800)]
iscsi: Allocate a dummy PDU for the internal nexus reset task.
When an iSCSI target session is terminated, an internal nexus reset
task is posted to abort existing tasks belonging to the session.
Previously, the ctl_io for this internal nexus reset stored a pointer
to the session in the slot that normally holds a pointer to the PDU
from the initiator that triggered the I/O request. The completion
handler then assumed that any nexus reset I/O was due to an internal
request and fetched the session pointer (instead of the PDU pointer)
from the ctl_io. However, it is possible to trigger a nexus reset via
an on-the-wire task management PDU. If such a PDU were sent to the
target, then the completion handler would incorrectly treat this
request as an internal request and treat the pointer to the received
PDU as a pointer to the session instead.
To fix, allocate a dummy PDU for the internal reset task and use an
invalid opcode to differentiate internal nexus resets from resets
requested by the initiator.
Alexander Motin [Fri, 28 Jan 2022 19:22:41 +0000 (14:22 -0500)]
glabel: Set G_CF_DIRECT_SEND/RECEIVE for taste consumer.
All I/O requests through the taste consumer are synchronous, done
with g_read_data() and without any locks held. It makes no sense
to delegate the I/O to g_down/g_up threads.
This removes many of context switches during disk retaste.
Alexander Motin [Fri, 28 Jan 2022 19:12:29 +0000 (14:12 -0500)]
GEOM: Relax direct dispatch for GEOM threads.
The only cases when direct dispatch does not make sense is for I/O
submission from down thread and for completion from up thread. In
all other cases, if both consumer and producer are OK about it, we
can save on context switches.
Alexander Motin [Fri, 28 Jan 2022 16:09:30 +0000 (11:09 -0500)]
graid: Set G_CF_DIRECT_SEND for task consumer.
Unlike normal consumers all taste consumer I/O is synchronous, done
with g_read_data() and without any locks held. It makes no sense to
delegate I/O submission to g_down thread.
This should remove number of context switches during disk retaste.
Don't emit messages; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().
Kirk McKusick [Fri, 28 Jan 2022 07:00:51 +0000 (23:00 -0800)]
ufs: handle LoR between snap lock and vnode lock
When a filesystem is mounted all of its associated snapshots must
be activated. It first allocates a snapshot lock (snaplk) that will
be shared by all the snapshot vnodes associated with the filesystem.
As part of each snapshot file activation, it must replace its own
ufs vnode lock with the snaplk. In this way acquiring the snaplk
gives exclusive access to all the snapshots for the filesystem.
A write to a ufs vnode first acquires the ufs vnode lock for the
file to be written then acquires the snaplk. Once it has the snaplk,
it can check all the snapshots to see if any of them needs to make
a copy of the block that is about to be written. This ffs_copyonwrite()
code path establishes the ufs vnode followed by snaplk locking
order.
When a filesystem is unmounted it has to release all of its snapshot
vnodes. Part of doing the release is to revert the snapshot vnode
from using the snaplk to using its original vnode lock. While holding
the snaplk, the vnode lock has to be acquired, the vnode updated
to reference it, then the snaplk released. Acquiring the vnode lock
while holding the snaplk violates the ufs vnode then snaplk order.
Because the vnode lock is unused, using LK_EXCLUSIVE | LK_NOWAIT
to acquire it will always succeed and the LK_NOWAIT prevents the
reverse lock order from being recorded.
This change was made in January 2021 (173779b98f) to avoid an LOR
violation in ffs_snapshot_unmount(). The same LOR issue was recently
found again when removing a snapshot in ffs_snapremove() which must
also revert the snaplk to the original vnode lock as part of freeing it.
The unwind in ffs_snapremove() deals with the case in which the
snaplk is held as a recursive lock holding multiple references.
Specifically an equal number of references are made on the vnode
lock. This change factors out the lock reversion operations into a
new function revert_snaplock() which handles both the recursive
locks and avoids the LOR. The new revert_snaplock() function is
then used in both ffs_snapshot_unmount() and in ffs_snapremove().
Rick Macklem [Thu, 27 Jan 2022 23:30:26 +0000 (15:30 -0800)]
nfsclient: Delete unused function nfscl_getcookie()
The function nfscl_getcookie(), which is essentially the
same as ncl_getcookie(), is never called, so delete it.
This is probably cruft left over from the port of the
NFSv4 code to FreeBSD several years ago.
Found while modifying the code to better use the
directory offset cookies.
John Baldwin [Thu, 27 Jan 2022 22:42:40 +0000 (14:42 -0800)]
Change the return value of _Unwind_GetCFA in include/unwind.h.
I tested the original commit as part of a series that culminates in
removing this header and installing LLVM libunwind's unwind.h in its
place so missed updating this header as was done in b84693501af6.
Pointy hat to: jhb
Reported by: kevans
Fixes: 3a502289d316 Use uintptr_t for return type of _Unwind_GetCFA.
tcp: Tidying up the conditionals for unwinding a spurious RTO
- Use the semantically correct TSTMP_xx macro when comparing
timestamps. (No functional change)
- check for bad retransmits only when TSopt is present in ACK
(don't assume there will be a valid TSopt in the TCP options struct)
- exclude tsecr == 0, since that most likely indicates an
invalid ts echo return (tsecr) value.
Reviewed By: tuexen, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34062
tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is
incorrectly detected and rewound, messing up various tcpcb values,
which can lead to a panic when SACK is in use.
Reviewed By: tuexen, chengc_netapp.com, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D33979
Kyle Evans [Thu, 27 Jan 2022 18:02:17 +0000 (12:02 -0600)]
cp: fix some cases with infinite recursion
As noted in the PR, cp -R has some surprising behavior. Typically, when
you `cp -R foo bar` where both foo and bar exist, foo is cleanly copied
to foo/bar. When you `cp -R foo foo` (where foo clearly exists), cp(1)
goes a little off the rails as it creates foo/foo, then discovers that
and creates foo/foo/foo, so on and so forth, until it eventually fails.
POSIX doesn't seem to disallow this behavior, but it isn't very useful.
GNU cp(1) will detect the recursion and squash it, but emit a message in
the process that it has done so.
This change seemingly follows the GNU behavior, but it currently doesn't
warn about the situation -- the author feels that the final product is
about what one might expect from doing this and thus, doesn't need a
warning. The author doesn't feel strongly about this.
Gleb Smirnoff [Thu, 27 Jan 2022 17:41:31 +0000 (09:41 -0800)]
rtsock: always set m_pkthdr.rcvif when queueing on netisr
netisr uses global workstreams and after dequeueing an mbuf it
uses rcvif to get the VNET of the mbuf. Of course, this is not
needed when kernel is compiled without VIMAGE. It came out that
routing socket does not set rcvif if compiled without VIMAGE.
Make this assignment not depending on VIMAGE option.
Tom Jones [Thu, 27 Jan 2022 17:24:45 +0000 (17:24 +0000)]
Remove SMALL conditionals from gzip
gzip has SMALL conditionals which enable building a reduced size version
of the binary. These exist as part of the introduction of BSD licensed
gzip in 2004 in NetBSD and appear to have been required to reach a size
for inclusion in their install media. For more information see commits
to gzip in the NetBSD tree on the 28th of March 2004.
SMALL doesn't appear to be hooked up to our build system and
complicates gzip quite a bit.
This test was written because execvp was found to improperly handle the
argc == 0 case when it falls back from an ENOEXEC. We could probably
mostly revert it now, but let's just fix the test for the time being and
circle back later to decide if we want to simplify execvp. The test
will likely remain either way just to make sure execvp isn't working
around the newly enforced restriction with the fallback.
Fixes: 301cb491ea41 ("execvp: fix up the ENOEXEC fallback")
Reported by: jenkins via lwhsu@
Tom Jones [Thu, 27 Jan 2022 17:10:06 +0000 (17:10 +0000)]
bsdinstall: Add quotes around error message argument
When error is called with a message with spaces (and probably multiple
lines) these are passed into dialog unquoted and an error message was
presented, wrap with quotes.
Andriy Gapon [Thu, 27 Jan 2022 16:49:27 +0000 (18:49 +0200)]
mmc_da: create disk(9) for pre-2.0 SD cards
It does not look like there is anything in mmc_da code that actually
requires protocol 2.0 or later. dev/mmc code also does not have such a
restriction.
Tested with a very old 2GB mini-SD card. Prior to this change mmc_da
would claim the card but would not expose it to GEOM.
Without MMCCAM:
mmc0: <MMC/SD bus> on sdhci_pci0
mmc0: Probing bus
mmc0: SD probe: OK (OCR: 0x00ff8000)
mmc0: Current OCR: 0x00ff8000
mmc0: CMD8 failed, RESULT: 1
mmc0: Probing cards
mmc0: New card detected (CID 1c53565344432020100002982e007600)
mmc0: New card detected (CSD 005e00325f5a83d02db7ffbf96800000)
mmc0: Card at relative address 0xb368 added:
mmc0: card: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV
mmc0: quirks: 0
mmc0: bus: 4bit, 50MHz (high speed timing)
mmc0: memory: 3998720 blocks, erase sector 256 blocks
mmc0: setting transfer rate to 50.000MHz (high speed timing)
GEOM: new disk mmcsd0
mmcsd0: 2GB <SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV> at mmc0 50.0MHz/4bit/65535-block
mmc0: setting bus width to 4 bits high speed timing
With MMCCAM and this change:
sdda0 at sdhci_slot0 bus 0 scbus2 target 0 lun 0
sdda0: Relative addr: 0000b368
Card features: <Memory>
sdda0: Serial Number 0002982E
sdda0: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV
GEOM: new disk sdda0
Mateusz Guzik [Thu, 27 Jan 2022 16:24:32 +0000 (16:24 +0000)]
Revert b58ca5df0bb7 ("vfs: remove the now unused insmntque1")
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.
Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.
Andrew Gallatin [Thu, 27 Jan 2022 15:35:03 +0000 (10:35 -0500)]
tcp: fix leaks in tcp_chg_pacing_rate error paths
tcp_chg_pacing_rate() is expected to release the hw rate limit table,
but failed to do so in several error cases, leading to ever
increasing counts of flows using the rate.
Andrew Gallatin [Thu, 27 Jan 2022 15:28:15 +0000 (10:28 -0500)]
Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues
When ip_output_send() returns EAGAIN due to issues with send tags (route
change, lagg failover, etc), it must free the mbuf. This is because
ip_output_send() was written as a wrapper/replacement for a direct
call to if_output(), and the contract with if_output() has
historically been that it owns the mbufs once called. When
ip_output_send() failed to free mbufs, it violated this assumption
and lead to leaked mbufs.
This was noticed when using NIC TLS in combination with hardware
rate-limited connections. When seeing lots of NIC output drops
triggered ratelimit send tag changes, we noticed we were leaking
ktls_sessions, send tags and mbufs. This was due ip_output_send()
leaking mbufs which held references to ktls_sessions, which in
turn held references to send tags.
Many thanks to jbh, rrs, hselasky and markj for their help in
debugging this.