There is probably a PR for this, but I can't find this, or remember who
submitted it. The patch got lost in the noise of another that wasn't
ready to commit.
Mark Johnston [Tue, 27 Apr 2021 00:04:25 +0000 (20:04 -0400)]
aesni: Avoid modifying session keys in hmac_update()
Otherwise aesni_process() is not thread-safe for AES+SHA-HMAC
transforms, since hmac_update() updates the caller-supplied key directly
to create the derived key. Use a buffer on the stack to store a copy of
the key used for computing inner and outer digests.
This is a direct commit to stable/12 as the bug is not present in later
branches.
iwnstats: fix build with clang and allow install under /usr/local/sbin
iwnstats was not compiling because of some issues raised by the clang
compiler due to -Werror. As a tool it is not connected to world build.
Add missing field "barker_mrc" initialization in struct
iwn_sensitivity_limits for -Wmissing-field-initializers, remove unused
pointer *is on iwn_stats_*_print functions and unused variables for
-Wunused-parameter and -Wunused-variable.
The value for field "barker_mrc" of struct iwn2030_sensitivity_limits
was obtained from linux 3.2 wireless/iwlwifi driver code (iwl-2000.c:115
.barker_corr_th_min_mrc = 390).
Also set BINDIR in Makefile to make it possible to install under
/usr/local/sbin/iwnstats as it require super user.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29800
Alexander Motin [Tue, 13 Apr 2021 15:19:10 +0000 (11:19 -0400)]
Fix race in case of device destruction.
During device destruction it is possible that open() succeed, but
fdevname() return NULL, that can't be assigned to string variable.
Fix that by adding explicit NULL check.
Also while there switch from fdevname() to fdevname_r().
"zgrep --version" is expected to print the version information in the
same way as "zgrep -V". However, the case handling the --version flag
is never reached, so "zgrep --version" prints:
zgrep: missing pattern
instead of:
grep (BSD grep, GNU compatible) 2.6.0-FreeBSD
Rick Macklem [Sat, 10 Apr 2021 22:50:25 +0000 (15:50 -0700)]
nfsd: fix replies from session cache for multiple retries
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.
Commit 05a39c2c1c18 fixed replying with the cached reply in
in the session slot if same session slot sequence#.
However, the code uses the reply and, as such,
will fail for a subsequent retry of the RPC.
A subsequent retry would be an extremely rare event,
but this patch fixes this, so long as m_copym(..M_NOWAIT)
does not fail, which should also be a rare event.
This fix affects the exceedingly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation, multiple times. Note that retries only occur
after the client has needed to create a new TCP connection,
with a new TCP connection for each retry.
Alexander Motin [Sat, 17 Apr 2021 14:41:35 +0000 (10:41 -0400)]
mpt(4): Remove incorrect S/G segments limits.
First, two of those four checks are unreachable.
Second, I don't believe there should be ">=" instead of ">".
Third, bus_dma(9) already returns the same EFBIG if ">".
This fixes false I/O errors in worst S/G cases with maxphys >= 2MB.
Alexander Motin [Fri, 16 Apr 2021 19:39:01 +0000 (15:39 -0400)]
pms(4): Limit maximum I/O size to 256KB instead of 1MB.
There is a weird limit of AGTIAPI_MAX_DMA_SEGS (128) S/G segments per
I/O since the initial driver import. I don't know why it was added,
can only guess some hardware limitation, but in worst case it means
maximum I/O size of 508KB. Respect it to be safe, rounding to 256KB.
Rick Macklem [Thu, 8 Apr 2021 21:04:22 +0000 (14:04 -0700)]
nfsd: fix replies from session cache for retried RPCs
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.
The FreeBSD server failec to reply using the cached
reply in the session slot when an RPC was retried on
the session slot, as indicated by same slot sequence#.
This patch fixes this. It should also fix a similar
failure for NFSv4.0 mounts, when the sequence# in
the open/lock_owner requires a reply be done from
an entry locked into the DRC.
This fix affects the fairly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation. Note that retries only occur after the
client has needed to create a new TCP connection.
Improve size readability.
Preserve more space for swap devise names.
Prevent line overflow with long devise name.
Don't draw a bar when swap is not used at all.
Simplify and optimize code.
Change the label to end at end of 100%.
PR: 251655
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27496
Mark Johnston [Wed, 21 Apr 2021 18:50:48 +0000 (14:50 -0400)]
safexcel: Fix the SHA-HMAC digest computation when AAD is present
The driver would fail to include the AAD in the input stream, resulting
in incorrect digests for requests combining SHA-HMAC with AES-CBC or
-CTR. Ensure that the AAD is included in the processor's input stream,
and fix the corresponding instruction sequence to include the AAD as
input to the digest computation.
This is a direct commit to stable/12 since the bug was introduced while
merging there and is not present in later branches.
When we find a state for packets that was created by a reply-to rule we
still need to process the packet. The state may require us to modify the
packet (e.g. in rdr or nat cases), which we won't do with the shortcut.
pf: change pf_route so pf only runs when packets enter and leave the stack.
before this change pf_route operated on the semantic that pf runs
when packets go over an interface, so when pf_route changed which
interface the packet was on it would run pf_test again. this change
changes (restores) the semantic that pf is only supposed to run
when packets go in or out of the network stack, even if route-to
is responsibly for short circuiting past the network stack.
just to be clear, for normal packets (ie, those not touched by
route-to/reply-to/dup-to), there isn't a difference between running
pf when packets enter or leave the stack, or having pf run when a
packet goes over an interface.
the main reason for this change is that running the same packet
through pf multiple times creates confusion for the state table.
by default, pf states are floating, meaning that packets are matched
to states regardless of which interface they're going over. if a
packet leaving on em0 is rerouted out em1, both traversals will end
up using the same state, which at best will make the accounting
look weird, or at worst fail some checks in the state and get
dropped.
another reason for this commit is is to make handling of the changes
that route-to makes consistent with other changes that are made to
packet. eg, when nat is applied to a packet, we don't run pf_test
again with the new addresses.
the main caveat with this diff is you can't have one rule that
pushes a packet out a different interface, and then have a rule on
that second interface that NATs the packet. i'm not convinced this
ever worked reliably or was used much anyway, so we don't think
it's a big concern.
discussed with many, with special thanks to bluhm@, sashan@ and
sthen@ for weathering most of that pain.
ok claudio@ sashan@ jmatthew@
Rick Macklem [Mon, 5 Apr 2021 01:15:54 +0000 (18:15 -0700)]
nfsd: make the server repeat CB_RECALL every couple of seconds
Commit 01ae8969a9ee stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. This will fix the back channel, but the
first attempt at a callback like CB_RECALL will already have
failed. Without this patch, a CB_RECALL will not be retried
and that can result in a 5 minute delay until the delegation
times out.
This patch modifies the code so that it will retry the
CB_RECALL every couple of seconds, often avoiding the
5 minute delay.
This is not critical for correct behaviour, but avoids
the 5 minute delay for the case where the Linux client
re-binds the back channel via BindConnectionToSession.
Rick Macklem [Sun, 4 Apr 2021 22:05:39 +0000 (15:05 -0700)]
nfsd: fix BindConnectionToSession so that it clears "cb path down"
Commit 01ae8969a9ee stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. It will do this for every RPC
reply until it no longer sees the flag.
Without that patch, this will happen until the client does
an Open, which will clear LCL_CBDOWN.
This patch clears LCL_CBDOWN right away, so that
NFSV4SEQ_CBPATHDOWN will no longer be sent to the client
in Sequence replies and the Linux client will not repeat
the BindConnectionToSession RPCs.
This is not critical for correct behaviour, but reduces
RPC overheads for cases where the Open will not be done
for a while.
tcp: For hostcache performance, use atomics instead of counters
As accessing the tcp hostcache happens frequently on some
classes of servers, it was recommended to use atomic_add/subtract
rather than (per-CPU distributed) counters, which have to be
summed up at high cost to cache efficiency.
tcp: Make hostcache.cache_count MPSAFE by using a counter_u64_t
Addressing the underlying root cause for cache_count to
show unexpectedly high values, by protecting all arithmetic on
that global variable by using counter(9).
tcp: reduce memory footprint when listing tcp hostcache
In tcp_hostcache_list, the sbuf used would need a large (~2MB)
blocking allocation of memory (M_WAITOK), when listing a
full hostcache. This may stall the requestor for an indeterminate
time.
A further optimization is to return the expected userspace
buffersize right away, rather than preparing the output of
each current entry of the hostcase, provided by: @tuexen.
This makes use of the ready-made functions of sbuf to work
with sysctl, and repeatedly drain the much smaller buffer.
While sbuf_drain was an internal function, two
KASSERTS checked the sanity of it being called.
However, an external caller may be ignorant if
there is any data to drain, or if an error has
already accumulated. Be nice and return immediately
with the accumulated error.
Rick Macklem [Thu, 1 Apr 2021 22:09:03 +0000 (15:09 -0700)]
nfsd: silence rpcb_unset noise for NFSv4 only servers
An NFSv4 only configuration does not register with
rpcbind(). Without this patch a failure to rpcb_unset()
is reported when the daemon is terminated for this case.
This is harmless noise, but this patch avoids calling
rpcb_unset() for the NFSv4 only case, avoiding the noise.
When called with "-d", it still does the rpcb_unset(),
assuming that the configuration might have been
changed to NFSv4 only and unregistering with
rpcbind() might still be needed.
Avoid raising unexpected floating point exceptions in libm
When using clang with x86_64 CPUs that support AVX, some floating point
transformations may raise exceptions that would not have been raised by
the original code. To avoid this, use the -fp-exception-behavior=maytrap
flag, introduced in clang 10.0.0.
In particular, this fixes a number of test failures with ctanhf(3) and
ctanf(3), when libm is compiled with -mavx. An unexpected FE_INVALID
exception is then raised, because clang emits vdivps instructions to
perform certain divides. (The vdivps instruction operates on multiple
single-precision float operands simultaneously, but the exceptions may
be influenced by unused parts of the XMM registers. In this particular
case, it was calculating 0 / 0, which results in FE_INVALID.)
If -fp-exception-behavior=maytrap is specified however, clang uses
vdivss instructions instead, which work on one operand, and should not
raise unexpected exceptions.
Only use -fp-exception-behavior=maytrap on x86, for now
After 3b00222f156d, it turns out that clang only supports strict
floating point semantics for SystemZ and x86 at the moment, while for
other architectures it is still experimental.
Therefore, only use -fp-exception-behavior=maytrap on x86 for now,
otherwise this option results in "error: overriding currently
unsupported use of floating point exceptions on this target
[-Werror,-Wunsupported-floating-point-opt]" on other architectures.
Avoid -pedantic warnings about using _Generic in __fp_type_select
When compiling parts of math.h with clang using a C standard before C11,
and using -pedantic, it will result in warnings similar to:
bug254714.c:5:11: warning: '_Generic' is a C11 extension [-Wc11-extensions]
return !isfinite(1.0);
^
/usr/include/math.h:111:21: note: expanded from macro 'isfinite'
^
/usr/include/math.h:82:39: note: expanded from macro '__fp_type_select'
^
This is because the block that enables use of _Generic is conditional
not only on C11, but also on whether the compiler advertises support for
C generic selections via __has_extension(c_generic_selections).
To work around the warning without having to pessimize the code, use the
__extension__ keyword, which is supported by both clang and gcc. While
here, remove the check for __clang__, as _Generic has been supported for
a long time by gcc too now.
Kristof Provost [Thu, 25 Mar 2021 12:59:14 +0000 (13:59 +0100)]
libnv: Allow use in non-sleepable contexts
44c125c4cebc2fd87c6260b90eddae11201f5232 switched the nvlist allocations
to be M_WAITOK, but this precludes the use in non-sleepable contexts.
(E.g. with a nonsleepable lock held).
All callers for these allocation functions already cope with memory
alloation failures, so there's no reason to allow sleeping during
allocations.
pf tests: make synproxy and nat work correctly even if inetd is running
tests/sys/netfil/pf/synproxy fails if inetd has been running
outside of the jail because pidfile_open() fails with EEXIST.
tests/sys/netfil/pf/nat has the same problem but the test succeeds
because whether inetd is running is not so important.
Fix the problem by changing the pidfile path from the default
location.
Ryan Libby [Wed, 20 Jan 2021 21:59:49 +0000 (13:59 -0800)]
pms: handle maximum size IO with any alignment
Define the maximum numbers of segments to allow for non-page alignment
at the beginning and end of a maxphys size transfer. Also set
ccb_pathinq.maxio consistent with maxphys.
Ed Maste [Mon, 5 Apr 2021 01:01:28 +0000 (21:01 -0400)]
readelf: return error in case of invalid file
GNU readelf exits with an error for a number of invalid file cases.
Previously ELF Tool Chain readelf always exited with 0. Now we exit 1
upon detecting an error with one or more input files, but in any case
all of them are processed.
This should catch common failure cases. We still do not report an error
for some types of malformed ELF files, but this is consistent with GNU
readelf.
PR: 252727
Reviewed by: jkoshy, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29377
Ed Maste [Sun, 4 Apr 2021 00:57:26 +0000 (20:57 -0400)]
freebsd-update: improve mandoc db generation
freebsd-update compares the dates on man pages with mandoc.db, and if
any newer pages are found it regenerates mandoc.db.
Previously, if mandoc.db did not already exist the check failed and
freebsd-update then failed to create one. Now, check that mandoc.db
exists before performing the check for newer pages.
Reported by: bdrewery (in D10482)
Reviewed by: gordon
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29575
Gordon Bergling [Fri, 9 Apr 2021 15:28:18 +0000 (17:28 +0200)]
sysctl.conf(5): Mention sysctl.conf.local in the sysctl.conf(5) manual page
The possibility of using a sysctl.conf.local on a machine that has a shared
sysctl.conf(5) isn't documented. So mention the sysctl.conf.local in the
manual page.
PR: 254901
Submitted by: Jose Luis Duran <jlduran at gmail dot com>
Reported by: Jose Luis Duran <jlduran at gmail dot com>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D29673
Rick Macklem [Tue, 30 Mar 2021 21:31:05 +0000 (14:31 -0700)]
nfsd: do not implicitly bind the back channel for NFSv4.1/4.2 mounts
The NFSv4.1 (and 4.2 on 13) server incorrectly binds
a new TCP connection to the back channel when first
used by an RPC with a Sequence op in it (almost all of them).
RFC5661 specifies that only the fore channel should be bound.
This was done because early clients (including FreeBSD)
did not do the required BindConnectionToSession RPC.
Unfortunately, this breaks the Linux client when the
"nconnects" mount option is used, since the server
may do a callback on the incorrect TCP connection.
This patch converts the server behaviour to that
required by the RFC. It also makes the server test/indicate
failure of the back channel more aggressively.
Until this patch is applied to the server, the
"nconnects" mount option is not recommended for a Linux
NFSv4.1/4.2 client mount to the FreeBSD server.
Rick Macklem [Tue, 23 Mar 2021 20:04:37 +0000 (13:04 -0700)]
nfsv4 client: fix forced dismount when sleeping in the renew thread
During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed. The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.
This is the second of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.
This patch adds a 5second timeout on the msleep()s and checks for
the forced dismount flag so that the renew thread will
wake up and see the forced dismount flag. Normally a wakeup()
will occur in less than 5seconds, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.
The patch also adds the "mp" argument to nfsv4_lock() so that it
will return when the forced dismount flag is set.
While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call.
Stefan Eßer [Tue, 6 Apr 2021 08:44:52 +0000 (10:44 +0200)]
[bc] Update to version 4.0.0
This version fixes an issue (missing pop of top-of-stack value in the
"P" command of the dc program).
This issue did not affect the bc program, since it does not use dc as
an back-end to actually perform the calculations as was the case with
the traditional bc and dc programs.
The major number has been bumped due to Windows support that has been
added to this version. It does not correspond to a major change that
might affect FreeBSD.
By this time, if_com_free[if_alloctype] is NULL since it's already
nullified by if_deregister_com_alloc(); hence, firewire_free() won't
have a chance to release the allocated fw_com.
Alan Somers [Wed, 3 Mar 2021 20:06:38 +0000 (13:06 -0700)]
Modernize geom_stats_snapshot_get
* A logically useless memset() is used to fault in some memory pages.
Change it to explicit_bzero so the compiler won't eliminate it.
* Eliminate the second memset. It made sense in the days of the Big
Kernel Lock, but not in the days of fine-grained SMP and especially
not in the days of VDSO.
Alan Somers [Sat, 27 Feb 2021 15:59:40 +0000 (08:59 -0700)]
Speed up geom_stats_resync in the presence of many devices
The old code had a O(n) loop, where n is the size of /dev/devstat.
Multiply that by another O(n) loop in devstat_mmap for a total of
O(n^2).
This change adds DIOCGMEDIASIZE support to /dev/devstat so userland can
quickly determine the right amount of memory to map, eliminating the
O(n) loop in userland.
This change decreases the time to run "gstat -bI0.001" with 16,384 md
devices from 29.7s to 4.2s.
Also, fix a memory leak first reported as PR 203097.
Alan Somers [Fri, 12 Feb 2021 18:30:52 +0000 (11:30 -0700)]
mount_nullfs: rename a local variable
The "source" variable was introduced in r26072, probably as the
traditional counterpart to "target". But the "source"/"target" names
suggest the opposite of their actual meaning. With ln, for example, the
source is the real file and the target is the newly created link. In
mount_nullfs the meaning is the opposite: the target is the existing
file system and the source is the newly created mountpoint. Better to
use "target"/"mountpoint" terminology, which matches the man page.
netmap: bridge: fix transmission in busy-wait mode
In busy-wait mode (BUSYWAIT defined), NIOCTXSYNC should be
performed after packets have been moved to the TX ring
(rather than before).
Before the change, moved packets may stall for an indefinite
time in the TX ring.
Kristof Provost [Tue, 9 Mar 2021 15:44:26 +0000 (16:44 +0100)]
dummynet: Move timekeeping information into dn_cfg
Just like with the packet counters move the timekeeping information into
dn_cfg. This reduces the global name space use for dummynet and will
make subsequent work to add vnet support and re-use in pf easier.
Kristof Provost [Tue, 9 Mar 2021 15:27:31 +0000 (16:27 +0100)]
dummynet: Move packet counters into dn_cfg
Move the packets counters into the dn_cfg struct. This reduces the
global name space use for dummynet and will make future work for things
like vnet support and re-use in pf easier.
Alexander Motin [Wed, 10 Mar 2021 18:39:15 +0000 (13:39 -0500)]
Move time math out of disabled interrupts sections.
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.
While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really. The conversion
is already done by AcpiGetTimerDuration(). Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.
Alexander Motin [Mon, 8 Mar 2021 23:43:47 +0000 (18:43 -0500)]
Do not read timer extra time when MWAIT is used.
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU. Extra register read covers that time. But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.
On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.
Alexander Motin [Mon, 8 Mar 2021 22:57:46 +0000 (17:57 -0500)]
Change mwait_bm_avoidance use to match Linux.
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.
On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.
ipdivert: check that PCB is still valid after taking INPCB_RLOCK.
We are inspecting PCBs of divert sockets under NET_EPOCH section,
but PCB could be already detached and we should check INP_FREED flag
when we took INP_RLOCK.
This per-driver callback is invoked by netmap when it wants
to align the number of TX/RX netmap rings and/or the number of
TX/RX netmap slots to the actual state configured in the hardware.
The alignment happens when netmap mode is switched on (with no
active netmap file descriptors for that netmap port), or when
collecting netmap port information.
Rick Macklem [Fri, 19 Mar 2021 21:09:33 +0000 (14:09 -0700)]
nfsv4 client: fix forced dismount when sleeping on nfsv4lck
During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed. The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.
This is the first of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.
nfsv4_lock() checks for forced dismount, but only after it wakes up
from msleep(). Without this patch, a wakeup() call was required.
This patch adds a 1second timeout on the msleep(), so that it will
wake up and see the forced dismount flag. Normally a wakeup()
will occur in less than 1second, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.
While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call and make the same change for nfsv4_getref().
Rick Macklem [Thu, 18 Mar 2021 19:20:25 +0000 (12:20 -0700)]
nfsv4 pnfs client: fix updating of the layout stateid.seqid
During a recent NFSv4 testing event a test server was replying
NFSERR_OLDSTATEID for layout stateids presented to the server
for LayoutReturn operations. Upon rereading RFC5661, it was
apparent that the FreeBSD NFSv4.1/4.2 pNFS client did not
maintain the seqid field of the layout stateid correctly.
This patch is believed to correct the problem. Tested against
a FreeBSD pNFS server with diagnostics added to check the stateid's
seqid did not indicate problems. Unfortunately, testing aginst
this server will not happen in the near future, so the fix may
not be correct yet.
Kyle Evans [Wed, 3 Mar 2021 03:38:37 +0000 (21:38 -0600)]
init: use explicit_bzero() for clearing passwords
This is a nop in practice, because it cannot be proven that this
particular bzero() is not significant. Make it explicit anyways, rather
than relying on an implementation detail of how the password is
collected.
Discussed with: Andrew Gierth <andrew tao146 riddles org uk>
ipf_proxy_check() returns -1 for an error and 0 or 1 for success.
ipf_proxy_check()'s callers check for error and if the return code
is 0, they change it to 1 prior to returning to their callers. Simply
by returning -1 or 1 we reduce complexity and cycles burned changing
0 to 1.
The computed IPOIB_CM_RX_SG is too small. It doesn't account for fallback
to mbuf clusters when jumbo frames are not available and it also doesn't
account for the packet header and trailer mbuf.
This causes a memory overwrite situation when IPOIB_CM is configured.
While at it add a kernel assert to ensure the mapping array is not overwritten.