Contrary to what the message says, this is not only executed in an EFI
context- it provides functions for use in an EFI environment. I don't think
there's much reason to broadcast this fact when we haven't in the past, so
just remove it.
Increase the fdtmemreserv array limit to boot on POWER9
Discussing with others, this needs to be at least 20 to boot on some POWER9
nodes. Linux made a similar change for the same reason, so increase to 32
to give us some extra breathing room as well. The input and output arrays
are sized at 256, so much greater than the increase in the property array
size.
Mark Johnston [Tue, 24 Apr 2018 21:15:54 +0000 (21:15 +0000)]
Improve VM page queue scalability.
Currently both the page lock and a page queue lock must be held in
order to enqueue, dequeue or requeue a page in a given page queue.
The queue locks are a scalability bottleneck in many workloads. This
change reduces page queue lock contention by batching queue operations.
To detangle the page and page queue locks, per-CPU batch queues are
used to reference pages with pending queue operations. The requested
operation is encoded in the page's aflags field with the page lock
held, after which the page is enqueued for a deferred batch operation.
Page queue scans are similarly optimized to minimize the amount of
work performed with a page queue lock held.
Extend ap_boot_mtx scope to also cover mca_init().
Otherwise, under bootverbose, the lapic_enable_cmc() banner 'lapicX:
CMCI unmasked' is printed by several CPUs in parallel, causing garbled
output for the LAPIC dumps.
Reported by: royger
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15157
Mark Johnston [Tue, 24 Apr 2018 20:05:45 +0000 (20:05 +0000)]
Add a UMA zone flag to disable the use of buckets.
This allows the creation of zones which don't do any caching in front of
the keg. If the zone is a cache zone, this means that UMA will not
attempt any memory allocations when allocating an item from the backend.
This is intended for use after a panic by netdump, but likely has other
applications.
Ed Maste [Tue, 24 Apr 2018 19:51:05 +0000 (19:51 +0000)]
Add deprecation notice for lmc(4)
We intend to remove support before FreeBSD 12 is branched. These are
available only as 32-bit PCI devices. The driver has an ambiguous
license and I have not been successful in contacting the driver's author
in order to address this.
The planned deprecation has been announced on -current and -stable; if
we receive feedback that the driver is still useful and we are able to
resolve the license issue this deprecation notice can be reverted.
Reviewed by: bapt, brooks, imp, rgrimes
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15182
Ed Maste [Tue, 24 Apr 2018 19:26:58 +0000 (19:26 +0000)]
lldb: remove assertion that target_arch is FreeBSD
The target is not necessarily a FreeBSD binary - for example, it may be
a Linux binary running under the linuxulator. Basic ptrace (live)
debugging already worked in this case, except for the assertion.
Ed Maste [Tue, 24 Apr 2018 19:23:26 +0000 (19:23 +0000)]
kdump: simplify/remove per-arch #ifdefs
It is acceptable for syscallabi to map SV_ABI to SYSDECODE_ABI on all
architectures; libsysdecode will return not-found sentinel values if
it does not have a syscall name or errno mapping for a given
architecture.
Also, use __LP64__ for the SV_ILP32 -> SYSDECODE_ABI_LINUX32 mapping,
for any future 32- on 64-bit linuxulator implementation.
Reviewed by: jhb
Sponsored by: Turing Robotic Industries Inc.
Do not totally silence suppressed secondary kasserts unless debug.kassert.do_log is disabled
To totally silence and ignore secondary kassert violations after a primary
panic, set debug.kassert.do_log=0 and debug.kassert.suppress_in_panic=1.
Additional assertion warnings shouldn't block core dump and may alert the
developer to another erroneous condition. Secondary stack traces may be
printed, identically to the unsuppressed case where panic() is reentered --
controlled via debug.trace_all_panics.
To diagnose and fix secondary panics, it is useful to have a stack trace.
When panic tracing is enabled, optionally trace secondary panics as well.
The option is configured with the tunable/sysctl debug.trace_all_panics.
(The original concern that inspired only tracing the primary panic was
likely that the secondary trace may scroll the original panic message or trace
off the screen. This is less of a concern for serial consoles with logging.
Not everything has a serial console, though, so the behavior is optional.)
Update r332860 by changing the default from suppressing post-panic
assertions to not suppressing post-panic assertions.
There are some post-panic assertions that are valuable and we shouldn't
default to disabling them. However, when a user trips over them, the
user can still adjust the tunable/sysctl to suppress them temporarily to
get conduct troubleshooting (e.g. get a core dump).
r313683 introduced new lockmgr APIs that missed the panic-time neutering
present in the rest of our locks. Correct that by adding the usual check.
Additionally, move the __lockmgr_args neutering above the assertions at the
top of the function. Drop the interlock unlock because we shouldn't have
an unneutered interlock either. No point trying to unlock it.
John Baldwin [Tue, 24 Apr 2018 17:53:16 +0000 (17:53 +0000)]
Fix PT_STEP single-stepping for mips.
Note that GDB at least implements single stepping for MIPS using software
breakpoints explicitly rather than using PT_STEP, so this has only been
tested via tests in ptrace_test which now pass rather than fail.
- Fix several places to use uintptr_t instead of int for virtual addresses.
- Check for errors from ptrace_read_int() when setting a breakpoint for a
step.
- Properly check for errors from ptrace_write_int() as it returns non-zero,
not negative values on failure.
- Change the error returns for ptrace_read_int() and ptrace_write_int() from
ENOMEM to EFAULT.
- Clear a single step breakpoint when it traps rather than waiting for it
to be cleared from ptrace(). This matches the behavior of the arm port
and in general seems a bit more reliable than waiting for ptrace() to
clear it via FIX_SSTEP.
- Drop the PROC_LOCK around ptrace_write_int() in ptrace_clear_single_step()
since it can sleep.
- Reorder the breakpoint handler in trap() to only read the instruction if
the address matches the current thread's breakpoint address.
- Replace various #if 0'd debugging printfs with KTR_PTRACE traces.
This is necessary to make sure that functions that can have stack
protection are not used to update the stack guard. If not, the stack
guard check would fail when it shouldn't.
guard_setup() calls elf_aux_info(), which, in turn, calls memcpy() to
update stack_chk_guard. If either elf_aux_info() or memcpy() have
stack protection enabled, __stack_chk_guard will be modified before
returning from them, causing the stack protection check to fail.
This change uses a temporary buffer to delay changing
__stack_chk_guard until elf_aux_info() returns.
We must ensure that accesses occur, they do not have any other
compiler-visible effects. Bruce found some situations where
optimization could remove an access, and provided a patch to use
volatile qualifier for the state variables. Since volatile behaviour
there is the compiler-specific interpretation of the keyword, use
relaxed atomics instead, which gives exactly the desired semantic.
Noted by and discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
add a new ACPI suspend debugging knob, debug.acpi.suspend_deep_bounce
This sysctl allows a deeper dive into the sleep abyss comparing to
debug.acpi.suspend_bounce. When the new sysctl is set the system will
execute the suspend sequence up to the call to AcpiEnterSleepState().
That includes saving processor contexts and parking APs. Then, instead
of actually entering the sleep state, the BSP will call resumectx() to
emulate the wakeup. The APs should get restarted by the sequence of
Init and Startup IPIs that BSP sends to them.
John Baldwin [Tue, 24 Apr 2018 05:42:10 +0000 (05:42 +0000)]
Relock PROC_LOCK before one failure case in ptrace_single_step().
The MIPS ptrace_single_step() unlocks the PROC_LOCK while reading and
writing instructions from userland. One failure case was not reacquiring
the lock before returning.
John Baldwin [Tue, 24 Apr 2018 05:33:17 +0000 (05:33 +0000)]
Report proper signal codes for SIGTRAP traps on MIPS.
- Use TRAP_TRACE for traps after stepping via PT_STEP.
- Use TRAP_BRKPT for software breakpoint traps and watchpoint traps.
This was tested via the recently added siginfo ptrace() tests. PT_STEP on
MIPS has several bugs that prevent it from working yet, but this does fix
the ptrace__breakpoint_siginfo test on MIPS.
John Baldwin [Tue, 24 Apr 2018 05:30:05 +0000 (05:30 +0000)]
Add two tests for TRAP_* signal codes for SIGTRAP.
- ptrace__breakpoint_siginfo tests that a SIGTRAP for a software breakpoint
in userland triggers a SIGTRAP with a signal code of TRAP_BRKPT.
- ptrace__step_siginfo tests that a SIGTRAP reported for a step after
stepping via PT_STEP or PT_SETSTEP has a signal code of TRAP_TRACE.
John Baldwin [Tue, 24 Apr 2018 05:20:16 +0000 (05:20 +0000)]
Extend support for ptrace() tests using breakpoints.
- Use a single list of platforms to define HAVE_BREAKPOINT for platforms
that expose a functional breakpoint() inline to userland. Replace
existing lists of platform tests with HAVE_BREAKPOINT instead.
- Add support for advancing PC past a breakpoint inserted via breakpoint()
to support the existing ptrace__PT_CONTINUE_different_thread test on
non-x86 platforms (x86 advances the PC past the breakpoint instruction,
but other platforms do not). This is implemented by defining a new
SKIP_BREAK macro which accepts a pointer to a 'struct reg' as its sole
argument and modifies the contents to advance the PC. The intention is
to use it in between PT_GETREGS and PT_SETREGS.
Tested on: amd64, i386, mips (after adding a breakpoint() to mips)
MFC after: 1 month
Ed Maste [Tue, 24 Apr 2018 01:22:57 +0000 (01:22 +0000)]
pwd_mkdb: default to network (big) endian hash order
For cross-architecture reproducibility. The db(3) functions work with
hashes of either endianness, and the current (v4) version password db
entries already store integers in network order. Do so with the hash as
well so that identical password databases can be created on big- and
little-endian hosts.
The -B and -L flags exist to set the endianness for legacy (v3) entries
when the -l flag is used, and they will still control hash endianness
(at least until the backwards compatibility infrastructure is removed).
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Pull in r329771 from upstream llvm trunk (by Craig Topper):
[X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need
to emit an explicit MOV8mr instruction.
Previously the code only knew how to handle setcc to a register.
This should fix a crash in the chromium build.
This fixes various assertion failures while building ports targeting
i386:
* www/firefox: isReg() && "This is not a register operand!"
* www/iridium, www/qt5-webengine: (I.atEnd() || std::next(I) ==
def_instr_end()) && "getVRegDef assumes a single definition or no
definition"
* devel/powerpc64-gcc: FromReg != ToReg && "Cannot replace a reg with
itself"
Reading mti_zone it wastes a cacheline to hold mti_probes + mti_zone
(which we know is 0) + part of malloc stats of the first cpu which on top
induces false-sharing.
Sean Bruno [Mon, 23 Apr 2018 19:51:00 +0000 (19:51 +0000)]
Load balance sockets with new SO_REUSEPORT_LB option
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).
John Baldwin [Mon, 23 Apr 2018 17:00:15 +0000 (17:00 +0000)]
Implement 32-bit atomic_fcmpset() in userland for armv4/v5.
- Add an implementation of atomic_fcmpset_32() using RAS for armv4/v5.
This fixes recent world breakage due to use of atomic_fcmpset() in
userland.
- While here, be more careful to not expose wrapper macros for 64-bit
atomic_*cmpset to userland for armv4/v5 as only 32-bit cmpset is
implemented.
This has been reviewed, but not runtime-tested, but should fix the arm.arm
and arm.armeb worlds that have been broken for a while.
John Baldwin [Mon, 23 Apr 2018 16:50:37 +0000 (16:50 +0000)]
Fix some harmless type mismatches in the ARM atomic_cmpset implementations.
The return value of atomic_cmpset() and atomic_fcmpset() is an int (which
is really a bool) that has the values 0 or 1. Some of the inlines were
using the type being operated on (e.g. uint32_t) as either the return type
of the function, or the type of a local 'ret' variable used to hold the
return value. Fix all of these to just use plain 'int'. Due to C promotion
rules and the fact that the value can only be 0 or 1, these should all be
harmless.
icmp6_reflect() sends ICMPv6 message with new IPv6 header. So, it is
considered as originated by our host packet. And thus rcvif should be
NULL, since it is used by ipfw(4) to determine that packet was originated
from this host. Some of icmp6_reflect() consumers reuse mbuf and m_pkthdr
without resetting rcvif pointer. To avoid this always reset m_pkthdr.rcvif
pointer to NULL in icmp6_reflect(). Also remove such line and comment
describing this from icmp6_error(), since it does not longer matters.
This combined with previous changes significantly depessimizes the behaviour
under contentnion.
In particular the lock1_processes test (locking/unlocking separate files)
from the will-it-scale suite was executed with 128 concurrency on a
4-socket Broadwell with 128 hardware threads.
Operations/second (lock+unlock) go from ~750000 to ~45000000 (6000%)
For reference single-process is ~1680000 (i.e. on stock kernel the resulting
perf is less than *half* of the single-threaded run),
Note this still does not really scale all that well as the locks were just
bolted on top of the current implementation. Significant room for improvement
is still here. In particular the top performance fluctuates depending on the
extent of false sharing in given run (which extends beyond the file).
Added chain+lock pairs were not padded w.r.t. cacheline size.
One big ticket item is the hash used for spreading threads: it used to be the
process pid (which basically serialized all threaded ops). Temporarily the
vnode addr was slapped in instead.
Update account and given names in committers-src.dot and calendar.freebsd
I have changed my given name from Bruce to Rebecca, and my FreeBSD account
from brucec to bcran.
Update committers-src.dot and calendar.freebsd to show these changes.
Make bufdaemon and bufspacedaemon use kthread_suspend_check instead of
kproc_suspend_check. In r329612 bufspacedaemon was turned into a thread
of the bufdaemon process causing both to call kproc_suspend_check with the
same proc argument and that function contains the following while loop:
So one thread wakes up the other and the other wakes up the first again,
locking up UP machines on shutdown.
Also register the shutdown handlers with SHUTDOWN_PRI_LAST + 100 so they
run after the syncer has shutdown, because the syncer can cause a
situation where bufdaemon help is needed to proceed.
1. check if P_ADVLOCK is already set and if so, don't lock to set it
(stolen from DragonFly)
2. when trying for fast path unlock, check that we are doing unlock
first instead of taking the interlock for no reason (e.g. if we want
to *lock*). whilere make it more likely that falling fast path will
not take the interlock either by checking for state
Note the code is severely pessimized both single- and multithreaded.
sysentvec::sv_hwcap/sv_hwcap2 are pointers to u_long, so cpu_features* need
to be u_long to use the pointers. This also requires a temporary cast in
printing the bitfields, which is fine because the feature flag fields are
only 32 bits anyway.
dwatch(1): Remove the line used to demonstrate `-dev' option
In recently added sendrecv profile, there was a line purposefully
added to introduce a compilation error in which `-dev' is used to
debug the entry. Removing the entry.
dwatch(1): Add `-dev' option to aid debugging of profiles
The options `-d' (debug), `-e' (exit after compile), and `-v' (verbose)
when combined in any order (though best remembered as `-dev') will run
the conflated script through dtrace(1), test for error conditions, and
show the line that dtrace(1) failed at (with context).
If no errors are found, the output is the same as `-e[v]'.
When writing a new profile for dwatch(1), you can quickly test to
make sure it compiles by running `dwatch -devX profile_name' where
profiles live in /usr/libexec/dwatch or /usr/local/libexec/dwatch
(the latter being where profiles installed via ports should go).
When running with INVARIANTS, the kernel contains extra checks. However,
these assumptions may not hold true once we've panic'd. Therefore, the
checks hold less value after a panic. Additionally, if one of the checks
fails while we are already panic'd, this creates a double-panic which can
interfere with debugging the original panic.
Therefore, this commit allows an administrator to suppress a response to
KASSERT checks after a panic by setting a tunable/sysctl. The
tunable/sysctl (debug.kassert.suppress_in_panic) defaults to being
enabled.
Reviewed by: kib
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D12920
FreeBSD exports the AT_HWCAP* auxvec items if provided by the ELF sysentvec
structure. Add the CPU features to be exported, so user space can more
easily check for them without using the hw.cpu_features and hw.cpu_features2
sysctls.
Not all feature flags are synced. Those for processors we don't currently
support are ignored currently. Those that are supported are synced best I
can tell. One flag was renamed to match the Linux flag name
(PPC_FEATURE2_VCRYPTO -> PPC_FEATURE2_VEC_CRYPTO).
Ed Maste [Fri, 20 Apr 2018 22:23:38 +0000 (22:23 +0000)]
makefs: tidy up reach-over source
- cd9660 relies on an #include "iso.h" but does not build any .c files
out of source, so remove reach-over .PATH
- ffs does not rely on any sys/ headers, so remove -I from CFLAGS.
- ffs_tables from sys/ is used by ffs; move the SRCS entry from the top-
level Makefile to ffs' Makefile.inc.
When disabling regulator when they are unused, check before is they are
enabled.
While here don't check the enable_cnt on the regulator entry as it is
checked by regnode_stop.
This solve the panic on any board using a fixed regulator that is driven
by a gpio when the regulator is unused.
Tested On: OrangePi One
Pointy Hat to: myself
Reported by: kevans, Milan Obuch (freebsd-arm@dino.sk)
Recommit r332501, with an additional upstream fix for "Cannot lower
EFLAGS copy that lives out of a basic block!" errors on i386.
Pull in r325446 from upstream clang trunk (by me):
[X86] Add 'sahf' CPU feature to frontend
Summary:
Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the
`+sahf` feature for the backend, for bug 36028 (Incorrect use of
pushf/popf enables/disables interrupts on amd64 kernels). This was
originally submitted in bug 36037 by Jonathan Looney
<jonlooney@gmail.com>.
As described there, GCC also uses `-msahf` for this feature, and the
backend already recognizes the `+sahf` feature. All that is needed is
to teach clang to pass this on to the backend.
The mapping of feature support onto CPUs may not be complete; rather,
it was chosen to match LLVM's idea of which CPUs support this feature
(see lib/Target/X86/X86.td).
I also updated the affected test case (CodeGen/attr-target-x86.c) to
match the emitted output.
Pull in r328944 from upstream llvm trunk (by Chandler Carruth):
[x86] Expose more of the condition conversion routines in the public
API for X86's instruction information. I've now got a second patch
under review that needs these same APIs. This bit is nicely
orthogonal and obvious, so landing it. NFC.
Pull in r329414 from upstream llvm trunk (by Craig Topper):
[X86] Merge itineraries for CLC, CMC, and STC.
These are very simple flag setting instructions that appear to only
be a single uop. They're unlikely to need this separation.
Pull in r329657 from upstream llvm trunk (by Chandler Carruth):
[x86] Introduce a pass to begin more systematically fixing PR36028
and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the
necessary state in a GPR. In the vast majority of cases, these uses
are cmovCC and jCC instructions. For such cases, we can very easily
save and restore the necessary information by simply inserting a
setCC into a GPR where the original flags are live, and then testing
that GPR directly to feed the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the
flags. This patch handles the vast majority of cases that seem to
come up in practice: adc, adcx, adox, rcl, and rcr; all without
taking advantage of partially preserved EFLAGS as LLVM doesn't
currently model that at all.
There are a large number of operations that techinaclly observe
EFLAGS currently but shouldn't in this case -- they typically are
using DF. Currently, they will not be handled by this approach.
However, I have never seen this issue come up in practice. It is
already pretty rare to have these patterns come up in practical code
with LLVM. I had to resort to writing MIR tests to cover most of the
logic in this pass already. I suspect even with its current amount
of coverage of arithmetic users of EFLAGS it will be a significant
improvement over the current use of pushf/popf. It will also produce
substantially faster code in most of the common patterns.
This patch also removes all of the old lowering for EFLAGS copies,
and the hack that forced us to use a frame pointer when EFLAGS copies
were found anywhere in a function so that the dynamic stack
adjustment wasn't a problem. None of this is needed as we now lower
all of these copies directly in MI and without require stack
adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things
tripping me up while working on this.
Pull in r329673 from upstream llvm trunk (by Chandler Carruth):
[x86] Model the direction flag (DF) separately from the rest of
EFLAGS.
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting
EFLAGS actually modify DF (other than things like popf) and so this
needlessly creates uses of EFLAGS that aren't really there.
In fact, DF is so restrictive it is pretty easy to model. Only STD,
CLD, and the whole-flags writes (WRFLAGS and POPF) need to model
this.
I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.
Adding this extra register also uncovered a failure to use the
correct datatype to hold X86 registers, and I've corrected that as
necessary here.
Pull in r330264 from upstream llvm trunk (by Chandler Carruth):
[x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite
uses across basic blocks in the limited cases where it is very
straight forward to do so.
This will also be useful for other places where we do some limited
EFLAGS propagation across CFG edges and need to handle copy rewrites
afterward. I think this is rapidly approaching the maximum we can and
should be doing here. Everything else begins to require either heroic
analysis to prove how to do PHI insertion manually, or somehow
managing arbitrary PHI-ing of EFLAGS with general PHI insertion.
Neither of these seem at all promising so if those cases come up,
we'll almost certainly need to rewrite the parts of LLVM that produce
those patterns.
We do now require dominator trees in order to reliably diagnose
patterns that would require PHI nodes. This is a bit unfortunate but
it seems better than the completely mysterious crash we would get
otherwise.
Together, these should ensure clang does not use pushf/popf sequences to
save and restore flags, avoiding problems with unrelated flags (such as
the interrupt flag) being restored unexpectedly.
Split the matching and non-matching cases out into their own functions to
reduce future complexity. As the name implies, procmatches will eventually
process more than one match itself in the future.
Rename PROC_PDEATHSIG_SET -> PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_GET
-> PROC_PDEATHSIG_STATUS for consistency with other procctl(2)
operations names.
Requested by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
call racct_proc_ucred_changed() under the proc lock
The lock is required to ensure that the switch to the new credentials
and the transfer of the process's accounting data from the old
credentials to the new ones is done atomically. Otherwise, some updates
may be applied to the new credentials and then additionally transferred
from the old credentials if the updates happen after proc_set_cred() and
before racct_proc_ucred_changed().
The problem is especially pronounced for RACCT_RSS because
- there is a strict accounting for this resource (it's reclaimable)
- it's updated asynchronously by the vm daemon
- it's updated by setting an absolute value instead of applying a delta
I had to remove a call to rctl_proc_ucred_changed() from
racct_proc_ucred_changed() and make all callers of latter call the
former as well. The reason is that rctl_proc_ucred_changed, as it is
implemented now, cannot be called while holding the proc lock, so the
lock is dropped after calling racct_proc_ucred_changed. Additionally,
I've added calls to crhold / crfree around the rctl call, because
without the proc lock there is no gurantee that the new credentials,
owned by the process, will stay stable. That does not eliminate a
possibility that the credentials passed to the rctl will get stale.
Ideally, rctl_proc_ucred_changed should be able to work under the proc
lock.
Many thanks to kib for pointing out the above problems.
Rick Macklem [Fri, 20 Apr 2018 11:38:29 +0000 (11:38 +0000)]
Fix use of pointer after being set NULL.
Using a pointer after setting it NULL is probably not a good plan.
Spotted by inspection during changes for Flexible File Layout Ioerr handling.
This code path obviously isn't normally executed.
Add dead_bpf_if structure, that should be used as fake bpf_if
during ifnet detach.
Since destroying interface is not atomic operation and due to the
lack of synhronization during destroy, it is possible, that in the
time between bpfdetach() and if_free() some queued on destroying
interface mbuf will be used by ether_input_internal() and
bpf_peers_present() can dereference NULL bpf_if pointer. To protect
from this, assign pointer to empty bpf_if_ext structure instead of
NULL pointer after bpfdetach().