Default loader.conf: Drop efi_max_resolution to 1x1
Effectively disabling the mode changing bits in the loader. No matter which
way we go with it, it seems to be wrong- either the firmware doesn't change
the resolution and reports the resolution we requested, or the firmware
changes the resolution and doesn't report the resolution we requested. It
some cases, it does the right thing, but the bad cases outweight those.
Interested individuals can still set efi_max_resolution to 1080p or whatnot
in loader.conf(5) to restore the new behavior, but the new behavior does not
work out well for many cases.
cxgbe: Implement tcp_info handler for connections handled by t4_tom.
The TCB is read using a memory window right now. A better alternate to
get self-consistent, uncached information would be to use a GET_TCB
request but waiting for a reply from hw while holding non-sleepable
locks is quite inconvenient.
Import CK as of commit b19ed4c6a56ec93215ab567ba18ba61bf1cfbac8
It should fix ck_pr_[load|store]_ptr on mips and riscv, make sure no
*fence instructions are used on i386, as older cpus don't support it, and
make sure we don't rely on gcc builtins that can lead to calls to
libatomic when linked with -O0.
Ensure the background laundering threshold is positive after a scan.
The division added in r331732 meant that we wouldn't attempt a
background laundering until at least v_free_target - v_free_min clean
pages had been freed by the page daemon since the last laundering. If
the inactive queue is depleted but not completely empty (e.g., because
it contains busy pages), it can thus take a long time to meet this
threshold. Restore the pre-r331732 behaviour of using a non-zero
background laundering threshold if at least one inactive queue scan has
elapsed since the last attempt at background laundering.
unify amd64 and i386 cpu_reset() in x86/cpu_machdep.c
Because I didn't see any reason not too.
I've been making some changes to the code and couldn't help but notice
that the i386 and am64 code was nearly identical.
x86 cpu_reset: if failed to switch to BSP proceed to cpu_reset_real
If cpu_reset() is called on an AP and if it somehow fails to wake the
BSP, then it's better to attempt the reset on the AP than just sit there
spinning on an unusable and undebuggable system.
x86 cpu_reset_proxy: no need to stop_cpus() the original processor
The processor is "parked" in a spin-loop already and that's sufficient
for the reset. There is nothing that stop_cpus() would add here, only
extra complexity and fragility.
The original processor does not need to enable interrupts now, in fact,
it must not do that.
In uma_startup_count() handle special case when zone will fit into
single slab, but with alignment adjustment it won't. Again, when
there is only one item in a slab alignment can be ignored. See
previous revision of this file for more info.
Handle a special case when a slab can fit only one allocation,
and zone has a large alignment. With alignment taken into
account uk_rsize will be greater than space in a slab. However,
since we have only one item per slab, it is always naturally
aligned.
Code that will panic before this change with 4k page:
ian [Sun, 1 Apr 2018 18:53:27 +0000 (18:53 +0000)]
Fix the build on arches with default unsigned char. Capture the fubyte()
return value in an int as well as the char, and test the full int value
for fubyte() failure.
jeff [Sun, 1 Apr 2018 04:50:05 +0000 (04:50 +0000)]
Add a uma cache of free pages in the DEFAULT freepool. This gives us
per-cpu alloc and free of pages. The cache is filled with as few trips
to the phys allocator as possible by the use of a new
vm_phys_alloc_npages() function which allocates as many as N pages.
This code was originally by markj with the import function rewritten by
me.
This commit splits all of the logodefs/graphics out into their own own files
and provides a method for these files to register their logodefs with the
drawer. Graphics are now loaded on demand if they don't exist in the current
set of logodefs.
The drawer module becomes a little easier to navigate through without all of
the graphics mixed in. It's also easy to do one-off graphics like the
9.2 Die Hard tribute by dteske@ without adding even more to our memory
requirements.
- No need for a 'goto' when our entire loop body is then wrapped in a
conditional.
- No need to leave commented out prints laying around
- If an expression is clearly going to be either nil or an expression that
isn't likely to be a boolean, we might as well use `or` to specify a
default value for the expression. e.g. `loader.getenv(...) or "no"`
kevans [Sat, 31 Mar 2018 23:49:00 +0000 (23:49 +0000)]
lualoader: Don't assume that {module}_load is set
The previous iteration of this assumed that {module}_load was set. In the
old world order of default loader.conf(5), this was probably a safe
assumption given that we had almost every module explicitly not-loaded in
it.
In the new world order, this is no longer the case, so one could delete a
_load line inadvertently while leaving a _name, _type, _flags, _before,
_after, or _error. This would have caused a confusing Lua error and borked
module loading.
imp [Sat, 31 Mar 2018 22:02:59 +0000 (22:02 +0000)]
fwohcireg.h is 99% the same between the boot loader and the
kernel. Delete it and fix up the 1% difference because there's no need
for them to be different.
benno [Sat, 31 Mar 2018 15:04:41 +0000 (15:04 +0000)]
Synchronise with NetBSD's version of EFI handling for El Torito images.
When I implemented my EFI support I failed to check if the upstream version
of makefs in NetBSD had done the same. Override my version with theirs to
make it easier to stay in sync with them in the future.
jah [Sat, 31 Mar 2018 05:17:12 +0000 (05:17 +0000)]
Remove MK_AUTO_OBJ from env passed to PORTS_MODULES
This fixes a failure to resolve object file paths seen when buildkernel
(which sets MK_AUTO_OBJ=yes) and installkernel (which sets MK_AUTO_OBJ=no)
are run as separate steps. r329232 partially fixed this scenario by removing
MAKEOBJDIR, but it seems the AUTO_OBJ setting also needs to be on the same
page for the build and install steps.
brooks [Fri, 30 Mar 2018 21:38:53 +0000 (21:38 +0000)]
Document and enforce assumptions about struct (in6_)ifreq.
- The two types must be type-punnable for shared members of ifr_ifru.
This allows compatibility accessors to be shared.
- There must be no padding gap between ifr_name and ifr_ifru. This is
assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
broadly portable. This is true for all current architectures, but very
large (256-bit) fat-pointers could violate this invariant.
brooks [Fri, 30 Mar 2018 20:24:29 +0000 (20:24 +0000)]
Fall back to ether_ioctl() by default.
The common pratice in ethernet device drivers is to fall back to
ether_ioctl() to implement generic ioctls not implemented by the driver
and to fail if no handler exists.
Convert these drivers to follow that practice rather than calling
ether_ioctl() for specific cases.
vxge(4) aready had the default case, but it was only called on failure
to match.
hselasky [Fri, 30 Mar 2018 19:45:48 +0000 (19:45 +0000)]
Reorganize health recovery in mlx5core.
- Move the semaphore locking and unlocking to the same function.
- Flags are no longer needed if the reset and crdump will be done in the
same function.
hselasky [Fri, 30 Mar 2018 19:43:15 +0000 (19:43 +0000)]
Prepare for FW dump in error state in mlx5core.
- Move firmware dump prep and cleanup to init_one() and remove_one() so that
the init and cleanup will happen only upon driver reload.
- Add some prints to indicate firmware dump.
gjb [Fri, 30 Mar 2018 19:08:37 +0000 (19:08 +0000)]
Add logic for "families" for GCE images.
This allows for GCE consumers to easily detect the latest major
version of FreeBSD when using the gcloud command line utility.
To ensure snapshot builds do not conflict with release-style
builds (ALPHA, BETA, RC, RELEASE), the '-snap' suffix is appended
to the GCE image family name.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
brooks [Fri, 30 Mar 2018 18:50:13 +0000 (18:50 +0000)]
Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.
manu [Fri, 30 Mar 2018 16:37:08 +0000 (16:37 +0000)]
efinet: Do not return only if ReceiveFilter fails
If the network interface or the uefi implementation do not support the
ReceiveFilter interface do not return only and just print a message.
U-Boot doesn't support is and likely never will. Also even if this fails
it doesn't mean that network in EFI isn't supported.
ken [Fri, 30 Mar 2018 15:28:25 +0000 (15:28 +0000)]
Bring in the Broadcom/Emulex Fibre Channel driver, ocs_fc(4).
The ocs_fc(4) driver supports the following hardware:
Emulex 16/8G FC GEN 5 HBAS
LPe15004 FC Host Bus Adapters
LPe160XX FC Host Bus Adapters
Emulex 32/16G FC GEN 6 HBAS
LPe3100X FC Host Bus Adapters
LPe3200X FC Host Bus Adapters
The driver supports target and initiator mode, and also supports FC-Tape.
Note that the driver only currently works on little endian platforms. It
is only included in the module build for amd64 and i386, and in GENERIC
on amd64 only.
kib [Fri, 30 Mar 2018 10:55:31 +0000 (10:55 +0000)]
Make vm_map_max/min/pmap KBI stable.
There are out of tree consumers of vm_map_min() and vm_map_max(), and
I believe there are consumers of vm_map_pmap(), although the later is
arguably less in the need of KBI-stable interface. For the consumers
benefit, make modules using this KPI not depended on the struct vm_map
layout.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14902
emaste [Fri, 30 Mar 2018 03:38:08 +0000 (03:38 +0000)]
makefs: sync fragment and block size with newfs
r222319 in newfs raised the default blocksize for UFS/FFS filesystems
from 16K to 32K and the default fragment size from 2K to 4K, with a
rationale that most disks were now running with 4K sectors.
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
landonf [Thu, 29 Mar 2018 19:44:15 +0000 (19:44 +0000)]
bhnd(4): include a subset of the ChipCommon capability flags in bhnd_chipid;
this provides early access to device capability flags required by bhnd(4)
bus and bhndb(4) bridge drivers.
davidcs [Thu, 29 Mar 2018 17:36:34 +0000 (17:36 +0000)]
1. Add additional debug prints.
2. Break transmit when IFF_DRV_RUNNING is OFF.
3. set desc_count=0 for default case in switch in ql_rcv_isr()
MFC after:5 days
markj [Thu, 29 Mar 2018 17:19:59 +0000 (17:19 +0000)]
Have TD_LOCKS_DEC() assert that td_locks is positive.
This makes it easier to catch lock accounting bugs, since the problem
is otherwise only detected upon a return to user mode (or never, for
kernel threads).
brooks [Thu, 29 Mar 2018 15:58:49 +0000 (15:58 +0000)]
GC never enabled support for SIOCGADDRROM and SIOCGCHIPID.
When de(4) was imported in 1997 the world was not ready for these ioctls.
In over 20 years that hasn't changed so it seems safe to assume their
time will never come.
markj [Thu, 29 Mar 2018 14:27:40 +0000 (14:27 +0000)]
Fix the background laundering mechanism after r329882.
Rather than using the number of inactive queue scans as a metric for
how many clean pages are being freed by the page daemon, have the
page daemon keep a running counter of the number of pages it has freed,
and have the laundry thread use that when computing the background
laundering threshold.
jeff [Thu, 29 Mar 2018 02:54:50 +0000 (02:54 +0000)]
Implement several enhancements to NUMA policies.
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.
Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user. This
also eliminates the need for the complicated interrupt binding code.
Add a sysctl API for viewing and manipulating domainsets. Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both. This probably belongs in a dedicated subr file.
kevans [Thu, 29 Mar 2018 00:55:11 +0000 (00:55 +0000)]
stand: Add workaround for HP BIOS issues
hrs@ and kuriyama@ have found that on some HP BIOS, a system will fail to
boot immediately after installation with the claim that it can't work out
which disk they are booting from.
They tracked it down to a buffer overrun, and found that it could be
alleviated by doing a dummy read before-hand.
jhb [Thu, 29 Mar 2018 00:12:50 +0000 (00:12 +0000)]
Reformat the enum of syscall argument types.
List enum values on separate lines to minimize diffs as new types are
added. Split the enum values up into groups and use some simple sorting
within groups (scalar enums are sorted by size, then base, all other
groups are generally sorted alphabetically).
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matt Ahrens <Matt.Ahrens@delphix.com>
With compressed ARC (6950) we use up to 25% of our CPU to decompress indirect
blocks, under a workload of random cached reads. To reduce this decompression
cost, we would like to increase the size of the dbuf cache so that more
indirect blocks can be stored uncompressed.
If we are caching entire large files of recordsize=8K, the indirect blocks
use 1/64th as much memory as the data blocks (assuming they have the same
compression ratio). We suggest making the dbuf cache be 1/32nd of all memory,
so that in this scenario we should be able to keep all the indirect blocks
decompressed in the dbuf cache. (We want it to be more than the 1/64th that
the indirect blocks would use because we need to cache other stuff in the
dbuf cache as well.)
In real world workloads, this won't help as dramatically as the example
above, but we think it's still worth it because the risk of decreasing
performance is low. The potential negative performance impact is that we
will be slightly reducing the size of the ARC (by ~3%).
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: George Wilson <george.wilson@delphix.com>
arc_loan_compressed_buf() increments arc_loaned_bytes by psize unconditionally
In the case of zfs_compressed_arc_enabled=0, when the buf is returned via
arc_return_buf(), if ARC_BUF_COMPRESSED(buf) is false, then arc_loaned_bytes
is decremented by lsize, not psize.
Switch to using arc_buf_size(buf), instead of psize, which will return
psize or lsize, depending on the result of ARC_BUF_COMPRESSED(buf).
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Allan Jude <allanjude@freebsd.org>
We want to be able to pass various settings during import/open of a pool,
which are not only related to rewind. Instead of adding a new policy and
duplicate a bunch of code, we should just rename rewind_policy to a more
generic term like load_policy.
For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we want
those flags not only for the import case, but also for the open case. One
such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would
allow zfs to open a pool when logs are missing.
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
ztest failed with uncorrectable IO error despite having the fix for #7163.
Both sides of the mirror have CANT_OPEN_BAD_LABEL, which also distinguishes
it from that issue.
Definitely seems like a racing condition between the vdev_validate and spa_sync:
1. Thread A (spa_sync): vdev label is updated to latest txg
2. Thread B (vdev_validate): vdev label's txg is compared to spa_last_synced_txg and is ahead.
3. Thread A (spa_sync): spa_last_synced_txg is updated to latest txg.
Solution: do not check txg in vdev_validate unless config lock is held.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that. It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data). It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it. Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure. She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state. Otherwise, she discards it. With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>
We do not have libfakekernel, but need to reduce code divergence.
jeff [Wed, 28 Mar 2018 18:47:35 +0000 (18:47 +0000)]
Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all
platforms. Original commit message as follows:
Only use CPUs in the domain the device is attached to for default
assignment. Device drivers are able to override the default assignment
if they bind directly. There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.
The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with
exactly that. It can be thought of as a “pool-wide snapshot” (or a
variation of extreme rewind that doesn’t corrupt your data). It remembers
the entire state of the pool at the point that it was taken and the user
can revert back to it later or discard it. Its generic use case is an
administrator that is about to perform a set of destructive actions to ZFS
as part of a critical procedure. She takes a checkpoint of the pool before
performing the actions, then rewinds back to it if one of them fails or puts
the pool into an unexpected state. Otherwise, she discards it. With the
assumption that no one else is making modifications to ZFS, she basically
wraps all these actions into a “high-level transaction”.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
hselasky [Wed, 28 Mar 2018 17:39:23 +0000 (17:39 +0000)]
Fix for regression issue in USB keyboard driver after r304735.
A series of zero delay callouts can happen causing high CPU usage of the
timer subsystem when trying to repeat keys, because the time of the
absolute timeout is not moving forward. The condition clears when all
keys are released.
ae [Wed, 28 Mar 2018 12:44:28 +0000 (12:44 +0000)]
Rework ipfw rules parsing and printing code.
Introduce show_state structure to keep information about printed opcodes.
Split show_static_rule() function into several smaller functions. Make
parsing and printing opcodes into several passes. Each printed opcode
is marked in show_state structure and will be skipped in next passes.
Now show_static_rule() function is simple, it just prints each part
of rule separately: action, modifiers, proto, src and dst addresses,
options. The main goal of this change is avoiding occurrence of wrong
result of `ifpw show` command, that can not be parsed by ipfw(8).
Also now it is possible to make some simple static optimizations
by reordering of opcodes in the rule.
avg [Wed, 28 Mar 2018 08:55:31 +0000 (08:55 +0000)]
ZFS vn_rele_async: catch up with the use of refcount(9) for the vnode use count
It's not sufficient nor required to use the vnode interlock when
checking if we are going to drop the last use count as the code in
vputx() uses refcount (atomic) operations for both checking and
decrementing the use code. Apply the same method to vn_rele_async().
While here, remove vn_rele_inactive(), a wrapper around vrele() that
didn't add any value.
Also, the change required making vfs_refcount_release_if_not_last()
public. I've made vfs_refcount_acquire_if_not_zero() public as well.
They are in sys/refcount.h now. While making the move I've dropped the
vfs_ prefix.