allanjude [Sat, 16 Jun 2018 15:16:02 +0000 (15:16 +0000)]
Avoid reading past the end of the disk in zfsboot.c and biosdisk.c
The GELI boot code rounds reads up to 4k, since the encrypted sectors are
4k, and must be decrypted as a unit. With oddball sized disks (almost
always virtual), this can lead to reading past the end of the disk.
ae [Sat, 16 Jun 2018 08:26:23 +0000 (08:26 +0000)]
Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9).
Using of rwlock with multiqueue NICs for IP forwarding on high pps
produces high lock contention and inefficient. Rmlock fits better for
such workloads.
mmel [Sat, 16 Jun 2018 08:25:38 +0000 (08:25 +0000)]
Fix handling of enable counter for shared GPIO line in fixed regulator.
For most regulators, the regulator_stop() method can be transformed to
regulator disable. But, in some cases, we needs to maintain shared data
across multiple regulators (e.g. single GPIO pin which works as enable
for multiple regulates). In this case, the implementation of regulator
should perform his own enable counting therefore it is necessary to
distinguish between the regulator enable/disable method (which
increments/decrements enable counter for shared resource) and regulator
stop method (which don't affect it).
So:
- add regnode_stop() method to regulator framework and default it to
regnode_enable(..., false, ...)
- implement it in regulator_fixed with proper enable counting.
While I'm in, also fix handling of always_on property. If any of regulators
sharing same GPIO pin have it enabled, then none of them can disable regulator.
allanjude [Sat, 16 Jun 2018 04:50:40 +0000 (04:50 +0000)]
biosdisk.c remove redundant variable
`rdev` and `disk` serve the same purpose, read the partition table without
the `d_offset` or `d_slice` set, so the read is relative to the start of
the disk. Reuse the already initialized `disk` instead of making another
copy later.
np [Fri, 15 Jun 2018 23:42:22 +0000 (23:42 +0000)]
cxgbe(4): Add a hw.cxgbe.starve_fl sysctl that can be used to starve the
freelists of netmap receive queues. This is primarily to test various
congestion scenarios in the chip.
glebius [Fri, 15 Jun 2018 21:36:16 +0000 (21:36 +0000)]
Since 'ticks' is an int, it may wrap around and cr_ticks at a certain
counter_rate will be greater than ticks, resulting in counter_ratecheck()
failure. To fix this take an absolute value of the difference between
ticks and cr_ticks.
np [Fri, 15 Jun 2018 21:23:03 +0000 (21:23 +0000)]
cxgbe(4): Track the number of received frames separately from the number
of descriptors processed. Add the ability to gather a certain maximum
number of frames in the driver's rx before waking up netmap rx. If
there aren't enough frames then netmap rx will be woken up as usual.
rmacklem [Fri, 15 Jun 2018 19:45:15 +0000 (19:45 +0000)]
Add a command that copies or migrates a data file from one DS to another.
This command can be used by a sysadmin to either copy or migrate a data
file on one DS to another DS.
Its main use is to recover data files onto a mirrored DS after the DS has
been repaired and brought back online.
rmacklem [Fri, 15 Jun 2018 19:35:08 +0000 (19:35 +0000)]
Add a command the displays and modifies the pNFS server's extended attribute.
This command allows a sysadmin to display or modify the pnfsd.dsfile extended
attribute used by the pNFS MDS server in various ways.
Its main use is to set a DS's IP address to 0.0.0.0 when that DS has failed,
so that it will not be used for the file when brought back online after
being repaired.
imp [Fri, 15 Jun 2018 19:07:37 +0000 (19:07 +0000)]
There's no need to walk through the tables looking for the smbios
table if we're just going to ignore it on arm, so expand, slightly,
the reach of the ifdef. Move the buffer to the inner block so we
don't have a separate #ifdef far away from these lines.
The issue on arm is that smbios_detect does unaligned accesses, which
in the u-boot implementing EFI context causes a crash.
imp [Fri, 15 Jun 2018 19:07:26 +0000 (19:07 +0000)]
Provide a more direct interface to tell ZFS what the preferred handle
is. We tell the ZFS code now, and it checks rather than having a
callback to do the checks.
This will allow us to have a more graceful fallback code. In the
future, it's anticipated that we may fallback to a more global search
(or implement a command to do so) when reqeusted by the user, or we
detect a violation of the UEFI Boot Manager protocol severe enough to
warrant this backstop. For now, it just allows us to get rid of img as
a global.
cem [Fri, 15 Jun 2018 19:02:53 +0000 (19:02 +0000)]
Retain offset compatibility with pre-12.0 dumps
As a follow-up to r324965, which adds support for compressed kernel dumps,
readjust dump header members slightly to mostly preserve ABI with earlier
(11.x and older) dumps.
jhibbits [Fri, 15 Jun 2018 18:55:02 +0000 (18:55 +0000)]
Check for a 'pci' prefix rather than a full match in get_addr_props
Summary:
Newer OPAL device trees, such as those on POWER9 systems, use 'pciex' for
device_type, not 'pci'. Rather than enumerating all possible variants, just
check for a 'pci' prefix.
bdrewery [Fri, 15 Jun 2018 18:50:24 +0000 (18:50 +0000)]
lib32: Fix lib/libpmc/pmu-events files ending up in source directory.
This could happen with either WITHOUT_AUTO_OBJ=yes or MAKELEVEL>0 for
the initial 'make buildworld' command.
This now ensures that build-tools targets have 'make obj' ran if needed.
This is especially problematic for pmu-events since it is not directly
connected in the build. Normally the 'make includes' call right before
this implicitly creates the objdir with a 'make obj' already but
misses pmu-events because it is disconnected from lib/libpmc. Fixing that
would make this new 'make obj' pointless but it is being added to avoid
this problem in the future should another tool be connected like this.
kevans [Fri, 15 Jun 2018 17:29:32 +0000 (17:29 +0000)]
extres/regulator: Switch boot_on/always_on sysctl to uint8
These are represented as booleans on the kernel-side, but were being exposed
as int. This was causing some funky things to happen when read later with
sysctl(8), e.g. randomly reading super-high when the value was actually
'0'/false.
emaste [Fri, 15 Jun 2018 16:28:50 +0000 (16:28 +0000)]
ldd: reference readelf instead of objdump in warning message
We have an obsolete GNU objdump 2.17.50 in the base system, which will
be removed in the future. Suggest readelf(1) for examining ELF files
instead; for most use cases it is the preferred tool anyhow.
PR: 229046
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
emaste [Fri, 15 Jun 2018 16:14:42 +0000 (16:14 +0000)]
elf.5: add readelf cross-reference
objdump is sometimes used in cases where readelf is more appropriate,
but the obsolete GNU objdump we have in the base system will be removed
in the future.
.Xr readelf from elf.5 to improve the odds the more appropriate tool
will be found.
PR: 229046
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
chuck [Fri, 15 Jun 2018 15:22:27 +0000 (15:22 +0000)]
Add linprocfs support for min_free_kbytes
This adds linprocfs support for proc/sys/vm/min_free_kbytes which the
free program requires for correct operation. The approach mirrors the
approach used in illumos.
emaste [Fri, 15 Jun 2018 14:41:51 +0000 (14:41 +0000)]
linuxulator: do not include legacy syscalls on arm64
Existing linuxulator platforms (i386, amd64) support legacy syscalls,
such as non-*at ones like open, but arm64 and other new platforms do
not.
Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h
files. We may need finer grained control in the future but this is
sufficient for now.
emaste [Fri, 15 Jun 2018 14:29:41 +0000 (14:29 +0000)]
Correct debug control for linuxulator faccessat
The Linuxulator provides per-syscall debug control via the
compat.linux.debug sysctl. There's generally a 1:1 mapping between
sysctl setting and syscall, but faccessat was controlled by the access
setting, perhaps due to copy-paste.
kevans [Fri, 15 Jun 2018 13:14:45 +0000 (13:14 +0000)]
Revert r335173 at request of mmel@
This was the wrong solution to the problem; regulator_shutdown invokes
regnode_stop. regulator_stop is not a refcounting method, but it invokes
regnode_enable, which is.
mmel@ has a proposed patch/solution to instead provide regnode_fixed_stop
behavior that properly takes shared GPIO pins into account.
tuexen [Fri, 15 Jun 2018 12:28:43 +0000 (12:28 +0000)]
When retransmitting TCP SYN-ACK segments with the TCP timestamp option
enabled use an updated timestamp instead of reusing the one used in
the initial TCP SYN-ACK segment.
This patch ensures that an updated timestamp is used when sending the
SYN-ACK from the syncache code. It was already done if the
SYN-ACK was retransmitted from the generic code.
This makes the behaviour consistent and also conformant with
the TCP specification.
Reviewed by: jtl@, Jason Eggleston
MFC after: 1 month
Sponsored by: Neflix, Inc.
Differential Revision: https://reviews.freebsd.org/D15634
rmacklem [Fri, 15 Jun 2018 11:52:34 +0000 (11:52 +0000)]
Add a command that disables a pNFS server mirrored DS.
This command can be used by a sysadmin to disable a malfunctioning pNFS server
mirrored DS. It is safe to use when a mirrored DS has already been disabled
via an I/O or network partitioning error.
manu [Fri, 15 Jun 2018 08:36:21 +0000 (08:36 +0000)]
allwinner: ccung: Fully subclass the clock drivers
Each clock drivers if now fully subclassed, this have the advantage that
we can control the probe order.
Some clocks can have parents from other drivers, for example clocks in the
sun8i_r driver uses clocks from the main clock driver.
This worked before because the sun8i_r node is after the main ccu node in the
dtb and driver are probed in DTB order. This cannot work with the Display
Engine clocks as it is the first node in the DTB.
bdrewery [Fri, 15 Jun 2018 00:36:41 +0000 (00:36 +0000)]
proc0_post: Fix some locking issues
- Filter out PRS_NEW procs as rufetch() tries taking the thread lock
which may not yet be initialized.
- Hold PROC_LOCK to ensure stability of iterating the threads.
- p_rux fields are protected by the process statlock as well.
jhb [Thu, 14 Jun 2018 22:31:30 +0000 (22:31 +0000)]
Exit with an error if a linker hints file can't be found.
Continuing with a NULL hints variable just triggers a segfault later on.
The other error cases in this function all exit for an error rather than
warning.
brooks [Thu, 14 Jun 2018 21:27:25 +0000 (21:27 +0000)]
Name the implementation of brk and sbrk sys_break().
The break() system call was renamed (several times) starting in v3
AT&T UNIX when C was invented and break was a language keyword. The
last vestage of a need for it to be called something else (eg obreak)
was removed in r225617 which consistantly prefixed all syscall
implementations.
regnode::enable_cnt is generally used to refcount regulator nodes. For
GPIOs, the refcount was done on the gpio_entry since more than one regulator
can share a GPIO.
GPIO regulators were not taking part in the node refcount, since they had
their own mechanism. This caused some fallout after manu started disabling
everybody's unused regulators in r331989.
rmacklem [Thu, 14 Jun 2018 20:36:55 +0000 (20:36 +0000)]
Add the "-p" and "-m" options to nfsd.c for the pNFS service.
The "-p" option specifies that the nfsd should run a pNFS service instead
of a regular NFS service. The "-m" option is only meaningful when used with
"-p" to specify that mirroring on the DSs should be done and on how many of
them.
This change requires the kernel changes committed as r334930.
The man page update will be committed as a separate commit soon.
kib [Thu, 14 Jun 2018 19:41:02 +0000 (19:41 +0000)]
Handle the race between fork/vm_object_split() and faults.
If fault started before vmspace_fork() locked the map, and then during
fork, vm_map_copy_entry()->vm_object_split() is executed, it is
possible that the fault instantiate the page into the original object
when the page was already copied into the new object (see
vm_map_split() for the orig/new objects terminology). This can happen
if split found a busy page (e.g. from the fault) and slept dropping
the objects lock, which allows the swap pager to instantiate
read-behind pages for the fault. Then the restart of the scan can see
a page in the scanned range, where it was already copied to the upper
object.
Fix it by instantiating the read-ahead pages before
swap_pager_getpages() method drops the lock to allocate pbuf. The
object scan would see the whole range prefilled with the busy pages
and not proceed the range.
Note that vm_fault rechecks the map generation count after the object
unlock, so that it restarts the handling if raced with split, and
re-lookups the right page from the upper object.
In collaboration with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kevans [Thu, 14 Jun 2018 18:34:02 +0000 (18:34 +0000)]
a10_ahci: Correct clock indices for new bindings
r329104 imported 4.15 DTS which brought CCU to a10/a20. In the process, they
swapped the ordering of 'clocks' for allwinner,sun4i-a10-ahci on both
sun4i-a10 and sun7i-a20 from PLL, Gate to Gate, PLL.
kevans [Thu, 14 Jun 2018 17:50:29 +0000 (17:50 +0000)]
aw_ccung: Add a10/a20 support
Note: At this time, this has only been tested on a single board from one of
the supported SoCs. This is enough to boot the board from MMC and have
functional USB- which is still an improvement over where we were at just
before with no functional clocks.
jhibbits [Thu, 14 Jun 2018 17:23:51 +0000 (17:23 +0000)]
Split the PowerISA 3.0 HPT implementation from historic
PowerISA 3.0 makes several changes to not only the format of the HPT but
also the behavior surrounding it. For instance, TLBIE no longer requires
serialization. Removing this lock cuts buildworld time in half on a
18-core/72-thread POWER9 system, demonstrating that this lock is highly
contended on such a system.
There was odd behavior observed trying to make this change in a
backwards-compatible manner in moea64_native.c, so the best option was to
fully split it, and largely revert the original changes adding POWER9
support to the original file.
manu [Thu, 14 Jun 2018 17:18:15 +0000 (17:18 +0000)]
arm timer: Add workaround for Allwinner A64 timer
The timer present in allwinner A64 SoC is unstable, value can jump backward
or forward.
It was found that when bit 11 and upper roll over the low bits can sometimes
being read as all as 1 or all as 0.
Simply ignore the values for those cases.
ken [Thu, 14 Jun 2018 17:08:44 +0000 (17:08 +0000)]
Fix da(4) locking when probing SMR drives.
Probing host aware and host managed SMR drives got broken in revision
330796.
The added cam_periph_lock() calls were in areas in dadone() where
the peripheral lock was already held.
Since then, dadone() has been split into separate functions that are
dedicated to each probe state.
The result is that when probing a host aware drive, I ran into a recursive
lock acquisition in dadone_probeatalogdir(). I would have run into the
same problem in dadone_probeataiddir(), and in dadone_probeatasup() and
dadone_probeatazone() in the error paths had the probe continued.
The solution is to take out all of the extra cam_periph_lock() calls. I
also added cam_periph_assert(periph, MA_OWNED) near the top of each of
the dadone_* calls. These make it clear to anyone coming along in the
the future that the lock is held in the probe done functions.
Also add a locking assert in daprobedone(), to make it clear that it must
be called with the periph lock held.
kevans [Thu, 14 Jun 2018 16:09:29 +0000 (16:09 +0000)]
devmatch: Address some rc nits
- devmatch_enable in rc.conf(5) was not gating the start of devmatch
- Use quietstart in devd/devmatch to suppress dozens of 'Cannot start'
messages and other spurious messages from rc.subr(8) that aren't
necessarily helpful.
jhibbits [Thu, 14 Jun 2018 16:01:11 +0000 (16:01 +0000)]
Fix CTR formatting for moea64_native bootstrap
On very large memory systems 'size' can become 2GB or larger, resulting in a
negative value being formatted. Also, moea64_pteg_count is already a long, so
format it as such.
ae [Thu, 14 Jun 2018 11:15:39 +0000 (11:15 +0000)]
In m_megapullup() use m_getjcl() to allocate 9k or 16k mbuf when requested.
It is better to try allocate a big mbuf, than just silently drop a big
packet. A better solution could be reworking of libalias modules to be
able use m_copydata()/m_copyback() instead of requiring the single
contiguous buffer.