Ian Lepore [Mon, 11 Sep 2017 23:47:49 +0000 (23:47 +0000)]
Add a default implementation that returns ENODEV for start, repeat_start,
stop, read, and write methods. Some controllers don't implement these
individual operations and have only a transfer method. In that case, we
should return an indication that the device is present but doesn't support
the method, as opposed to the kobj default error ENXIO which makes it
look like the whole device is missing. Userland tools such as i2c(8) can
use the differing return values to switch between the two different i2c
IO mechanisms.
Ian Lepore [Mon, 11 Sep 2017 21:49:38 +0000 (21:49 +0000)]
Make i2c -s (device scan) work on hardware that supports only full xfers.
The existing scan code is based on sending an i2c START condition and if
there is no error it assumes there is a device at that i2c address. Some
i2c controllers don't support sending individual start/stop signals on the
bus, they can only perform complete data transfers with start/stop handled
in the silicon.
This adds a fallback mechanism that attempts to read a single byte from each
i2c address. It's less reliable than looking for an an ACK repsonse to a
start, because some devices will NAK an attempt to read that isn't preceeded
by a write of a register address. Writing to devices to probe them is too
dangerous to even consider. The user is told that a less-reliable scan is
being done, so even if the read-scan comes up empty too, it's still a vast
improvement over the old situation where it would just claim there were no
devices on the bus even though the devices were there and working fine.
If the i2c controller responds with a proper ENODEV (device doesn't support
operation) or an almost-proper EOPNOTSUPP, the START/STOP scan is switched
to a read-scan right away. Most controllers respond with ENXIO or EIO if
they don't support START/STOP, so no quick-out is available. For those,
if a scan of all 127 addresses and come up empty, the scan is re-done using
the read method.
Ed Maste [Mon, 11 Sep 2017 17:39:21 +0000 (17:39 +0000)]
Ignore error return from newaliases(1)
This was originally added as "exit $SUCCESS" but with nothing to set the
SUCCESS variable. Thus it became an exit with no argument, which just
exits with the status of the preceding command.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Ed Maste [Mon, 11 Sep 2017 14:41:57 +0000 (14:41 +0000)]
make-memstick.sh: use UFSv2
There's not much practical difference as far as install media is
concerned but newfs creates UFSv2 by default and it is sensible to use
the contemporary UFS version.
I also intend to change makefs to create UFSv2 by default (to match
newfs) so we'll want make-memstick.sh to be explicit, rather than
relying on the host tool's default.
Reviewed by: andrew, gjb, jhibbits
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12231
Ed Maste [Mon, 11 Sep 2017 14:33:04 +0000 (14:33 +0000)]
boot1: remove BOOT1_MAXSIZE default value
This Makefile relies on Makefile.fat providing the correct value for
BOOT1_MAXSIZE and BOOT1_OFFSET. Since BOOT1_OFFSET had no default value
here the build would already fail if Makefile.fat did not provide
correct values.
https://www.illumos.org/issues/8569
C [C99] has peculiar rules for inline functions that are different from the
C++ rules. Unlike C++ where inline is "fire and forget", in C a programmer
must pay attention to the function's storage class / visibility. The main
problem is with the case where a compiler decides to not inline a call to the
function declared as inline.
Some relevant links:
- http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15831.html
- http://www.drdobbs.com/the-new-c-inline-functions/184401540
The summary is that either the inline functions should be declared 'static
inline' or one of the compilation units (.c files) must provide a callable
externally visible function definition. In the former case, the compiler would
automatically create a local non-inlined function instance in every compilation
unit where it's needed. In the latter case the single external definition is
used to satisfy any non-inlined calls in all compilation units. As things
stand right now, we can get an undefined reference error under certain
combinations of compilers and compiler options. For example, this is what I
get on FreeBSD when compiling with clang 4.0.0 and -O1:
In function `abd_free': /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:385:
undefined reference to `abd_is_linear'
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/8558
On a system with more than 80K ZFS filesystems, we've seen cases where
lwp_create() will start to fail by returning EAGAIN. The problem being,
for each of those 80K ZFS filesystems, a taskq will be created for each
dataset as part of the ZIL for each dataset.
For each of these taskq's, a kernel thread will be created which results
in 24KB being allocated for each thread. With enough of these 24KB
allocations, we eventually exhaust the memory region set aside for these
allocations. Currently, segkpsize is set to a value of 2GB, which means
we can only support about 80K filesystems; 2GB / 24KB = ~80K.
The lwp_create() failure comes into play due to the fact that LWP
creation also allocates 24KB from this same region of memory. Thus, if
we've exhausted this region of memory due to the number of ZIL taskq's,
there won't be any memory avaible to allow the call to lwp_create() to
succeed.
FreeBSD note: I haven't created sysctl-s for the new ZIL clean
parameters. Let's add them if anyone requires to tune them.
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
MFC after: 3 weeks
Marcin Wojtas [Mon, 11 Sep 2017 10:41:42 +0000 (10:41 +0000)]
Improve HW type checking in mv_ehci driver
This patch adds hwtype parameter which keeps information about hardware
revision of Marvell EHCI controller. It allows to replace multiple
calls to ofw_bus_is_compatible with comparing hwtype value during driver
initialization.
Ed Maste [Mon, 11 Sep 2017 00:37:00 +0000 (00:37 +0000)]
boot1 generate-fat: generate all templates at once
In advance of other changes to the fat template generation process, have
generate-fat.sh create all template files at the same time so that they
cannot get out of sync.
Also correct a longstanding but where BOOT1_OFFSET was overwritten on
each invocation. A previous version of this patch stored a per-arch
offset (e.g. BOOT1_arm64_OFFSET) but that was deemed unnecessary.
Instead just hardcode the known offset that applies to all archs (0x2d)
and fail if the offset happens to be different.
Ongiong work (using newfs_msdos in bsdinstall and adding msdosfs support
to makefs) will eventually allow us to do away with this fat template
hack altogether, but in the near term we have a few improvements that
will build on this.
Reviewed by: allanjude, imp, Eric McCorkle
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10931
Ed Maste [Mon, 11 Sep 2017 00:14:04 +0000 (00:14 +0000)]
newvers.sh: speed up failing git-svn revision search
In the case of running newvers.sh on a git tree w/o git-svn-id notes we
previously piped the entire 'git log' to grep. Add --grep to the log
invocation to avoid processing log entries of no interest.
This saves about 2-3 seconds of newvers.sh run time on my SSD laptop.
Later changes will bring further speedups.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Ed Maste [Sun, 10 Sep 2017 19:12:01 +0000 (19:12 +0000)]
newvers.sh: accept "git-svn-id:" at the start of a line only
This prevents incorrect subversion revision detection when "git svn" is
not being used to get the sources but git is available. Previously old
subversion revisions included in commit messages were favoured over the
more recent and correct revisions in git notes.
For example cf1f35574722 represents r315395 but was treated as r313908
which is referenced in the commit message. Commits following
r315395/cf1f35574722 but before another commit with a git-svn-id
reference in the commit message would be treated as r313908 as well.
Patch from PR updated to accommodate the initial four space indent in
`git log` ouptut.
Ian Lepore [Sun, 10 Sep 2017 18:08:25 +0000 (18:08 +0000)]
Add gpio methods to read/write/configure up to 32 pins simultaneously.
Sometimes it is necessary to combine several gpio pins into an ad-hoc bus
and manipulate the pins as a group. In such cases manipulating the pins
individualy is not an option, because the value on the "bus" assumes
potentially-invalid intermediate values as each pin is changed in turn. Note
that the "bus" may be something as simple as a bi-color LED where changing
colors requires changing both gpio pins at once, or something as complex as
a bitbanged multiplexed address/data bus connected to a microcontroller.
In addition to the absolute requirement of simultaneously changing the
output values of driven pins, a desirable feature of these new methods is to
provide a higher-performance mechanism for reading and writing multiple
pins, especially from userland where pin-at-a-time access incurs a noticible
syscall time penalty.
These new interfaces are NOT intended to abstract away all the ugly details
of how gpio is implemented on any given platform. In fact, to use these
properly you absolutely must know something about how the gpio hardware is
organized. Typically there are "banks" of gpio pins controlled by registers
which group several pins together. A bank may be as small as 2 pins or as
big as "all the pins on the device, hundreds of them." In the latter case, a
driver might support this interface by allowing access to any 32 adjacent
pins within the overall collection. Or, more likely, any 32 adjacent pins
starting at any multiple of 32. Whatever the hardware restrictions may be,
you would need to understand them to use this interface.
In additional to defining the interfaces, two example implementations are
included here, for imx5/6, and allwinner. These represent the two primary
types of gpio hardware drivers. imx6 has multiple gpio devices, each
implementing a single bank of 32 pins. Allwinner implements a single large
gpio number space from 1-n pins, and the driver internally translates that
linear number space to a bank+pin scheme based on how the pins are grouped
into control registers. The allwinner implementation imposes the restriction
that the first_pin argument to the new functions must always be pin 0 of a
bank.
Alan Cox [Sun, 10 Sep 2017 17:46:03 +0000 (17:46 +0000)]
To analyze the allocation of swap blocks by blist functions, add a method
for analyzing the radix tree structures and reporting on the number, and
sizes, of maximal intervals of free blocks. The report includes the number
of maximal intervals, and also the number of them in each of several size
ranges, from small (size 1, or 3 to 4) to large (28657 to 46367) with size
boundaries defined by Fibonacci numbers. The report is written in the test
tool with the 's' command, or in a running kernel by sysctl.
The analysis of the radix tree frequently computes the position of the lone
bit set in a u_daddr_t, a computation that also appears in leaf allocation.
That computation has been moved into a function of its own, and optimized
for cases where an inlined machine instruction can replace the usual binary
search.
Toomas Soome [Sun, 10 Sep 2017 13:53:42 +0000 (13:53 +0000)]
loader.efi: chain loader should provide proper device handle
Since the efipart rewrite, the chain command was looking for device
handle using interface applicable only for net devices. Disk
partitions and zfs pools need their own approach to find the proper handle.
namecache_ts differs from mere namecache by few fields placed mid struct.
The access to the last element (the name) is thus special-cased.
The standard solution is to put new fields at the very beginning anad
embedd the original struct. The pointer shuffled around points to the
embedded part. If needed, access to new fields can be gained through
__containerof.
Marius Strobl [Sun, 10 Sep 2017 01:25:15 +0000 (01:25 +0000)]
MFV: r323381
Permit a deflateParams() parameter change as soon as possible.
This change fixes compression errors seen when the embedded Tomcat
web server of a UniFi Controller zlib compresses responses. Given
that Tomcat just uses Java/OpenJDK which in turn employs zlib for
its compression/decompression support, this bug might very well
affect other applications, too.
Marius Strobl [Sun, 10 Sep 2017 00:56:28 +0000 (00:56 +0000)]
Permit a deflateParams() parameter change as soon as possible.
This commit allows a parameter change even if the input data has
not all been compressed and copied to the application output
buffer, so long as all of the input data has been compressed to
the internal pending output buffer. This also allows an immediate
deflateParams change so long as there have been no deflate calls
since initialization or reset.
Warner Losh [Sat, 9 Sep 2017 21:33:43 +0000 (21:33 +0000)]
It's been pointed out that init_script at least is useful w/o
re-rooting. Remove deprecation notice for it. init_chroot likely is
still better served with reroot.
Warner Losh [Sat, 9 Sep 2017 20:14:18 +0000 (20:14 +0000)]
Don't build uart_dev_mvebu unless we're on arm64.
This module is specific to a single Marvel board that we currently
only support in 64-bit mode. Remove it from the build otherwise. It
likely should be completely removed, but this unbreaks x86 building.
Sean Bruno [Sat, 9 Sep 2017 19:19:13 +0000 (19:19 +0000)]
r323359 instroduced an ARMv8 only uart(4) device to the tree but placed
the driver in a place where it will be built for all targets. x86 doesn't
have all the required build bits for this device.
Add a vm_page_change_lock() helper, the common code to not relock page
lock if both old and new pages use the same underlying lock. Convert
existing places to use the helper instead of inlining it. Use the
optimization in vm_object_page_remove().
Suggested and reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Only search the scope ID in ip6_find_dev() for IPv6 addresses which
have a scope ID. Change size of the searched scope ID to the full
16-bits. There can typically be more than 255 interfaces.
Marcin Wojtas [Sat, 9 Sep 2017 11:54:04 +0000 (11:54 +0000)]
Add support for Armada 3700 in the NETA driver
This patch enables using NETA driver on Marvell Armada 3700 SoC
by introducing new compatible string, modifying clock source
obtaining and also excluding unnecessary parts.
The driver is added as a build option for arm64 platforms as well.
Marcin Wojtas [Sat, 9 Sep 2017 11:49:36 +0000 (11:49 +0000)]
Store virtual address of buffer in mvneta_rx_ring
Now the virtual address of received buffer is taken from a software ring.
Thanks to this, we can use the NETA driver on 64 bits architecture and
avoid 32-bit buf_cookie descriptor field limitation.
Marcin Wojtas [Sat, 9 Sep 2017 11:06:58 +0000 (11:06 +0000)]
Add support for Armada 3700 EHCI
This patch reuses ehci_mv driver by adding a support for the new
compatible string and adding ehci_mv.c to list of available options
for arm64 platforms.
FreeBSD note: rather than merging the zpool.8 update I copied the zpool
scrub section from the illumos zpool.1m to FreeBSD zpool.8 almost
verbatim. Now that the illumos page uses the mdoc format, it was an
easier option. Perhaps the change is not in perfect compliance with the
FreeBSD style, but I think that it is acceptible.
https://www.illumos.org/issues/8414
This issue tracks the port of scrub pause from ZoL: https://github.com/zfsonlinux/zfs/pull/6167
Currently, there is no way to pause a scrub. Pausing may be useful when
the pool is busy with other I/O to preserve bandwidth.
Description
This patch adds the ability to pause and resume scrubbing. This is achieved
by maintaining a persistent on-disk scrub state. While the state is 'paused'
we do not scrub any more blocks. We do however perform regular scan
housekeeping such as freeing async destroyed and deadlist blocks while paused.
Motivation and Context
Scrub pausing can be an I/O intensive operation and people have been asking
for the ability to pause a scrub for a while. This allows one to preserve scrub
progress while freeing up bandwidth for other I/O.
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Alek Pinchuk <apinchuk@datto.com>
Marcin Wojtas [Sat, 9 Sep 2017 10:54:13 +0000 (10:54 +0000)]
Add support for xhci in Armada 3700 and 7k/8k
This driver will be used by Marvell Armada 3700 and 7k/8k SoC families.
The same, generic xhci device also appears in Armada 380, so we are reusing
driver.
This patch also adds xhci_mv.c entry to the arm64 files list.
Resolve IPv6 scope ID issues when using ip6_find_dev() in the LinuxKPI.
Workaround problem that ifa_ifwithaddr() also matches the scope ID of
the IPv6 address when searching for a maching IPv6 address. For now
simply try all valid scope IDs until a match is found.
Properly implement poll_wait() in the LinuxKPI. This prevents direct
use of the linux_poll_wakeup() function from unsafe contexts, which
can lead to use-after-free issues.
Instead of calling linux_poll_wakeup() directly use the wake_up()
family of functions in the LinuxKPI to do this.
Bump the FreeBSD version to force recompilation of external kernel modules.
Enji Cooper [Sat, 9 Sep 2017 06:24:21 +0000 (06:24 +0000)]
Unbreak :broken_pipe
- Capture exit code in pipeline and test in output.
- Drop awk use in favor of `sleep 2`. This helps guarantee the EPIPE
behavior without the potential race.
cxgbe(4): Fix a couple of problems in the sge_wrq data path.
- start_wrq_wr must not drain the wr_list if there are incomplete_wrs
pending. This can happen when a t4_wrq_tx runs between two
start_wrq_wr.
- commit_wrq_wr must examine the cookie's pidx and ndesc with the
queue's lock held. Otherwise there is a bad race when incomplete WRs
are being completed and commit_wrq_wr for the WR that is ahead in the
queue updates the next incomplete WR's cookie's pidx/ndesc but the
commit_wrq_wr for the second one is using stale values that it read
without the lock.
Gordon Tetlow [Sat, 9 Sep 2017 03:09:02 +0000 (03:09 +0000)]
The purge option hasn't been implemented since 1994 when we imported this
code. I think it is safe to say it's not going to be. I'm also working to
de-orbit catman, so remove the reference in the manpage.
Add P5021 and P5040 conditions for LAW count check.
P5040/P5021 have the same number of LAWs as P5020. There may be a better way of
getting the count from the FDT (fsl,num-laws property on soc/corenet-law or
soc/ecm-law), but that's not supported everywhere, so we still need this check
for those other cases.
These processors may not be supported yet, but add them for completion.
POWER9 is planned for support. e300 may work (based on 603e core).
P5040/P5021 are similar to P5020, so should work as well. One addition is
needed for P5040, to support the number of LAWs, and will be a separate commit.
In integrity mode, a larger logical sector (e.g., 4096 bytes) spans several
physical sectors (e.g., 512 bytes) on the backing device. Due to hash
overhead, a 4096 byte logical sector takes 8.5625 512-byte physical sectors.
This means that only 288 bytes (256 data + 32 hash) of the last 512 byte
sector are used.
The memory allocation used to store the encrypted data to be written to the
physical sectors comes from malloc(9) and does not use M_ZERO.
Previously, nothing initialized the final physical sector backing each
logical sector, aside from the hash + encrypted data portion. So 224 bytes
of kernel heap memory was leaked to every block :-(.
This patch addresses the issue by initializing the trailing portion of the
physical sector in every logical sector to zeros before use. A much simpler
but higher overhead fix would be to tag the entire allocation M_ZERO.
Enhance qpi.c to make it usable on all Core-microarchitecture Xeons.
Scan all buses for CSR bus, not stopping on the first failed
match. Scan all slots for function 0 on the found bus, for instance on
IvyBridge the slot 0 is not decoded at all. Since the scan is quite
unsafe, and access to the buses is mostly useful for developers,
enable the csr buses scan with the tunable.
Current qpi.c makes too many assumptions about the uncore
configuration buses location and about slots occupied. Also it
restricts itself only to Nehalem CPUs. It is needed on all Core-based
Xeons. On the 2600 v2 (IvyBridge) machine I have access to, the CSR
buses have numbers 31 (BSP socket) and 63 (second socket), and there
is no functions pci0.31.0.0 or pci0.63.0.0. According to the CPU
datasheet, all devices on the uncore bus occupy slots >= 8.
Practically, the attach to config buses is required for the intel-pcm
pcm-memory.x tool to work, for instance.
Use IOAPIC PCI rid as the interrupt TLP source id for DMAR interrupt
remapping.
VT-d specification requires use of PCI rid as source id for IOAPICs
enumerated by PCI bus. The values from the DMAR ACPI table should be
only used when IOAPIC is not on PCI.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Hardware provided by: Intel
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12205
Add an ioapic_get_rid() function to obtain PCIe TLP requester-id for
the interrupt messages from given IOAPIC, if the IOAPIC can be
enumerated on PCI bus.
If IOAPIC has PCI binding, match the PCI device against MADT
enumerated IOAPIC. Match is done first by registers window physical
address, then by IOAPIC ID as read from the APIC ID register.
PCI bsf address of the matched PCI device is the rid.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Hardware provided by: Intel
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12205
Scott Long [Fri, 8 Sep 2017 17:40:29 +0000 (17:40 +0000)]
As with r323317, hold off on releasing the intrhook during boot until
we're ready to accept probing from GEOM. Untested, but the pattern is
the same as with aac.
Scott Long [Fri, 8 Sep 2017 16:52:59 +0000 (16:52 +0000)]
Move the intrhook release to later in the function so that GEOM knows to wait longer
for possible root devices to come online. This fixes a race that seems to be
triggered by EARLY_AP_STARTUP.
Audit userspace geom code for leaking memory to disk
Any geom class using g_metadata_store, as well as geom_virstor which
duplicated g_metadata_store internally, would dump sectorsize - mdsize bytes
of userspace memory following the metadata block stored. This is most or all
geom classes (gcache, gconcat, geli, gjournal, glabel, gmirror, gmultipath,
graid3, gshsec, gstripe, and geom_virstor).