jhb [Tue, 27 Mar 2018 00:35:35 +0000 (00:35 +0000)]
MFC 330711:
Permit sysctl(8) to set an array of numeric values for a single node.
Most sysctl nodes only return a single value, but some nodes return an
array of values (e.g. kern.cp_time). sysctl(8) understand how to display
the values of a node that returns multiple values (it prints out each
numeric value separated by spaces). However, until now sysctl(8) has
only been able to set sysctl nodes to a single value. This change
allows sysctl to accept a new value for a numeric sysctl node that contains
multiple values separated by either spaces or commas. sysctl(8) parses
this list into an array of values and passes the array as the "new" value
to sysctl(2).
hselasky [Mon, 26 Mar 2018 21:06:23 +0000 (21:06 +0000)]
MFC r330660:
Add call to setup firmware data dump structure during device load in
mlx5core.
Do not consider the inability to create a firmware dump fatal, but
inform about the situation and allow the driver to attach. The device
might not implement the needed VSC, or we might not know the layout of
the registers map. In either case, only firmware dump functionality is
limited, the network operations should be fine.
hselasky [Mon, 26 Mar 2018 21:03:33 +0000 (21:03 +0000)]
MFC r330658:
Fix mlx5en(4) driver to properly call m_defrag().
When the mlx5en(4) driver was converted to using BUSDMA(9) the call to
m_defrag() was moved after the part of the TX routine that strips the
header from the mbuf chain. Before it called m_defrag it first trimmed
off the now-empty mbufs from the start of the chain. This has the side
effect of also removing the head of the chain that has M_PKTHDR set.
m_defrag() will not defrag a chain that does not have M_PKTHDR set,
thus it was effectively never defragging the mbuf chains.
As it turns out, trimming the mbufs in this fashion is unnecessary since
the call to bus_dmamap_load_mbuf_sg doesn't map empty mbufs anyway, so
remove it.
hselasky [Mon, 26 Mar 2018 21:02:20 +0000 (21:02 +0000)]
MFC r330657:
Use vport rather than physical-port MTU in mlx5en(4).
Set and report vport MTU rather than physical MTU,
The driver will set both vport and physical port mtu
and will rely on the query of vport mtu.
SRIOV VFs have to report their MTU to their vport manager (PF),
and this will allow them to work with any MTU they need
without failing the request.
Also for some cases where the PF is not a port owner, PF can
work with MTU less than the physical port mtu if set physical
port mtu didn't take effect.
hselasky [Mon, 26 Mar 2018 21:00:57 +0000 (21:00 +0000)]
MFC r330656:
Use the device unit number for naming the ifnet interface in mlx5en(4).
Currently the ifnet interface is named mceX, where X is a monotonically
incremented value. If the device is reset due to a fatal error, then the
interface name will change. Using the device unit number will keep the
naming consistent across the reset logic.
Submitted by: Matthew Finlay <matt@mellanox.com>
Sponsored by: Mellanox Technologies
hselasky [Mon, 26 Mar 2018 20:59:26 +0000 (20:59 +0000)]
MFC r330653:
Add kernel and userspace code to dump the firmware state of supported
ConnectX-4/5 devices in mlx5core.
The dump is obtained by reading a predefined register map from the
non-destructive crspace, accessible by the vendor-specific PCIe
capability (VSC). The dump is stored in preallocated kernel memory and
managed by the mlx5tool(8), which communicates with the driver using a
character device node.
The utility allows to store the dump in format
<address> <value>
into a file, to reset the dump content, and to manually initiate the
dump.
A call to mlx5_fwdump() should be added at the places where a dump
must be fetched automatically. The most likely place is right before a
firmware reset request.
hselasky [Mon, 26 Mar 2018 20:50:28 +0000 (20:50 +0000)]
MFC r330649:
Add support for per priority flow control, PFC, to mlx5en(4).
Add support for PFC and implement reading the per priority statistics
using the sysctl(8) interface. PFC is used together with VLAN priority
and can be enabled and disabled on a per priority basis.
Global pause frames and PFC are incompatible features and surrounding
logic has been added to warn the user about misconfiguration.
Update relevant mlx5core APIs for PFC configuration.
hselasky [Mon, 26 Mar 2018 20:25:24 +0000 (20:25 +0000)]
MFC r330606:
Implement support for querying the current port rate in mlx5core.
The mlx5ib(4) part will be merged separately.
- Factor out port speed definitions into new port.h header file,
similarly as done in Linux upstream.
- Correct two existing port speed definitions in mlx5en according to
Linux upstream.
hselasky [Mon, 26 Mar 2018 20:08:21 +0000 (20:08 +0000)]
MFC r330600:
Add timeout handle to commands with callback in mlx5core.
The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get firmware response. Add delayed callback timeout work
before posting the command to firmware. In case of real firmware
command completion we will cancel the delayed work. In case of
firmware command timeout the callback timeout handler will be called
and it will simulate firmware completion with timeout error.
hselasky [Mon, 26 Mar 2018 20:06:37 +0000 (20:06 +0000)]
MFC r330599:
Fix potential deadlock in command mode change in mlx5core.
Call command completion handler in case of timeout when working in
interrupts mode. Avoid flushing the commands workqueue after acquiring
the semaphores to prevent a potential deadlock.
Decrease latency by not wrapping the idle loop's potentially lengthy
search for a thread to steal inside a critical section. Since this
allows the search to be preempted, restart the search if preemption
happens since the search results found earlier may no longer be
valid.
Decrease the latency of starting a thread that may be assigned to
this CPU during the search by polling for incoming threads during
the search and switching to that thread instead of continuing the
search.
Test for stale search results and restart the search before going
through the expense of calling tdq_lock_pair(). Retry some tests
after grabbing the locks since things may have changed while waiting
to get both locks.
Eliminate special case handling for stealing from an SMT peer that
uses 1 as the steal threshold. This can only succeed if a thread
has been assigned but our SMT peer has not yet started executing
it. This is quite rare and when it happens the other SMT thread
is generally waiting for the same tdq lock that we hold. Basically
both SMT threads are racing to grab the same spin lock.
Add the kern.sched.always_steal knob from a ULE patch by jeff@.
Incorporate another idea from Jeff's ULE patch. If the sched_switch()
detects that the CPU is about to go idle, try to steal a thread
before switching to the idle thread. Since the search for a thread
to steal has to be done inside a critical section in this context,
limit the impact on latency by adding the knob kern.sched.trysteal_limit
to limit the topological distance of the search and don't restart
the search if we detect stale results. If this search can't find
an stealable thread, the idle loop can do a more complete search.
Also poll for threads being assigned to this CPU during the search
and switch to them instead of continuing the search. This change
is responsibile for the majority of the improvement in parallel
buildworld times.
In sched_balance_group() change the minimum threshold from stealing
a thread from 1 to 2. Poaching a newly assigned thread from a CPU
that is waking up hasn't yet switched to that thread from idle is
likely very rare and is likely to have the same lock race as is
seen when stealing threads in the idle loop. Also use tdq_notify()
to kick the destintation CPU instead of always sending an IPI.
Update a stale comment, the number of transferable threads is not
calculated.
ae [Sun, 25 Mar 2018 03:50:38 +0000 (03:50 +0000)]
MFC r330781:
Update pfkey_open() function to set socket's write buffer size to
128k and receive buffer size to 2MB. In case if system has bigger
default values, do not lower them.
This should partially solve the problem, when setkey(8) returns
EAGAIN error on systems with many SAs or SPs.
ae [Sun, 25 Mar 2018 03:45:02 +0000 (03:45 +0000)]
MFC r330779:
Rework key_sendup_mbuf() a bit:
o count in_nomem counter when we have failed to allocate mbuf for
promisc socket;
o count in_msgtarget counter when we have secussfully sent data to socket;
o Since we are sending messages in a loop, returning error on first fail
interrupts the loop, and all remaining sockets will not receive this
message. So, do not return error when we have failed to send data to ALL
or REGISTERED target. Return error only for KEY_SENDUP_ONE case. Now,
when some socket has overfilled its receive buffer, this will not break
other sockets.
ian [Sun, 25 Mar 2018 02:04:44 +0000 (02:04 +0000)]
MFC r329989, r330044
r329989:
Add support for booting into kdb on arm platforms when the RB_KDB is set
(using "boot -d" at the loader propmt or setting boot_ddb in loader.conf).
Submitted by: Thomas Skibo <thomasskibo@yahoo.com>
Differential Revision: https://reviews.freebsd.org/D14428
r330044:
Add a hw.model sysctl oid for armv6/7 which reports the CPU model, similar
to what other arches (all except riscv and armv4/5) do.
r331123:
Do not overwrite the contents of BIO_WRITE buffers. SPI inherently
transfers data in both directions at once. When writing to the device,
use a dummy buffer for the incoming data, not the same buffer as the
outgoing data. Writes are done in FLASH_PAGE_SIZE chunks, which is only
256 bytes, so just put the dummy buffer into the softc.
r331126:
Remove a pointless KASSERT and reword a comment a bit. The KASSERT tested
for the same condition that the preceeding lines checked for and would have
returned EIO, so the assert could never possibly trigger (sc_sectorsize must
inherently be an integer multiple of FLASH_PAGE_SIZE).
r331129:
Eliminate some unneeded intermediate variables. Eliminate some redundant
parens in shift-and-mask expressions. Reword and reflow some comments.
r331132:
Bugfix: wait for writes/erases to complete after starting them, instead of
before starting them.
Using the wait-before logic would make sense if there was useful time-
consuming work that could be done between the end of one write and the
beginning of the next, but it also requires doing the wait-for-ready before
reading, because a prior write or erase could still be in progress. Reading
is the far more common case, so adding a whole extra bus transaction to
check for ready before each read would soak up any small gains that might be
had from doing async writes.
r331136:
Add sc_parent to the softc and use it in place of device_get_parent() calls
all over the place. Also pass the softc as the arg to all the internal
functions instead of passing a device_t and calling device_get_softc() in
each function.
r331138:
Make all internal routines return an int error status, and check the
status at all call points. Combine the get_status and wait_for_ready
routines, since waiting for ready is the only reason to ever get status.
r331139:
Add support for 4K and 32K erase block sizes. Many of the supported chips
have these flags set in the ident table, but there was no code to support
using the smaller erase sizes.
r331141:
Add the device/chip type to the disk d_descr field, and print more info
about the chip including the erase block size at attach time.
Also add myself to the copyrights since at this point svn blame would point
to me as the culprit for much of this.
ian [Sun, 25 Mar 2018 01:55:17 +0000 (01:55 +0000)]
MFC r331068:
Use EFI RTC capabilities info when registering, add bootverbose diagnostics.
Make some small improvements to the efirtc driver by obtaining the clock
capabilities (resolution and whether the sub-second counters are reset) and
using the info when registering the clock. When the hardware zeroes out the
subsecond info on clock-set, schedule clock updates to happen just before
top-of-second, so that the RTC time is closely in-sync with kernel time.
Also, in the identify() routine, always add the driver if EFI runtime
services are available, then decide in probe() whether to attach the driver
or not. If not attaching and bootverbose is on, say why. All of this is
basically to avoid "silent failure" -- if someone thinks there should be an
efi rtc and it's not attaching, at least they can set bootverbose and maybe
get a clue from the output.
ian [Sun, 25 Mar 2018 01:52:38 +0000 (01:52 +0000)]
MFC r330773, r330778, r330782, r330797
r330773:
Use separate mutexes for atrtc and i8254 locking. Change all the strange
un-function-like RTC_LOCK/UNLOCK macro usage into normal function calls.
Since there is no longer any need to handle register access from a debugger
context, those function calls can just be regular mutex lock/unlock calls.
Requested by: bde
r330778:
Everywhere that multiple registers are accessed in sequence, lock/unlock
just once around the whole group of accesses.
r330782:
Remove MTX_NOPROFILE from atrtc_lock, it was inappropriately copy/pasted
from the i8254 driver when I created separate mutexes for each. The i8254
driver could be the active timecounter, leading to recursion during mutex
profiling, but the atrtc driver cannot be a timecounter, so it isn't needed.
ian [Sun, 25 Mar 2018 01:47:57 +0000 (01:47 +0000)]
MFC r330050:
Initialize all members of vm_page::md_page for armv4/5 systems. This fixes
a hang in SI_SUB_KMEM sysinit, and is apparently required after r323290.
Inspired by the commit message for r323676.
ian [Sun, 25 Mar 2018 01:47:17 +0000 (01:47 +0000)]
MFC r330437-r330438, r330440, r331045
r330437:
Do not stop the loop that configures gpio chipselect pins on the first
error, just ignore pins that don't configure and keep setting up the ones
that do. (But when bootverbose is on, whine about the errors.)
r330438:
Defer attaching the spibus until timers and interrupts are working. The
driver requires interrupts to do transfers, and the drivers for the SPI
devices on the bus quite reasonably expect to be able to do IO while probing
and attaching.
r330440:
Switch imx_gpio to attach at BUS_PASS_INTERRUPT + BUS_PASS_ORDER_LATE.
Pretty much any other device might need to manipulate a gpio pin during its
probe or attach routines, so these devices must be available as early as
possible.
The gpio device is an interrupt controller, but I didn't choose the
INTERRUPT pass for that reason (it works fine as an interrupt controller as
long as it attaches any time before interrupts are enabled). That just
looked like the right place in the passes to ensure that it attaches before
any type of device that might need gpio pin manipulations.
sevan [Sun, 25 Mar 2018 01:34:44 +0000 (01:34 +0000)]
MFC r322665
Add caveat to kinfo_getvmmap(3) explaining high CPU utilisation.
Based on kib's reply on https://lists.freebsd.org/pipermail/freebsd-hackers/2016-July/049710.html
sevan [Sun, 25 Mar 2018 01:24:52 +0000 (01:24 +0000)]
MFC 321881
For the udp-client example, instruct user to add an entry for a udp based
service.
For tcp-client & udp-client, use the same port in configuration snippet as used
in the comment prior to remove any ambiguity on the port number which needs to
be specified.
sevan [Sun, 25 Mar 2018 01:03:26 +0000 (01:03 +0000)]
MFC r331274
Extend the description of ALTQ to call it a system which is a framework in
altq(4) to match altq(9). This makes preserving the history section as the
author of ALTQ easier in the history section, rather than calling it a framework
in the description & a system in the history.
Add a history section to altq(4) and extend the history section in altq(9)
ian [Sun, 25 Mar 2018 00:50:27 +0000 (00:50 +0000)]
MFC r330385:
Flag the first interface on a KTLINK FTDI-based jtag+uart device as being
the jtag port, so that a tty is not created for it.
This is based on information in the PR and from the vendor website. When
the PR was first opened we had no facility for flagging the jtag ports. I
stumbled across the still-open PR with the idea of closing it, and noticed
that this wee update was needed.
r310229:
ofw_spi: Parse property for the SPI mode and CS polarity.
As cs is stored in a uint32_t, use the last bit to store the
active high flag as it's unlikely that we will have that much CS.
Reported by: rpokala
Reviewed by: rpokala, adrian
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D8795
r327260:
SPDX: fix wrong license ID tag in dev/spibus.
r329539:
Provide public declarations for ofw_spibus_driver and ofw_spibus_devclass
so other drivers can refer to them in DRIVER_MODULE() decls.
r329544:
Add modules/spi as a gathering point for SPI-related modules, analagous to
modules/i2c for i2c/iicbus modules. Build spibus as a module.
r329545:
Add ofw_bus_if.h to SRCS.
r329546:
Build at45d and mx25l SPI flash drivers as modules.
r329620:
Add missing MODULE_DEPENDS().
r329729:
Remove some files that snuck in via cut and paste.
Having these compiled into the module causes the kobj method descriptors
to be resolved incorrectly (by the compile-time linker instead of the
kernel linker), which then leads to hours of frustrating debugging.
r329911:
Add a functional detach() routine, to make things kldunload-friendly.
r329999:
Add a SPI driver for imx5 and imx6.
It can be compiled into the kernel with "device imx_spi" or loaded as a
module, which is also named "imx_spi".
ian [Sat, 24 Mar 2018 23:03:59 +0000 (23:03 +0000)]
MFC r329537:
Provide a public function to acquire a gpio pin by giving the property name
and index. A private function to do exactly that already existed, so this
renames gpio_pin_get_by_ofw_impl() to gpio_pin_get_by_ofw_propidx() and
provides a declaration for it in a public header.
Previously there were functions to get a pin by property name (assuming
there would only be one pin defined for the name), or by index (asuming
the property has the standard name "gpios"). It turns out there are
devicetree bindings that describe properties with names other than "gpios"
which can describe multiple pins. Hence the need to retrieve the Nth item
from a named property.
r325233:
[i2c/clock] add support for EPSON RTC-8583
RTC-8583 is time-of-day clock used in some SOHO routers. This clock has
only 2 bits for year values, but thanks to user SRAM it's possible to save
year value and keep it up to date via driver code.
Tested on Planex_MZK-W300NAG (SoC is RT2880)
Submitted by: Hiroki Mori <yamori83@yahoo.co.jp>
Differential Revision: https://reviews.freebsd.org/D12833
r328956:
Use const pointers for input data not modified by clock utility functions.
r329170:
Replace the existing print_ct() private debugging function with a set of
three public functions to format and print the three major data structures
used by realtime clock drivers (clocktime, bcd_clocktime, and timespec).
r329172:
Add a set of convenience routines for RTC drivers to use for debug output,
and a debug.clock_show_io sysctl to control debugging output.
r329173:
Add a new sysctl, debug.clock_do_io, to allow manully triggering a one-shot
read or write of all registered realtime clocks. In the read case, the
values read are simply discarded. For writes, there's no alternative but
to actually write the current system time to the device.
r329224:
Fix bad indentation. Whitespace only, no functional changes.
Reported by: bde@
r330403:
Add calls to the new clock_dbgprint_xxx() functions.
r330404:
Add calls to the new clock_dbgprint_xxx() functions.
r330405:
Oops, fix a paste-o.
r330406:
Add calls to the new clock_dbgprint_xxx() functions.
r330407:
Add calls to the new clock_dbgprint_xxx() functions. Also, stop applying
a local .5 second adjustment to the time, since that is now done by the
code in subr_rtc.c.
r330411:
Convert to the new(ish) bcd_clocktime conversion functions, add calls to
the new debug output functions, and when setting the clock, propagate the
timespec nsecs to the 1/100ths register.
r330412:
Build iicbus/rtc8583 as a module.
r330416:
The year is stored in a single byte in sram, in binary format, as a count
of years since the century, so strip the century out when converting to or
from bcd_clocktime format (the conversion routines will infer century by
pivoting on 70).
r330430:
Switch to the new bcd_clocktime conversion routines, and add calls to the
new clock_dbgprint_xxx() functions.
r330431:
Switch to the new bcd_clocktime conversion routines, and add calls to the
new clock_dbgprint_xxx() functions.
r330433:
Switch to the new bcd_clocktime conversion routines, and add calls to the
new clock_dbgprint_xxx() functions.
r330528:
Fix a paste-o that broke the build. There is no softc pointer here, just
use the dev arg.
Reported by: Jonathan Looney <jonlooney@gmail.com>
Pointy hat: ian@
r330529:
Build the ds1672 driver as a module. Add a detach() to unregister the rtc.
r330767:
Convert atrtc the new style rtc debugging output. Remove the db show
command handler which provided much the same information. Removing the
possibility of accessing the hardware regs from the debugger context
paves the way for simplifying the locking code in the driver.
r329479:
Do not try to deallocate memory that wasn't allocated (you'd think that
would be safe, but the function also tries to destroy mutexes that never
got created).
I guess this can only happen when imx_ehci_detach() is called on the
error-exit path from imx_ehci_attach(), and that path never got exercised
before today.
r329480:
Don't call sdhci_cleanup_slot() if sdhci_init_slot() never got called.
Also, do callout_init() very early in attach, so that callout_drain()
can be called in detach without worrying about whether it ever got init'd.
r329483:
Fix fallout from the import of fresh dts source files from linux 4.15. It
appears that node names no longer include leading zeroes in the @address
qualifiers, so we have to search for the nodes involved in interrupt fixup
using both flavors of name to be compatible with old and new .dtb files.
(You know you're in a bad place when you're applying a workaround to code
that exists only as a workaround for another problem.)
r329506:
Add a detach method so that this can be a kldunload-friendly module.
r329507:
Build modules specific to imx5/imx6 only when building those kernels.
This adds sys/modules/imx with a SUBDIR makefile to make the whole
collection of modules that are specific to these SoCs. Initially, that
"whole collection" consists of the if_ffec and imx_i2c drivers.
The if_ffec driver is referenced in its existing home in ../ffec rather
than moving it into the imx directory, because it's used by powerpc too,
but it is no longer built for all armv6/7 systems.
The imx_i2c driver is newly added as a module.
r329526:
Allow i2c hardware drivers to declare their own relationships to ofw_iicbus
rather than relying on a set of canned EARLY_DRIVER_MODULE() statements in
the ofw_iicbus source. This means hw drivers will no longer be required to
use one of a few predefined driver names. They will also now be able to
decide themselves if they want to use DRIVER_MODULE or EARLY_DRIVER_MODULE
and to set which pass to attach on for early modules.
Mainly, this adds extern declarations for the driver and devclass variables.
It also renames ofwiicbus_devclass to ofw_iicbus_devclass to be consistant
with the way we use ofw_ prefixes on this stuff.
r329529:
Give the imx_i2c driver its own name, set up its relationship to ofw_iicbus.
Previously it called itself 'iichb' to link up with the EARLY_DRIVER_MODULE
declaration in ofw_iicbus.c.
r329536:
Add the MODULE_DEPEND()s needed so that the kernel linker can resolve all
the symbols at load time when iicbus is not compiled into the kernel.
r329541:
Build ofw_iicbus as a module if OPT_FDT is defined.
r329730:
Add required header files.
Reported by: andreast@
r329841:
Add a missing line continuation.
How many commits does it take to get a simple module makefile working?
Apparently at least three.
Pointy hat to: ian
r329988:
Instead of building ofw_iicbus as a separate module, just compile it in to
the iicbus module for FDT-based systems.
The primary motivation for this is that host controller drivers which
declare DRIVER_MODULE(ofw_iicbus, thisdriver, etc, etc) now only need a
single MODULE_DEPEND(thisdriver, ofw_iicbus) for runtime linking to resolve
all the symbols. With ofw_iicbus and iicbus in separate modules, drivers
would need to declare a MODULE_DEPEND() on both, because symbol lookup is
non-recursive through the dependency chain. Requiring a driver to have
MODULE_DEPENDS() on both amounts to requiring the drivers to understand the
kobj inheritence details of how ofw_iicbus is implemented, which seems like
something they shouldn't have to know (and could even change some day).
Also, this is somewhat analogous to how the drivers get built when compiled
into the kernel. You don't have to ask for ofw_iicbus separately, it just
gets built in along with iicbus when option FDT is in effect.
r330397:
Fix a paste-o: use the IICBUS version constants, not IICBB bitbang driver's.
ian [Sat, 24 Mar 2018 21:27:10 +0000 (21:27 +0000)]
MFC r328345, r328349, r328405, r328407, r328442
r328345:
Reformat indentation to match other imx5/6 register definition headers, and
tweak some comments. No functional changes.
r328349:
Make the trivial imx_soc_family() function an inline in imx_machdep.h.
The imx_machdep.c file is on the fast path to non-existance and this would
be the only thing left in it after some watchdog changes are completed.
r328405:
Minor cleanups... Move DRIVER_MODULE() and other boilerplate stuff to the
bottom of the file, where it is in most imx5/6 drivers. Switch from an RD2
macro using bus_space_read_2() to an inline function using bus_read_2();
likewise for WR2. Use RESOURCE_SPEC_END to end the resource_spec list.
Net effect should be no functional changes.
r328407:
Fix return style in RD2. Remove bogus return value from a void function
in WR2 (I have no idea why that didn't result in a compile error).
r328442:
Add support to the imx5/6 watchdog for the external reset signal. Also, if
the "power down" watchdog used by the ROM boot code is still active when the
regular watchdog is activated, turn off the power-down watchdog.
This adds support for the "fsl,ext-reset-output" FDT property. When
present, that property indicates that a chip reset is accomplished by
asserting the WDOG1_B external signal, which is supposed to trigger some
external component such as a PMIC to ready the hardware for reset (for
example, adjusting voltages from idle to full-power levels), and assert the
POR signal to SoC when ready. To guard against misconfiguation leading to a
non-rebootable system, the external reset signal is backstopped by code
that asserts a normal internal chip reset if nothing responds to the
external reset signal within one second.
ian [Sat, 24 Mar 2018 21:22:28 +0000 (21:22 +0000)]
MFC r328307, r328311-r328312
r328307:
Fix a bug introduced with recursive bus ownership support in r321584.
The recursive ownership support added in r321584 was unconditionally in
effect all the time -- whenever a given i2c slave device instance tried to
lock the i2c bus for exclusive use when it already owned the bus, the call
returned immediately without waiting. However, many i2c slave drivers use
bus ownership to enforce that only a single thread at a time can be using
the slave device. The recursive locking changes broke this use case.
Now there is a new flag, IIC_RECURSIVE, which can be mixed in with the
other flags passed to iicbus_acquire_bus() to allow drivers to indicate
when recursive locking is desired. Using the flag implies that the driver
is managing concurrent access to the device by different threads in some way.
This immediately fixes all existing i2c slave drivers except for the two
i2c RTC drivers which use the recursive locking feature; those will be
fixed in a followup commit.
r328311 and r328312:
Follow changes in r328307 by using new IIC_RECURSIVE flag.
The driver now ensures only one thread at a time is running in the API
functions (clock_gettime() and clock_settime()) by specifically requesting
ownership of the i2c bus without using IIC_RECURSIVE, then it does all IO
using IIC_RECURSIVE so that each individual IO operation doesn't try to
re-acquire the bus.
The other IO done by the driver happens at attach or intr_config_hooks time,
when there can't be multiple threads running with the same device instance.
So, the IIC_RECURSIVE flag can be safely ORed into the wait flags for all IO
done by the driver, because it's all either done in a single-threaded
environment, or protected within a block bounded by explict
iicbus_acquire_bus() and iicbus_release_bus() calls.
'compat' can never be NULL, because the compatible check loop ends when
compat->ocd_str is NULL. This causes ds1307 to attach to any unclaimed i2c
device.
r314936:
Validate values read from the RTC before trying BCD decoding
Submitted by: cem
Reported by: Michael Gmelin <freebsd@grem.de>
Tested by: Oleksandr Tymoshenko <gonzo@bluezbox.com>
Sponsored by: Dell EMC
r325527:
DS1307: Add the mcp7941x enable bit
Summary:
Existing code recognizes the mcp7941x RTC, but this RTC has an
enable bit at the same location as the "Clock Halt" bit on the ds1307, with an
opposite assertion (set == on, whereas CH set == clock stopped). Thus the
current code halts the clock, with no way to enable it.
Reviewed By: ian
Differential Revision: https://reviews.freebsd.org/D12961
r327971:
Add RTC clock conversions for BCD values, with non-panic validation.
RTC clock hardware frequently uses BCD numbers. Currently the low-level
bcd2bin() and bin2bcd() functions will KASSERT if given out-of-range BCD
values. Every RTC driver must implement its own code for validating the
unreliable data coming from the hardware to avoid a potential kernel panic.
This change introduces two new functions, clock_bcd_to_ts() and
clock_ts_to_bcd(). The former validates its inputs and returns EINVAL if any
values are out of range. The latter guarantees the returned data will be
valid BCD in a known format (4-digit years, etc).
A new bcd_clocktime structure is used with the new functions. It is similar
to the original clocktime structure, but defines the fields holding BCD
values as uint8_t (uint16_t for year), and adds a PM flag for handling hours
using AM/PM mode.
PR: 224813
Differential Revision: https://reviews.freebsd.org/D13730 (no reviewers)
r328005:
Convert the x86 RTC driver to use new validated BCD<->timespec conversions.
New common routines were added to kern/subr_clock.c for converting between
calendrical time expressed in BCD and struct timespec. The new functions
return EINVAL on error, as expected when the clock hardware does not provide
valid time.
PR: 224813
Differential Revision: https://reviews.freebsd.org/D13731 (no reviewers)
r328039:
Add static inline rtcin_locked() and rtcout_locked() functions for doing a
related series of operations without doing a lock/unlock for each byte.
Use them when reading and writing the entire set of time registers.
The original rtcin() and writertc() functions which do lock/unlock on each
byte still exist, because they are public and called by outside code.
r328068:
Move some code around and rename a couple variables; no functional changes.
The static atrtc_set() function was called only from clock_settime(), so
just move its contents entirely into clock_settime() and delete atrtc_set().
Rename the struct bcd_clocktime variables from 'ct' to 'bct'. I had
originally wanted to emphasize how identical the clocktime and bcd_clocktime
structs were, but things evolved to the point where the structs are not at
all identical anymore, so now emphasizing the difference seems better.
r328069:
Remove redundant critical_enter/exit() calls. The block of code delimited
by these calls is now protected by a spin mutex (obscured within the
RTC_LOCK/RTC_UNLOCK macros).
Reported by: bde@
r328301:
Switch to using the bcd_clocktime conversion functinos that validate the BCD
data without panicking, and have common code for handling AM/PM mode.
r328302:
Switch to using the bcd_clocktime conversion functions that validate the BCD
data without panicking, and have common code for handling AM/PM mode.
r328303:
Switch to using the bcd_clocktime conversion functions that validate the BCD
data without panicking, and have common code for handling AM/PM mode.
ian [Sat, 24 Mar 2018 19:57:08 +0000 (19:57 +0000)]
MFC r327756-r327758
r327756:
Bugfix: on RTC chips with a 32-bit binary counter, after reading the time,
return immediately rather than falling through to the logic that reads
BCD-encoded time.
r327757:
Bugfix: don't lose the am/pm mode flag when setting the time. Unlike most
RTC chips that have a control register bit for am/pm mode, the DS13xx series
uses one of the high bits in the hour register. Thus, when setting the time
in am/pm mode, the am/pm mode flag has to be ORed into the hour.
r327758:
Convert a collection of unrelated bitwise flags to a collection of boolean
vars in the softc. It makes the code more compact and readable, and
actually uses less memory too.
ian [Sat, 24 Mar 2018 18:42:45 +0000 (18:42 +0000)]
MFC r324186:
Define a single instance of ahci_devclass and reference it from all the
attachment code for various SOCs and busses. Remove all the static and
should-have-been-static and named-differently instances of it.
This should eliminate the recently-grown build warnings about multiple
definitions when building arm kernels.
Add the BSD-licensed diff from OpenBSD, which is optionally built and
installed when WITHOUT_GNU_DIFF is set.
r315051:
Import diff from OpenBSD
Some of the modifications from the previous summer of code has been integrated
Modification for compatibility with GNU diff output has been added
Main difference with OpenBSD:
Implement multiple GNU diff options:
* --ignore-file-name-case
* --no-ignore-file-name-case
* --normal
* --tabsize
* --strip-trailing-cr
Make diff -p compatible with GNU diff
Implement diff -l
Make diff -r compatible with GNU diff
Capsicumize diffing 2 regular files
Add a simple test suite
r315103:
Implement a stub --horizon-lines=NUM for compatibility with GNU diff3
some options of GNU diff3 would call diff with --horizon-lines, rcs is depending
on that.
Reported by: antoine
r315107:
Fix building with recent gcc
Reported by: lwhsu, ngie
r315180:
Readd codes that creates a tmp file for diffing stdout or devices
r315197:
Do not die if cap_rights_limit reports ENOSYS
Reported by: mmel
r315293:
Integrate contrib/netbsd-tests/usr.bin/diff/t_diff.sh in as
.../usr.bin/diff/diff_test
Some minor adjustment needed to be done for :same as it currently
has the test script hardcoded into the test, instead of using an
idiom like $(dirname $0)
Sponsored by: Dell EMC Isilon
r315319:
diff(1): sort long options under -D example in SYNOPSYS
Sponsored by: Dell EMC Isilon
r315590:
diff(1): add --strip-trailing-cr to last example in the SYNOPSIS
This syncs the last example in the SYNOPSIS with the other examples.
r316639:
add a stub --speed-large-files for compatibility with GNU diff
There is no intention to implement it, but lots of scripts/tools using
diff(1) passes GNU diff option
r316959:
Clean up headers declaration
r317187:
Add a regression test for diff -D
r317194:
Implement a basic --changed-group-format
etcupdate(8) requires that option, while GNU diff supports many more variation
of that options, their behaviour beside the simple verion implemented here are
quite inconsistent as such I do not plan to implement those.
The only special keyword supported by this implementation are: %< and %>
%= is not implemented as the documentation of GNU diff says: common lines, but
it actually when tested print the changes from the first file
r317205:
Document all long options
r317206:
Update the TODO list to reflect what has been changed
r317207:
Cross reference pr(1) which diff might call with -l option
r317381:
Fix the following warning from gcc 4.2 in usr.bin/diff:
usr.bin/diff/diffreg.c: In function 'change':
usr.bin/diff/diffreg.c:1085: warning: 'i' may be used uninitialized in this function
This version of gcc is not smart enough to see that 'i' cannot actually
be used unitialized. However, the variable is confusingly re-used, so
it is better to give it another name, and clearly initialize it before
attempting to use it.
r319847:
Add some testcases for `diff --side-by-side` support
These are were created proactively, in anticipation of the support being
fully implemented sometime in the future.
The tests currently fail on ^/head@r319845, however. Expect them to fail.
PR: 219933
Tested with: gdiff
r321076:
Don't emit "diff: diff <options> arguments" when diffing files if
-q is specified.
This improves compatibility with GNU diff.
Found by accident with `diff -Nrq /usr/tests /usr/tests.new | grep Kyuafile`.
Relnotes: yes
r321077:
Add some tests for brief (--brief/-q) format
MFC with: r321076
r321078:
Fix exit status with -rq when there is a file in one directory but not another,
i.e., when print_only is called.
Prior to this change, -rq was always returning 0. After this change it will
return 1 if there is a difference between two directories.
This fixes compatibility with GNU diff and unbreaks backwards compatibility
expectations.
Found when trying to extend diff_test:brief_format_test.
MFC with: r321076, r321077
r321079:
Add tests that exercise -q, like -rq and add tests that test -q like -Nrq
MFC with: r321076, r321077, r321078
r321227:
Use more flexible expression for replacing t_diff in
contrib/netbsd-tests/usr.bin/diff/t_diff.sh with the name of the script via
`basename $0`.
This was a change I forgot to port over from
^/head/gnu/usr.bin/diff/tests/Makefile@r272787.
r326822:
Replace homemade equivalent of tolower(3) by towlower(3)
This will help in the futur making diff -i works with multibyte
Support for WITHOUT_GNU_DIFF and WITHOUT_GNU_GREP, plus manually
regenerated src.conf.5, which seems to have picked up a couple changes
beyond what was in this MFC.
r307656:
Put each SUBDIR on a separate line for ease of maintenance
Additional patches to this file are in progress, and having each SUBDIR
entry on a separate line makes it easier to change the order in which
the patches are reviewed, tested, and applied.
r307659:
Switch gnu/usr.bin/Makefile to SUBDIR.${MK_*} optional subdir style
r307674:
Add knobs to make GNU diff and GNU grep optional
This is added to facilitate experiments building FreeBSD without
copyleft software.
If WITHOUT_GNU_DIFF is set no /usr/bin/diff or /usr/bin/diff3 will
be built.
If WITHOUT_GNU_GREP is set then BSD grep will be installed as
/usr/bin/bsdgrep or /usr/bin/grep, depending on the WITH_BSD_GREP
knob.
Reviewed by: brooks (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: Differential Revision: https://reviews.freebsd.org/D8288
r307675:
Remove trailing whitespace from r307674
r307679:
Build libgnuregex only if necessary for other components
jhb [Fri, 23 Mar 2018 19:30:00 +0000 (19:30 +0000)]
MFC 328909: Always give ELF brands a chance to veto a match.
If a brand provides a header_supported hook, check it when trying to
find a brand based on a matching interpreter as well as in the final
loop for the fallback brand. Previously a brand might reject a binary
via a header_supported hook in one of the earlier loops, but still be
chosen by one of these later loops.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8154
r307737:
Fix few sentence in the man page.
Pointed out by: wblock
r309366:
capsicum_helpers: Squash errors from closed fds
Squash EBADF from closed stdin, stdout, or stderr in caph_limit_stdio().
Any program used during special shell scripts may commonly be forked
from a parent process with closed standard stream. Do the common sense
thing for this common use.
Reported by: Iblis Lin <iblis AT hs.ntnu.edu.tw>
Reviewed by: oshogbo@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8657
r310135:
capsicum_helpers: Add LOOKUP flag
Add a helper routine for opening a directory that is restricted to being
used for opening relative files as stdio streams.
I think this will really help basic adaptation of multi-file programs to
Capsicum. Rather than having each program initialize a rights object and
ioctl/fcntl arrays for their root fd for relative opens, consolidate in the
logical place.
emaste [Fri, 23 Mar 2018 02:37:08 +0000 (02:37 +0000)]
MFC r331333: Fix kernel memory disclosure in drm_infobufs
drm_infobufs() has a structure on the stack, fills it out and copies it
to userland. There are 2 elements in the struct that are not filled out
and left uninitialized. This will leak uninitialized kernel stack data
to userland.
emaste [Fri, 23 Mar 2018 02:33:30 +0000 (02:33 +0000)]
MFC r331339: Correct signedness bug in drm_modeset_ctl
drm_modeset_ctl() takes a signed in from userland, does a boundscheck,
and then uses it to index into a structure and write to it. The
boundscheck only checks upper bound, and never checks for nagative
values. If the int coming from userland is negative [after conversion]
it will bypass the boundscheck, perform a negative index into an array
and write to it, causing memory corruption.
Note that this is in the "old" drm driver; this issue does not exist
in drm2.
Reported by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by: cem
Sponsored by: The FreeBSD Foundation
gonzo [Fri, 23 Mar 2018 01:43:27 +0000 (01:43 +0000)]
Fix VERSATILEPB boot after r331402
r331402 MFCed switch from custom DTS to upstream one. stable/11 version
still has main bus compatibility as "arm,amba-bus" while in Linux and HEAD
it has been changed to "simple-bus". simplebus(4) supports only the latter
so VERSATILEPB boot was broken by r331402. To fix this make manual change
to stable/11 version of versatile-ab.dts
gonzo [Fri, 23 Mar 2018 01:37:31 +0000 (01:37 +0000)]
MFC r316370-r316371
r316370:
[versatilepb] Convert VERSATILEPB kernel to INTRNG and switch to upstream DTB
Scope of this change is somewhat larger than just converting to INTRNG.
The reason for this is that INTRNG support required switching from custom
to upstream DTS because custom DTS didn't have interrup routing information.
This switch caused rewrite of PCI and CLCD drivers and adding SCM module.
List of changes in this commit:
- Enable INTRNG and switch to versatile-pb.dts
- Add SCM driver that controls various peripheral devices like LCD or
PCI controller. Previously registers required for power-up and
configuring peripherals were part of their respective nodes. Upstream
DTS has dedicated node for SCM
- Convert PL190 driver to INTRNG
- Convert Versatile SIC (secondary interrupt controller) to INTRNG
- Refactor CLCD driver to use SCM API to power up and configuration
- Refactor PCI driver to use SCM API to enable controller
- Refactor PCI driver to use interrupt map provided in DTS for
interrupt routing. As a result it fixes broken IRQ routing and
it's no longer required to run QEMU with "-global versatile_pci.broken-irq-mapping=1"
command-line arguments
r316371:
[versatilepb] Fix keyboard driver after switching to upstream DTS
FreeBSD's DTS contained only one PL050 node and driver considered it to
be PS/2 keyboard. In reality PL050 is a PS/2 port that pushes bytes to/from
the periphers connected to it. New DTS contains two nodes and QEMU emulates
keyboard connected to port #0 and mouse connected to port #1. Since there
is no way to say what's connected to port by checking DTS we hardcode
this knowledge in the driver: it assumes keyboard on port #0 and ignores
port #1 altogether.
Also QEMU defaults emulated keyboard to scan code set 2 while driver used
to work with scan code set 1 so when initializing driver make sure keyboard
is switched to scan code set 1
https://www.illumos.org/issues/8081
zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
which uses -WError, the only way to build it is to disable all compiler
warnings. This makes it much harder to detect newly introduced bugs. We should
cleanup all the warnings.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
https://www.illumos.org/issues/6939
Originally created https://smartos.org/bugview/OS-4489
sysevents should be fired in the kernel from ZFS whenever a command
is run that is logged in zpool history.
Example output
Terminal 1
root - gz sunos ~ # zfs create zones/foobar
root - gz sunos ~ # zfs set quota=10g zones/foobar
root - gz sunos ~ # zfs destroy zones/foobar
Terminal 2
root - gz sunos ~ # sysevent EC_zfs
nvlist version: 0
date = 2016-04-28T14:50:08.964Z
vendor = SUNW
publisher = zfs
class = EC_zfs
subclass = ESC_ZFS_history_event
pid = 0
data = (embedded nvlist)
nvlist version: 0
pool_name = zones
pool_guid = 0x40c964e8f9a7a694
history_record = (embedded nvlist)
nvlist version: 0
dsname = zones/foobar
dsid = 0x1525
history internal str =
internal_name = create
history txg = 0x4c4ef3
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Joshua M. Clulow <jmc@joyent.com>
Reviewed by: Josh Wilsdon <jwilsdon@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Alan Somers <asomers@gmail.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Dave Eddy <dave@daveeddy.com>
https://www.illumos.org/issues/6396
LVM = SVM = Solaris Volume Manager
dead code and not using with ZFS based platform.
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
https://www.illumos.org/issues/7446
Since we support whole-disk configuration for boot pool, we also will need
whole disk support with UEFI boot and for this, zpool create should create efi-
system partition.
I have borrowed the idea from oracle solaris, and introducing zpool create -
B switch to provide an way to specify that boot partition should be created.
However, there is still an question, how big should the system partition be.
For time being, I have set default size 256MB (thats minimum size for FAT32
with 4k blocks). To support custom size, the set on creation "bootsize"
property is created and so the custom size can be set as: zpool create B -
o bootsize=34MB rpool c0t0d0
After pool is created, the "bootsize" property is read only. When -B switch is
not used, the bootsize defaults to 0 and is shown in zpool get output with
value ''. Older zfs/zpool implementations are ignoring this property.
https://www.illumos.org/rb/r/219/
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
This commit makes no sense for FreeBSD, that is why I blocked the option,
but it should be good to stay closer to upstream.
https://www.illumos.org/issues/7745
The problem is that consumers of `libZFS_Core` that forget to call
`libzfs_core_init()` before calling any other function of the library
are having a hard time realizing their mistake. The library's internal
file descriptor is declared as global static, which is ok, but it is not
initialized explicitly; therefore, it defaults to 0, which is a valid
file descriptor. If `libzfs_core_init()`, which explicitly initializes
the correct fd, is skipped, the ioctl functions return errors that do
not have anything to do with `libZFS_Core`, where the problem is
actually located.
Even though assertions for that existed within `libZFS_Core` for debug
builds, they were never enabled because the `-DDEBUG` flag was missing
from the compiler flags.
This patch applies the following changes:
1. It adds `-DDEBUG` for debug builds of `libZFS_Core` and `libzfs`,
to enable their assertions on debug builds.
2. It corrects an assertion within `libzfs`, where a function had
been spelled incorrectly (`zpool_prop_unsupported()`) and nobody
knew because the `-DDEBUG` flag was missing, and the preprocessor
was taking that part of the code away.
3. The library's internal fd is initialized to `-1` and `VERIFY`
assertions have been placed to check that the fd is not equal to
`-1` before issuing any ioctl. It is important here to note, that
the `VERIFY` assertions exist in both debug and non-debug builds.
4. In `libzfs_core_fini` we make sure to never increment the
refcount of our fd below 0, and also reset the fd to `-1` when no
one refers to it. The reason for this, is for the rare case that
the consumer closes all references but then calls one of the
library's functions without using `libzfs_core_init()` first, and
in the mean time, a previous call to `open()` decided to reuse
our previous fd. This scenario would have passed our assertion in
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
https://www.illumos.org/issues/7730
antares:root:~# mdb /usr/sbin/zpool
> ::sysbp _exit
> ::run import
pool: data
id: 2093977168778024605
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
https://www.illumos.org/issues/7604
If a zvol has the default setting for the "volblocksize" property, it is
8KB. However, it is displayed as "-" (not present), rather than "8K".
The problem was introduced by:
commit 25228e830e86924a41243343b1de9daf2d7dd43a
Author: Matthew Ahrens <mahrens@delphix.com>
Date: Thu Nov 17 14:37:24 2016 -0800
7571 non-present readonly numeric ZFS props do not have default value
which changed changed get_numeric_property() to indicate that readonly
default properties are not present. However, zfs_prop_readonly() returns
TRUE for both readonly and set-once properties (e.g. volblocksize).
Amusingly, that commit essentially reverted: 6900484 default volblocksize is no longer being reported correctly
from November 2009. However, that change was not correct either; the
correct solution is to only do this check for "truly readonly" (i.e. not
setonce) properties.
$ zfs list -t volume -o name,volblocksize
NAME
VOLBLOCK
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
archive -
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
datafile -
domain0/group-100/appdata_container-101/appdata_windows_timeflow-102/
external -
rpool/dump
128K
rpool/swap
4K
rpool/swap1
===============================================================================
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/7542
libshare keeps a cached copy of the sharetab listing in memory, which can
become out of date if shares are destroyed or created while leaving a libzfs
handle open. This results in a spurious unmounting failure when an NFS share
exists but isn't in the stale libshare cache.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Amdur <matt.amdur@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
https://www.illumos.org/issues/7336
We can run into a problem where we call into zfs_mount, which in turn calls
is_dir_empty, which opens the directory to try and make sure it's empty. The
issue with the current approach is that it holds the directory open while it
traverses it with readdir, which, due to subtle interaction with the Java JVM,
vfork, and exec can cause a tricky race condition resulting in zfs_mount
failures.
The approach to resolving the issue in this patch is to drop the usage of
readdir altogether, and instead rely on the fact that ZFS stores the number of
entries contained in a directory using the st_size field of the stat structure.
Thus, if the directory in question is a ZFS directory, we can check to see if
it's empty by calling stat() and inspecting the st_size field of structure
returned.
===============================================================================
The root cause appears to be an interesting race between vfork, exec, and
zfs_mount's usage of O_CLOEXEC when calling openat. Here's what is going on:
1. We call zfs_mount, and this in turn calls openat to check if the directory
is empty, which results in opening the directory we're trying to mount onto,
and increment v_count.
2. As we're in the middle of reading the directory, vfork is called by the JVM
and proceeds to exec the jspawnhelper utility. As a result of the vfork, we
take an additional hold on the directory, which increments v_count a second
time. The semantics of vfork mean the parent process will wait for the child
process to exit or exec before the parent can continue; at this point the
parent is in the middle of zfs_mount, reading the directory to determine if
it's empty or not.
3. The child process exec-ing jspawnhelper gets to the relvm call within
exec_args (which is called by exec_common). relvm is the function that releases
the parent process, allowing the parent to proceed. The problem is, at this
point of calling relvm, the child hasn't yet called close_exec which is
responsible for closing the file descriptors inherited from the parent process
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Prakash Surya <prakash.surya@delphix.com>
https://www.illumos.org/issues/7233
This fixes a race where one thread is executing zfs_mount() while another
thread forks and execs. If the fork occurs while the directory is open, the
child process will inherit (but not necessarily close immediately) the open fd
for the directory, preventing the mount.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alex Reece <alex@delphix.com>
https://www.illumos.org/issues/7502
Right now ztest executes zdb without -G, so when it has errors, the messages
are often not very helpful:
Executing zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest
zdb: can't open 'ztest': Operation not supported
ztest: '/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache ztest' exit
code 1
With -G, we'd have:
/usr/sbin/amd64/zdb -bccsv -d -U /rpool/tmp/zpool.cache -G ztest
zdb: can't open 'ztest': Operation not supported
ZFS_DBGMSG(zdb):
spa_open_common: opening ztest
spa_load(ztest): LOADING
spa_load(ztest): FAILED: unable to parse config [error=48]
spa_load(ztest): UNLOADING
Which indicates where the error came from
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
https://www.illumos.org/issues/7812
This change removes all gendered language that did not refer specifically
to an individual person or pet. The convention taken was to use
variations on "they" when referring to users and/or human beings, while
using "it" when referring to code, functions, and/or libraries.
Additionally, we took the liberty to fix up any whitespace issues that
were found in any files that were already being modified.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Daniel Hoffman <dj.hoffman@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Author: Prakash Surya <prakash.surya@delphix.com>
https://www.illumos.org/issues/8081
zdb(8) is full of minor problems that generate compiler warnings. On FreeBSD,
which uses -WError, the only way to build it is to disable all compiler
warnings. This makes it much harder to detect newly introduced bugs. We should
cleanup all the warnings.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Alan Somers <asomers@gmail.com>
https://www.illumos.org/issues/8502
The code in lib/libzfs/common/libzfs_mount.c already basically handles
the case when libshare is not installed. We just need to not fail in
zfs_init_libshare_impl. I tested this in lx and things work as
expected. I also tested there trying to set sharenfs and sharesmb on
the delegated dataset. Neither is allowed from within a zone. The
spew of msgs from a native zone is not ZFS specific. I see the same
spew simply running the share command.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Jerry Jelinek <jerry.jelinek@joyent.com>