Marcin Wojtas [Thu, 10 Aug 2017 13:45:56 +0000 (13:45 +0000)]
Enable OF_setprop API function to add property in FDT
This patch modifies function ofw_fdt_setprop (called by OF_setprop),
so that it can add property, when replacing is not possible.
Adding property is needed to fixup FDT's that have missing
properties.
Glen Barber [Thu, 10 Aug 2017 13:32:04 +0000 (13:32 +0000)]
Further revise r322327 and r322352 in release/packages/kernel.ucl.
Use PPID and PID to kill off the pre-install and parent pkg(8)
processes unless 'Y' or 'y' are entered at the prompt if the user
wants to proceed with upgrading the kernel and userland at the same
time.
This restores some of the logic and intent of r322327, with the
caveat of printing "child process terminated unexpectedly."
MFC after: 5 days
MFC with: r322327, r322352
Sponsored by: The FreeBSD Foundation
Fixes for wait event in the LinuxKPI. These are regression issues
after r319757.
1) Correct the return value from __wait_event_common() from 1 to 0 in
case the timeout is specified as MAX_SCHEDULE_TIMEOUT. In the other
case __ret is zero and will be substituted in the last part of the
macro with the appropriate value before return.
2) Make sure the "timeout" argument is casted to "int" before
evaluating negativity. Else the signedness of a "long" might be
checked instead of the signedness of an integer.
3) The wait_event() function should not have a return value.
Make sure the linux_wait_event_common() function in the LinuxKPI properly
handles a timeout value of MAX_SCHEDULE_TIMEOUT which basically means there
is no timeout. This is a regression issue after r319757.
While at it change the type of returned variable from "long" to "int" to
match the actual return type.
Glen Barber [Thu, 10 Aug 2017 12:30:34 +0000 (12:30 +0000)]
Revise part of r322327 in release/packages/kernel.ucl.
It appears I misunderstand process forking and signal handling in
how the pre-/post-install scripts are executed internally by pkg(8).
In some cases (not all), ^C when prompted to cancel the kernel
package update will stop the pre-install script from executing, but
allow pkg(8) to continue extracting the package when it is not the
intent.
In order to keep somewhat of an anti-footshooting measure in place,
print the recommendation to install the kernel package first if
ASSUME_ALWAYS_YES is false and TERM is set, then sleep for 5 seconds
to allow the user to see the message.
MFC after: 5 days
MFC with: r322327
X-MFC-Note: Maybe not until I am happy with this..
Sponsored by: The FreeBSD Foundation
Roger Pau Monné [Thu, 10 Aug 2017 09:16:40 +0000 (09:16 +0000)]
x86: bump MAX_APIC_ID to 512
Introduce a new define to take int account the xAPIC ID limit, for
systems where x2APIC is not available/reliable.
Also change some of the usages of the APIC ID to use an unsigned int
(which is the correct storage type to deal with x2APIC IDs as found in
x2APIC MADT entries).
This allows booting FreeBSD on a box with 256 CPUs and APIC IDs up to
295:
Roger Pau Monné [Thu, 10 Aug 2017 09:16:03 +0000 (09:16 +0000)]
x86: make the arrays that depend on MAX_APIC_ID dynamic
So that MAX_APIC_ID can be bumped without wasting memory.
Note that the usage of MAX_APIC_ID in the SRAT parsing forces the
parser to allocate memory directly from the phys_avail physical memory
array, which is not the best approach probably, but I haven't found
any other way to allocate memory so early in boot. This memory is not
returned to the system afterwards, but at least it's sized according
to the maximum APIC ID found in the MADT table.
Roger Pau Monné [Thu, 10 Aug 2017 09:15:18 +0000 (09:15 +0000)]
apic_enumerator: only set mp_ncpus and mp_maxid at probe cpus phase
Populate the lapics arrays and call cpu_add/lapic_create in the setup
phase instead. Also store the max APIC ID found in the newly
introduced max_apic_id global variable.
This is a requirement in order to make the static arrays currently
using MAX_LAPIC_ID dynamic.
Ryan Libby [Wed, 9 Aug 2017 22:58:42 +0000 (22:58 +0000)]
Pick 'Remove external linkage for spin_adaptive' from upstream jemalloc
Apply the changes from upstream jemalloc 048c6679. This is actually not
quite a cherry pick due to makefile difference and because FreeBSD does
not carry the msvc project files which were also modified in that
commit.
Ryan Libby [Wed, 9 Aug 2017 20:13:49 +0000 (20:13 +0000)]
i386/boot2: -fno-asynchronous-unwind-tables for gcc
The amd64 build of boot2 was failing with gcc 6.3.0 due to being more
than 1 kB too large. It was apparently generating a .eh_frame section
which was not being removed by objcopy -S. The .eh_frame section seems
to be mandatory per the amd64 ABI, but boot2 is compiled for i386 (uses
-m32), and therefore should be optional in this context. Suppress
generation of .eh_frame with the -fno-asynchronous-unwind-tables flag to
gcc. This saves 1348 bytes (the limit is 7680 bytes).
key_msg2sp() is used for parsing data from setsockopt(IP[V6]_IPSEC_POLICY)
call. This socket option is usually used to configure IPsec bypass for
socket. Only privileged user can set this socket option.
The message syntax is described here
http://www.kame.net/newsletter/20021210/
and our libipsec is usually used to create the correct request.
Add additional checks:
* that sadb_x_ipsecrequest_len is not out of bounds of user supplied buffer
* that src/dst's sa_len is the same
* that 2*sa_len is not out of bounds of user supplied buffer
* that 2*sa_len fits into bounds of sadb_x_ipsecrequest
Glen Barber [Wed, 9 Aug 2017 19:16:54 +0000 (19:16 +0000)]
Add a dependency on the kernel package for the runtime package.
The idea here is that, provided upstream pkg(8) maintainers accept
the proposed change, the kernel.ucl will contain a post-install
script causing pkg(8) to emit a message informing to reboot the
system after the kernel is upgraded using 'pkg upgrade', so the
new userland is installed on the running new kernel. At present,
this functionality does not exist in pkg(8), but will help ensure
the upgrade path follows that from UPDATING. To work around this
for now, evaluate ASSUME_ALWAYS_YES, and prompt the user if they
wish to proceed if not set to true.
Since there is a kernel dependency, and a non-GENERIC kernel may
be in use, update Makefile.inc1 to replace '%KERNCONF%' in the
runtime.ucl with the first-built kernel set either via command line
or in make.conf(5).
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Ed Maste [Wed, 9 Aug 2017 19:09:23 +0000 (19:09 +0000)]
lldb: Make i386-*-freebsd expression work on JIT path
* Enable i386 ABI creation for freebsd
* Added an extra argument in ABISysV_i386::PrepareTrivialCall for mmap
syscall
* Unlike linux, the last argument of mmap is actually 64-bit(off_t).
This requires us to push an additional word for the higher order bits.
* Prior to this change, ktrace dump will show mmap failures due to
invalid argument coming from the 6th mmap argument.
Kyle Evans [Wed, 9 Aug 2017 18:15:07 +0000 (18:15 +0000)]
capsicum_helpers: Add FIODTYPE to default ioctls allowed
FIODTYPE will be needed by hexdump(1) to speed up the -s flag on devices
that should be able to support fseek(3); specifically, in an attempt to
correct for the fact that most tape drives don't support seeking yet don't
indicate as such when fseeko(3) is invoked. Related: D10939
Jung-uk Kim [Wed, 9 Aug 2017 18:09:09 +0000 (18:09 +0000)]
Split identify_cpu() into two functions for amd64 as we do for i386. This
reduces diff between amd64 and i386. Also, it fixes a regression introduced
in r322076, i.e., identify_hypervisor() failed to identify some hypervisors.
This function assumes cpu_feature2 is already initialized.
Kyle Evans [Wed, 9 Aug 2017 18:06:27 +0000 (18:06 +0000)]
libusb(3): Expose device caps as libusb_bos_descriptor::dev_capability
Some libusb consumers in Linux-land (in this case, libusb4java) expect a
dev_capability member that they can use to enumerate the device
capabilities.
No particular layout is expected of this, just that it can be traversed
using the bLength member until bNumDeviceCapabilities are read and that the
consumer may then use one of the libusb_get_*_descriptor methods to extract
specific (usb 2.0 vs. ss) capability information.
Warner Losh [Wed, 9 Aug 2017 16:15:24 +0000 (16:15 +0000)]
Mark geom classes as deprecated.
geom_bsd, geom_mbr and geom_sunlabel have been obsolete since Marcel
Moolenaar's geom_part was in FreeBSD 7. They haven't been in GENERIC
since FreeBSD 8. Add warning when used.
geom_vol_ffs has been obsolete since ufs support to geom_label was
committed in FreeBSD 5. It hasn't been in GENERIC since FreeBSD 5.
Add warning when used.
geom_fox has been obsolete since gmultipath was committed in FreeBSD 7.
(no warning added, since this is a very obscure class).
These will all be removed in FreeBSD 12.
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D11935
Add to if_enc(4) ability to capture packets via BPF after pfil processing.
New flag 0x4 can be configured in net.enc.[in|out].ipsec_bpf_mask.
When it is set, if_enc(4) additionally captures a packet via BPF after
invoking pfil hook. This may be useful for debugging.
How network VF works with hn(4) on Hyper-V in transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is attached, the cooresponding hn(4) waits several
seconds to make sure that the network VF attach routing completes, then:
o Set the intersection of the network VF's if_capabilities and the
cooresponding hn(4)'s if_capabilities to the cooresponding hn(4)'s
if_capabilities. And adjust the cooresponding hn(4) if_capable and
if_hwassist accordingly. (*)
o Make sure that the cooresponding hn(4)'s TSO parameters meet the
constraints posed by both the network VF and the cooresponding hn(4).
(*)
o The network VF's if_input is overridden. The overriding if_input
changes the input packet's rcvif to the cooreponding hn(4). The
network layers are tricked into thinking that all packets are
neceived by the cooresponding hn(4).
o If the cooresponding hn(4) was brought up, bring up the network VF.
The transmission dispatched to the cooresponding hn(4) are
redispatched to the network VF.
o Bringing down the cooresponding hn(4) also brings down the network
VF.
o All IOCTLs issued to the cooresponding hn(4) are pass-through'ed to
the network VF; the cooresponding hn(4) changes its internal state
if necessary.
o The media status of the cooresponding hn(4) solely relies on the
network VF.
o If there are multicast filters on the cooresponding hn(4), allmulti
will be enabled on the network VF. (**)
- Once the network VF is detached. Undo all damages did to the
cooresponding hn(4) in the above item.
NOTE:
No operation should be issued directly to the network VF, if the
network VF transparent mode is enabled. The network VF transparent mode
can be enabled by setting tunable hw.hn.vf_transparent to 1. The network
VF transparent mode is _not_ enabled by default, as of this commit.
The benefit of the network VF transparent mode is that the network VF
attachment and detachment are transparent to all network layers; e.g. live
migration detaches and reattaches the network VF.
The major drawbacks of the network VF transparent mode:
- The netmap(4) support is lost, even if the VF supports it.
- ALTQ does not work, since if_start method cannot be properly supported.
(*)
These decisions were made so that things will not be messed up too much
during the transition period.
(**)
This does _not_ need to go through the fancy multicast filter management
stuffs like what vlan(4) has, at least currently:
- As of this write, multicast does not work in Azure.
- As of this write, multicast packets go through the cooresponding hn(4).
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11803
Kirk McKusick [Wed, 9 Aug 2017 05:21:57 +0000 (05:21 +0000)]
Add an entry to UPDATING for r322297 which restores the ability
of fsck to automatically find alternate superblocks when the
standard one is trashed or unavailable.
Kirk McKusick [Wed, 9 Aug 2017 05:17:21 +0000 (05:17 +0000)]
Since the switch to GPT disk labels, fsck for UFS/FFS has been
unable to automatically find alternate superblocks. This checkin
places the information needed to find alternate superblocks to the
end of the area reserved for the boot block.
Filesystems created with a newfs of this vintage or later will
create the recovery information. If you have a filesystem created
prior to this change and wish to have a recovery block created for
your filesystem, you can do so by running fsck in forground mode
(i.e., do not use the -p or -y options). As it starts, fsck will
ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should
answer yes.
Alan Cox [Wed, 9 Aug 2017 04:23:04 +0000 (04:23 +0000)]
Introduce vm_page_grab_pages(), which is intended to replace loops calling
vm_page_grab() on consecutive page indices. Besides simplifying the code
in the caller, vm_page_grab_pages() allows for batching optimizations.
For example, the current implementation replaces calls to vm_page_lookup()
on consecutive page indices by cheaper calls to vm_page_next().
Marcin Wojtas [Wed, 9 Aug 2017 01:25:47 +0000 (01:25 +0000)]
Enable pl310 coherent operation in platform init for Armada 38x
Updating PL310 sotfware context sc_io_coherent field in
platform_pl310_init() routine for Armada 38x helps to avoid
using 'arm,io-coherent' property, which is by default not present
in the device tree node in Linux.
This way another step for DT unification between two operating
systems is done. The improvemnt will also work after enabling
PLATFORM for Marvell ARMv7 SoCs.
Marcin Wojtas [Wed, 9 Aug 2017 01:20:53 +0000 (01:20 +0000)]
Remove clock-frequency properties from Armada 38x timer nodes
Since the timers' base frequency setting is added to the platform code,
this patch removes clock-frequency properties from global
and twd timers, aligning both to the Linux device tree.
Marcin Wojtas [Wed, 9 Aug 2017 01:14:29 +0000 (01:14 +0000)]
Dynamically configure timers' base frequency for Armada 38x
Instead of using 'clock-frequency' device tree property for global/twd
mpcore timers of Armada 38x SoCs, set it in platform_late_init stage
with arm_tmr_change_frequency() function.
Marcin Wojtas [Wed, 9 Aug 2017 01:06:40 +0000 (01:06 +0000)]
Enable using ofw_bus_find_compatible in early platform code
Before this patch function ofw_bus_find_compatible was using
memory allocations in order to find compatible node and the property's
length. This way there was always a suited buffer for property,
however this approach had also disadvantages - ofw_bus_find_compatible
couldn't be used when malloc is not available, e.g. during fdt fixup stage.
In order to remove the usage limitation of ofw_bus_find_compatible(),
this patch modifies the function to use ofw_bus_node_is_compatible()
(instead of the one without _int suffix), which uses a fixed
buffer on stack instead of dynamic allocations.
Kyle Evans [Wed, 9 Aug 2017 01:04:36 +0000 (01:04 +0000)]
regex(3): Refactor fast/slow stepping bits in the matching engine
Adding features for matching is fairly straightforward, but this requires
some duplication because of this fast/slow setup. They can be fairly
trivially combined into a single walk(), so do it to make future additions
less error prone.
Marcin Wojtas [Wed, 9 Aug 2017 00:56:29 +0000 (00:56 +0000)]
Add support for "compatible" parameter in ofw_fdt_fixup
Sometimes it's convenient to provide fixup to many boards
that use the same SoC family (eg. Marvell Armada 38x).
Instead of putting multiple entries in fdt_fixup_table,
use one entry which refers to all boards with given SoC.
Marcin Wojtas [Wed, 9 Aug 2017 00:45:25 +0000 (00:45 +0000)]
Enable parsing simple-bus 'ranges' with multiple entries
This patch makes possible to boot with up to 8 ranges in soc.
Dynamic allocation cannot be used, because ftd_get_ranges
function is called early, when malloc is not available.
Change is required for the alignment of Marvell Armada 38x
device trees present in sys/gnu/dts/arm - originally
the platform has 6 entries in simple-bus 'ranges'.
Ian Lepore [Tue, 8 Aug 2017 22:58:34 +0000 (22:58 +0000)]
Remove the ds133x and s35390a i2c RTC drivers for now. They both do i2c
transfers in their probe() or attach() routines, and that doesn't work
when the low-level controller requires interrupts to be functional.
The DS133x family of chips is nearly identical to the DS1307 and support
for them should be added to that driver, then the ds133x driver can be
deleted. The s35390a driver just needs a non-trivial workover. In both
cases that work will be done and committed separately.
Kristof Provost [Tue, 8 Aug 2017 21:09:26 +0000 (21:09 +0000)]
pf_get_sport(): Prevent possible endless loop when searching for an unused nat port
This is an import of Alexander Bluhm's OpenBSD commit r1.60,
the first chunk had to be modified because on OpenBSD the
'cut' declaration is located elsewhere.
Upstream report by Jingmin Zhou:
https://marc.info/?l=openbsd-pf&m=150020133510896&w=2
OpenBSD commit message:
Use a 32 bit variable to detect integer overflow when searching for
an unused nat port. Prevents a possible endless loop if high port
is 65535 or low port is 0.
report and analysis Jingmin Zhou; OK sashan@ visa@
Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c
Warner Losh [Tue, 8 Aug 2017 20:44:16 +0000 (20:44 +0000)]
Fail to open efirt device when no EFI on system.
libefivar expects opening /dev/efi to indicate if the we can make efi
runtime calls. With a null routine, it was always succeeding leading
efi_variables_supported() to return the wrong value. Only succeed if
we have an efi_runtime table. Also, while I'm hear, out of an
abundance of caution, add a likely redundant check to make sure
efi_systbl is not NULL before dereferencing it. I know it can't be
NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.
Alexander Motin [Tue, 8 Aug 2017 19:36:34 +0000 (19:36 +0000)]
Fix few issues of LinuxKPI workqueue.
LinuxKPI workqueue wrappers reported "successful" cancellation for works
already completed in normal way. This change brings reported status and
real cancellation fact into sync. This required for drm-next operation.
John Baldwin [Tue, 8 Aug 2017 17:49:57 +0000 (17:49 +0000)]
Fix a NULL pointer dereference in mly_user_command().
If mly_user_command fails to allocate a command slot it jumps to an 'out'
label used for error handling. The error handling code checks for a data
buffer in 'mc->mc_data' to free before checking if 'mc' is NULL. Fix by
just returning directly if we fail to allocate a command and only using
the 'out' label for subsequent errors when there is actual cleanup to
perform.
Alan Somers [Tue, 8 Aug 2017 16:14:31 +0000 (16:14 +0000)]
Make p1003_1b.aio_listio_max a tunable
p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
Its value will be bounded from below by the compile-time constant
AIO_LISTIO_MAX and from above by the compile-time constant
MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.
Fix logic error in the the assert, causing the condition to be always true.
Also improve the formatting of the corresponding KASSERT message.
Based on the submission by: Svyatoslav <razmyslov@viva64.com>
Found by: PVS-Studio
PR: 217741
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Count drop events due to lack of PCI bandwidth as queue drops and not as
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.
The m_defrag() function can only defrag mbuf chains which have a valid
mbuf packet header. In r291699 when the mlx4en(4) driver was converted
into using BUSDMA(9), the call to m_defrag() was moved after the part
of the transmit routine which strips the header from the mbuf chain.
This effectivly disabled the mbuf defrag mechanism and such packets
simply got dropped.
This patch removes the stripping of mbufs from a chain and loads all
mbufs using busdma. If busdma finds there are no segments, unload
the DMA map and free the mbuf right away, because that means all
data in the mbuf has been inlined in the TX ring. Else proceed
as usual.
Add a per-ring rounter for the number of defrag attempts and
make sure the oversized_packets counter gets zeroed while at it.
The counters are per-ring to avoid excessive cache misses in the
TX path.
https://www.illumos.org/issues/8373
The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign
a transaction to a transaction group. That seems to be logically
incorrect as writing of the ZIL block does not introduce any new dirty
data. Also, when there is a lot of dirty data, the call can introduce
significant delays into the ZIL commit path, thus affecting all
synchronous writes. Additionally, ARC throttling may affect the ZIL
writing.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/8373
The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign a
transaction to a transaction group.
That seems to be logically incorrect as writing of the ZIL block does not
introduce any new dirty data.
Also, when there is a lot of dirty data, the call can introduce significant
delays into the ZIL commit path,
thus affecting all synchronous writes. Additionally, ARC throttling may affect
the ZIL writing.
We probably need a new mechanism similar to dmu_tx_create_assigned to assign
ZIL transactions.
(Ab)using TXG_WAITED does not seem to be sufficient.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/8491
The zpool checkpoint feature in DxOS added a new field in the uberblock.
The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the
uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279).
As these two changes come from two different sources and once upstreamed and
deployed will introduce an incompatibility with each other we want
to upstream a change that will reserve the padding for both of them so
integration goes smoothly and everyone gets both features.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Olaf Faaland <faaland1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
https://www.illumos.org/issues/8491
The zpool checkpoint feature in DxOS added a new field in the uberblock.
The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the
uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279).
As these two changes come from two different sources and once upstreamed and
deployed will introduce an incompatibility with each other we want
to upstream a change that will reserve the padding for both of them so
integration goes smoothly and everyone gets both features.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Olaf Faaland <faaland1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
https://www.illumos.org/issues/7915
l2arc_evict() is strictly serialized with respect to
l2arc_write_buffers() and l2arc_write_done(). Normally, l2arc_evict()
and l2arc_write_buffers() are called from the same thread, so they can
not be concurrent. Also, l2arc_write_buffers() uses zio_wait() on the
parent zio of all cache zio-s. That ensures that l2arc_write_done()
is completed before l2arc_write_buffers() returns. Finally, if a
cache device is removed, then l2arc_evict() is called under SCL_ALL in
the exclusive mode. That ensures that it can not be concurrent with
the normal L2ARC accesses to the device (including writing and
evicting buffers). Given the above, some checks and actions in
l2arc_evict() do not make sense. For instance, it must never
encounter the write head header let alone remove it from the buffer
list.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/7915
l2arc_evict() is strictly serialized with respect to l2arc_write_buffers() and
l2arc_write_done().
Normally, l2arc_evict() and l2arc_write_buffers() are called from the same
thread, so they can not be concurrent.
Also, l2arc_write_buffers() uses zio_wait() on the parent zio of all cache zio-
s.
That ensures that l2arc_write_done() is completed before l2arc_write_buffers()
returns.
Finally, if a cache device is removed, then l2arc_evict() is called under
SCL_ALL in the exclusive mode.
That ensures that it can not be concurrent with the normal L2ARC accesses to
the device (including writing and evicting buffers).
Given the above, some checks and actions in l2arc_evict() do not make sense.
For instance, it must never encounter the write head header let alone remove it
from the buffer list.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/8126
The sync thread is concurrently modifying dn_phys->dn_nlevels
while dbuf_dirty() is trying to assert something about it, without
holding the necessary lock. We need to move this assertion further down
in the function, after we have acquired the dn_struct_rwlock.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/8126
The sync thread is concurrently modifying dn_phys->dn_nlevels
while dbuf_dirty() is trying to assert something about it, without
holding the necessary lock. We need to move this assertion further down
in the function, after we have acquired the dn_struct_rwlock.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/8067
Add an option to zdb to print a literal embedded block pointer supplied on the
command line:
zdb -E [-A] word0:word1:...:word15
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/8426
abd_copy_from_buf and abd_cmp_buf do not modify their void *buf arguments, so
qualify them with const.
abd_copy_from_buf_off and abd_cmp_buf_off already had that type for the
corresponding arguments.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/8426
abd_copy_from_buf and abd_cmp_buf do not modify their void *buf arguments, so
qualify them with const.
abd_copy_from_buf_off and abd_cmp_buf_off already had that type for the
corresponding arguments.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/7600
At present, the kernel side code seems to blindly rollback to whatever happens
to be the latest snapshot at the time when the rollback task is processed.
The expected target's name should be passed to the kernel driver and the sync
task should validate that the target exists and that it is the latest snapshot
indeed.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/7600
At present, the kernel side code seems to blindly rollback to whatever happens
to be the latest snapshot at the time when the rollback task is processed.
The expected target's name should be passed to the kernel driver and the sync
task should validate that the target exists and that it is the latest snapshot
indeed.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/8377
The problem is that when dsl_bookmark_destroy_check() is executed from open
context (the pre-check), it fills in dbda_success based on the existence of the
bookmark.
But the bookmark (or containing filesystem as in this case) can be destroyed
before we get to syncing context. When we re-run dsl_bookmark_destroy_check()
in syncing
context, it will not add the deleted bookmark to dbda_success, intending for
dsl_bookmark_destroy_sync() to not process it. But because the bookmark is
still in dbda_success
from the open-context call, we do try to destroy it.
The fix is that dsl_bookmark_destroy_check() should not modify dbda_success
when called from open context.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/8377
The problem is that when dsl_bookmark_destroy_check() is executed from open
context (the pre-check), it fills in dbda_success based on the existence of the
bookmark.
But the bookmark (or containing filesystem as in this case) can be destroyed
before we get to syncing context. When we re-run dsl_bookmark_destroy_check()
in syncing
context, it will not add the deleted bookmark to dbda_success, intending for
dsl_bookmark_destroy_sync() to not process it. But because the bookmark is
still in dbda_success
from the open-context call, we do try to destroy it.
The fix is that dsl_bookmark_destroy_check() should not modify dbda_success
when called from open context.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/8378
The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
to zgd_bp, dbuf_write_ready()
could change db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale
BP and that the dbuf it not dirty, so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
db_mtx. We could still see a stale
db_blkptr, but if it is stale then the dirty record will still exist and thus
we won't attempt to nopwrite.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/8378
The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
to zgd_bp, dbuf_write_ready()
could change db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale
BP and that the dbuf it not dirty, so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
db_mtx. We could still see a stale
db_blkptr, but if it is stale then the dirty record will still exist and thus
we won't attempt to nopwrite.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
Andriy Gapon [Tue, 8 Aug 2017 10:43:41 +0000 (10:43 +0000)]
MFV r322221: 7910 l2arc_write_buffers() may write beyond target_sz
FreeBD note: the essence of this change was committed to FreeBSD in
r314274. This commit catches up with differences between what was
committed to FreeBSD and what was committed to OpenZFS, mainly more
logical variable names.
https://www.illumos.org/issues/7910
It seems that the change in issue #6950 resurrected the problem that was
earlier fixed by the change in issue #5219.
Please also see the following FreeBSD bug report:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216178
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>