Based on the discovery that every unmap waits for the commit of the txn to the ZIL,
introducing a very high latency to unmap commands, this behavior was made into a
tunable zvol_unmap_sync_enabled and set to false. The net impact of this change is
that by default SCSI unmap commands will result in space being freed within the zvol
(today they are ignored and returned with good status). However, unlike the code
today, instead of 18+ms per unmap, they take about 30us.
With the testing done on NTFS against a Win2k12 target, the new behavior should work
seamlessly. Files on the zvol that have already been set with the zfree application
will continue to write 0's when deleted, and any new files created since zvol
creation will send unmap commands when deleted. This behavior exists today, but with
this change the unmap commands will be processed and result in reclaim of space.
Author: Stephen Blinick <stephen.blinick@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Based on the discovery that every unmap waits for the commit of the txn to the ZIL,
introducing a very high latency to unmap commands, this behavior was made into a
tunable zvol_unmap_sync_enabled and set to false. The net impact of this change is
that by default SCSI unmap commands will result in space being freed within the zvol
(today they are ignored and returned with good status). However, unlike the code
today, instead of 18+ms per unmap, they take about 30us.
With the testing done on NTFS against a Win2k12 target, the new behavior should work
seamlessly. Files on the zvol that have already been set with the zfree application
will continue to write 0's when deleted, and any new files created since zvol
creation will send unmap commands when deleted. This behavior exists today, but with
this change the unmap commands will be processed and result in reclaim of space.
Author: Stephen Blinick <stephen.blinick@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Mariusz Zaborski [Sat, 25 Feb 2017 18:14:32 +0000 (18:14 +0000)]
Remove unused macro from common/drv.c.
When we was compering it to code from boot2 it also looks like
this code is buggy and boot2 was never updated to use this code.
USE_XREAD flag is unused in boot2, and common/drv.c was never
build with that flag.
Andriy Gapon [Sat, 25 Feb 2017 16:45:53 +0000 (16:45 +0000)]
zfs: call spa_deadman on a taskqueue thread
callout(9) prohibits callout functions from sleeping.
illumos mutexes are emulated using sx(9).
spa_deadman() calls vdev_deadman() and the latter acquires vq_lock.
As a result we can get a more confusing panic instead of a specific
panic or no panic:
sleepq_add: td 0xfffff80019669960 to sleep on wchan 0xfffff8001cff4d88 with sleeping prohibited
This change adds another level of indirection where the deadman
callout schedules spa_deadman() to be executed on taskqueue_thread.
While there, use callout_schedule(0 instead of callout_reset()
in spa_sync().
Andriy Gapon [Sat, 25 Feb 2017 16:39:21 +0000 (16:39 +0000)]
call vm_lowmem hook in uma_reclaim_worker
A comment near kmem_reclaim() implies that we already did that.
Calling the hook is useful, because some handlers, e.g. ARC,
might be able to release significant amounts of KVA.
Now that we have more than one place where vm_lowmem hook is called,
use this change as an opportunity to introduce flags that describe
a reason for calling the hook. No handler makes use of the flags yet.
The fsid of zfs filesystems might change after reboot or remount. The problem seems to
be caused by a race between unique_insert() and unique_remove(). The unique_remove()
is called from dsl_dataset_evict() which is now an asynchronous thread. In a case the
dsl_dataset_evict() thread is very slow and calls unique_remove() too late we will end
up with changed fsid on zfs mount.
This problem is very likely caused by #5056.
Steps to Reproduce
Note: I'm able to reproduce this always on a single core (virtual) machine. On multicore
machines it is not so easy to reproduce.
Impact
The persistent fsid (filesystem id) is essential for proper NFS functionality.
If the fsid of a filesystem changes on remount (or after reboot) the NFS
clients might not be able to automatically recover from such event and the
manual remount of the NFS filesystems on every NFS client might be needed.
Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Alexander Motin [Sat, 25 Feb 2017 14:20:30 +0000 (14:20 +0000)]
Reenable CTL_WITH_CA, optimizing it for lower memory usage.
This code was disabled due to its high memory usage. But now we need this
functionality for cfumass(4) frontend, since USB MS BBB transport does not
support autosense.
Thread might create a condition for delayed SU cleanup, which creates
a reference to the mount point in td_su, but exit without returning
through userret(), e.g. when terminating due to single-threading or
process exit. In this case, td_su reference is not dropped and mount
point cannot be freed.
Handle the situation by clearing td_su also in the thread destructor
and in exit1(). softdep_ast_cleanup() has to receive the thread as
argument, since e.g. thread destructor is executed in different
context.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Warner Losh [Sat, 25 Feb 2017 06:11:59 +0000 (06:11 +0000)]
Convert PCIe Hot Plug to using pci_request_feature
Convert PCIe hot plug support over to asking the firmware, if any, for
permission to use the HotPlug hardware. Implement pci_request_feature
for ACPI. All other host pci connections to allowing all valid feature
requests.
Warner Losh [Sat, 25 Feb 2017 06:11:36 +0000 (06:11 +0000)]
Create pcib_request_feature.
pcib_request_feature allows drivers to request the firmware (ACPI)
release certain features it may be using. ACPI normally manages things
like hot plug, advanced error reporting and other features until the
OS requests ACPI to relenquish control since it is taking over.
Enji Cooper [Sat, 25 Feb 2017 03:44:51 +0000 (03:44 +0000)]
Fill MK_LIBTHR as far as lib/libthr is concerned
There are other areas of the tree that will need to be evaluated for sanity
if they're supposed to be conditionally compiled out of the build/install,
like libzpool
MFC after: 1 month
Relnotes: yes (this might break someone's system if have the knob set)
Sponsored by: Dell EMC Isilon
The fsid of zfs filesystems might change after reboot or remount. The problem seems to
be caused by a race between unique_insert() and unique_remove(). The unique_remove()
is called from dsl_dataset_evict() which is now an asynchronous thread. In a case the
dsl_dataset_evict() thread is very slow and calls unique_remove() too late we will end
up with changed fsid on zfs mount.
This problem is very likely caused by #5056.
Steps to Reproduce
Note: I'm able to reproduce this always on a single core (virtual) machine. On multicore
machines it is not so easy to reproduce.
Impact
The persistent fsid (filesystem id) is essential for proper NFS functionality.
If the fsid of a filesystem changes on remount (or after reboot) the NFS
clients might not be able to automatically recover from such event and the
manual remount of the NFS filesystems on every NFS client might be needed.
Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Enji Cooper [Sat, 25 Feb 2017 03:11:08 +0000 (03:11 +0000)]
Add shutdown/poweroff support to rescue(8)
shutdown is a safer way to power off than reboot (in general), because of
the added shutdown process that it executes via /etc/rc.shutdown . It was
odd that it was missing from rescue(8) since reboot and friends were
added in past commits.
While here, alias poweroff to shutdown for parity with sbin/shutdown/Makefile
iwn: some initialization / RF switch state change fixes.
- Check return code from initialization path; otherwise, vap state
may be wrong after an error.
- Do not try to run iwn_stop() / iwn_init() multiple times.
- Merge iwn_radio_on/off() and move RFKILL bit check into the task.
- Try to handle possible RF switch state change in S3 state (PR 181694).
PR: 181694
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D9797
Warner Losh [Sat, 25 Feb 2017 00:09:26 +0000 (00:09 +0000)]
Exit when we can't print a variable.
Exit after printing a message on stderr when we can't get a
message. This is slightly different than linux, but keeps shell
scripts from thinking the value of the variable is the error message
and so is a net win.
Warner Losh [Sat, 25 Feb 2017 00:09:21 +0000 (00:09 +0000)]
Don't convert ENOENT to nothing for individual lookup, just for the
iterative get_next interface. This prevents efivar(3) from printing 4k
of 0's when a variable isn't set.
Warner Losh [Sat, 25 Feb 2017 00:09:12 +0000 (00:09 +0000)]
Exit with usage if argv[1] is NULL in dispatch. This fixes core dumps
when a command has subcommands, but the user doesn't give the
parameters on the command line.
We have seen several cases recently where we appear to get a double-fault:
We have an original panic. Then, instead of writing the core to the dump
device, the kernel has a second panic: "smp_targeted_tlb_shootdown:
interrupts disabled". This change is an attempt to fix that second panic.
When the other CPUs are stopped, we can't notify them of the TLB shootdown,
so we skip that operation. However, when the CPUs come back up, we
invalidate the TLB to ensure they correctly observe any changes to the
page mappings.
On Core2 and older Intel CPUs, where TSC stops in C2, system does not
allow C2 entrance if timecounter hardware is TSC. This is done by
tc_windup() which tests for TC_FLAGS_C2STOP flag of the new
timecounter and increases cpu_disable_c2_sleep if flag is set. Right
now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but
TSC is initialized too early for this variable to be set by
acpi_cpu.c.
There is no reason to require that ACPI reported C2 and deeper states
to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from
init_TSC_tc() condition. And since this is the only use of the
variable, remove it at all.
Reported and submitted by: Jia-Shiun Li <jiashiun@gmail.com>
Suggested by: jhb
MFC after: 2 weeks
Properly handle possible underflow in vm_fault_prefault().
In vm_fault_prefault(), if backward count causes underflow in
calculation of
starta = addra - backward * PAGE_SIZE;
then starta must be clipped to entry->start, instead of zero.
Clipping to zero allowed mapping outside of the map entries address
ranges, in particular, map at zero.
Adrian Chadd [Thu, 23 Feb 2017 20:49:17 +0000 (20:49 +0000)]
[ifconfig] handle illegal WPS frames
Some APs broadcast WPS IE frames with totally broken data. Ifconfig's printwpsie()
loops through WPS frames printing the attributes out; if the frame's data is bad,
printwpsie() can end up looking at out-of-bounds addresses causing ifconfig to
bus error.
Thanks to Takashi Inoue at Nihon U for his efforts in debugging this.
Brooks Davis [Thu, 23 Feb 2017 20:41:55 +0000 (20:41 +0000)]
Fix and shorten BERI kernel builds during universe.
Stop building BERI_DE4_BASE and BERI_SIM_BASE, they aren't particularly
valid as they don't have a root dev. Do build BERI_DE4_SDROOT which
does so devices get coverage.
Remove ident from BERI_DE4_BASE for the reasons above which will cause
it to fail to build. BERI_SIM_BASE was already this way and broke
universe.[0]
Eric van Gyzen [Thu, 23 Feb 2017 19:36:38 +0000 (19:36 +0000)]
Add sem_clockwait_np()
This function allows the caller to specify the reference clock
and choose between absolute and relative mode. In relative mode,
the remaining time can be returned.
The API is similar to clock_nanosleep(3). Thanks to Ed Schouten
for that suggestion.
While I'm here, reduce the sleep time in the semaphore "child"
test to greatly reduce its runtime. Also add a reasonable timeout.
Michael Tuexen [Thu, 23 Feb 2017 18:14:36 +0000 (18:14 +0000)]
TCP window updates are only sent if the window can be increased by at
least 2 * MSS. However, if the receive buffer size is small, this might
be impossible. Add back a criterion to send a TCP window update if
the window can be increased by at least half of the receive buffer size.
This condition was removed in r242252. This patch simply brings it back.
PR: 211003
Reviewed by: gnn
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D9475
Enji Cooper [Thu, 23 Feb 2017 17:44:06 +0000 (17:44 +0000)]
Unbreak if_iwm.ko after r314076
Add if_iwm_7000.c/if_iwm_8000.c to SRCS to match similar additions made
to sys/conf/files after refactoring done in the commit noted.
PR: 217308
Pointyhat to: adrian
Submitted by: Andreas Nilsson <andrnils@gmail.com>
Reported by: Jakob Alvermark <jakob@alvermark.net>, Juan Ramómon Molina Menor <listjm@club.fr>
Sponsored by: Dell EMC Isilon
Alan Somers [Thu, 23 Feb 2017 16:31:04 +0000 (16:31 +0000)]
Misc Coverity fixes in xnb(4)
Most of these are null pointer dereferences or missing error checks in the
unit tests. One is a missing error check in xnb_attach_failed. None can
cause real problems in running systems.
Ed Maste [Thu, 23 Feb 2017 14:39:51 +0000 (14:39 +0000)]
make vi message catalogues build independent of locale
r275234 addressed sort automatically converting 8-bit locales to UTF-8
by using "LANG=C sort", but LC_ALL overrides LANG if set, so the issue
may still be present depending on the user's environment. Use LC_ALL=C
instead.
Reported by: tests.reproducible-builds.org
Reviewed by: bapt
MFC after: 1 week
Sponsored by: The Linux Foundation / Core Infrastructure Initiative
Differential Revision: https://reviews.freebsd.org/D9765
Kurt Lidl [Thu, 23 Feb 2017 05:40:59 +0000 (05:40 +0000)]
Reset failed login count to zero when removing a blocked address
The blacklistd daemon keeps records of failed login attempts for
each address:port that is flagged as a failed login. When a
successful login occurs for that address:port combination,
the record's last update time is set to zero, to indicate no current
failed login attempts.
Reset the failed login count to zero, so that at the next failed
login attempt, the counting will restart properly at zero. Without
this reset to zero, the first failed login after a successful login
will cause the address to be blocked immediately.
When debugging is turned on, output more information about database
state before and after the database updates have occured.
A similar patch has already been upstreamed to NetBSD.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Fix a panic during boot caused by inadequate locking of some vt(4) driver
data structures.
vt_change_font() calls vtbuf_grow() to change some vt driver data
structures. It uses TF_MUTE to prevent the console from trying to use those
data structures while it changes them.
During the early stage of the boot process, the vt driver's tc_done routine
uses those data structures; however, it is currently called outside the
TF_MUTE check.
Move the tc_done routine inside the locked TF_MUTE check.
John Baldwin [Thu, 23 Feb 2017 00:02:49 +0000 (00:02 +0000)]
Fully handle the special encoding of GOT[1] on mips64.
The MIPS ABI does not require the second GOT entry to be reserved for use
by the runtime linker as on other architectures. Instead, static linkers
use a special value in the second GOT entry to indicate if the entry is
reserved. This value is supposed to consist of an address with the MSB
set and the rest of the bits all zero which is an invalid user address.
However, the old binutils currently in the tree uses the 32-bit mask value
(2^31) on 64-bit MIPS instead of 2^63. This was fixed in upstream
binutils in 2008 to use 2^63 on 64-bit MIPS.
The first part of this change changes the runtime check in init_pltgot()
to check for both values (2^31 and 2^63) when deciding whether to store
the current object pointer in GOT[1] which fixes dynamic N64 binaries
compiled with modern binutils.
However, the initial version of this fix exposed another related bug in
that _rtld_relocate_nonplt_self() was only checking for the new value
(2^63) in GOT[1] and incorrectly treated GOT[1] as a local GOT entry
(and did not relocate the final local GOT entry). To handle this, fix
all of the places that check for GOT[1]'s status to use the same macro
that checks for both values on N64.
Toomas Soome [Wed, 22 Feb 2017 22:00:50 +0000 (22:00 +0000)]
loader: update symlink support in zfs reader
As the current zfs file system is providing symlink via system attributes, need
to update the code accordingly.
Note, as the zfsboot code does not free the memory at this time, the
object list will put some stress on the boot2 heap, eventually we should
address the issue.
Kurt Lidl [Wed, 22 Feb 2017 21:50:37 +0000 (21:50 +0000)]
Improve ipfw rule creation for blacklist-helper script
When blocking an address, the blacklist-helper script
needs to do the following things for the ipfw packet
filter:
- create a table to hold the addresses to be blocked,
so lookups can be done quickly, and place the address
to be blocked in that table
- create rule that does the lookup in the table and
blocks the packet
The ipfw system allows multiple rules to be inserted for
a given rule number. There only needs to be one rule
to do the lookup per port. Modify the script to probe
for the existence of the rule before attempting to create
it, so only one rule is inserted, rather than one rule per
blocked address.
PR: 214980
Reported by: azhegalov (at) gmail.com
Reviewed by: emaste
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D9681