Alan Cox [Thu, 10 Jul 2014 20:55:38 +0000 (20:55 +0000)]
Correct the accounting code for wired mappings. The wrong field of the PVO
entry was being tested. We were incrementing and decrementing the pmap's
wired mapping count based on whether the physical page being mapped or
unmapped was cache coherent, not whether it was a wired mapping.
tty.c 1.31
handle EINTR in the termios operations
Allow a single process to control multiple ttys (for pthreads using _REENTRANT)
using multiple EditLine objects.
pass lint on _LP64.
Don't depend on side effects inside an assert
Ed Schouten [Thu, 10 Jul 2014 16:10:39 +0000 (16:10 +0000)]
Fix a couple of style nits.
- Use set instead of std::set, to be consistent with the rest of the file.
- Remove return (0); it's not required.
- Add a dash at the beginning of the copyright, per style(9).
Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.
This was a very big performance improvement for an experimental
Netmap bhyve backend.
Adrian Chadd [Thu, 10 Jul 2014 03:10:56 +0000 (03:10 +0000)]
Implement the first stage of multi-bind listen sockets and RSS socket
awareness.
* Introduce IP_BINDMULTI - indicating that it's okay to bind multiple
sockets on the same bind details.
Although the PCB code has been taught about this (see below) this patch
doesn't introduce the rest of the PCB changes necessary to distribute
lookups among multiple PCB entries in the global wildcard table.
* Introduce IP_RSS_LISTEN_BUCKET - placing an listen socket into the
given RSS bucket (and thus a single PCBGROUP hash.)
* Modify the PCB add path to be aware of IP_BINDMULTI:
+ Only allow further PCB entries to be added if the owner credentials
and IP_BINDMULTI has been specified. Ie, only allow further
IP_BINDMULTI sockets to appear if the first bind() was IP_BINDMULTI.
* Teach the PCBGROUP code about IP_RSS_LISTE_BUCKET marked PCB entries.
Instead of using the wildcard logic and hashing, these sockets are
simply placed into the PCBGROUP and _not_ in the wildcard hash.
* When doing a PCBGROUP lookup, also do a wildcard match as well.
This allows for an RSS bucket PCB entry to appear in a PCBGROUP
rather than having to exist in the wildcard list.
Tested:
* TCP IPv4 server testing with igb(4)
* TCP IPv4 server testing with ix(4)
TODO:
* The pcbgroup lookup code duplicated the wildcard and wildcard-PCB
logic. This could be refactored into a single function.
* This doesn't yet work for IPv6 (The PCBGROUP code in netinet6/ doesn't
yet know about this); nor does it yet fully work for UDP.
Warner Losh [Thu, 10 Jul 2014 00:15:42 +0000 (00:15 +0000)]
Make SERIAL support optional again. Enable it for i386 because a huge
percentage of machines has a 16550. Disable it for pc98 since only a
tiny fraction of them have one. These changes save 293 bytes when
building with clang, but preserves the ability to build with serial if
you really want. We now have 92 bytes free (412 with the in-tree gcc).
Xin LI [Wed, 9 Jul 2014 23:14:59 +0000 (23:14 +0000)]
MFV r268455:
Use reserved space for ZFS administrative commands.
We reserve 1/2^spa_slop_shift = 1/32 or 3.125% of pool space (or 32MB at
least) for system use. Most ZPL operations, e.g. write(2), creat(2), will
fail with ENOSPC if we fall below this.
Certain operations, e.g. file removal and most administrative actions,
still permitted until half of the slop space is used. This would allow
users to use these operations to free up space in the pool when pool is
close to full but half of slop space is still free.
A very restricted set of operations that frees up space or change quota
are always permitted, regardless of the amount of free space.
For safety, ensure that any consumer of the set_regs() and
ptrace_set_pc() use the correct return to userspace using iret.
The signal return, PT_CONTINUE (which in fact uses signal return path)
set the pcb flag already. The setcontext(2) enforces iret return when
%rip is incorrect. Due to this, the change is redundand, but is made
to ensure that no path which modifies context, forgets to set
PCB_FULL_IRET.
Inspired by: CVE-2014-4699
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Current code in sysctl proc.vmmap, which intent is to calculate the
amount of resident pages, in fact calculates the amount of installed
pte entries in the region. Resident pages which were not soft-faulted
yet are not counted.
Calculate the amount of resident pages by looking in the objects chain
backing the region.
Add a knob to disable the residency calculation at all. For large
sparce regions, either previous or updated algorithm runs for too long
time, while several introspection tools do not need the (advisory) RSS
value at all.
PR: kern/188911
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The GNU readline library is now an INTERNALLIB - that is, it is
statically linked into consumers (GDB and variants) in the base
system, and the shared library is no longer installed.
That also allows ports to be able to use a modern version of readline
Xin LI [Wed, 9 Jul 2014 08:23:22 +0000 (08:23 +0000)]
4951 ZFS administrative commands should use reserved space, not fail with ENOSPC
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Xin LI [Wed, 9 Jul 2014 08:20:08 +0000 (08:20 +0000)]
4966 zpool list iterator does not update output
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Xin LI [Wed, 9 Jul 2014 08:17:09 +0000 (08:17 +0000)]
4953 zfs rename <snapshot> need not involve libshare
4954 "zfs create" need not involve libshare if we are not sharing
4955 libshare's get_zfs_dataset need not sort the datasets
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Xin LI [Wed, 9 Jul 2014 08:14:13 +0000 (08:14 +0000)]
4950 files sometimes can't be removed from a full filesystem
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
vm_phys: remove limitation on number of fictitious regions
The number of vm fictitious regions was limited to 8 by default, but
Xen will make heavy usage of those kind of regions in order to map
memory from foreign domains, so instead of increasing the default
number, change the implementation to use a red-black tree to track vm
fictitious ranges.
vm/vm_phys.c:
- Replace the vm fictitious static array with a red-black tree.
- Use a rwlock instead of a mutex, since now we also need to take the
lock in vm_phys_fictitious_to_vm_page, and it can be shared.
Julio Merino [Wed, 9 Jul 2014 00:55:50 +0000 (00:55 +0000)]
Fix atf-sh's integration_test
With the move of atf-sh into /usr/libexec in r267181, some of the
tests in the integration_test program broke because they could not
execute atf-sh from the path any longer.
This slipped through because I do have a local atf installation in
my home directory that appears in my path, hence the tests could
still execute my own version.
Fix this by forcing /usr/libexec to appear at the beginning of the
path when attempting to execute atf-sh.
To make upgrading easy (and to avoid an unnecessary entry in UPDATING),
make integration_test depend on the Makefile so that a rebuild of the
shell script is triggered. This requires a hack in the *.test.mk files
to ensure the Makefile is not treated as a source to the generated
program. Ugly, I know, but I don't have a better way of doing this at
the moment. Will think of one once I address the TODO in the *.test.mk
files that suggests generalizing the file generation functionality.
Alexander Motin [Tue, 8 Jul 2014 17:26:08 +0000 (17:26 +0000)]
Remove IO_SYNC flag when writing extended file attributes on ZFS.
While it is possible to create and write file, modify its permissions, etc.
without ever doing sync, it looks odd that it is required for setting
extended file attributes on ZFS. UFS does not do sync there too.
Samba uses those extended attributes to store some its data, and doing it
synchronously by many times reduces file creation performance for systems
without SLOG device.
Reviewed by: delphij, jpaetzel, silence on fs@
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Alexander Motin [Tue, 8 Jul 2014 16:38:05 +0000 (16:38 +0000)]
Enable TAS feature: notify initiator if its command was aborted by other.
That should make operation more kind to multi-initiator environment.
Without this, other initiators may find out that something bad happened
to their commands only via command timeout.
Alexander Motin [Tue, 8 Jul 2014 12:15:15 +0000 (12:15 +0000)]
Return task management requests to queued execution, but differently.
Testing shown that both original queued design with separate task queue,
and recent direct execution design had significant flaw: If abort request
arrives just after the victim, the last one may not be in the ooa_queue
yet, and so invisible for the task management function.
Unlike original queued implementation, use same queue for all SCSI and
TASK requests from the same initiator. That avoids races between them:
task functions are always executed in proper time, relatively to other
requests.
Correct the problem reported by test16 from
tools/regression/file/flock/flock.c, which completes the fix in
r192685. When the lock was stolen from us, retry the whole lock
sequence in kernel, instead of returning EINTR to usermode and hoping
that application would handle it correctly by restarting the lock
acquire.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Warner Losh [Mon, 7 Jul 2014 23:21:25 +0000 (23:21 +0000)]
xdev builds libsupc++ and libstdc++ in a slightly strange way. This
cause a race to be exposed between the two. Compensate for this race
by serializing the build/install of libstdc++ before libsupc++.
Warner Losh [Mon, 7 Jul 2014 23:21:20 +0000 (23:21 +0000)]
rm -rf can fail sometimes with an error from fts_read. Make it honor
fflag to ignore fts_read errors, but stop deleting from that directory
because no further progress can be made.
When building a kernel with a high -j value on a high core count
machine, during the cleanobj phase we can wind up doing multiple rm
-rf at the same time for modules that have subdirectories. This
exposed this race (sometimes) as fts_read can return an error if the
directory is removed by another rm -rf. Since the intent of the -f
flag was to ignore errors, even if this was a bug in fts_read, we
should ignore the error like we've been instructed to do.
Warner Losh [Mon, 7 Jul 2014 23:21:07 +0000 (23:21 +0000)]
Naughty NANDFS was using hidden unused flag, hiding the fact that the
flag was used and wasn't really available. Change the name without
fixing any laying issues that might be present in NANDFS' use of this
flag.
Alexander Motin [Mon, 7 Jul 2014 09:37:22 +0000 (09:37 +0000)]
Teach ctl_add_initiator() to dynamically allocate IIDs from pool.
If port passed negative IID value, the function will try to allocate IID
from the pool of unused, based on passed wwpn or name arguments. It does
all its best to make IID unique and persistent across reconnects.
This makes persistent reservation properly work for iSCSI. Previously,
in case of reconnects, reservation could be unexpectedly lost, or even
migrate between intiators.
Fabien Thomas [Mon, 7 Jul 2014 08:22:39 +0000 (08:22 +0000)]
Optim and Fix for mge driver:
- add missing rcvif in mbuf
- add missing ipacket stat
- remove uncessary mbuf copy on output path
- fix deadlock of the TX engine in case of error