hselasky [Sat, 28 Jun 2014 17:36:18 +0000 (17:36 +0000)]
Compile fixes:
Remove duplicate "debug_ktr.mask" sysctl definition.
Remove now unused variable from "kern_ktr.c".
This fixes build of "ktr" which was broken by r267961.
Let the default value for "vm_kmem_size_scale" be zero. It is setup
after that the sysctl has been initialized from "getenv()" in the
"kmeminit()" function to equal the "VM_KMEM_SIZE_MAX" value, if
zero. On Sparc64 the "VM_KMEM_SIZE_MAX" macro is not a constant. This
fixes build of Sparc64 which was broken by r267961.
Add a special macro to dynamically create SYSCTL root nodes, because
root nodes have a special parent. This fixes build of existing OFED
module and CANBUS module for pc98 which was broken by r267961.
Add missing "sysctl.h" includes to get the needed sysctl header file
declarations. This is needed after r267961.
mjg [Sat, 28 Jun 2014 05:18:03 +0000 (05:18 +0000)]
Make sure to always clear p_fd for process getting rid of its filetable.
Filetable can be shared with other processes. Previous code failed to
clear the pointer for all but the last process getting rid of the table.
This is mostly cosmetics.
Get rid of 'This should happen earlier' comment. Clearing the pointer in
this place is fine as consumers can reliably check for files availability
by inspecting fd_refcnt and vnodes availabity by NULL-checking them.
dim [Fri, 27 Jun 2014 20:41:12 +0000 (20:41 +0000)]
Pull in r211627 from upstream llvm trunk (by Bill Schmidt):
[PPC64] Fix PR20071 (fctiduz generated for targets lacking that
instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation
for floating-point conversion to integer. The fctiduz instruction
was added in Power ISA 2.06 (i.e., Power7 and later). However, this
instruction is being generated regardless of which 64-bit PowerPC
target is selected.
The intent is for fast-isel to punt to DAG selection when this
instruction is not available. This patch implements that change.
For testing purposes, the existing fast-isel-conversion.ll test adds
a RUN line for -mcpu=970 and tests for the expected code generation.
Additionally, the existing test fast-isel-conversion-p5.ll was found
to be incorrectly expecting the unavailable instruction to be
generated. I've removed these test variants since we have adequate
coverage in fast-isel-conversion.ll.
This is needed to compile clang with debug+asserts on older powerpc64
and ppc970 targets.
marius [Fri, 27 Jun 2014 19:57:57 +0000 (19:57 +0000)]
In order to get vt(4) a bit closer to the feature set provided by sc(4),
implement options TERMINAL_{KERN,NORM}_ATTR. These are aliased to
SC_{KERNEL_CONS,NORM}_ATTR and like these latter, allow to change the
default colors of normal and kernel text respectively.
Note on the naming: Although affecting the output of vt(4), technically
kern/subr_terminal.c is primarily concerned with changing default colors
so it would be inconsistent to term these options VT_{KERN,NORM}_ATTR.
Actually, if the architecture and abstraction of terminal+teken+vt would
be perfect, dev/vt/* wouldn't be touched by this commit at all.
Reviewed by: emaste
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
marius [Fri, 27 Jun 2014 18:24:20 +0000 (18:24 +0000)]
- SC_NO_SYSMOUSE isn't currently supported by vt(4), so nuke it from vt.4.
- vt_vga(4) is a driver rather than a function so reference it accordingly.
- Uncomment HISTORY section given that vt(4) will first appear in 9.3.
Reviewed by: emaste (modulo last part)
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
emaste [Fri, 27 Jun 2014 17:50:33 +0000 (17:50 +0000)]
Use a common tunable to choose between vt(4)/sc(4)
With this change and previous work from ray@ it will be possible to put
both in GENERIC, and have one enabled by default, but allow the other to
be selected via the loader.
(The previous implementation had separate kern.vt.disable and
hw.syscons.disable tunables, and would panic if both drivers were
compiled in and neither was explicitly disabled.)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
hselasky [Fri, 27 Jun 2014 16:33:43 +0000 (16:33 +0000)]
Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
grehan [Fri, 27 Jun 2014 05:27:37 +0000 (05:27 +0000)]
Set the version and date to fixed fields rather than using
preprocessor macros that don't allow reproducible builds.
As a side-effect, the date string is now spec-compliant.
mjg [Fri, 27 Jun 2014 05:04:36 +0000 (05:04 +0000)]
Check lower bound of cmsg_len.
If passed cm->cmsg_len was below cmsghdr size the experssion:
datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;
would give negative result. However, in practice it would not
result in a crash because the kernel would try to obtain garbage fds
for given process and would error out with EBADF.
jhb [Thu, 26 Jun 2014 20:12:38 +0000 (20:12 +0000)]
- Document -b to enable the bvmcons console (but mark it as deprecated
similar to -g.)
- Document -U to set the SMBIOS UUID.
- Add missing options to the usage output and to the manpage Synopsis.
- Don't claim that bvmdebug is amd64-only (it is also a device, not an
option).
mav [Thu, 26 Jun 2014 20:06:37 +0000 (20:06 +0000)]
Simplify statistics calculation.
Instead of trying to guess size of disk I/O operations (it just won't work
that way for newly added commands, and is equal to data move size for old
ones), account data move traffic. If disk I/Os are that interesting, then
backends have to account and provide that information.
Block backend already exports the information about disk I/Os via devstat,
so having it here too is excessive.
rpaulo [Thu, 26 Jun 2014 19:38:16 +0000 (19:38 +0000)]
MFV illumos r266986:
2915 DTrace in a zone should see "cpu", "curpsinfo", et al
2916 DTrace in a zone should be able to access fds[]
2917 DTrace in a zone should have limited provider access
bz [Thu, 26 Jun 2014 17:10:07 +0000 (17:10 +0000)]
Allow switching between 32bit and 64bit bus width data access at compile
time by setting NF10BMAC_64BIT and using a REGWTYPE #define to set correct
variable and return value widths.
Adjust comments to indicate the 32 or 64bit register widths.
alc [Thu, 26 Jun 2014 16:04:03 +0000 (16:04 +0000)]
Delay the call to crhold() in vm_map_insert() until we know that we won't
have to undo it by calling crfree(). This reduces the total number of calls
by vm_map_insert() to crhold() and crfree() by 45% in my tests.
Eliminate an unnecessary variable from vm_map_insert().
ume [Thu, 26 Jun 2014 12:12:18 +0000 (12:12 +0000)]
- Exclude loopback address rather than loopback interface.
- style(9)
TODO: When AI_ADDRCONFIG is specified, getaddrinfo() can
be quite slow for system with many interfaces. We should
have some kernel sysctls to report IPv4/IPv6 status.
mav [Thu, 26 Jun 2014 08:56:36 +0000 (08:56 +0000)]
Add READ BUFFER and improve WRITE BUFFER SCSI commands support.
This gives some use to 512KB per-LUN buffers, allocated for Copan-specific
processor code and not used. It allows, for example, to test transport
performance and/or correctness without accessing the media, as supported
by Linux version of sg3_utils.
davide [Thu, 26 Jun 2014 05:23:48 +0000 (05:23 +0000)]
Improve r264388 removing namespace pollution previously introduced in
<sys/time.h>. INT64_MAX actually requires __INT64_C() hack to get the
type right on exotic architectures (e.g. on ones with 63-bit ints or long
0x7fffffffffffffff is unsigned int or long). The hardcoded LL suffix is
good enough to avoid these problems for SBT_MAX (it makes the type always
signed long long, without overflow since long long has at least 64 bits).
Many thanks to Bruce Evans for the time spent me to explain this.
adrian [Thu, 26 Jun 2014 04:12:41 +0000 (04:12 +0000)]
Retire IP_RSSCPUID ; the right thing to do is query the RSS bucket;
map the bucket to an RSS queue, then map the queue to a CPU ID.
This way the bucket->queue and queue->CPU mapping can change
over time.
Introduce IP_RSSBUCKETID - which instead looks up the RSS bucket.
User applications can then map the RSS bucket to a CPU.
adrian [Thu, 26 Jun 2014 02:49:51 +0000 (02:49 +0000)]
Add another RSS method to query the indirection table entries.
There's 128 indirection table entries which correspond to the
low 7 bits of the 32 bit RSS hash. Each value will correspond
to an RSS bucket. (Then each RSS bucket currently will map
to a CPU.)
This is a more explicit way of figuring out which RSS bucket
is in each RSS indirection slot. It can be inferred by the other
methods but I'd rather drivers use something more simplified and
explicit.
vmm_stat.[ch] -
Modify the counter code in bhyve to allow direct setting of a counter
as opposed to incrementing, and providing a callback to fetch a
counter's value.
jhb [Wed, 25 Jun 2014 20:30:47 +0000 (20:30 +0000)]
Expand r261243 even further and ignore any I/O port resources assigned to
PCI root bridges except for the one known-valid case on x86 where bridges
claim the I/O port registers used for PCI config space access.
mav [Wed, 25 Jun 2014 17:02:01 +0000 (17:02 +0000)]
Introduce fine-grained CTL locking to improve SMP scalability.
Split global ctl_lock, historically protecting most of CTL context:
- remaining ctl_lock now protects lists of fronends and backends;
- per-LUN lun_lock(s) protect LUN-specific information;
- per-thread queue_lock(s) protect request queues.
This allows to radically reduce congestion on ctl_lock.
Create multiple worker threads, depending on number of CPUs, and assign
each LUN to one of them. This allows to spread load between multiple CPUs,
still avoiging congestion on queues and LUNs locks.
On 40-core server, exporting 5 LUNs, each backed by gstripe of SATA SSDs,
accessed via 6 iSCSI connections, this change improves peak request rate
from 250K to 680K IOPS.
mav [Wed, 25 Jun 2014 16:12:14 +0000 (16:12 +0000)]
Allow to use iSCSI immediate data by several ctl_datamove() calls.
While for FreeBSD client that is only a minor optimization, VMWare client
doesn't support additional data requests after all data being sent once as
immediate.
marcel [Wed, 25 Jun 2014 15:22:14 +0000 (15:22 +0000)]
* Handle ++x as well as x++ while converting.
* Add special case handling where normal conversion would not work
(some APIs have special names)
* Fix conversion for function calls involving ifnet
Submitted by: Sreekanth Rupavatharam <rupavath@juniper.net>
Obtained from: Juniper Networks, Inc.
royger [Wed, 25 Jun 2014 09:51:08 +0000 (09:51 +0000)]
xen/virtio: fix balloon drivers to not mark pages as WIRED
Prevent the Xen and VirtIO balloon drivers from marking pages as
wired. This prevents them from increasing the system wired page count,
which can lead to mlock failing because of hitting the limit in
vm.max_wired.
In the Xen case make sure pages are zeroed before giving them back to
the hypervisor, or else we might be leaking data. Also remove the
balloon_{append/retrieve} and link pages directly into the
ballooned_pages queue using the plinks.q field in the page struct.
dev/virtio/balloon/virtio_balloon.c:
- Don't allocate pages with VM_ALLOC_WIRED.
dev/xen/balloon/balloon.c:
- Don't allocate pages with VM_ALLOC_WIRED.
- Make sure pages are zeroed before giving them back to the
hypervisor.
- Remove the balloon_entry struct and the balloon_{append/retrieve}
functions and use the page plinks.q entry to link the pages
directly into the ballooned_pages queue.
daichi [Wed, 25 Jun 2014 05:39:30 +0000 (05:39 +0000)]
Fixed an IIC timing issue between the glxiic master and a slave of
peripheral devices. When transmitting (rx) from slave to master,
sometimes nAKC delays. As a result, some slaves fails their
transmission.
davide [Wed, 25 Jun 2014 03:54:02 +0000 (03:54 +0000)]
Continue the crusade towards a dev_clone()-free kernel, removing its
usage from dtrace. The dtrace code already uses cdevpriv(9) since FreeBSD
8, so this change should be quite harmless.
Reviewed by: markj
Approved by: markj
MFC after: never
alc [Wed, 25 Jun 2014 03:30:03 +0000 (03:30 +0000)]
Now that vm_map_insert() sets MAP_ENTRY_GROWS_{DOWN,UP} on the stack entries
that it creates (r267645), we can place the check that blocks map entry
coalescing on stack entries in vm_map_simplify_entry() where it properly
belongs.
imp [Tue, 24 Jun 2014 22:15:27 +0000 (22:15 +0000)]
Make sure that the sub-makes for unwind.h start from the CURDIR
(/usr/src) tree rather than the OBJDIR (/usr/obj) tree. This fixes
broken incremental builds with the canonical MAKESYSPATH workaround
of .../share/mk. This is a gross kludge.
wollman [Tue, 24 Jun 2014 20:23:18 +0000 (20:23 +0000)]
Catch up with many years of changes:
o Document PF_LOCAL as being an alias for PF_UNIX
o Document POSIX standardization of this interface using AF_*
constants rather than PF_* constants, and note the three particular
families which POSIX standardizes.
o Note anticipated POSIX standardization of SOCK_CLOEXEC.
o Delete from listing protocol families that FreeBSD doesn't support
(in some cases, like PF_PUP, has never supported).
o Add to listing some current protocol families that have been
introduced in the last decade or so.
o Document the correspondence of PF_* and AF_* constants.
We should probably change the documentation to make the AF_* constants
primary, but this commit does not do so.