emaste [Fri, 27 Jun 2014 17:50:33 +0000 (17:50 +0000)]
Use a common tunable to choose between vt(4)/sc(4)
With this change and previous work from ray@ it will be possible to put
both in GENERIC, and have one enabled by default, but allow the other to
be selected via the loader.
(The previous implementation had separate kern.vt.disable and
hw.syscons.disable tunables, and would panic if both drivers were
compiled in and neither was explicitly disabled.)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
hselasky [Fri, 27 Jun 2014 16:33:43 +0000 (16:33 +0000)]
Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
grehan [Fri, 27 Jun 2014 05:27:37 +0000 (05:27 +0000)]
Set the version and date to fixed fields rather than using
preprocessor macros that don't allow reproducible builds.
As a side-effect, the date string is now spec-compliant.
mjg [Fri, 27 Jun 2014 05:04:36 +0000 (05:04 +0000)]
Check lower bound of cmsg_len.
If passed cm->cmsg_len was below cmsghdr size the experssion:
datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;
would give negative result. However, in practice it would not
result in a crash because the kernel would try to obtain garbage fds
for given process and would error out with EBADF.
jhb [Thu, 26 Jun 2014 20:12:38 +0000 (20:12 +0000)]
- Document -b to enable the bvmcons console (but mark it as deprecated
similar to -g.)
- Document -U to set the SMBIOS UUID.
- Add missing options to the usage output and to the manpage Synopsis.
- Don't claim that bvmdebug is amd64-only (it is also a device, not an
option).
mav [Thu, 26 Jun 2014 20:06:37 +0000 (20:06 +0000)]
Simplify statistics calculation.
Instead of trying to guess size of disk I/O operations (it just won't work
that way for newly added commands, and is equal to data move size for old
ones), account data move traffic. If disk I/Os are that interesting, then
backends have to account and provide that information.
Block backend already exports the information about disk I/Os via devstat,
so having it here too is excessive.
rpaulo [Thu, 26 Jun 2014 19:38:16 +0000 (19:38 +0000)]
MFV illumos r266986:
2915 DTrace in a zone should see "cpu", "curpsinfo", et al
2916 DTrace in a zone should be able to access fds[]
2917 DTrace in a zone should have limited provider access
bz [Thu, 26 Jun 2014 17:10:07 +0000 (17:10 +0000)]
Allow switching between 32bit and 64bit bus width data access at compile
time by setting NF10BMAC_64BIT and using a REGWTYPE #define to set correct
variable and return value widths.
Adjust comments to indicate the 32 or 64bit register widths.
alc [Thu, 26 Jun 2014 16:04:03 +0000 (16:04 +0000)]
Delay the call to crhold() in vm_map_insert() until we know that we won't
have to undo it by calling crfree(). This reduces the total number of calls
by vm_map_insert() to crhold() and crfree() by 45% in my tests.
Eliminate an unnecessary variable from vm_map_insert().
ume [Thu, 26 Jun 2014 12:12:18 +0000 (12:12 +0000)]
- Exclude loopback address rather than loopback interface.
- style(9)
TODO: When AI_ADDRCONFIG is specified, getaddrinfo() can
be quite slow for system with many interfaces. We should
have some kernel sysctls to report IPv4/IPv6 status.
mav [Thu, 26 Jun 2014 08:56:36 +0000 (08:56 +0000)]
Add READ BUFFER and improve WRITE BUFFER SCSI commands support.
This gives some use to 512KB per-LUN buffers, allocated for Copan-specific
processor code and not used. It allows, for example, to test transport
performance and/or correctness without accessing the media, as supported
by Linux version of sg3_utils.
davide [Thu, 26 Jun 2014 05:23:48 +0000 (05:23 +0000)]
Improve r264388 removing namespace pollution previously introduced in
<sys/time.h>. INT64_MAX actually requires __INT64_C() hack to get the
type right on exotic architectures (e.g. on ones with 63-bit ints or long
0x7fffffffffffffff is unsigned int or long). The hardcoded LL suffix is
good enough to avoid these problems for SBT_MAX (it makes the type always
signed long long, without overflow since long long has at least 64 bits).
Many thanks to Bruce Evans for the time spent me to explain this.
adrian [Thu, 26 Jun 2014 04:12:41 +0000 (04:12 +0000)]
Retire IP_RSSCPUID ; the right thing to do is query the RSS bucket;
map the bucket to an RSS queue, then map the queue to a CPU ID.
This way the bucket->queue and queue->CPU mapping can change
over time.
Introduce IP_RSSBUCKETID - which instead looks up the RSS bucket.
User applications can then map the RSS bucket to a CPU.
adrian [Thu, 26 Jun 2014 02:49:51 +0000 (02:49 +0000)]
Add another RSS method to query the indirection table entries.
There's 128 indirection table entries which correspond to the
low 7 bits of the 32 bit RSS hash. Each value will correspond
to an RSS bucket. (Then each RSS bucket currently will map
to a CPU.)
This is a more explicit way of figuring out which RSS bucket
is in each RSS indirection slot. It can be inferred by the other
methods but I'd rather drivers use something more simplified and
explicit.
vmm_stat.[ch] -
Modify the counter code in bhyve to allow direct setting of a counter
as opposed to incrementing, and providing a callback to fetch a
counter's value.
jhb [Wed, 25 Jun 2014 20:30:47 +0000 (20:30 +0000)]
Expand r261243 even further and ignore any I/O port resources assigned to
PCI root bridges except for the one known-valid case on x86 where bridges
claim the I/O port registers used for PCI config space access.
mav [Wed, 25 Jun 2014 17:02:01 +0000 (17:02 +0000)]
Introduce fine-grained CTL locking to improve SMP scalability.
Split global ctl_lock, historically protecting most of CTL context:
- remaining ctl_lock now protects lists of fronends and backends;
- per-LUN lun_lock(s) protect LUN-specific information;
- per-thread queue_lock(s) protect request queues.
This allows to radically reduce congestion on ctl_lock.
Create multiple worker threads, depending on number of CPUs, and assign
each LUN to one of them. This allows to spread load between multiple CPUs,
still avoiging congestion on queues and LUNs locks.
On 40-core server, exporting 5 LUNs, each backed by gstripe of SATA SSDs,
accessed via 6 iSCSI connections, this change improves peak request rate
from 250K to 680K IOPS.
mav [Wed, 25 Jun 2014 16:12:14 +0000 (16:12 +0000)]
Allow to use iSCSI immediate data by several ctl_datamove() calls.
While for FreeBSD client that is only a minor optimization, VMWare client
doesn't support additional data requests after all data being sent once as
immediate.
marcel [Wed, 25 Jun 2014 15:22:14 +0000 (15:22 +0000)]
* Handle ++x as well as x++ while converting.
* Add special case handling where normal conversion would not work
(some APIs have special names)
* Fix conversion for function calls involving ifnet
Submitted by: Sreekanth Rupavatharam <rupavath@juniper.net>
Obtained from: Juniper Networks, Inc.
royger [Wed, 25 Jun 2014 09:51:08 +0000 (09:51 +0000)]
xen/virtio: fix balloon drivers to not mark pages as WIRED
Prevent the Xen and VirtIO balloon drivers from marking pages as
wired. This prevents them from increasing the system wired page count,
which can lead to mlock failing because of hitting the limit in
vm.max_wired.
In the Xen case make sure pages are zeroed before giving them back to
the hypervisor, or else we might be leaking data. Also remove the
balloon_{append/retrieve} and link pages directly into the
ballooned_pages queue using the plinks.q field in the page struct.
dev/virtio/balloon/virtio_balloon.c:
- Don't allocate pages with VM_ALLOC_WIRED.
dev/xen/balloon/balloon.c:
- Don't allocate pages with VM_ALLOC_WIRED.
- Make sure pages are zeroed before giving them back to the
hypervisor.
- Remove the balloon_entry struct and the balloon_{append/retrieve}
functions and use the page plinks.q entry to link the pages
directly into the ballooned_pages queue.
daichi [Wed, 25 Jun 2014 05:39:30 +0000 (05:39 +0000)]
Fixed an IIC timing issue between the glxiic master and a slave of
peripheral devices. When transmitting (rx) from slave to master,
sometimes nAKC delays. As a result, some slaves fails their
transmission.
davide [Wed, 25 Jun 2014 03:54:02 +0000 (03:54 +0000)]
Continue the crusade towards a dev_clone()-free kernel, removing its
usage from dtrace. The dtrace code already uses cdevpriv(9) since FreeBSD
8, so this change should be quite harmless.
Reviewed by: markj
Approved by: markj
MFC after: never
alc [Wed, 25 Jun 2014 03:30:03 +0000 (03:30 +0000)]
Now that vm_map_insert() sets MAP_ENTRY_GROWS_{DOWN,UP} on the stack entries
that it creates (r267645), we can place the check that blocks map entry
coalescing on stack entries in vm_map_simplify_entry() where it properly
belongs.
imp [Tue, 24 Jun 2014 22:15:27 +0000 (22:15 +0000)]
Make sure that the sub-makes for unwind.h start from the CURDIR
(/usr/src) tree rather than the OBJDIR (/usr/obj) tree. This fixes
broken incremental builds with the canonical MAKESYSPATH workaround
of .../share/mk. This is a gross kludge.
wollman [Tue, 24 Jun 2014 20:23:18 +0000 (20:23 +0000)]
Catch up with many years of changes:
o Document PF_LOCAL as being an alias for PF_UNIX
o Document POSIX standardization of this interface using AF_*
constants rather than PF_* constants, and note the three particular
families which POSIX standardizes.
o Note anticipated POSIX standardization of SOCK_CLOEXEC.
o Delete from listing protocol families that FreeBSD doesn't support
(in some cases, like PF_PUP, has never supported).
o Add to listing some current protocol families that have been
introduced in the last decade or so.
o Document the correspondence of PF_* and AF_* constants.
We should probably change the documentation to make the AF_* constants
primary, but this commit does not do so.
kib [Tue, 24 Jun 2014 06:55:49 +0000 (06:55 +0000)]
Put the aesni_cipher_setup() and aesni_cipher_process() functions into
the file which is compiled with SSE disabled. The functions set up
the FPU context for kernel, and compiler optimizations which could
lead to use of XMM registers before the fpu_kern_enter(9) is called or
after fpu_kern_leave(9), panic the machine.
Discussed with: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
neel [Tue, 24 Jun 2014 02:02:51 +0000 (02:02 +0000)]
Provide APIs to directly get 'lowmem' and 'highmem' size directly.
Previously the sizes were inferred indirectly based on the size of the mappings
at 0 and 4GB respectively. This works fine as long as size of the allocation is
identical to the size of the mapping in the guest's address space. However, if
the mapping is disjoint then this assumption falls apart (e.g., due to the
legacy BIOS hole between 640KB and 1MB).
kib [Mon, 23 Jun 2014 07:37:54 +0000 (07:37 +0000)]
Add FPU_KERN_KTHR flag to fpu_kern_enter(9), which avoids saving FPU
context into memory for the kernel threads which called
fpu_kern_thread(9). This allows the fpu_kern_enter() callers to not
check for is_fpu_kern_thread() to get the optimization.
Apply the flag to padlock(4) and aesni(4). In aesni_cipher_process(),
do not leak FPU context state on error.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kib [Mon, 23 Jun 2014 07:03:47 +0000 (07:03 +0000)]
Use correct names for the flags. MAP_ENTRY_GROWS_* have the same
numerical values as MAP_STACK_GROWS_*, but the former is for entries'
eflags, while the later for the cow argument of vm_map_insert().
kan [Mon, 23 Jun 2014 03:45:39 +0000 (03:45 +0000)]
Restore the check for non-NULL dmatag in sndbuf_free.
The sound drivers that use own buffer management can use sndbuf_setup
and not do any busdma allocation, so the driver will end up with the
managed buffer but no valid dma map and tag for it. Avoid calling
bus_dmamem_free in such cases.
markj [Mon, 23 Jun 2014 02:00:14 +0000 (02:00 +0000)]
Fix some bugs when fetching probe arguments in i386. Firstly ensure that
the 4 byte-aligned dtrace_invop_callsite can be found and that it
immediately follows the call to dtrace_invop(). Secondly, fix some pointer
arithmetic to account for differences between struct i386_frame and illumos'
struct frame. Finally, ensure that dtrace_getarg() isn't inlined. It works
by following a fixed number of frame pointers to the probe site, so inlining
breaks it.
mjg [Mon, 23 Jun 2014 01:28:18 +0000 (01:28 +0000)]
Tidy up fd-related functions called by do_execve
o assert in each one that fdp is not shared
o remove unnecessary NULL checks - all userspace processes have fdtables
and kernel processes cannot execve
o remove comments about the danger of fd_ofiles getting reallocated - fdtable
is not shared and fd_ofiles could be only reallocated if new fd was about to be
added, but if that was possible the code would already be buggy as setugidsafety
work could be undone
markj [Mon, 23 Jun 2014 01:10:56 +0000 (01:10 +0000)]
Fix a couple of bugs on amd64 when fetching probe arguments beyond the
first five for probes entered through a UD fault (i.e. FBT probes).
Specifically, handle the fact that dtrace_invop_callsite must be
16 byte-aligned and thus may not immediately follow the call to
dtrace_invop() in dtrace_invop_start(). Also fetch register arguments and
the stack pointer through a struct trapframe instead of a struct reg.
ache [Mon, 23 Jun 2014 00:54:56 +0000 (00:54 +0000)]
Change suggestion how to set MAKESYSPATH as broken incremental build
workaround. Magic ".../share/mk" (search directories up to /)
does not work for f.e. /usr/src/gnu/lib/libgcc because the path
inside is starting from /usr/obj hierarchy and ends up in
/usr/share/mk, not in the /usr/src/share/mk where src.opts.mk is.
IMHO proper fixing of incremental build is needed urgently.