markj [Sat, 28 Feb 2015 23:35:29 +0000 (23:35 +0000)]
Remove the old DTrace test suite makefile - it was somewhat primitive and
mostly unmaintained, and it has been superseded by the infrastructure added
in r279418.
markj [Sat, 28 Feb 2015 23:30:06 +0000 (23:30 +0000)]
Add infrastructure to integrate the DTrace test suite with Kyua.
For each test category, we generate a script containing ATF test cases for
the tests under that category. Each test case simply runs dtest.pl (the
upstream test harness) with the corresponding test files. The exclude.sh
script is used to record info about tests which should be skipped or are
expected to fail; it is used to generate atf_skip and atf_expect_fail calls.
The genmakefiles.sh script can be used to regenerate the test makefiles when
new tests are brought it from upstream.
The test suite is currently not connected to the build as there is a small
number of lingering test issues which still need to be worked out. In the
meantime however, the test suite can be easily built and installed
manually from cddl/usr.sbin/dtrace/tests.
loos [Sat, 28 Feb 2015 21:01:01 +0000 (21:01 +0000)]
Add ofw_gpiobus_parse_gpios(), a new public function, to parse the gpios
property for devices that doesn't descend directly from gpiobus.
The parser supports multiple pins, different GPIO controllers and can use
arbitrary names for the property (to match the many linux variants:
cd-gpios, power-gpios, wp-gpios, etc.).
Pass the driver name on ofw_gpiobus_add_fdt_child(). Update gpioled to
match.
An usage example of ofw_gpiobus_parse_gpios() will follow soon.
kib [Sat, 28 Feb 2015 20:37:38 +0000 (20:37 +0000)]
Supposed fix for some SandyBridge mobile CPUs hang on AP startup when
x2APIC mode is detected and enabled. Current theory is that switching
the APIC mode while an IPI is in flight might be the issue.
Postpone switching to x2APIC mode until we are guaranteed that all
starting IPIs are already send and aknowledged. Use aps_ready signal
as an indication that the BSP is done with us.
Tested by: adrian
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
kib [Sat, 28 Feb 2015 19:57:22 +0000 (19:57 +0000)]
Some fixes for fdescfs lookup code.
Do not ever return doomed vnode from lookup. This could happen, if
not checked, since dvp is relocked in the 'looking up ourselves' case.
In the other case, since dvp is relocked, mount point might go away
while fdesc_allocvp() is called. Prevent the situation by doing
vfs_busy() before unlocking dvp. Reuse the vn_vget_ino_gen() helper.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
loos [Sat, 28 Feb 2015 19:02:44 +0000 (19:02 +0000)]
Add a driver for the Maxim DS3231 a low-cost, extremely accurate (+-2PPM)
I2C real-time clock (RTC).
The DS3231 has an integrated temperature-compensated crystal oscillator
(TXCO) and crystal.
DS3231 has a temperature sensor, an independent 32kHz output (which can be
turned on and off by the driver) and another output that can be used as
interrupt for alarms or as a second square-wave output, which frequency and
operation mode can be set by driver sysctl(8) knobs.
Differential Revision: https://reviews.freebsd.org/D1016
Reviewed by: ian, rpaulo
Tested on: Raspberry pi model B
royger [Sat, 28 Feb 2015 15:21:06 +0000 (15:21 +0000)]
netback: disable GSO
The current GSO implementation in netback is broken and causes errors on the
guest tx path. While this is fixed disable GSO in order to have a working
netback.
Sponsored by: Citrix Systems R&D
Discussed with: gibbs
ngie [Sat, 28 Feb 2015 14:57:57 +0000 (14:57 +0000)]
Pad RX copy alignment calculation to avoid illegal memory accesses
The optimization made in r239940 is valid for struct mbuf's current structure
and size in FreeBSD, but hardcodes assumptions about sizes of struct mbuf,
which are unfortunately broken if additional data is added to the beginning of
struct mbuf
X-MFC note (discussed with rwatson):
This change requires the MPKTHSIZE definition, which is only available after
head@r277203 and will not be MFCed as it breaks mbuf(9) KPI.
A direct commit to stable/10 and merges to other branches to add the necessary
definitions to work with the code as-is will be done to facilitate this MFC
kib [Sat, 28 Feb 2015 04:19:02 +0000 (04:19 +0000)]
The umtx_lock mutex is used by top-half of the kernel, but is
currently a spin lock. Apparently, the only reason for this is that
umtx_thread_exit() is called under the process spinlock, which put the
requirement on the umtx_lock. Note that the witness static order list
is wrong for the umtx_lock, umtx_lock is explicitely before any thread
lock, so it is also before sleepq locks.
Change umtx_lock to be the sleepable mutex. For the reason above, the
calls to umtx_thread_exit() are moved from thread_exit() earlier in
each caller, when the process spin lock is not yet taken.
Discussed with: jhb
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
imp [Sat, 28 Feb 2015 00:06:04 +0000 (00:06 +0000)]
Merge latest (commit c8c1b3a77934768c7f7a4a9c10140c8bec529059) files
from the git tree. This merges a lot that we're not using, but there's
too many files to be selective and have a hope of catching everything.
If there are conflicts with the rest of the tree, we'll resolve them
on a case by case basis.
jchandra [Fri, 27 Feb 2015 23:33:53 +0000 (23:33 +0000)]
Add subclass of simplebus for Broadcom XLP
This will override the resource allocation of simplebus, and also
merge the resource allocation code which was in xlp_pci.c.
With this change the SoC devices that does not have proper PCI
resources will be on the FDT simplebus. We can remove
sys/mips/nlm/dev/cfi_pci_xlp.c and sys/mips/nlm/dev/uart_pci_xlp.c
ken [Fri, 27 Feb 2015 21:35:36 +0000 (21:35 +0000)]
Fix I/O size calculation for pass(4) driver requests and add latency
tracking.
It is important to subtract the residual from the requested
transfer size to see how much data was actually transferred. With
tape drives in particular, it is common to request more data than is
returned.
Also, add I/O latency tracking for CAM requests issued by
cam_periph_runccb().
If the caller supplies a struct devstat, and the I/O is a SCSI or
ATA I/O, we will track the elapsed time to provide I/O latency
statistics for the request.
sys/cam/scsi/cam_periph.c:
In cam_periph_runccb(), subtract the residual when reporting I/O
totals to devstat(9) for SCSI and ATA passthrough requests.
In cam_periph_runccb(), grab the I/O start time and supply
the start time to devstat_end_transaction() so that it can
calculate the elapsed I/O time.
imp [Fri, 27 Feb 2015 21:15:12 +0000 (21:15 +0000)]
Make sched_random() return an unsigned number, and use uint32_t
consistently. This also matches the per-cpu pointer declaration
anyway.
This changes the tweak we give to the load from -32..31 to be 0..31
which seems more inline with the rest of the code (- rnd and the -=
64). It should also provide the randomness we need, and may fix a
signedness bug in the old code (it isn't clear that the effect was
intentional as opposed to sloppy, and the right shift of a signed
value is undefined to boot).
This stores sched_balance() behavior when it used random().
jkim [Fri, 27 Feb 2015 19:05:23 +0000 (19:05 +0000)]
When a file is executed and the path starts with `/', AT_EXECPATH is set
without any translation. If the file is a symbolic link, $ORIGIN may not be
expanded to the actual origin. Use realpath(3) to properly expand $ORIGIN
to its absolute path.
kib [Fri, 27 Feb 2015 16:43:50 +0000 (16:43 +0000)]
The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems. Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.
ian [Fri, 27 Feb 2015 16:28:55 +0000 (16:28 +0000)]
Allow the kern.osrelease and kern.osreldate sysctl values to be set in a
jail's creation parameters. This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.
The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.
The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).
There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design. The system
administrator is trusted to set sane values. Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.
kib [Fri, 27 Feb 2015 11:13:46 +0000 (11:13 +0000)]
Since all generations of Intel CPUs have errata which causes hang on
the cache line flush in the LAPIC page, keep direct map page covering
LAPIC mapped uncached.
To have the (incomplete) check for the LAPIC range in
pmap_invalidate_cache_range() working, lapic_paddr must be initialized
in x2APIC mode too.
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
arybchik [Fri, 27 Feb 2015 07:39:09 +0000 (07:39 +0000)]
sfxge: expect required init_state on data path and in periodic calls
With the patch applied the number of instruction events is 1% less and
number of mispredicted branch events is 5% less under multistream TCP
traffic load close to line rate.
Sponsored by: Solarflare Communications, Inc.
Approved by: gnn (mentor)
adrian [Fri, 27 Feb 2015 04:45:47 +0000 (04:45 +0000)]
Fix kern/196290 - don't announce 11n HTINFO rates if the channel is
configured as 11b.
This came up when debugging other issues surrounding scanning and
channel modes.
What's going on:
* The VAP comes up as an 11b VAP, but on an 11n capable NIC;
* .. it announces HTINFO and MCS rates;
* The AP thinks it's an 11n capable device and transmits 11n frames
to the STA;
* But the STA is in 11b mode, and thus doesn't receive/ACK the frames.
It didn't happen for the ath(4) devices as the AR5416/AR9300 HALs
unconditionally enable MCS frame reception, even if the channel
mode is not 11n. But the Intel NICs are configured in 11b/11a/11g
modes when doing those, even if 11n is enabled and available.
So, don't announce 11n capabilities if the VAP isn't on an 11n
channel when sending management assocation request / reassociation
request frames.
TODO:
* Lots more testing - 11n should be "upgraded" after association,
and I just want to make sure I haven't broken 11n upgrade.
I shouldn't have - this is only happening for /sending/ association
requests, which APs aren't doing.
Tested:
* ath(4) APs (AR9331, AR7161+AR9280, AR934x)
* AR5416, STA mode
* Intel 5100, STA mode
imp [Fri, 27 Feb 2015 02:56:58 +0000 (02:56 +0000)]
Create sched_rand() and move the LCG code into that. Call this when
we need randomness in ULE. This removes random() call from the
rebalance interval code.
Submitted by: Harrison Grundy
Differential Revision: https://reviews.freebsd.org/D1968
ken [Fri, 27 Feb 2015 02:44:12 +0000 (02:44 +0000)]
Remove an obsolete comment in devstat(3) about the accuracy of the
milliseconds per transaction (DSM_MS_PER_TRANSACTION) calculation.
The comment was accurate many years ago when the kernel didn't
record I/O times on a per-I/O basis, but now that we do collect
that information in most areas, it isn't correct.
The milliseconds per transaction values are correct, assuming the
I/O duration has been recorded.
pfg [Fri, 27 Feb 2015 01:59:29 +0000 (01:59 +0000)]
Hint out check for unsigned negative values.
On FreeBSD socklen_t is unsigned so the check negative len
in inet6_opt_append() is redundant and likely to be optimized
away by the compiler.
On other operating systems this is not necessarily so, and
in the future we may want to sign it so leave the check in
but place it in a secondary position as a subtle indication
that the bogus check is intentional.
jchandra [Fri, 27 Feb 2015 00:57:09 +0000 (00:57 +0000)]
Improve additional interrupt ACK for Broadcom XLP
Handling some interrupts in XLP (like PCIe and SATA) involves writing to
vendor specific registers as part of interrupt acknowledgement.
This was earlier done with xlp_establish_intr(), but a better solution
is to provide a function xlp_set_bus_ack() that can be used with
cpu_establish_hardintr(). This will allow platform initialization code to
setup these ACKs without changing the standrard drivers.
royger [Thu, 26 Feb 2015 16:05:09 +0000 (16:05 +0000)]
xen/intr: fix fallout from r278854
r278854 introduced a race in the event channel handling code. We must make
sure that the pending bit is cleared before executing the filter, or else we
might miss other events that would be injected after the filter has ran but
before the pending bit is cleared.
While there also mask event channels while FreeBSD executes the ithread
bound to that event channel. This refrains Xen from injecting more
interrupts while the ithread has not finished it's work.
ae [Thu, 26 Feb 2015 15:59:45 +0000 (15:59 +0000)]
When gpart(8) is trying automatically determine the first available
block of free space after existing partition, take into account
provider's stripeoffset, since the result will be adjusted to this
value.
kib [Thu, 26 Feb 2015 11:02:40 +0000 (11:02 +0000)]
Implements EOI suppression mode, where LAPIC on EOI command for
level-triggered interrupt does not broadcast the EOI message to all
APICs in the system. Instead, interrupt handler must follow LAPIC EOI
with IOAPIC EOI. For modern IOAPICs, the later is done by writing to
EOIR register. Otherwise, Intel provided Linux with a trick of
temporary switching the pin config to edge and then back to level.
Detect presence of EOIR register by reading IO-APIC version. The
summary table in the comments was taken from the Linux kernel. For
Intel, newer IO-APICs are only briefly documented as part of the
ICH/PCH datasheet. According to the BKDG and chipset documentation,
AMD LAPICs do not provide EOI suppression, althought IO-APICs do
declare version 0x21 and implement EOIR.
The trick to temporary switch pin to edge mode to clear IRR was tested
on modern chipset, by pretending that EOIR is not present, i.e. by
forcing io_haseoi to zero.
Tunable hw.lapic_eoi_suppression disables the optimization.
Reviewed by: neel
Tested by: pho
Review: https://reviews.freebsd.org/D1943
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
dim [Thu, 26 Feb 2015 07:42:16 +0000 (07:42 +0000)]
Since newer versions of compiler-rt require unwind.h, and we want to use
the copy in libcxxrt for it, fix the arm-specific header to define the
_Unwind_Action type.
dim [Thu, 26 Feb 2015 07:20:05 +0000 (07:20 +0000)]
Make libcxxrt's parsing of DWARF exception handling tables work on
architectures with strict alignment, by using memcpy() instead of
directly reading fields.
jchandra [Thu, 26 Feb 2015 01:53:24 +0000 (01:53 +0000)]
Fix up interrupt definitions for Broadcom XLP
Gather all the IRQ definitions to interrupt.h. Earlier these were in xlp.h
and pic.h. Update the definition of XLP_IRQ_IS_PICINTR to check for last
irq as well.
delphij [Wed, 25 Feb 2015 20:47:25 +0000 (20:47 +0000)]
Explicitly crypt_set_format("des") and bail out if we
can't. This would prevent problem when we changed the
default crypt(3) algorithm or removed it in the future.
kib [Wed, 25 Feb 2015 16:44:07 +0000 (16:44 +0000)]
For now, disable x2APIC mode when Xen is detected, even if CPU
declares support for it. Newer versions of Xen works fine with x2APIC
code, but e.g. Xen 4.2 delivers GPF on the LAPIC MSR write, despite
x2APIC mode being known to hypervisor.
Discussed with: royger
Sponsored by: The FreeBSD Foundation
kib [Wed, 25 Feb 2015 16:18:26 +0000 (16:18 +0000)]
Propagate errors from _thr_umutex_unlock2 through mutex_unlock_common.
Errors from _thr_umutex_unlock2 should "never happen" in normal
circumstances. If they do, however, return them to the application
so it can fail early and loudly. Hiding the errors will only delay
the inevitable failure, making it harder to find and diagnose.
Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
Obtained from: Dell Inc.
PR: 198914
MFC after: 1 week
kib [Wed, 25 Feb 2015 16:17:16 +0000 (16:17 +0000)]
When failing to claim ownership of a umtx_pi, restore the umutex owner
to its previous, unowned state. This avoids compounding an existing
problem of inconsistent ownership.
Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
Obtained from: Dell Inc.
PR: 198914
MFC after: 1 week
kib [Wed, 25 Feb 2015 16:12:56 +0000 (16:12 +0000)]
When unlocking a contested PI pthread mutex, if the queue of waiters
is empty, look up the umtx_pi and disown it if the current thread owns it.
This can happen if a signal or timeout removed the last waiter from
the queue, but there is still a thread in do_lock_pi() holding a reference
on the umtx_pi. The unlocking thread might not own the umtx_pi in this case,
but if it does, it must disown it to keep the ownership consistent between
the umtx_pi and the umutex.
Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com>
with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc.
Obtained from: Dell Inc.
PR: 198914
hselasky [Wed, 25 Feb 2015 13:58:43 +0000 (13:58 +0000)]
Fix a special case in ip_fragment() to produce a more sensible chain
of packets. When the data payload length excluding any headers, of an
outgoing IPv4 packet exceeds PAGE_SIZE bytes, a special case in
ip_fragment() can kick in to optimise the outgoing payload(s). The
code which was added in r98849 as part of zero copy socket support
assumes that the beginning of any MTU sized payload is aligned to
where a MBUF's "m_data" pointer points. This is not always the case
and can sometimes cause large IPv4 packets, as part of ping replies,
to be split more than needed.
Instead of iterating the MBUFs to figure out how much data is in the
current chain, use the value already in the "m_pkthdr.len" field of
the first MBUF in the chain.