yongari [Tue, 13 May 2014 05:19:29 +0000 (05:19 +0000)]
Disable TX IP/TCP/UDP checksum offloading for RTL8168C/RTL8168CP.
Previously only TX IP checksum offloading was disabled but it's
reported that TX checksum offloading for UDP datagrams with IP
options also generates corrupted frames. Reporter's controller is
RTL8168CP but I guess RTL8168C also have the same issue since it
shall share the same core.
neel [Mon, 12 May 2014 23:35:10 +0000 (23:35 +0000)]
abort(3) the process in response to a VMEXIT_ABORT. This usually happens in
response to an unhandled VM exit or an unexpected error so a core is useful.
ian [Mon, 12 May 2014 13:05:03 +0000 (13:05 +0000)]
Interrupts need to be disabled on entry to cpu_sleep() for ARM. Given
that and the need to be in a critical section when switching to idleclock
mode for event timers, use spinlock_enter()/exit() to achieve both needs.
The ARM WFI (wait for interrupt) instruction blocks until an interrupt is
asserted, and it will unblock even if interrupts are masked, and it will
unblock immediately if an interrupt is already pending. It is necessary
to execute it with interrupts disabled, otherwise the interrupt that
should unblock it may occur and be serviced just prior to executing the
instruction. At that point the system is inappropriately asleep until
the next timer tick or some other random interrupt happens.
In general, interrupts need to be disabled continuously from the time the
decision is made that there is no work to be done and sleeping is needed
until actually going to sleep, to avoid a race where handling a new
interrupt changes the basis for deciding there is no work to be done.
tuexen [Mon, 12 May 2014 09:46:48 +0000 (09:46 +0000)]
Disable TX checksum offload for UDP-Lite completely. It wasn't used for
partial checksum coverage, but even for full checksum coverage it doesn't
work.
This was discussed with Kevin Lo (kevlo@).
nwhitehorn [Mon, 12 May 2014 02:56:27 +0000 (02:56 +0000)]
Repair some races in IPI handling:
1. Make sure IPI mask is set before sending the IPI
2. Operate atomically on PS3 PIC outstanding interrupt list
3. Make sure IPIs are EOI'ed before, not after, processing. Without this,
a second IPI could be sent partway through processing the first one,
get erroneously acknowledge by the EOI to the first, and be lost. In
particular in the case of smp_rendezvous(), this can be fatal.
In combination, this makes the PS3 boot SMP again. It probably also fixes
some latent bugs elsewhere.
imp [Sun, 11 May 2014 23:22:32 +0000 (23:22 +0000)]
Attempt to walk a fine line between current usage (/usr/ports which
does an out-of-tree build without setting MAKESYSPATH) and recently
added requirements (JIRA's building the modules in a non-standard
layout). So, when MAKESYSPATH is defined, trust that it will do the
right thing (to catch the JIRA use case). When it isn't defined,
assume a standard FreeBSD tree and reach over to grab bsd.mkopt.mk (to
fix the /usr/ports use case). Both camps cannot be appeased otherwise,
so we have this kludge until it can be sorted out.
jilles [Sun, 11 May 2014 21:21:14 +0000 (21:21 +0000)]
accept(),accept4(): Don't set *addrlen = 0 on [ECONNABORTED].
If the underlying protocol reported an error (e.g. because a connection was
closed while waiting in the queue), this error was also indicated by
returning a zero-length address. For all other kinds of errors (e.g.
[EAGAIN], [ENFILE], [EMFILE]), *addrlen is unmodified and there are
successful cases where a zero-length address is returned (e.g. a connection
from an unbound Unix-domain socket), so this error indication is not
reliable.
As reported in Austin Group bug #836, modifying *addrlen on error may cause
subtle bugs if applications retry the call without resetting *addrlen.
dim [Sun, 11 May 2014 21:07:00 +0000 (21:07 +0000)]
Allow libstdc++ and libsupc++ to compile with clang again, after the
bsd.*.mk infrastructure changes. Apparently, you must now modify
CXXFLAGS *before* including bsd.lib.mk, or your changes will be lost.
alc [Sun, 11 May 2014 17:41:29 +0000 (17:41 +0000)]
With the new-and-improved vm_fault_copy_entry() (r265843), we can always
avoid soft page faults when adding write access to user wired entries in
vm_map_protect(). Previously, we only avoided the soft page fault when
the underlying pages were copy-on-write. In other words, we avoided the
pages faults that might sleep on page allocation, but not the trivial
page faults to update the physical map.
nwhitehorn [Sun, 11 May 2014 16:49:31 +0000 (16:49 +0000)]
Fix interrupt allocation after changes to nexus. This makes PS3 boot
multiuser again (this commit comes from the PS3 itself). Some problems
still exist with SMP, apparently, as I had to boot a non-SMP kernel to
get here.
cperciva [Sun, 11 May 2014 10:32:58 +0000 (10:32 +0000)]
In cf_get_method, when we don't already know what clock speed the CPU is
running at, guess the nearest value instead of looking for a value within
25 MHz of the observed frequency.
Prior to this change, if a system booted with Intel Turbo Boost enabled,
the dev.cpu.0.freq sysctl is nonfunctional, since the ACPI-reported
frequency for Turbo Boost states does not match the actual clock frequency
(and thus no levels are within 25 MHz of the observed frequency) and the
current performance level is read before a new level is set.
MFC after: 3 days
Relnotes: Bug fix in power management on CPUs with Intel Turbo Boost
hselasky [Sun, 11 May 2014 08:17:46 +0000 (08:17 +0000)]
Optimise host mode data roundtrip time. When BULK data is submitted to
the main processing queue, clear the NAK counter for any associated
BULK or CONTROL transfers and poll the endpoint(s) for 1 millisecond
at 125us rate interval, before going into slow, 10ms, NAK polling mode
again. This has the effect that typical ping-ping protocols respond
quicker when initiated from the USB host.
nwhitehorn [Sun, 11 May 2014 05:49:35 +0000 (05:49 +0000)]
Move the PS3 framebuffer console to use vt instead of syscons and adjust
GENERIC64 for PowerPC to use vt with it.
Much to my chagrin, PS3 support seems to have bitrotted somewhat since the
last time I tried it. ehci panics on attach and interrupt handling seems
to be faulty. This should be fixed soon...
ian [Sun, 11 May 2014 04:24:57 +0000 (04:24 +0000)]
Add cpu_l2cache_drain_writebuf(), use it to implement generic_bs_barrier().
On modern ARM SoCs the L2 cache controller sits between the CPU and the
AXI bus, and most on-chip memory-mapped devices are on the AXI bus. We
map the device registers using the 'Device' memory attribute, which means
the memory is not cached, but writes to it are buffered. Ensuring that a
write has made it all the way to a device may require that the L2
controller take some action.
There is currently only one implementation of the new function, for the
PL310 cache controller. It invokes a function that the controller
manual calls "cache sync" but it actually has nothing to do with cache at
all, it triggers a drain of all pending store buffer writes and it blocks
until they complete.
The sheeva and xscale L2 controllers (which predate the concept of Device
memory) don't seem to have a corresponding function. It appears that the
standard armv5 drain_writebuf function includes draining all the way
through the L2 controller.
nwhitehorn [Sun, 11 May 2014 02:18:17 +0000 (02:18 +0000)]
Use vt(4) by default on 32-bit PowerPC now that it is fully functional and
fast. 64-bit PowerPC will follow along once the PS3 framebuffer driver is
adapted.
nwhitehorn [Sun, 11 May 2014 02:16:08 +0000 (02:16 +0000)]
Port over mmap routine from syscons. This lets X11 work on PowerPC with vt.
The last obstacle to switching PowerPC entirely to vt is that the Playstation 3
framebuffer driver needs to be ported over. This only applies for powerpc64,
however.
nwhitehorn [Sun, 11 May 2014 01:58:56 +0000 (01:58 +0000)]
Make ofwfb not be painfully slow. This reduces the time for a verbose boot
on my G4 iBook by more than half. Still 10% slower than syscons, but that's
much better than a factor of 2.
The slowness had to do with pathological write performance on 8-bit
framebuffers, which are almost universally used on Open Firmware systems.
Writing 1 byte at a time, potentially nonconsecutively, resulted in many
extra PCI write cycles. This patch, in the common case where it's writing
one or several characters in an 8x8 font, gangs the writes together into
a set of 32-bit writes. This is a port of r143830 to vt(4).
The EFI framebuffer is also extremely slow, probably for the same reason,
and the same patch will likely help there.
nwhitehorn [Sun, 11 May 2014 01:19:55 +0000 (01:19 +0000)]
Make ofwfb actually work again. Apparently the API it was written against
still exists but is now silently ignored by the VT core. At least xboxfb
needs similar changes.
ian [Sat, 10 May 2014 20:03:03 +0000 (20:03 +0000)]
When mapping device memory, use PTE_DEVICE rather than PTE_NOCACHE.
On armv4 these are defined as synonyms right now, but it's a bit ambiguous
what NOCACHE means (is buffering/write-combining also enabled or not?); this
is a first step towards replacing PTE_NOCACHE with a less ambiguous name.
kib [Sat, 10 May 2014 19:08:07 +0000 (19:08 +0000)]
Invalidate the cache for the named posix semaphore when opened and
actual file storing the semaphore object is different from the file
created on the first open. Store the file st_dev and st_ino members
of the struct stat in the semaphore structure on open, and compare
them with the attributes of the opened file to detect unlink and
re-creation.
This fixes an issue of sem_unlink(3) failing to flush the named entry
in the semaphore list for the current or remote process, making
sem_unlink(3) not correctly operating if the unlinked semaphore is
still opened.
jilles [Sat, 10 May 2014 17:42:21 +0000 (17:42 +0000)]
sh: Don't discard getopts state on unknown option or missing argument.
When getopts finds an invalid option or a missing option-argument, it should
not reset its state and should set OPTIND as normal. This is an old ash bug
that was fixed long ago in dash. Our behaviour now matches most other
shells.
kib [Sat, 10 May 2014 17:03:33 +0000 (17:03 +0000)]
For the upgrade case in vm_fault_copy_entry(), when the entry does not
need COW and is writeable (i.e. becoming writeable due to the
mprotect(2) operation), do not create a new backing object for the
entry. The caller of the function is vm_map_protect(), the call is
made to ensure that wired entry has all pages resident and wired in
the top level object and to enable the write. We might need to copy
read-only page from some backing objects into the top object or remap
the page with the write allowed.
This fixes the issue with mishandling of the swap accounting when
read-only wired mapping is upgraded to write-enabled after fork. The
previous code path did not accounted the new object, but it creation
is redundand anyway and the change provides an optimization for the
non-common situation.
Reported by: markj
Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
imp [Sat, 10 May 2014 16:39:15 +0000 (16:39 +0000)]
bitrotted compat cruft removal:
o KMODDEPS warning is 15 years stale. Remove it.
o MK_CTF will always be defined now, so no need to test to see if it
is defined.
o no need to define MK_FORMAT_EXTENTIONS if undefined anymore.
imp [Sat, 10 May 2014 16:39:08 +0000 (16:39 +0000)]
grep -L returns non-zero status if none of the files had the pattern
in them. This is often the case, so just ignore the return
code. Actual errors that are found will also be detected downstream in
the rare cases where the return code is 2 instead of 1.
imp [Sat, 10 May 2014 16:39:00 +0000 (16:39 +0000)]
Sprinkle a few more .WAITs into the mix after csu, libc, msun and the
early built libraries. This should be sufficient for most cases and
has eliminated the issues I've seen with high -j builds. Races likely
still remain, but this knocks the problem down a notch.
imp [Sat, 10 May 2014 16:38:45 +0000 (16:38 +0000)]
We haven't done anything with _UPGRADING in ~forever (was present, but
not needed, in FreeBSD 6.x, and has been absent in newer versions).
This was needed to upgrade from 3.x -> 4.x, once upon a time.
imp [Sat, 10 May 2014 16:38:09 +0000 (16:38 +0000)]
Simplify clang ifdefs in the kernel a bit. Introduce
CFLAGS.${COMPILER_TYPE} to mirror userland. Be explicit about which
compiler needs something (not clang isn't necessarily gcc in the
future).
imp [Sat, 10 May 2014 16:38:03 +0000 (16:38 +0000)]
Eliminate EARLY_BUILD flag. It is redundant and means MK_CLANG_FULL=no
and MK_LLDB=no, so set those explicitly (now that we can do
that). Simplify tests for these variables as well, since we know they
will always be defined regardless of the phase of the build.
imp [Sat, 10 May 2014 16:37:53 +0000 (16:37 +0000)]
Migrate NO_WARN to MK_WARN. Support legacy NO_WARN usage. Remove a
check for EARLY_BUILD because it isn't necessary (MK_WARN=no will
always be defined for that).
imp [Sat, 10 May 2014 16:37:44 +0000 (16:37 +0000)]
Support, to the extent we generate proper command lines, compiling
with clang 3.3. Useful for test building -current on a -stable system
in individual directories. Potentially useful if we ever want to
support, say, gcc 4.8 or 4.9's new warnings when building with an
external toolchain (but such support not yet committed). Document
the bsd.compiler.mk interface.
imp [Sat, 10 May 2014 16:37:39 +0000 (16:37 +0000)]
Optionally allow building the historical FreeBSD make program and
install it as fmake. This defaults to no. This should be viewed as the
first step towards evental migration of this historic code to ports
and removal from the tree.
kib [Sat, 10 May 2014 16:36:13 +0000 (16:36 +0000)]
When printing the map with the ddb 'show procvm' command, do not dump
page queues for the backing objects. The queues are huge and clutter
the display, when mostly the map entries and its backing storage is
interesting.
The page queues can be seen with ddb 'show object' command.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
mav [Sat, 10 May 2014 15:21:37 +0000 (15:21 +0000)]
Comment out some pointless device open/close around reading device IDs.
FreeBSD ZFS port unlike OpenSolaris does not use device IDs, and does not
implement respective devid_*() fuctions. It is pointless to open devices
just to close them back immediately.
hselasky [Sat, 10 May 2014 07:37:32 +0000 (07:37 +0000)]
Optimise host channel disabling:
- For non-periodic traffic we only need to wait two SOFs before
disabling the channel.
- Make sure we release the TX FIFO tracking level after the host
channel is disabled.
- Make sure the host channel state gets reset/disabled initially.
- Two minor code style changes.
adrian [Sat, 10 May 2014 00:53:36 +0000 (00:53 +0000)]
Add in support to optionally pin the swi threads.
Under enough load, the swi's can actually be preempted and migrated
to other currently free cores. When doing RSS experiments, this lead
to the per-CPU TCP timers not lining up any more with the RX CPU said
flows were ending up on, leading to increased lock contention.
Since there was a little pushback on flipping them on by default,
I've left the default at "don't pin."
The other less obvious problem here is that the default swi
is also the same as the destination swi for CPU #0. So if one
pins the swi on CPU #0, there's no default floating swi.
A nice future project would be to create a separate swi for
the "default" floating swi, as well as per-CPU swis that are
(optionally) pinned.
Tested:
* parallel TCP tests (2 x 1g unfortunately for now);
CPU: Intel(R) Xeon(R) CPU E5-2650
Note:
This is based on some initial investigation into RSS/TCP stack lock
contention on FreeBSD-HEAD whilst at Netflix in January 2014.
imp [Fri, 9 May 2014 21:11:27 +0000 (21:11 +0000)]
Introduce kern.opts.mk to hold all the options for kernel module
builds. Include this in the right places. Make src.opts.mk optional so
that modules can be built outside of the tree in the ports system.
ian [Fri, 9 May 2014 19:14:34 +0000 (19:14 +0000)]
Call idcache_inv_all from the AP core entry code before turning on the MMU.
Also, enable instruction and branch caches, which should be safe now that
they're properly initialized/invalidated first.
hselasky [Fri, 9 May 2014 14:23:06 +0000 (14:23 +0000)]
Multiple DWC OTG host mode related fixes and improvements:
- Rework how we allocate and free USB host channels, so that we only
allocate a channel if there is a real packet going out on the USB
cable.
- Use BULK type for control data and status, due to instabilities in
the HW it appears.
- Split FIFO TX levels into one for the periodic FIFO and one for the
non-periodic FIFO.
- Use correct HFNUM mask when scheduling host transactions. The HFNUM
register does not count the full 16-bit range.
- Correct START/COMPLETION slot for TT transactions. For INTERRUPT and
ISOCHRONOUS type transactions the hardware always respects the ODDFRM
bit, which means we need to allocate multiple host channels when
processing such endpoints, to not miss any so-called complete split
opportunities.
- When doing ISOCHRONOUS OUT transfers through a TT send all data
payload in a single ALL-burst. This deacreases the likelyhood for
isochronous data underruns.
- Fixed unbalanced unlock in case of "dwc_otg_init_fifo()" failure.