Alexander Motin [Fri, 1 May 2009 20:53:37 +0000 (20:53 +0000)]
Small addition to r191720.
Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.
Alexander Motin [Fri, 1 May 2009 17:05:49 +0000 (17:05 +0000)]
Use value -1 instead of 0 for marking unused APIC vectors. This fixes
IRQ0 routing on LAPIC-enabled systems.
Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.
On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.
By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.
This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.
Dmitry Chagin [Fri, 1 May 2009 15:36:02 +0000 (15:36 +0000)]
Reimplement futexes.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.
New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.
Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.
Bruce M Simpson [Fri, 1 May 2009 11:05:24 +0000 (11:05 +0000)]
Limit scope of acquisition of INP_RLOCK for multicast input filter
to the scope of its use, even though this may thrash the lock if
the INP is referenced for other purposes.
Alexander Motin [Fri, 1 May 2009 08:03:46 +0000 (08:03 +0000)]
Improve kernel dumping reliability for busy ATA channels:
- Generate fake channel interrupts even if channel busy with previous
request to let it finish. Without this, dumping requests were just queued
and never processed.
- Drop pre-dump requests queue on dumping. ATA code, working in dumping
(interruptless) mode, unable to handle long request queue. Actually, to get
coherent dump we anyway should do as few unrelated actions as possible.
Pyun YongHyeon [Fri, 1 May 2009 03:24:03 +0000 (03:24 +0000)]
Separate multicast filtering of SysKonnect GENESIS and Marvell
Yukon from common multicast handling code. Yukon uses hash-based
multicast filtering(big endian form) but GENESIS uses perfect
multicast filtering as well as hash-based one(little endian form).
Due to the differences of multicast filtering there is no much
sense to have a common code.
o Remove sk_setmulti() and introduce sk_rxfilter_yukon(),
sk_rxfilter_yukon() that handles multicast filtering setup.
o Have sk_rxfilter_{yukon, genesis} handle promiscuous mode and
nuke sk_setpromisc(). This simplifies ioctl handler as well as
giving a chance to check validity of Rx control register of
Yukon.
o Don't reinitialize controller when IFF_ALLMULTI flags is changed.
o Nuke sk_gmchash(), it's not needed anymore.
o Always reconfigure Rx control register whenever a new multicast
filtering condition is changed. This fixes multicast filtering
setup on Yukon.
Jung-uk Kim [Thu, 30 Apr 2009 22:10:04 +0000 (22:10 +0000)]
- Fix divide-by-zero panic when SMP kernel is used on UP system[1].
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.
Submitted by: pluknet (pluknet at gmail dot com)[1]
Jung-uk Kim [Thu, 30 Apr 2009 17:35:44 +0000 (17:35 +0000)]
General sleep state change clean up.
- Probe supported sleep states from acpi_attach() just once and do not
call AcpiGetSleepTypeData() again. It is redundant because
AcpiEnterSleepStatePrep() does it any way.
- Treat UNKNOWN sleep state as NONE, i.e., "do nothing", and remove obscure
NONE state (ACPI_S_STATES_MAX + 1) to avoid confusions.
- Do not set unsupported sleep states as default button/switch events.
If the default sleep state is not supported, just set it as UNKNOWN/NONE.
- Do not allow sleep state change if the system is not fully up and running.
This should prevent entering S5 state multiple times, which causes strange
behaviours later.
- Make sleep states case-insensitive when they are used with sysctl(8).
For example,
Marko Zec [Thu, 30 Apr 2009 13:36:26 +0000 (13:36 +0000)]
Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:
1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:
3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.
4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.
5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.
6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.
Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.
Warner Losh [Thu, 30 Apr 2009 01:24:53 +0000 (01:24 +0000)]
Report the next directory being scanned when a ^T is pressed (or any
SIGINFO). Provides some progress report for the impatient. This
won't report that we're blocking in our walk due to disk/network
problems, however. There's no really good way to report that
condition that I'm aware of...
Alexander Motin [Wed, 29 Apr 2009 21:17:18 +0000 (21:17 +0000)]
Add experimental support for SATA interface power management.
Feature is controlled by hint.ata.X.pm_level tunable:
0 - PM disabled, old behaviour, default.
1 - device is allowed to initiate PM state change, host is passive.
2 - host initiates PARTIAL state transition every time port is idle.
3 - host initiates SLUMBER state transition every time port is idle.
PARTIAL state has up to 100us (50us for me) wakeup latency, but for my
ICH8M saves 0.5W of power per drive. SLUMBER state has up to 10ms (3.5ms
for me) wakeup latency, but saves 0.8W of power.
Modes 2 and 3 are implemented only for AHCI driver now.
Interface power management is incompatible with device presence detection
(host receives no signal from drive, so unable to monitor it), so later is
disabled when PM is used.
Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2). Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
This replaces jail(2).
* jail_get, to read the parameters of existing jails. This replaces the
security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes. The current jail(2) system
call still exists, though it is now a stub to jail_set(2).
Bruce M Simpson [Wed, 29 Apr 2009 19:19:13 +0000 (19:19 +0000)]
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
Warner Losh [Wed, 29 Apr 2009 18:08:18 +0000 (18:08 +0000)]
Implement ^T support for rm: now it will report the next file it
removes when you hit ^T. This is similar to what's done for cp. The
signal handler and type definitions for "info" were borrowed directly
from cp.
Bruce M Simpson [Wed, 29 Apr 2009 11:15:58 +0000 (11:15 +0000)]
Stub out IN6_LOOKUP_MULTI() for GETSPI requests, for now.
This has the effect that IPv6 multicast traffic won't trigger
an SPI allocation when IPSEC is in use, however, this obviously
needs to stomp on locks, and IN6_LOOKUP_MULTI() is about to go away.
This definitely needs to be revisited before 8.x is branched as
a release branch.
Bruce M Simpson [Wed, 29 Apr 2009 10:22:44 +0000 (10:22 +0000)]
Add IN6ADDR_LINKLOCAL_ALLV2ROUTERS_INIT, in6addr_linklocal_allv2routers
for use by MLDv2.
Add IPv6 SSM socket layer membership vector size constants and
tree bounds.
Remove unreferenced struct ipv6_mreq_source; SSM for IPv6 goes
straight to the RFC 3678 socket options.
Bruce M Simpson [Wed, 29 Apr 2009 10:12:01 +0000 (10:12 +0000)]
Fix a problem whereby enqueued IGMPv3 filter list changes would be
incorrectly output, if the RB-tree enumeration happened to reuse the
same chain for a mode switch: that is, both ALLOW and BLOCK records
were appended for the same group, in the same mbuf packet chain.
This was introduced during an mbuf chain layout bug fix involving
m_getptr(), which obviously cannot count from offset 0 on the
second pass through the RB-tree when serializing the IGMPv3
group records into the pending mbuf chain.
Bruce M Simpson [Wed, 29 Apr 2009 09:58:31 +0000 (09:58 +0000)]
Fix an obvious bug in getsourcefilter()'s use of struct __msfilterreq;
the kernel will return in msfr_nsrcs the number of source filters
in-mode for a given multicast group.
However, the filters themselves were never copied out, as the libc
function clobbers this field with zero, causing the kernel to assume
the provided vector of struct sockaddr_storage has zero length.
This bug would only affect users of SSM multicast, which is shimmed
in 7.x.
Picked up during mtest(8) refactoring.
Bruce M Simpson [Wed, 29 Apr 2009 09:54:33 +0000 (09:54 +0000)]
Grab KTR_SPARE1 and KTR_SPARE5 for KTR_INET and KTR_INET6
respectively, as placeholder for future use of CTR() by
the networking code (MLDv2 will be going in shortly).
Mark KTR_SPARE* as being used in fact by cxgb (picked up
on a 'make universe' run).
Jeff Roberson [Wed, 29 Apr 2009 06:54:40 +0000 (06:54 +0000)]
- Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
- Remove the cpu_cores/cpu_logical detection from identcpu.
- Describe the layout of the system in cpu_mp_announce().
Jeff Roberson [Wed, 29 Apr 2009 03:15:43 +0000 (03:15 +0000)]
- Remove the bogus idle thread state code. This may have a race in it
and it only optimized out an ipi or mwait in very few cases.
- Skip the adaptive idle code when running on SMT or HTT cores. This
just wastes cpu time that could be used on a busy thread on the same
core.
- Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive. Re-use
CG_FLAG_THREAD to mean SMT or HTT.
Prevent a superuser inside a jail from modifying the dedicated
root cpuset of that jail.
Processes inside the jail will still be able to change child sets.
A superuser outside of a jail will still be able to change the jail cpuset
and thus limit the number of cpus available to the jail.
Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman)
PR: kern/134050
Reviewed by: jeff
MFC after: 3 weeks
X-MFC: backout r191596
Marius Strobl [Tue, 28 Apr 2009 20:49:47 +0000 (20:49 +0000)]
- Change some softc members to be unsigned where more appropriate.
- Add some missing const.
- Move the size of the window spun by the registers to the softc
as neither using va_mem_size for this nor va_mem_base for the
start of the bus addresses is appropriate.
Change the test at the beginning of strncmp(), from being if (len - 1) < 0
to if (len == 0).
The length is supposed to be unsigned, so len - 1 < 0 won't happen except
if len == 0 anyway, and it would return 0 when it shouldn't, if len was
> INT_MAX.
acpi: do not run resume/backout code when entering S0/S5 states
This change adds (possibly redundant) early check for invalid
state input parameter (including S0). Handling of S5 request
is reduced to simply calling shutdown_nice(). As a result
control flow of acpi_EnterSleepState is somewhat simplified
and resume/backout half of the function is not executed
for S5 (soft poweroff) request and invalid state requests.
Note: it seems that shutdown_nice may act as nop when initproc
is already initialized (to grab pid of 1), but init process is in
"pre-natal" state.
Don't require packet to match a route (any route; this information wasn't
used anyway, so a typical workaround was to add a dummy route) if it's going
to be sent through IPSec tunnel.
Ruslan Ermilov [Tue, 28 Apr 2009 09:45:32 +0000 (09:45 +0000)]
Added (pre|post)(start|stop) jail hooks. These can be used to run
arbitrary commands (outside the jail) associated with said events,
e.g. to bring up/down CARP interfaces representing services run in
jails.
Tim Kientzle [Mon, 27 Apr 2009 22:39:43 +0000 (22:39 +0000)]
Document the liblzma support.
Unfortunately, liblzma itself is GPLed, so unlikely to become part of
the FreeBSD base system.
However, the core lzma compression/decompression code is public
domain, so it should be feasible for someone to create a compatible
library without the GPL strings.
key_gettunnel() has been unsued with FAST_IPSEC (now IPSEC).
KAME had explicit checks at one point using it, so just hide it behind
#if 0 for now until we are sure if we can completely dump it or not.
Tim Kientzle [Mon, 27 Apr 2009 20:09:05 +0000 (20:09 +0000)]
Merge r990,r1044 from libarchive.googlecode.com:
read_support_format_raw() allows people to exploit libarchive's
automatic decompression support by simply stubbing out the
archive format handler.
The raw handler is not enabled by support_format_all(), of course.
It bids 1 on any non-empty input and always returns a single
entry named "data" with no properties set.
Tim Kientzle [Mon, 27 Apr 2009 19:30:09 +0000 (19:30 +0000)]
Merge r1061,r1062,r1063 from libarchive.googlecode.com:
Fix reading big-endian binary cpio archives, and add a test.
While I'm here, add a note about Solaris ACL extension for cpio,
which should be relatively straightforward to support.
Thanks to: Edward Napierala, who sent me a big-endian cpio archive
from a Solaris system he's been playing with.
Pointy hat: me
Tim Kientzle [Mon, 27 Apr 2009 18:55:22 +0000 (18:55 +0000)]
Merge r1032 from libarchive.googlecode.com:
Make test_fuzz a bit more sensitive by actually reading the body
of each entry instead of skipping it.
While I'm here, move the "UnsupportedCompress" macro into the
only file that still uses it.
Tim Kientzle [Mon, 27 Apr 2009 18:39:55 +0000 (18:39 +0000)]
Merge r1054,r1060 from libarchive.googlecode.com:
* assertEqualMem() now takes void * arguments
* Be a little smarter about what we hexdump when assertEqualMem() fails