attilio [Tue, 31 May 2011 20:48:58 +0000 (20:48 +0000)]
Fix KTR_CPUMASK in order to accept a string representing a cpuset_t.
This introduce all the underlying support for making this possible (via
the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64
implements its own assembly primitives for tracing events and needs to
properly check it. Anyway the sparc64 logic is not implemented yet due
to lack of knowledge (by me) and time (by marius), but it is just a
matter of using ktr_cpumask when possible.
mav [Tue, 31 May 2011 09:22:52 +0000 (09:22 +0000)]
Add quirks to hint 4K physical sector (Advanced Format) for ATA disks not
reporting it properly (none? of known disks now).
Hitachi and WDC AF disks seem could be identified more or less formally.
For Seagate and Samsung enumerate some found models/series.
For other disks it can be forced with kern.cam.ada.X.quirks=1 tunable.
pjd [Tue, 31 May 2011 07:02:49 +0000 (07:02 +0000)]
Imagine situation where a security problem is found in setuid binary.
User upgrades his system to fix the problem, but if he has any ZFS snapshots
for the file system which contains problematic binary, any user can mount the
snapshot and execute vulnerable binary.
Prevent this from happening by always mounting snapshots with setuid turned off.
bz [Tue, 31 May 2011 00:25:52 +0000 (00:25 +0000)]
No logner set an IPv4 loopback address by default in defaults/rc.conf.
If not specified, network.subr will add it automatically if we have
INET support (1).
In network.subr only call the address family up/down functions
if the respective AF is available.
Switch to new kern.features variables for inet and inet6 as the
inet sysctl tree is also available for IPv6-only kernels leading
to unexpected results.
Suggested by: hrs (1)
Reviewed by: hrs
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 20 days
jilles [Mon, 30 May 2011 21:41:06 +0000 (21:41 +0000)]
posix_spawn(): Do not fail when trying to close an fd that is not open.
As noted in Austin Group issue #370 (an interpretation has been issued),
failing posix_spawn() because an fd specified with
posix_spawn_file_actions_addclose() is not open is unnecessarily harsh, and
there are existing implementations that do not fail posix_spawn() for this
reason.
np [Mon, 30 May 2011 21:34:44 +0000 (21:34 +0000)]
- Specialized ingress queues that take interrupts for other ingress
queues. Try to have a set of these per port when possible, fall back
to sharing a common pool between all ports otherwise.
- One control queue per port (used to be one per hardware channel).
np [Mon, 30 May 2011 21:07:26 +0000 (21:07 +0000)]
L2 table code. This is enough to get the T4's switch + L2 rewrite
filters working. (All other filters - switch without L2 info rewrite,
steer, and drop - were already fully-functional).
Some contrived examples of "switch" filters with L2 rewriting:
# cxgbetool t4nex0 iport 0 dport 80 action switch vlan +9 eport 3
Intercept all packets received on physical port 0 with TCP port 80 as
destination, insert a vlan tag with VID 9, and send them out of port 3.
# cxgbetool t4nex0 sip 192.168.1.1/32 ivlan 5 action switch \
vlan =9 smac aa:bb:cc:dd:ee:ff eport 0
Intercept all packets (received on any port) with source IP address
192.168.1.1 and VLAN id 5, rewrite the VLAN id to 9, rewrite source mac
to aa:bb:cc:dd:ee:ff, and send it out of port 0.
adrian [Mon, 30 May 2011 15:06:57 +0000 (15:06 +0000)]
Enable setting the short-GI bit when TX'ing HT rates but only if the
hardware supports it.
Since ni->ni_htcap in hostap mode is what the remote end has advertised,
not what has been negotiated/decided, we need to check ourselves what
the current channel width is and what the hardware supports before
enabling short-GI.
It's important that short-GI isn't enabled when it isn't negotiated
and when the hardware doesn't support it (ie, short-gi for 20mhz channels
on any chip < AR9287.)
I've quickly verified this on the AR9285 in 11n mode.
rwatson [Mon, 30 May 2011 09:43:55 +0000 (09:43 +0000)]
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
rwatson [Mon, 30 May 2011 09:34:15 +0000 (09:34 +0000)]
Rework TIMEWAIT regression test so that kernel-allocated port numbers are
used rather than a fixed userspace one, avoiding conflicts between the two
test runs.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
rwatson [Mon, 30 May 2011 09:04:35 +0000 (09:04 +0000)]
In the tcpdrop regression test, allow the kernel to allocate us a port
rather than using a fixed port number. This means that the regression test
can be run many times in a row without waiting on TIMEWAIT to release a
hard-coded port number.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
jchandra [Mon, 30 May 2011 06:23:51 +0000 (06:23 +0000)]
Fix read_ivar implementation for MMC and SD.
1. Both mmc_read_ivar() and sdhci_read_ivar() use the expression
'*(int *)result = val' to assign to result which is uintptr_t *.
This does not work on big-endian 64 bit systems.
2. The media_size ivar is declared as 'off_t' which does not fit
into uintptr_t in 32bit systems, change this to long.
Submitted by: kanthms at netlogicmicro com (initial version)
bz [Sun, 29 May 2011 21:24:20 +0000 (21:24 +0000)]
Split netconfig into three parts:
- netconfig - what auto will call which in turn will check for
IPv4 and IPv6 to be available and ask the user to configure it
by calling
- netconfig_ipv4 doing DHCP and static IPv4 addresses, and
- netconfig_ipv6 doing rtsol and static IPv6 addresses,
and then checking, querying and updating resolv.conf upon return.
Both DHCP and rtsol (in the future) might update resolv.conf already so
we seed ourselves from that file if available.
Reviewed by: nwhitehorn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
rmacklem [Sun, 29 May 2011 21:13:53 +0000 (21:13 +0000)]
Modify the umount(8) command so that it doesn't do
a sync(2) syscall before unmount(2) for the "-f" case.
This avoids a forced dismount from getting stuck for
an NFS mountpoint in sync() when the server is not
responsive. With this commit, forced dismounts should
normally work for the NFS clients, but can take up to
about 1minute to complete.
bz [Sun, 29 May 2011 21:03:40 +0000 (21:03 +0000)]
Check for IPv4 or IPv6 to be available by the kernel to not
provoke errors trying to query options not available.
Make it possible to compile out INET or INET6 only parts.
Reviewed by: jamie
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 10 days
rmacklem [Sun, 29 May 2011 20:55:23 +0000 (20:55 +0000)]
Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync()
in the old NFS client so that a forced dismount doesn't
get stuck in the VFS_SYNC() call that happens before
VFS_UNMOUNT() in dounmount(). Analagous to r222329 for the new NFS client.
An additional change is needed before forced dismounts will work.
nwhitehorn [Sun, 29 May 2011 20:46:53 +0000 (20:46 +0000)]
Add some error handling here: if a sensor returns an error code (a negative
Kelvin temperature, which is impossible except for some contrived magnetic
spin systems), use the previous measurement from that sensor instead of
corrupting everything and randomly changing the fans or shutting off the
machine.
nwhitehorn [Sun, 29 May 2011 18:35:57 +0000 (18:35 +0000)]
Update the I2C-based temperature/fan drivers to connect to the Powermac
thermal control module. This provides automatic fan management on all G5
PowerMacs and Xserves.
trociny [Sun, 29 May 2011 18:00:50 +0000 (18:00 +0000)]
In soreceive_generic(), if MSG_WAITALL is set but the request is
larger than the receive buffer, we have to receive in sections.
When notifying the protocol that some data has been drained the
lock is released for a moment. Returning we block waiting for the
rest of data. There is a race, when data could arrive while the
lock was released and then the connection stalls in sbwait.
Fix this by checking for data before blocking and skip blocking
if there are some.
bcr [Sun, 29 May 2011 11:10:56 +0000 (11:10 +0000)]
Mention that jumbo frame support is disabled on PCIe VT6130/VT6132
controllers because of TX MAC hangs when trying to send a frame
that is larger than 4K (see r200759).
PR: docs/156742
Submitted by: Michael Moll (kvedulv at kvedulv dot de)
Reviewed by: yongari@
MFC after: 6 days
bz [Sun, 29 May 2011 07:40:48 +0000 (07:40 +0000)]
The argument to setsockopt for IP_MULTICAST_LOOP depends on operating
system and is decided upon by configure and could be an u_int or a
u_char. For FreeBSD it is a u_char.
For IPv6 however RFC 3493, 5.2 defines the argument to
IPV6_MULTICAST_LOOP to be an unsigned integer so make sure we always
use that using a second variable for the IPV6 case.
This is to get rid of these error messages every 5 minutes on some
systems:
ntpd[1530]: setsockopt IPV6_MULTICAST_LOOP failure: Invalid argument
on socket 22, addr fe80::... for multicast address ff02::101
While here also fix the copy&paste error in the log message for
IPV6_MULTICAST_LOOP.
Reviewed by: roberto
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 10 days
Filed as: Bug 1936 on ntp.org
qingli [Sun, 29 May 2011 02:21:35 +0000 (02:21 +0000)]
Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface
address so that proper clean up will take place in the routing code.
This patch fixes the bootp panic on startup problem. Also, added more
error handling and logging code in function in_scrubprefix().
marcel [Sun, 29 May 2011 00:27:42 +0000 (00:27 +0000)]
o Add system versions for the P4040(E) and P4080(E).
o In bare_probe(), change the logic that determines the maximum
number of processors/cores into a switch statement and take
advantage of the fact that bit 3 of the SVR value indicates
whether we're running on a security enabled version. Since we
don't care about that here, mask the bit. All -E versions
are taken care of automatically.
adrian [Sun, 29 May 2011 00:17:13 +0000 (00:17 +0000)]
Teach if_ath about devices which have short-GI in 20MHz channel modes.
This has been disabled until now because there hasn't been any supported
device which has this feature. Since the AR9287 is the first device to
support it, and since now the HAL has functional AR9287+11n support,
flip this on.
nwhitehorn [Sat, 28 May 2011 21:54:18 +0000 (21:54 +0000)]
Require an error instead of a timeout to decide the new-style fan
commands won't work. This prevents a busy system from making smu(4)
suddenly decide its fans use the old-style command set.
nwhitehorn [Sat, 28 May 2011 21:13:22 +0000 (21:13 +0000)]
Factor out the SMU fan management code into a new module (powermac_thermal)
that will connect all of the various sensors and fan control modules on
Apple hardware with software-controlled fans (e.g. all G5 systems).
marcel [Sat, 28 May 2011 19:14:16 +0000 (19:14 +0000)]
o Determine the number of LAWs in a way the is future proof. Only the
MPC8555(E) has 8 LAWs, so don't make that the default case. Current
processors have 12 LAWs so use that as the default instead.
o Determine the target ID of the PCI/PCI-X and PCI-E controllers in
a way that's more future proof. There's almost a perfect mapping
from HC register offset to target ID, so use that as the default.
Handle the MPC8548(E) specially, since it has a non-standard target
ID for the PCI-E controller. Don't worry about whether the processor
implements the target ID here, because we should not get called for
PCI/PCI-X or PCI-E host controllers that don't exist.
jilles [Sat, 28 May 2011 11:37:47 +0000 (11:37 +0000)]
printf: Allow multibyte characters for '<char> form, avoid negative codes.
Examples:
LC_ALL=en_US.UTF-8 printf '%d\n' $(printf \'\\303\\244)
LC_ALL=en_US.ISO8859-1 printf '%d\n' $(printf \'\\344)
Both of these should print 228.
Like some other shells, incomplete or invalid multibyte characters yield the
value of the first byte without a warning.
Note that there is no general way to go back from the character code to the
character.
julian [Sat, 28 May 2011 08:50:38 +0000 (08:50 +0000)]
New boot loader menus from Devin Teske.
Discussed on hackers and recommended for inclusion into 9.0 at the devsummit.
All support email to devin dteske at vicor dot ignoreme dot com .
Submitted by: dteske at vicor dot ignoreme dot com
Reviewed by: me and many others
marcel [Sat, 28 May 2011 04:10:44 +0000 (04:10 +0000)]
Better support different kernel hand-offs. When loaded directly
from U-Boot, the kernel is passed a standard argc/argv pair.
The Juniper loader passes the metadata pointer as the second
argument and passes 0 in the first. The FreeBSD loader passes
the metadata pointer in the first argument.
As such, have locore preserve the first 2 arguments in registers
r30 & r31. Change e500_init() to accept these arguments. Don't
pass global offsets (i.e. kernel_text and _end) as arguments to
e500_init(). We can reference those directly.
Rename e500_init() to booke_init() now that we're changing the
prototype.
In booke_init(), "decode" arg1 and arg2 to obtain the metadata
pointer correctly. For the U-Boot case, clear SBSS and BSS and
bank on having a static FDT for now. This allows loading the
ELF kernel and jumping to the entry point without trampoline.
dougb [Sat, 28 May 2011 00:21:28 +0000 (00:21 +0000)]
Upgrade to 9.6-ESV-R4-P1, which address the following issues:
1. Very large RRSIG RRsets included in a negative cache can trigger
an assertion failure that will crash named (BIND 9 DNS) due to an
off-by-one error in a buffer size check.
This bug affects all resolving name servers, whether DNSSEC validation
is enabled or not, on all BIND versions prior to today. There is a
possibility of malicious exploitation of this bug by remote users.
2. Named could fail to validate zones listed in a DLV that validated
insecure without using DLV and had DS records in the parent zone.
Add a patch provided by ru@ and confirmed by ISC to fix a crash at
shutdown time when a SIG(0) key is being used.
marcel [Fri, 27 May 2011 23:18:41 +0000 (23:18 +0000)]
o The P1020(E) & P2020(E) also have two cores. This conditional has
a tendency to grow unwieldy so we may want to revisit this in due
time.
o Simplify the CPU reset function by writing to the reset control
register irrespective of whether the CPU has one and automatically
falling back to the debug control register if we didn't reset the
CPU. The side-effect is that we now properly reset future processors
without first having to add the system version to the list.
marcel [Fri, 27 May 2011 23:09:12 +0000 (23:09 +0000)]
Wire the kernel using TLB1 entry 0 rather than entry 1. A more recent
U-Boot as found on the P1020RDB doesn't like it when we use entry 1
(for some reason) whereas an older U-Boot doesn't mind if we use entry
0. If anything else, this simplifies the code a bit.
rmacklem [Fri, 27 May 2011 22:05:10 +0000 (22:05 +0000)]
Fix the new NFS client so that it handles NFSv4 state
correctly during a forced dismount. This required that
the exclusive and shared (refcnt) sleep lock functions check
for MNTK_UMOUNTF before sleeping, so that they won't block
while nfscl_umount() is getting rid of the state. As
such, a "struct mount *" argument was added to the locking
functions. I believe the only remaining case where a forced
dismount can get hung in the kernel is when a thread is
already attempting to do a TCP connect to a dead server
when the krpc client structure called nr_client is NULL.
This will only happen just after a "mount -u" with options
that force a new TCP connection is done, so it shouldn't
be a problem in practice.
jilles [Fri, 27 May 2011 20:53:07 +0000 (20:53 +0000)]
sh: Remove the "exp" builtin.
The "exp" builtin is undocumented, non-standard and not very useful.
If exp's return value is not used, something like
VAR=$(exp EXPRESSION)
is equivalent to
VAR=$((EXPRESSION))
except that errors in the expression are fatal and quoting special
characters is not needed in the latter case.
If exp's return value is used, something like
if exp EXPRESSION >/dev/null
can be replaced by
if [ $((EXPRESSION)) -ne 0 ]
with similar differences.
The exp-run showed that "let" is close enough to bash's and ksh's builtin
that removing it would break a few ports. Therefore, "let" remains in 9.x.
PR: bin/104432
Exp-run done by: pav (with some other sh(1) changes)
trasz [Fri, 27 May 2011 19:57:58 +0000 (19:57 +0000)]
Remove definitions for RACCT_FSIZE and RACCT_SBSIZE - these two are rather
performance-sensitive and not that useful, so I won't be merging them
before 9.0.
jilles [Fri, 27 May 2011 15:56:13 +0000 (15:56 +0000)]
sh: Fix unquoted $@/$* if IFS=''.
If IFS is null, unquoted $@/$* should still expand to separate words.
This differs from quoted $@ (which does not depend on IFS) in that pathname
generation is performed and empty words are removed.
attilio [Fri, 27 May 2011 15:50:14 +0000 (15:50 +0000)]
In the near future cpuset_t objects in struct pcpu will be axed out, but
as long as this does not happen, we need to fix interfaces to userland
in order to not break run-time accesses to the structure.
ae [Fri, 27 May 2011 06:37:42 +0000 (06:37 +0000)]
Some partitioning tools may have a different opinion about disk
geometry and partitions may start from withing the first track.
If we found such partitions, then do not reserve space of the
first track, only first sector.