Robert Watson [Mon, 30 May 2011 09:43:55 +0000 (09:43 +0000)]
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
Robert Watson [Mon, 30 May 2011 09:34:15 +0000 (09:34 +0000)]
Rework TIMEWAIT regression test so that kernel-allocated port numbers are
used rather than a fixed userspace one, avoiding conflicts between the two
test runs.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Robert Watson [Mon, 30 May 2011 09:04:35 +0000 (09:04 +0000)]
In the tcpdrop regression test, allow the kernel to allocate us a port
rather than using a fixed port number. This means that the regression test
can be run many times in a row without waiting on TIMEWAIT to release a
hard-coded port number.
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Jayachandran C. [Mon, 30 May 2011 06:23:51 +0000 (06:23 +0000)]
Fix read_ivar implementation for MMC and SD.
1. Both mmc_read_ivar() and sdhci_read_ivar() use the expression
'*(int *)result = val' to assign to result which is uintptr_t *.
This does not work on big-endian 64 bit systems.
2. The media_size ivar is declared as 'off_t' which does not fit
into uintptr_t in 32bit systems, change this to long.
Submitted by: kanthms at netlogicmicro com (initial version)
Bjoern A. Zeeb [Sun, 29 May 2011 21:24:20 +0000 (21:24 +0000)]
Split netconfig into three parts:
- netconfig - what auto will call which in turn will check for
IPv4 and IPv6 to be available and ask the user to configure it
by calling
- netconfig_ipv4 doing DHCP and static IPv4 addresses, and
- netconfig_ipv6 doing rtsol and static IPv6 addresses,
and then checking, querying and updating resolv.conf upon return.
Both DHCP and rtsol (in the future) might update resolv.conf already so
we seed ourselves from that file if available.
Reviewed by: nwhitehorn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Rick Macklem [Sun, 29 May 2011 21:13:53 +0000 (21:13 +0000)]
Modify the umount(8) command so that it doesn't do
a sync(2) syscall before unmount(2) for the "-f" case.
This avoids a forced dismount from getting stuck for
an NFS mountpoint in sync() when the server is not
responsive. With this commit, forced dismounts should
normally work for the NFS clients, but can take up to
about 1minute to complete.
Bjoern A. Zeeb [Sun, 29 May 2011 21:03:40 +0000 (21:03 +0000)]
Check for IPv4 or IPv6 to be available by the kernel to not
provoke errors trying to query options not available.
Make it possible to compile out INET or INET6 only parts.
Reviewed by: jamie
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 10 days
Rick Macklem [Sun, 29 May 2011 20:55:23 +0000 (20:55 +0000)]
Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync()
in the old NFS client so that a forced dismount doesn't
get stuck in the VFS_SYNC() call that happens before
VFS_UNMOUNT() in dounmount(). Analagous to r222329 for the new NFS client.
An additional change is needed before forced dismounts will work.
Nathan Whitehorn [Sun, 29 May 2011 20:46:53 +0000 (20:46 +0000)]
Add some error handling here: if a sensor returns an error code (a negative
Kelvin temperature, which is impossible except for some contrived magnetic
spin systems), use the previous measurement from that sensor instead of
corrupting everything and randomly changing the fans or shutting off the
machine.
Nathan Whitehorn [Sun, 29 May 2011 18:35:57 +0000 (18:35 +0000)]
Update the I2C-based temperature/fan drivers to connect to the Powermac
thermal control module. This provides automatic fan management on all G5
PowerMacs and Xserves.
Mikolaj Golub [Sun, 29 May 2011 18:00:50 +0000 (18:00 +0000)]
In soreceive_generic(), if MSG_WAITALL is set but the request is
larger than the receive buffer, we have to receive in sections.
When notifying the protocol that some data has been drained the
lock is released for a moment. Returning we block waiting for the
rest of data. There is a race, when data could arrive while the
lock was released and then the connection stalls in sbwait.
Fix this by checking for data before blocking and skip blocking
if there are some.
Mention that jumbo frame support is disabled on PCIe VT6130/VT6132
controllers because of TX MAC hangs when trying to send a frame
that is larger than 4K (see r200759).
PR: docs/156742
Submitted by: Michael Moll (kvedulv at kvedulv dot de)
Reviewed by: yongari@
MFC after: 6 days
Bjoern A. Zeeb [Sun, 29 May 2011 07:40:48 +0000 (07:40 +0000)]
The argument to setsockopt for IP_MULTICAST_LOOP depends on operating
system and is decided upon by configure and could be an u_int or a
u_char. For FreeBSD it is a u_char.
For IPv6 however RFC 3493, 5.2 defines the argument to
IPV6_MULTICAST_LOOP to be an unsigned integer so make sure we always
use that using a second variable for the IPV6 case.
This is to get rid of these error messages every 5 minutes on some
systems:
ntpd[1530]: setsockopt IPV6_MULTICAST_LOOP failure: Invalid argument
on socket 22, addr fe80::... for multicast address ff02::101
While here also fix the copy&paste error in the log message for
IPV6_MULTICAST_LOOP.
Reviewed by: roberto
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 10 days
Filed as: Bug 1936 on ntp.org
Qing Li [Sun, 29 May 2011 02:21:35 +0000 (02:21 +0000)]
Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface
address so that proper clean up will take place in the routing code.
This patch fixes the bootp panic on startup problem. Also, added more
error handling and logging code in function in_scrubprefix().
Marcel Moolenaar [Sun, 29 May 2011 00:27:42 +0000 (00:27 +0000)]
o Add system versions for the P4040(E) and P4080(E).
o In bare_probe(), change the logic that determines the maximum
number of processors/cores into a switch statement and take
advantage of the fact that bit 3 of the SVR value indicates
whether we're running on a security enabled version. Since we
don't care about that here, mask the bit. All -E versions
are taken care of automatically.
Adrian Chadd [Sun, 29 May 2011 00:17:13 +0000 (00:17 +0000)]
Teach if_ath about devices which have short-GI in 20MHz channel modes.
This has been disabled until now because there hasn't been any supported
device which has this feature. Since the AR9287 is the first device to
support it, and since now the HAL has functional AR9287+11n support,
flip this on.
Nathan Whitehorn [Sat, 28 May 2011 21:54:18 +0000 (21:54 +0000)]
Require an error instead of a timeout to decide the new-style fan
commands won't work. This prevents a busy system from making smu(4)
suddenly decide its fans use the old-style command set.
Nathan Whitehorn [Sat, 28 May 2011 21:13:22 +0000 (21:13 +0000)]
Factor out the SMU fan management code into a new module (powermac_thermal)
that will connect all of the various sensors and fan control modules on
Apple hardware with software-controlled fans (e.g. all G5 systems).
Marcel Moolenaar [Sat, 28 May 2011 19:14:16 +0000 (19:14 +0000)]
o Determine the number of LAWs in a way the is future proof. Only the
MPC8555(E) has 8 LAWs, so don't make that the default case. Current
processors have 12 LAWs so use that as the default instead.
o Determine the target ID of the PCI/PCI-X and PCI-E controllers in
a way that's more future proof. There's almost a perfect mapping
from HC register offset to target ID, so use that as the default.
Handle the MPC8548(E) specially, since it has a non-standard target
ID for the PCI-E controller. Don't worry about whether the processor
implements the target ID here, because we should not get called for
PCI/PCI-X or PCI-E host controllers that don't exist.
Jilles Tjoelker [Sat, 28 May 2011 11:37:47 +0000 (11:37 +0000)]
printf: Allow multibyte characters for '<char> form, avoid negative codes.
Examples:
LC_ALL=en_US.UTF-8 printf '%d\n' $(printf \'\\303\\244)
LC_ALL=en_US.ISO8859-1 printf '%d\n' $(printf \'\\344)
Both of these should print 228.
Like some other shells, incomplete or invalid multibyte characters yield the
value of the first byte without a warning.
Note that there is no general way to go back from the character code to the
character.
Julian Elischer [Sat, 28 May 2011 08:50:38 +0000 (08:50 +0000)]
New boot loader menus from Devin Teske.
Discussed on hackers and recommended for inclusion into 9.0 at the devsummit.
All support email to devin dteske at vicor dot ignoreme dot com .
Submitted by: dteske at vicor dot ignoreme dot com
Reviewed by: me and many others
Marcel Moolenaar [Sat, 28 May 2011 04:10:44 +0000 (04:10 +0000)]
Better support different kernel hand-offs. When loaded directly
from U-Boot, the kernel is passed a standard argc/argv pair.
The Juniper loader passes the metadata pointer as the second
argument and passes 0 in the first. The FreeBSD loader passes
the metadata pointer in the first argument.
As such, have locore preserve the first 2 arguments in registers
r30 & r31. Change e500_init() to accept these arguments. Don't
pass global offsets (i.e. kernel_text and _end) as arguments to
e500_init(). We can reference those directly.
Rename e500_init() to booke_init() now that we're changing the
prototype.
In booke_init(), "decode" arg1 and arg2 to obtain the metadata
pointer correctly. For the U-Boot case, clear SBSS and BSS and
bank on having a static FDT for now. This allows loading the
ELF kernel and jumping to the entry point without trampoline.
Doug Barton [Sat, 28 May 2011 00:21:28 +0000 (00:21 +0000)]
Upgrade to 9.6-ESV-R4-P1, which address the following issues:
1. Very large RRSIG RRsets included in a negative cache can trigger
an assertion failure that will crash named (BIND 9 DNS) due to an
off-by-one error in a buffer size check.
This bug affects all resolving name servers, whether DNSSEC validation
is enabled or not, on all BIND versions prior to today. There is a
possibility of malicious exploitation of this bug by remote users.
2. Named could fail to validate zones listed in a DLV that validated
insecure without using DLV and had DS records in the parent zone.
Add a patch provided by ru@ and confirmed by ISC to fix a crash at
shutdown time when a SIG(0) key is being used.
Marcel Moolenaar [Fri, 27 May 2011 23:18:41 +0000 (23:18 +0000)]
o The P1020(E) & P2020(E) also have two cores. This conditional has
a tendency to grow unwieldy so we may want to revisit this in due
time.
o Simplify the CPU reset function by writing to the reset control
register irrespective of whether the CPU has one and automatically
falling back to the debug control register if we didn't reset the
CPU. The side-effect is that we now properly reset future processors
without first having to add the system version to the list.
Marcel Moolenaar [Fri, 27 May 2011 23:09:12 +0000 (23:09 +0000)]
Wire the kernel using TLB1 entry 0 rather than entry 1. A more recent
U-Boot as found on the P1020RDB doesn't like it when we use entry 1
(for some reason) whereas an older U-Boot doesn't mind if we use entry
0. If anything else, this simplifies the code a bit.
Rick Macklem [Fri, 27 May 2011 22:05:10 +0000 (22:05 +0000)]
Fix the new NFS client so that it handles NFSv4 state
correctly during a forced dismount. This required that
the exclusive and shared (refcnt) sleep lock functions check
for MNTK_UMOUNTF before sleeping, so that they won't block
while nfscl_umount() is getting rid of the state. As
such, a "struct mount *" argument was added to the locking
functions. I believe the only remaining case where a forced
dismount can get hung in the kernel is when a thread is
already attempting to do a TCP connect to a dead server
when the krpc client structure called nr_client is NULL.
This will only happen just after a "mount -u" with options
that force a new TCP connection is done, so it shouldn't
be a problem in practice.
Jilles Tjoelker [Fri, 27 May 2011 20:53:07 +0000 (20:53 +0000)]
sh: Remove the "exp" builtin.
The "exp" builtin is undocumented, non-standard and not very useful.
If exp's return value is not used, something like
VAR=$(exp EXPRESSION)
is equivalent to
VAR=$((EXPRESSION))
except that errors in the expression are fatal and quoting special
characters is not needed in the latter case.
If exp's return value is used, something like
if exp EXPRESSION >/dev/null
can be replaced by
if [ $((EXPRESSION)) -ne 0 ]
with similar differences.
The exp-run showed that "let" is close enough to bash's and ksh's builtin
that removing it would break a few ports. Therefore, "let" remains in 9.x.
PR: bin/104432
Exp-run done by: pav (with some other sh(1) changes)
Remove definitions for RACCT_FSIZE and RACCT_SBSIZE - these two are rather
performance-sensitive and not that useful, so I won't be merging them
before 9.0.
Jilles Tjoelker [Fri, 27 May 2011 15:56:13 +0000 (15:56 +0000)]
sh: Fix unquoted $@/$* if IFS=''.
If IFS is null, unquoted $@/$* should still expand to separate words.
This differs from quoted $@ (which does not depend on IFS) in that pathname
generation is performed and empty words are removed.
Some partitioning tools may have a different opinion about disk
geometry and partitions may start from withing the first track.
If we found such partitions, then do not reserve space of the
first track, only first sector.
Rick Macklem [Thu, 26 May 2011 22:05:35 +0000 (22:05 +0000)]
Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync()
in the new NFS client so that a forced dismount doesn't
get stuck in the VFS_SYNC() call that happens before
VFS_UNMOUNT() in dounmount().
Additional changes are needed before forced dismounts will work.
John Baldwin [Thu, 26 May 2011 20:54:45 +0000 (20:54 +0000)]
For Timedia multiport serial adapters, the first two ports use a SUN1889
which uses a non-standard clock (* 8) while any additional ports use
SUN1699 chips which use a standard clock.
Tested by: N.J. Mann njm of njm me uk
MFC after: 1 week
Kirk McKusick [Thu, 26 May 2011 18:22:49 +0000 (18:22 +0000)]
Raise the default blocksize for UFS/FFS filesystems from
16K to 32K and the default fragment size from 2K to 4K.
The rational is that most disks are now running with 4K
sectors. While they can (slowly) simulate 512-byte sectors
by doing a read-modify-write, it is desirable to avoid this
functionality. By raising the minimum filesystem allocation
to 4K, the filesystem will never trigger the small sector
emulation.
Also, the growth of disk sizes has lead us to double the
default block size about every ten years. The rise from 8K
to 16K blocks was done in 2001. So, by the 10-year metric,
the time has come for 32K blocks.
Discussed at: May 2011 BSDCan Developer Summit
Reference: http://wiki.freebsd.org/201105DevSummit/FileSystems
Marcel Moolenaar [Thu, 26 May 2011 17:02:56 +0000 (17:02 +0000)]
Ignore MCR[6] during the probe to fix a false negative. Bit 6 of the
MCR register on the Sunix Sun1699 chip tends to be set but doesn't
seem to have a function. That is, FreeBSD just works (provided the
correct RCLK is used) regardless.
PR: kern/129663
Diagnostics: Eygene Ryabinkin <rea-fbsd at codelabs.ru>
MFC after: 3 days
Will Andrews [Thu, 26 May 2011 16:27:00 +0000 (16:27 +0000)]
Close a race between libzfs and mountd when updating NFS exports.
- Flush the file descriptor for the new ZFS exports file before
sending a SIGHUP to mountd.
Reviewed by: pjd
Approved by: ken
MFC after: 3 days
Adrian Chadd [Thu, 26 May 2011 14:29:05 +0000 (14:29 +0000)]
Flesh out the TX power calibration for the AR9287.
I'm assuming for now that the AR9287 is only open-loop TX power control
(as mine is) so I've hard-coded the attach path to fail if the NIC is
not open-loop.
This greatly simplifies the TX calibration path and the amount of code
which needs to be ported over.
This still isn't complete - the rate calculation code still needs to be
ported and it all needs to be glued together.
Alexander Motin [Thu, 26 May 2011 09:23:01 +0000 (09:23 +0000)]
Marvell 88SE91xx controllers are known to report soft-reset completion
without waiting for device readiness (or at least not updating FIS receive
area in time). To workaround that, special quirk was added earlier to wait
for the FIS receive area update. But it was found that under same PCI ID
0x91231b4b and revision 0x11 there are two completely different chip
versions (firmware?): HBA and RAID. The problem is that RAID version in
some cases, such as hot-plug, does not update FIS receive area at all!
To workaround that, differentiate the chip versions by their capabilities,
and, if RAID version found, skip FIS receive area update waiting and read
device signature from the PxSIG register instead. This method doesn't work
for HBA version when PMP attached, so keep using previous workaround there.