jilles [Sun, 29 May 2011 15:07:53 +0000 (15:07 +0000)]
MFC r222173: sh: Fix bss-based buffer overflow in . builtin.
If the length of a directory in PATH together with the given filename
exceeded FILENAME_MAX (which may happen even for pathnames that work), a
static buffer was overflown.
The static buffer is unnecessary, we can use the stalloc() stack.
lstewart [Sat, 28 May 2011 13:48:49 +0000 (13:48 +0000)]
MFC r218912,218945,220237:
- Add new man pages for the modular congestion control, Khelp and Hhook
frameworks (cc.4, cc.9, khelp.9 and hhook.9).
- Add new man pages for each available congestion control algorithm (cc_chd.4,
cc_cubic.4, cc_hd.4, cc_htcp.4, cc_newreno.4 and cc_vegas.4).
- Add a new man page for the Enhanced Round Trip Time (ERTT) Khelp module
(h_ertt.4).
- Update the TCP (tcp.4) man page to mention the TCP_CONGESTION socket option,
cross reference to cc.4 and remove references to the retired
"net.inet.tcp.newreno" sysctl MIB variable.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
lstewart [Sat, 28 May 2011 08:34:30 +0000 (08:34 +0000)]
MFC 218155:
Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control
algorithm described in the paper "Improved coexistence and loss tolerance for
delay based TCP congestion control" by Hayes and Armitage. It is implemented as
a kernel module compatible with the recently committed modular congestion
control framework.
CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide
tolerance to non-congestion related packet loss and improvements to coexistence
with loss-based congestion control algorithms. A key idea in improving
coexistence with loss-based congestion control algorithms is the use of a shadow
window, which attempts to track how NewReno's congestion window (cwnd) would
evolve. At the next packet loss congestion event, CHD uses the shadow window to
correct cwnd in a way that reduces the amount of unfairness CHD experiences when
competing with loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
lstewart [Sat, 28 May 2011 08:32:17 +0000 (08:32 +0000)]
MFC 218153:
Import a clean-room implementation of the Hamilton-Delay (HD) congestion control
algorithm based on the paper "A strategy for fair coexistence of loss and
delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and
Baker. It is implemented as a kernel module compatible with the recently
committed modular congestion control framework.
HD uses a probabilistic approach to reacting to delay-based congestion. The
probability of reducing cwnd is zero when the queuing delay is very small,
increasing to a maximum at a set threshold, then back down to zero again when
the queuing delay is high. Normal operation keeps the queuing delay below the
set threshold. However, since loss-based congestion control algorithms push the
queuing delay high when probing for bandwidth, having the probability of
reducing cwnd drop back to zero for high delays allows HD to compete with
loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
lstewart [Sat, 28 May 2011 08:28:37 +0000 (08:28 +0000)]
MFC r218152,218156:
Import a clean-room implementation of the VEGAS congestion control algorithm
based on the paper "TCP Vegas: end to end congestion avoidance on a global
internet" by Brakmo and Peterson. It is implemented as a kernel module
compatible with the recently committed modular congestion control framework.
VEGAS uses network delay as a congestion indicator and unlike regular loss-based
algorithms, attempts to keep the network operating with stable queuing delays
and no congestion losses. By keeping network buffers used along the path within
a set range, queuing delays are kept low while maintaining high throughput.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
lstewart [Sat, 28 May 2011 08:24:22 +0000 (08:24 +0000)]
MFC 217806:
Import the ERTT (Enhanced Round Trip Time) Khelp module. ERTT uses the
Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low
noise estimate of the instantaneous RTT. ERTT's implementation is robust even in
the face of delayed acknowledgements and/or TSO being in use for a connection.
A high quality, low noise RTT estimate is a requirement for applications such as
delay-based congestion control, for which we will be importing some algorithm
implementations shortly.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
lstewart [Sat, 28 May 2011 08:13:39 +0000 (08:13 +0000)]
MFC r216758,217252:
- Add some helper hook points to the TCP stack. The hooks allow Khelp modules to
access inbound/outbound events and associated data for established TCP
connections. The hooks only run if at least one hook function is registered
for the hook point, ensuring the impact on the stack is effectively nil when
no TCP Khelp modules are loaded. struct tcp_hhook_data is passed as contextual
data to any registered Khelp module hook functions.
- Add an OSD (Object Specific Data) pointer to struct tcpcb to allow Khelp
modules to associate per-connection data with the TCP control block.
- Tweak the MFCed code to preserve the ABI of the 8-STABLE branch with respect
to "struct tcpcb" by consuming some of the padding within the struct.
- Bump __FreeBSD_version to 802506.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz, others along the way
lstewart [Sat, 28 May 2011 06:56:09 +0000 (06:56 +0000)]
MFC r216615,217248,217250:
- Introduce the Hhook (Helper Hook) KPI. The KPI is closely modelled on pfil(9),
and in many respects can be thought of as a more generic superset of pfil.
Hhook provides a way for kernel subsystems to export hook points that Khelp
modules can hook to provide enhanced or new functionality to the kernel. The
KPI has been designed to ensure hook points pose no noticeable overhead when
no hook functions are registered.
- Introduce the Khelp (Kernel Helpers) KPI. Khelp provides a framework for
managing Khelp modules, which indirectly use the Hhook KPI to register their
hook functions with hook points of interest within the kernel. Khelp modules
aim to provide a structured way to dynamically extend the kernel at runtime in
an ABI preserving manner. Depending on the subsystem providing hook points, a
Khelp module may be able to associate per-object data for maintaining relevant
state between hook calls.
- pjd's Object Specific Data (OSD) KPI is used to manage the per-object data
allocated to Khelp modules. Create a new "OSD_KHELP" OSD type for use by the
Khelp framework.
- Bump __FreeBSD_version to 802505 to mark the introduction of the new KPIs.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz, others along the way
lstewart [Sat, 28 May 2011 05:28:00 +0000 (05:28 +0000)]
MFC r216115:
Import a clean-room implementation of the experimental H-TCP congestion control
algorithm based on the Internet-Draft "draft-leith-tcp-htcp-06.txt". It is
implemented as a kernel module compatible with the recently committed modular
congestion control framework.
H-TCP was designed to provide increased throughput in fast and long-distance
networks. It attempts to maintain fairness when competing with legacy NewReno
TCP in lower speed scenarios where NewReno is able to operate adequately. The
paper "H-TCP: A framework for congestion control in high-speed and long-distance
networks" provides additional detail.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: rpaulo
lstewart [Sat, 28 May 2011 05:17:52 +0000 (05:17 +0000)]
MFC r216114,217683:
Import a clean-room implementation of the experimental CUBIC congestion control
algorithm based on the Internet-Draft "draft-rhee-tcpm-cubic-02.txt". It is
implemented as a kernel module compatible with the recently committed modular
congestion control framework.
CUBIC was designed for provide increased throughput in fast and long-distance
networks. It attempts to maintain fairness when competing with legacy NewReno
TCP in lower speed scenarios where NewReno is able to operate adequately. The
paper "CUBIC: A New TCP-Friendly High-Speed TCP Variant" provides additional
detail.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: rpaulo
- Add a KPI and supporting infrastructure to allow modular congestion control
algorithms to be used in the net stack. Algorithms can maintain per-connection
state if required, and connections maintain their own algorithm pointer, which
allows different connections to concurrently use different algorithms. The
TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to
programmatically query or change the congestion control algorithm respectively
from within an application at runtime.
- Integrate the framework with the TCP stack in as least intrusive a manner as
possible. Care was also taken to develop the framework in a way that should
allow integration with other congestion aware transport protocols (e.g. SCTP)
in the future. The hope is that we will one day be able to share a single set
of congestion control algorithm modules between all congestion aware transport
protocols.
- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack
and use it to decouple the meaning of recovery from a congestion event and
recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based
congestion control protocols don't generally need to recover from packet loss
and need a different way to note a congestion recovery episode within the
stack.
- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code
and ensures the stack always uses the appropriate mechanisms for recovering
from packet loss during a congestion recovery episode.
- Extract the NewReno congestion control algorithm from the TCP stack and
massage it into module form. NewReno is always built into the kernel and will
remain the default algorithm for the forseeable future. Implementations of
additional different algorithms will become available in the near future.
- Tweak the MFCed code to preserve the ABI of the 8-STABLE branch with respect
to "struct tcpcb" by consuming some of the padding within the struct.
- Bump __FreeBSD_version to 802504.
In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: rpaulo (r215166), bz (r215391,215395,216749,217748)
Tested by: David Hayes (r215166), trociny (r215377,215391,215392,215395)
dougb [Sat, 28 May 2011 00:33:06 +0000 (00:33 +0000)]
Upgrade to 9.6-ESV-R4-P1, which address the following issues:
1. Very large RRSIG RRsets included in a negative cache can trigger
an assertion failure that will crash named (BIND 9 DNS) due to an
off-by-one error in a buffer size check.
This bug affects all resolving name servers, whether DNSSEC validation
is enabled or not, on all BIND versions prior to today. There is a
possibility of malicious exploitation of this bug by remote users.
2. Named could fail to validate zones listed in a DLV that validated
insecure without using DLV and had DS records in the parent zone.
Add a patch provided by ru@ and confirmed by ISC to fix a crash at
shutdown time when a SIG(0) key is being used.
yongari [Fri, 27 May 2011 21:43:35 +0000 (21:43 +0000)]
MFC r221565-221568,221579:
r221565:
Reuse the TX descriptor(DPD) if xl_encap() failed instead of just
picking the next available one. This may explain why xl(4) sees TX
underrun error with no queued frame. I hope this addresses a long
standing xl(4) watchdog timeout issue as well.
Obtained from: OpenBSD
r221566,221579:
Rename xl_stats_update() callout handler to xl_tick() and move MII
tick driving logic to xl_tick(). Now xl_tick() handles MII tick as
well as periodic updating of statistics.
This change removes a hack used in interrupt handler where it
wanted to update statistics without driving MII tick.
r221567:
Rearm watchdog timer if driver kick controller to recover from TX
underrun error.
While here, prepend 0x to status code to show TX status is hex
number.
r221568:
XL_DMACTL is 32bit register, use 32bit write macro.
While I'm here add more bits for the register.
yongari [Fri, 27 May 2011 20:33:26 +0000 (20:33 +0000)]
MFC r221563-221564:
r221563:
Terminate interrupt handler if driver detect it's not running.
Also add check for driver running state before trying to send
frames. While I'm here, use for loop.
r221564:
Change xl_rxeof() a bit to return the number of processed frames in
RX descriptor ring. Previously it returned the number of frames
that were successfully passed to upper stack which in turn means it
ignored frames that were discarded due to errors. The number of
processed frames in RX descriptor ring is used to detect whether
driver is out of sync with controller's current descriptor pointer.
Returning number of processed frames reduces unnecessary (probably
wrong) re-synchronization.
While here, remove unnecessary local variable initialization.
r221558:
Set status word once instead of twice. For 3C90xB/3C90xC, frame
length of status word is ignored. While here move bus_dmamap_sync()
up where DMA map is loaded.
r221560:
Call bus_dmamap_sync() only after TX DPD update.
r221561:
Updating status word should be the last operation of UPD structure
renewal. Disable instruction reordering by adding volatile to
xl_list_onefrag structure.
yongari [Fri, 27 May 2011 19:26:12 +0000 (19:26 +0000)]
MFC r221555:
Rewrite RX filter logic and provide controller specific filter
handler for 3C90x and 3C90xB/C respectively. This simplifies ioctl
handler as well as enhancing readability.
While I'm here don't reprogram multicast filter when driver is not
running.
yongari [Fri, 27 May 2011 18:58:08 +0000 (18:58 +0000)]
MFC r222135:
Remove unnecessary controller reinitialization by checking
IFF_DRV_RUNNING flag. Previously running dhclient or adding alias
addresses reinitialized controller and it resulted in unnecessary
link flips.
yongari [Fri, 27 May 2011 18:46:24 +0000 (18:46 +0000)]
MFC r221712:
Since r117657, bge(4) does not enable buffer manager for BCM5705 or
newer controllers. However, all data sheet I have access has no
indication that buffer manager should not be touched on these
controllers. It seems the buffer manager always runs on BCM5705 or
newer controllers. Some controller(e.g. BCM5719) needs other buffer
manager configuration so driver should enable buffer manager for
all controllers. Both Linux and OpenBSD/NetBSD use the same
approach.
This change polls enable bit of block to know whether specified
block was really stopped as well as enabling buffer manager for all
controllers in driver initialization.
yongari [Fri, 27 May 2011 18:39:18 +0000 (18:39 +0000)]
MFC r222142:
Datasheet says vge(4) controllers support DAC but it seems that's
not true on old PCI based controllers. DAC configuration is read
from EEPROM in device reset phase and driver can override DAC
configuration. However I guess there is an undocumented reason why
EEPROM configuration does not enable DAC so do not blindly override
DAC configuration. Recent PCIe based controllers are supposed to
support 64bit DMA so allow 64bit DMA only on PCIe based controllers.
zec [Fri, 27 May 2011 08:43:59 +0000 (08:43 +0000)]
MFC r222257:
Assume the link to be dead if bit error rate (BER) parameter is set to 1.
When a transition from link alive to link dead configuration or vice
versa occurs, notify any upstream and / or downstream peers using
NGM_FLOW messagges.
Link state notification using NGM_FLOW messages is modelled around
around already existing code in ng_ether.c.
zec [Fri, 27 May 2011 08:43:03 +0000 (08:43 +0000)]
MFC r222255:
Provide fake link status information in an attempt to let ng_eiface(4)
virtual ifnets more realistically mimic physical ethernet interfaces.
The main motivation behind this change is to allow for ng_eiface(4)
interfaces to participate in STP if_bridge(4) configurations.
When announcing link status changes, switch to the vnet to which the
ifnet belongs, since it is possible for ng_eiface ifnets to be assigned
to a vnet different from the one in which its netgraph node resides.
zec [Fri, 27 May 2011 08:41:57 +0000 (08:41 +0000)]
MFC r222247:
Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured
on top of epair(4) virtual interfaces, since there's no physical
hardware associated with epair interfaces which would imply any
constraints on MTU sizes.
zec [Fri, 27 May 2011 08:40:26 +0000 (08:40 +0000)]
MFC r222246:
Let epair(4) virtual interfaces report fake link / media status,
by borrowing the skeleton of if_media manipulation and reporting
code from if_lagg(4). The main motivation behind this change is
to allow for epair(4) interfaces to participate in STP if_bridge(4)
configurations.
mav [Thu, 26 May 2011 00:37:44 +0000 (00:37 +0000)]
MFC r220911:
Make PATA-like soft-reset in ata(4) more strict in checking disk signature.
It allows to avoid false positive device detection under Xen, that caused
long probe delays due to subsequent IDENTIFY command timeouts.
kib [Wed, 25 May 2011 03:25:14 +0000 (03:25 +0000)]
MFC r222086:
The protection against the race with dev_rel(), introduced in r163328,
should be extended to cover destroy_devl() calls for the children of the
destroyed dev.
mdf [Tue, 24 May 2011 16:04:35 +0000 (16:04 +0000)]
MFC r221843:
Note that the _SWAP operation is supported for all list/queue types.
Also place STAILQ_REMOVE_HEAD in alphabetical order. Lastly, document
the _SWAP macros.
jilles [Sun, 22 May 2011 22:28:07 +0000 (22:28 +0000)]
MFC r208489,r216208: sh: Reap any zombies before forking for a background
command.
This prevents accumulating huge amounts of zombies if a script executes
many background commands but no external commands or subshells.
Note that zombies will not be reaped during long calculations (within
the shell process) or read builtins, but those actions do not create
more zombies.
The terminated background commands will also still be remembered by the
shell.
r216208 fixes a bug in r208489 that could cause a multi-command pipeline to
be marked as done before all processes had been created.
nwhitehorn [Sun, 22 May 2011 15:01:02 +0000 (15:01 +0000)]
MFC r221519,221813:
Do not use Open Firmware to open the device and instead program its start
on our own. This prevents hangs at boot when using a bm(4) NIC where the
cable is not plugged in at boot time.
yongari [Sat, 21 May 2011 00:38:43 +0000 (00:38 +0000)]
MFC r221548,221552:
r221548:
Do not increment collision counter if transmit have failed.
Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it
has nothing to do with collision.
r221552:
Fix white space nits and style
yongari [Fri, 20 May 2011 20:29:50 +0000 (20:29 +0000)]
MFC r221468:
Enable Ethernet@WireSpeed for BCM5718/BCM57765 family. While I'm
here inverse meaning of PHY flag as Ethernet@WireSpeed is enabled
for most PHYs.
yongari [Fri, 20 May 2011 20:26:16 +0000 (20:26 +0000)]
MFC r221445:
Add initial BCM57765 family support. The BCM57765 family seems to
have similar hardware features of BCM5718 family except the number
of receive return ring is 4. The BCM57765 family is known to
support IEEE 802.3az EEE(Energy Efficient Ethernet) but this change
does not include EEE support code. I hope EEE is implemented in
near future.
This change will support BCM57761, BCM57765, BCM57781, BCM57785,
BCM57791 and BCM57795. All hardware offloading features are
supported and suspend/resume also should work.
Many thanks to Broadcom for continuing support of FreeBSD.
trociny [Fri, 20 May 2011 17:29:03 +0000 (17:29 +0000)]
MFC r221632, r221643:
r221632:
Fix isitme(), which is used to check if node-specific configuration
belongs to our node, and was returning false positive if the first
part of a node name matches short hostname.
r221643 (pjd):
Allow to specify remote as 'none' again which was broken by r219351, where
'none' was defined as a value for checksum.
rmacklem [Fri, 20 May 2011 01:04:33 +0000 (01:04 +0000)]
MFC: r221537
Set the initial value of maxfilesize to OFF_MAX in the
new NFS client. It will then be reduced to whatever the
server says it can support. There might be an argument
that this could be one block larger, but since NFS is
a byte granular system, I chose not to do that.
rmacklem [Fri, 20 May 2011 00:51:52 +0000 (00:51 +0000)]
MFC: r221517
Change the new NFS server so that it returns 0 when the f_bavail
or f_ffree fields of "struct statfs" are negative, since the
values that go on the wire are unsigned and will appear to be
very large positive values otherwise. This makes the handling
of a negative f_bavail compatible with the old/regular NFS server.
yongari [Thu, 19 May 2011 17:18:13 +0000 (17:18 +0000)]
MFC r221817:
Explicitly clear 1000baseT control register for F1 PHY used in
AR8132 FastEthernet controller. The PHY has no ability to
establish a gigabit link. Previously only link parters which
support down-shifting was able to establish link.
This change should fix a long standing link establishment issue of
AR8132.
rmacklem [Thu, 19 May 2011 01:35:52 +0000 (01:35 +0000)]
MFC: r221467
Fix the new NFS client so that it handles the 64bit fields
that are now in "struct statfs" for NFSv3 and NFSv4. Since
the ffiles value is uint64_t on the wire, I clip the value
to INT64_MAX to avoid setting f_ffree negative.
rmacklem [Wed, 18 May 2011 02:14:26 +0000 (02:14 +0000)]
MFC: r221462
Add a comment noting that the NFS code assumes that the
values of error numbers in sys/errno.h will be the same
as the ones specified by the NFS RFCs and that the code
needs to be fixed if error numbers are changed in sys/errno.h.
rmacklem [Wed, 18 May 2011 01:14:27 +0000 (01:14 +0000)]
MFC: r221439
Add kernel support for NFSSVC_ZEROCLTSTATS and NFSSVC_ZEROSRVSTATS
so that they can be used by nfsstat(1) to implement the "-z" option
for the new NFS subsystem.
delphij [Mon, 16 May 2011 18:12:32 +0000 (18:12 +0000)]
MFC r217588+218386 (trasz):
Add MNT_NFS4ACLS to ZFS mount flags and make it impossible to clear
the flag by using 'mount -uw'. It's not conditional, since there
is no way to disable NFSv4 ACLs in ZFS. This should make it easier
for the NFS server to figure out whether the exported filesystem supports
ACLs or not.
kib [Sun, 15 May 2011 06:42:32 +0000 (06:42 +0000)]
MFC r220985:
Move some parts of ufs_reclaim() into helper function ufs_prepare_reclaim(),
and call the helper from VOP_RECLAIM and ffs_valloc() to properly prepare
the ufs vnode for reuse.
attilio [Sun, 15 May 2011 01:08:51 +0000 (01:08 +0000)]
MFC r221121,221173:
- Add the possibility to reuse the already last used timeout when patting
the watchdog, via the watchdog(9) interface.
- Add the possibility to pat the watchdogs installed via the watchdog(9)
interface from the kernel.
- Avoid to pass WD_ACTIVE down in the watchdog handlers. All the control
bit processing should over to the upper layer functions and not passed
down to the handlers at all.
- Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
rmacklem [Sun, 15 May 2011 00:43:51 +0000 (00:43 +0000)]
MFC: r221190,r221205
Fix the new NFS client so that it handles the "nfs_args" value
in mnt_optnew. This is needed so that the old mount(2) syscall
works and that is needed so that amd(8) works. The code was
basically just cribbed from sys/nfsclient/nfs_vfsops.c with minor
changes. This patch is mainly to fix the new NFS client so that
amd(8) works with it. Thanks go to Craig Rodrigues for helping with
this.
rmacklem [Sun, 15 May 2011 00:25:19 +0000 (00:25 +0000)]
MFC: r221127
This patch is believed to fix a problem in the kernel rpc for
non-interruptible NFS mounts, where a kernel thread will seem
to be stuck sleeping on "rpccon". The msleep() in clnt_vc_create()
that was waiting to a TCP connect to complete would return ERESTART,
since PCATCH was specified. Then the tsleep() in clnt_reconnect_call()
would sleep for 1 second and then try again and again and...
The patch changes the msleep() in clnt_vc_create() so it only sets
the PCATCH flag for interruptible cases.
rmacklem [Sun, 15 May 2011 00:11:00 +0000 (00:11 +0000)]
MFC: r221032,r221040,r221066
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.
Also, make the changes to the experimental NFS client so
that it uses the moved diskless NFS root files and fixes
it so that it links for cases where "options NFS_ROOT" is
not specified for the kernel config.
dougb [Sat, 14 May 2011 21:42:08 +0000 (21:42 +0000)]
MFC r221475:
1. If PKG_DBDIR cannot be determined from make, set the default
2. Add the -H flag to tar in case /var/db/pkg itself is a symlink
3. Direct stderr to /dev/null to suppress the leading slash warning
marius [Sat, 14 May 2011 21:15:49 +0000 (21:15 +0000)]
MFC: r220039, 220147
- A closer inspection of the OpenSolaris code indicates that the DMA
syncing for Hummingbird and Sabre bridges should be applied with every
BUS_DMASYNC_POSTREAD instead of in a wrapper around interrupt handlers
for devices behind PCI-PCI bridges only as suggested by the documentation
(code for the latter actually exists in OpenSolaris but is disabled by
default), which also makes more sense.
- Take advantage of the ofw_pci_setup_device method introduced in r220038
(MFC'ed to stable/8 in r221923) for disabling bus parking for certain
EBus bridges in order to work around hardware bugs.
- Mark some unused parameters as such.
marius [Sat, 14 May 2011 21:12:00 +0000 (21:12 +0000)]
MFC: r220038
- Merge the *_SET macros from fire(4) which generally print out the
register changes when compiled with SCHIZO_DEBUG and take advantage
of them.
- Add support for the XMITS Fireplane/Safari to PCI-X bridges. I tought
I'd need this for a Sun Fire 3800, which then turned out to not being
equipped with such a bridge though. The support for these should be
complete but given that it hasn't actually been tested probing is
disabled for now.
This required a way to alter the XMITS configuration in case a PCI-X
device is found further down the device tree so the sparc64 specific
ofw_pci kobj was revived with a ofw_pci_setup_device method, which is
called by the ofw_pcibus code for every device added.
- A closer inspection of the OpenSolaris code indicates that consistent
DMA flushing/syncing as well as the block store workaround should be
applied with every BUS_DMASYNC_POSTREAD instead of in a wrapper around
interrupt handlers for devices behind PCI-PCI bridges only as suggested
by the documentation (code for the latter actually exists in OpenSolaris
but is disabled by default), which also makes more sense.
- Add a workaround for Casinni/Skyhawk combinations. Chances are that
this solves the crashes seen when using the the on-board Casinni NICs
of Sun Fire V480 equipped with centerplanes other than 501-6780 or
501-6790. This also takes advantage of the ofw_pci_setup_device method.
- Mark some unused parameters as such.
marius [Sat, 14 May 2011 21:07:51 +0000 (21:07 +0000)]
MFC: r219785
- Make a panic message better reflect the actual problem.
- A closer inspection of the OpenSolaris code indicates the block store
workaround is only necessary in case of BUS_DMASYNC_POSTREAD.
- Mark some unused parameters as such.
marius [Sat, 14 May 2011 21:03:44 +0000 (21:03 +0000)]
MFC: r216803, r217058, r217514, r218457
On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS,
which takes an physical address instead of an virtual one, for loading TTEs
of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB,
which only has a very limited number of lockable dTLB slots. The net result
is that we now basically can handle a kernel TSB of any size and no longer
need to limit the kernel address space based on the number of dTLB slots
available for locked entries. Consequently, other parts of the trap handlers
now also only access the the kernel TSB via its physical address in order
to avoid nested traps, as does the PMAP bootstrap code as we haven't taken
over the trap table at that point, yet. Apart from that the kernel TSB now
is accessed via a direct mapping when we are otherwise taking advantage of
ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this
is implemented by extending the patching of the TSB addresses and mask as
well as the ASIs used to load it into the trap table so the runtime overhead
of this change is rather low.
Theoretically it should be possible to use the same approach also for the
user TSB, which already is not locked into the dTLB, avoiding nested traps.
However, for reasons I don't understand yet OpenSolaris only does that with
SPARC64 CPUs. On the other hand I think that also addressing the user TSB
physically and thus avoiding nested traps would get us closer to sharing
this code with sun4v, which only supports trap level 0 and 1, so eventually
we could have a single kernel which runs on both sun4u and sun4v (as does
Linux and OpenBSD).
rmacklem [Sat, 14 May 2011 02:28:21 +0000 (02:28 +0000)]
MFC: r221014,r221018
Modify the experimental NFS client so that it uses the same
"struct nfs_args" as the regular NFS client. This is needed
so that the old mount(2) syscall will work and it makes
sharing of the diskless NFS root code easier. Eary in the
porting exercise I introduced a new revision of nfs_args, but
didn't actually need it, thanks to nmount(2). I re-introduced the
NFSMNT_KERB flag, since it does essentially the same thing and
the old one would not have been used because it never worked.
I also added a few new NFSMNT_xxx flags to sys/nfsclient/nfs_args.h
that are used by the experimental NFS client.
Also fix the NFS client so that it doesn't bogusly set the
f_flags argument of "struct statfs",