John Baldwin [Mon, 2 May 2011 21:05:52 +0000 (21:05 +0000)]
Handle a rare edge case with nearly full TCP receive buffers. If a TCP
buffer fills up causing the remote sender to enter into persist mode, but
there is still room available in the receive buffer when a window probe
arrives (either due to window scaling, or due to the local application
very slowing draining data from the receive buffer), then the single byte
of data in the window probe is accepted. However, this can cause rcv_nxt
to be greater than rcv_adv. This condition will only last until the next
ACK packet is pushed out via tcp_output(), and since the previous ACK
advertised a zero window, the ACK should be pushed out while the TCP
pcb is write-locked.
During the window while rcv_nxt is greather than rcv_adv, a few places
would compute the remaining receive window via rcv_adv - rcv_nxt.
However, this value was then (uint32_t)-1. On a 64 bit machine this
could expand to a positive 2^32 - 1 when cast to a long. In particular,
when calculating the receive window in tcp_output(), the result would be
that the receive window was computed as 2^32 - 1 resulting in advertising
a far larger window to the remote peer than actually existed.
Fix various places that compute the remaining receive window to either
assert that it is not negative (i.e. rcv_nxt <= rcv_adv), or treat the
window as full if rcv_nxt is greather than rcv_adv.
John Baldwin [Mon, 2 May 2011 19:02:30 +0000 (19:02 +0000)]
The ACPI Host-PCI bridge driver actually supports multiple domains via
the optional _SEG function. Return that value (ap->segment) rather than
0 for the pcib domain ivar.
Don't use the whole region 5 for KVA, because the CPU may not implement all
of the 61 bits available within the region for virtual addressing. Since
there's no good way for us to map out the gap in the virtual address space,
limit KVA to the architectural minimum implemented address bits. This still
gives us 1 petabyte of KVA, so no worries.
Dimitry Andric [Mon, 2 May 2011 17:46:59 +0000 (17:46 +0000)]
Remove usr/include/nfs/krpc.h and usr/include/nfs/nfsdiskless.h from
ObsoleteFiles.inc, since these files have been reincarnated in the new
NFS implementation.
John Baldwin [Mon, 2 May 2011 14:13:12 +0000 (14:13 +0000)]
Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.
Rebecca Cran [Mon, 2 May 2011 10:35:27 +0000 (10:35 +0000)]
Add -Wmissing-include-dirs to CWARNFLAGS, so tinderbox will punish those
developers committing new code with broken include directories.
Fix a few whitespace issues.
Improve a couple of comments.
-W is now deprecated and is referred to as -Wextra (see gcc(1)).
Adrian Chadd [Mon, 2 May 2011 05:39:43 +0000 (05:39 +0000)]
Add documentation to sys/conf/options pointing out that AH_SUPPORT_AR9130
shouldn't be enabled by default unless you're truely building for the
AR913x platform.
Rick Macklem [Sun, 1 May 2011 22:19:52 +0000 (22:19 +0000)]
Add the kernel support needed to zero out the nfsstats
structure for the new NFS subsystem. This will be used
by nfsstats.c to implement the "-z" option.
Ulrich Spörlein [Sun, 1 May 2011 20:14:10 +0000 (20:14 +0000)]
recoverdisk(8): treat output file consistently and abort on EINVAL
This improves usability a little as we no longer require using touch.
Also reword the manpage wrt. parameters and fix usage() [1]
With no media in a cd(4) drive, the reads will loop producing EINVAL,
abort in that case [2].
Document the shortcoming of sectorsize and MAXPHYS (a quick solution
to this might be having MAXPHYS as the "bigsize", in short testing it
didn't make a difference on throughput).
Ulrich Spörlein [Sun, 1 May 2011 19:47:34 +0000 (19:47 +0000)]
Let users' PATH decide which groff suite to pick up.
Let groff pass the -c flag to grotty, which will turn off ANSI
sequences. While these are not a problem for our more/less, they get
mangled by col(1) and this will result in garbage output.
This makes man(1) work together with textproc/groff, in case the
user decided to delete the old groff from base (-DWITHOUT_GROFF).
- Remove the following sysctl:
kern.sched.ipiwakeup.onecpu
kern.sched.ipiwakeup.htt2
Because they are absolutely obsolete. Probabilly the whole wakeup
forward mechanism should be revisited for a better fitting in modern
hw.
- As map2 variable is no longer used rename map3 to map2
- Fix a string by making more informative the msg and removing the
arguments passing
Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.
While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.
Introduce two new options MK_INET and MK_INET_SUPPORT analogically
with INET6 equivalents. Patch reather than re-genenerating src.conf
(given the current problem with the script that does the re-gen).
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 2 weeks
Allow MKMODULESENV being preset from other sources like makeoptions
kernel configurations to apply WITH_* WITHOUT_* knobs we use for
module building as well to restrict or control opt_*.h flags.
Reviewed by: imp, +
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 2 weeks
Add some more missing optional dependencies on inet6, not only inet,
to get the files for an IPv6 only kernel as well, remove extra inet6
option where not needed.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Adrian Chadd [Sat, 30 Apr 2011 11:36:16 +0000 (11:36 +0000)]
Add some initial PCIe bridge support for the AR724x chipsets.
This is reported to work on the AR7240 based Ubiquiti Rocket M5
but I haven't tested it on that hardware. I also don't yet have
it fully working on the AR7242 based development board here;
probe/attach functions but the register space resource looks like
the endian-ness is wrong (0x10000000 instead of 0x00001000).o
Make the TCP code compile without INET. Sort #includes and add #ifdef INETs.
Add some comments at #endifs given more nestedness. To make the compiler
happy, some default initializations were added in accordance with the style
on the files.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Michael Tuexen [Sat, 30 Apr 2011 11:18:16 +0000 (11:18 +0000)]
Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
IPv6.
* Checking for additional addresses takes the correct address
into account and also does not do more comparisons than
necessary.
This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.
Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only
as well compiling out most functions adding or extending #ifdef INET
coverage.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Steve Kargl [Fri, 29 Apr 2011 23:13:43 +0000 (23:13 +0000)]
Improve the accuracy from a max ULP of ~2000 to max ULP < 0.79
on i386-class hardware for sinl and cosl. The hand-rolled argument
reduction have been replaced by e_rem_pio2l() implementations. To
preserve history the following commands have been executed:
The ld80 version has been tested by bde, das, and kargl over the
last few years (bde, das) and few months (kargl). An older ld128
version was tested by das. The committed version has only been
compiled tested via 'make universe'.
Add an -E option to mirror newfs's. The idea is that if you have a system
that was built before ffs grew support for TRIM, your filesystem will have
plenty of free blocks that the flash chip doesn't know are free, so it
can't take advantage of them for wear leveling. Once you've upgraded your
kernel, you enable TRIM on the filesystem (tunefs -t enable), then run
fsck_ffs -E on it before mounting it.
I tested this patch by half-filling an mdconfig'ed filesystem image,
running fsck_ffs -E on it, then verifying that the contents were not
damaged by comparing them to a pristine copy using rsync's checksum
functionality. There is no reliable way to test it on real hardware.
Many thanks to mckusick@, who provided the tricky parts of this patch and
reviewed the final version.
Somewhere around the 473rd time I mistyped "mdconfig file" instead of
"mdconfig -f file", I decided that it would be easier to make mdconfig
DWIM than to teach my fingers to type the correct command line.
John Baldwin [Fri, 29 Apr 2011 21:36:45 +0000 (21:36 +0000)]
Add a new bus method, BUS_ADJUST_RESOURCE() that is intended to be a
wrapper around rman_adjust_resource(). Include a generic implementation,
bus_generic_adjust_resource() which passes the request up to the parent
bus. There is currently no default implementation. A
bus_adjust_resource() wrapper is provided for use in drivers.
Implement BIO_DELETE for vnode devices by simply overwriting the deleted
sectors with all-zeroes.
The zeroes come from a static buffer; null(4) uses a dynamic buffer for
the same purpose (for /dev/zero). It might be a good idea to have a
static, shared, read-only all-zeroes page somewhere in the kernel that
md(4), null(4) and any other code that needs zeroes could use.
Rather than trusting that nothing is going to sneak in before the
early_late_divider in the second run (and thus be skipped altogether),
keep a list of the scripts run early, and use that list to skip things
in the second run.
This has the primary benefit of not skipping a local script that gets
ordered too early in the second run. It also gives an opportunity to
clean up/simplify the code a bit.
Use a space-separated list rather than the more traditional colon for
maximum insurance against creativity in local naming conventions.
John Baldwin [Fri, 29 Apr 2011 20:05:19 +0000 (20:05 +0000)]
Extend the rman(9) API to support altering an existing resource.
Specifically, these changes allow a resource to back a relocatable and
resizable resource such as the I/O window decoders in PCI-PCI bridges.
- rman_adjust_resource() can adjust the start and end address of an
existing resource. It only succeeds if the newly requested address
space is already free. It also supports shrinking a resource in
which case the freed space will be marked unallocated in the rman.
- rman_first_free_region() and rman_last_free_region() return the
start and end addresses for the first or last unallocated region in
an rman, respectively. This can be used to determine by how much
the resource backing an rman must be adjusted to accomodate an
allocation request that does not fit into the existing rman.
While here, document the rm_start and rm_end fields in struct rman,
rman_is_region_manager(), the bound argument to
rman_reserve_resource_bound(), and rman_init_from_resource().
John Baldwin [Fri, 29 Apr 2011 18:41:21 +0000 (18:41 +0000)]
Change rman_manage_region() to actually honor the rm_start and rm_end
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
rm_end to allow managing the full range (0 to ~0ul) if they are not set by
the caller when rman_init() is called.
Jung-uk Kim [Fri, 29 Apr 2011 18:20:12 +0000 (18:20 +0000)]
Detect VMware guest and set the TSC frequency as reported by the hypervisor.
VMware products virtualize TSC and it run at fixed frequency in so-called
"apparent time". Although virtualized i8254 also runs in apparent time, TSC
calibration always gives slightly off frequency because of the complicated
timer emulation and lost-tick correction mechanism.
John Baldwin [Fri, 29 Apr 2011 15:40:12 +0000 (15:40 +0000)]
TCP reuses t_rxtshift to determine the backoff timer used for both the
persist state and the retransmit timer. However, the code that implements
"bad retransmit recovery" only checks t_rxtshift to see if an ACK has been
received in during the first retransmit timeout window. As a result, if
ticks has wrapped over to a negative value and a socket is in the persist
state, it can incorrectly treat an ACK from the remote peer as a
"bad retransmit recovery" and restore saved values such as snd_ssthresh and
snd_cwnd. However, if the socket has never had a retransmit timeout, then
these saved values will be zero, so snd_ssthresh and snd_cwnd will be set
to 0.
If the socket is in fast recovery (this can be caused by excessive
duplicate ACKs such as those fixed by 220794), then each ACK that arrives
triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd
to be no larger than snd_ssthresh. In effect, the socket's send window
is permamently stuck at 0 even though the remote peer is advertising a
much larger window and pending data is only sent via TCP window probes
(so one byte every few seconds).
Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that
the various snd_*_prev fields in the pcb are valid and only perform
"bad retransmit recovery" if this flag is set in the pcb. The flag is set
on the first retransmit timeout that occurs and is cleared on subsequent
retransmit timeouts or when entering the persist state.
John Baldwin [Fri, 29 Apr 2011 14:06:37 +0000 (14:06 +0000)]
Add a 'show progress' command that shows a summary of all in-progress
commands for a given adapter. Specifically, it shows the status of any
drive or volume activities currently in progress similar to the
'drive process' and 'volume progress' commands.
Rick Macklem [Thu, 28 Apr 2011 23:21:50 +0000 (23:21 +0000)]
Fix the new NFS client so that it handles the "nfs_args" value
in mnt_optnew. This is needed so that the old mount(2) syscall
works and that is needed so that amd(8) works. The code was
basically just cribbed from sys/nfsclient/nfs_vfsops.c with minor
changes. This patch is mainly to fix the new NFS client so that
amd(8) works with it. Thanks go to Craig Rodrigues for helping with
this.