John Baldwin [Mon, 2 May 2011 14:13:12 +0000 (14:13 +0000)]
Add implementations of BUS_ADJUST_RESOURCE() to the PCI bus driver,
generic PCI-PCI bridge driver, x86 nexus driver, and x86 Host to PCI bridge
drivers.
Rebecca Cran [Mon, 2 May 2011 10:35:27 +0000 (10:35 +0000)]
Add -Wmissing-include-dirs to CWARNFLAGS, so tinderbox will punish those
developers committing new code with broken include directories.
Fix a few whitespace issues.
Improve a couple of comments.
-W is now deprecated and is referred to as -Wextra (see gcc(1)).
Adrian Chadd [Mon, 2 May 2011 05:39:43 +0000 (05:39 +0000)]
Add documentation to sys/conf/options pointing out that AH_SUPPORT_AR9130
shouldn't be enabled by default unless you're truely building for the
AR913x platform.
Rick Macklem [Sun, 1 May 2011 22:19:52 +0000 (22:19 +0000)]
Add the kernel support needed to zero out the nfsstats
structure for the new NFS subsystem. This will be used
by nfsstats.c to implement the "-z" option.
Ulrich Spörlein [Sun, 1 May 2011 20:14:10 +0000 (20:14 +0000)]
recoverdisk(8): treat output file consistently and abort on EINVAL
This improves usability a little as we no longer require using touch.
Also reword the manpage wrt. parameters and fix usage() [1]
With no media in a cd(4) drive, the reads will loop producing EINVAL,
abort in that case [2].
Document the shortcoming of sectorsize and MAXPHYS (a quick solution
to this might be having MAXPHYS as the "bigsize", in short testing it
didn't make a difference on throughput).
Ulrich Spörlein [Sun, 1 May 2011 19:47:34 +0000 (19:47 +0000)]
Let users' PATH decide which groff suite to pick up.
Let groff pass the -c flag to grotty, which will turn off ANSI
sequences. While these are not a problem for our more/less, they get
mangled by col(1) and this will result in garbage output.
This makes man(1) work together with textproc/groff, in case the
user decided to delete the old groff from base (-DWITHOUT_GROFF).
Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.
While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.
Introduce two new options MK_INET and MK_INET_SUPPORT analogically
with INET6 equivalents. Patch reather than re-genenerating src.conf
(given the current problem with the script that does the re-gen).
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 2 weeks
Allow MKMODULESENV being preset from other sources like makeoptions
kernel configurations to apply WITH_* WITHOUT_* knobs we use for
module building as well to restrict or control opt_*.h flags.
Reviewed by: imp, +
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 2 weeks
Add some more missing optional dependencies on inet6, not only inet,
to get the files for an IPv6 only kernel as well, remove extra inet6
option where not needed.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Adrian Chadd [Sat, 30 Apr 2011 11:36:16 +0000 (11:36 +0000)]
Add some initial PCIe bridge support for the AR724x chipsets.
This is reported to work on the AR7240 based Ubiquiti Rocket M5
but I haven't tested it on that hardware. I also don't yet have
it fully working on the AR7242 based development board here;
probe/attach functions but the register space resource looks like
the endian-ness is wrong (0x10000000 instead of 0x00001000).o
Make the TCP code compile without INET. Sort #includes and add #ifdef INETs.
Add some comments at #endifs given more nestedness. To make the compiler
happy, some default initializations were added in accordance with the style
on the files.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Michael Tuexen [Sat, 30 Apr 2011 11:18:16 +0000 (11:18 +0000)]
Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
IPv6.
* Checking for additional addresses takes the correct address
into account and also does not do more comparisons than
necessary.
This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.
Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only
as well compiling out most functions adding or extending #ifdef INET
coverage.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Steve Kargl [Fri, 29 Apr 2011 23:13:43 +0000 (23:13 +0000)]
Improve the accuracy from a max ULP of ~2000 to max ULP < 0.79
on i386-class hardware for sinl and cosl. The hand-rolled argument
reduction have been replaced by e_rem_pio2l() implementations. To
preserve history the following commands have been executed:
The ld80 version has been tested by bde, das, and kargl over the
last few years (bde, das) and few months (kargl). An older ld128
version was tested by das. The committed version has only been
compiled tested via 'make universe'.
Add an -E option to mirror newfs's. The idea is that if you have a system
that was built before ffs grew support for TRIM, your filesystem will have
plenty of free blocks that the flash chip doesn't know are free, so it
can't take advantage of them for wear leveling. Once you've upgraded your
kernel, you enable TRIM on the filesystem (tunefs -t enable), then run
fsck_ffs -E on it before mounting it.
I tested this patch by half-filling an mdconfig'ed filesystem image,
running fsck_ffs -E on it, then verifying that the contents were not
damaged by comparing them to a pristine copy using rsync's checksum
functionality. There is no reliable way to test it on real hardware.
Many thanks to mckusick@, who provided the tricky parts of this patch and
reviewed the final version.
Somewhere around the 473rd time I mistyped "mdconfig file" instead of
"mdconfig -f file", I decided that it would be easier to make mdconfig
DWIM than to teach my fingers to type the correct command line.
John Baldwin [Fri, 29 Apr 2011 21:36:45 +0000 (21:36 +0000)]
Add a new bus method, BUS_ADJUST_RESOURCE() that is intended to be a
wrapper around rman_adjust_resource(). Include a generic implementation,
bus_generic_adjust_resource() which passes the request up to the parent
bus. There is currently no default implementation. A
bus_adjust_resource() wrapper is provided for use in drivers.
Implement BIO_DELETE for vnode devices by simply overwriting the deleted
sectors with all-zeroes.
The zeroes come from a static buffer; null(4) uses a dynamic buffer for
the same purpose (for /dev/zero). It might be a good idea to have a
static, shared, read-only all-zeroes page somewhere in the kernel that
md(4), null(4) and any other code that needs zeroes could use.
Rather than trusting that nothing is going to sneak in before the
early_late_divider in the second run (and thus be skipped altogether),
keep a list of the scripts run early, and use that list to skip things
in the second run.
This has the primary benefit of not skipping a local script that gets
ordered too early in the second run. It also gives an opportunity to
clean up/simplify the code a bit.
Use a space-separated list rather than the more traditional colon for
maximum insurance against creativity in local naming conventions.
John Baldwin [Fri, 29 Apr 2011 20:05:19 +0000 (20:05 +0000)]
Extend the rman(9) API to support altering an existing resource.
Specifically, these changes allow a resource to back a relocatable and
resizable resource such as the I/O window decoders in PCI-PCI bridges.
- rman_adjust_resource() can adjust the start and end address of an
existing resource. It only succeeds if the newly requested address
space is already free. It also supports shrinking a resource in
which case the freed space will be marked unallocated in the rman.
- rman_first_free_region() and rman_last_free_region() return the
start and end addresses for the first or last unallocated region in
an rman, respectively. This can be used to determine by how much
the resource backing an rman must be adjusted to accomodate an
allocation request that does not fit into the existing rman.
While here, document the rm_start and rm_end fields in struct rman,
rman_is_region_manager(), the bound argument to
rman_reserve_resource_bound(), and rman_init_from_resource().
John Baldwin [Fri, 29 Apr 2011 18:41:21 +0000 (18:41 +0000)]
Change rman_manage_region() to actually honor the rm_start and rm_end
constraints on the rman and reject attempts to manage a region that is out
of range.
- Fix various places that set rm_end incorrectly (to ~0 or ~0u instead of
~0ul).
- To preserve existing behavior, change rman_init() to set rm_start and
rm_end to allow managing the full range (0 to ~0ul) if they are not set by
the caller when rman_init() is called.
Jung-uk Kim [Fri, 29 Apr 2011 18:20:12 +0000 (18:20 +0000)]
Detect VMware guest and set the TSC frequency as reported by the hypervisor.
VMware products virtualize TSC and it run at fixed frequency in so-called
"apparent time". Although virtualized i8254 also runs in apparent time, TSC
calibration always gives slightly off frequency because of the complicated
timer emulation and lost-tick correction mechanism.
John Baldwin [Fri, 29 Apr 2011 15:40:12 +0000 (15:40 +0000)]
TCP reuses t_rxtshift to determine the backoff timer used for both the
persist state and the retransmit timer. However, the code that implements
"bad retransmit recovery" only checks t_rxtshift to see if an ACK has been
received in during the first retransmit timeout window. As a result, if
ticks has wrapped over to a negative value and a socket is in the persist
state, it can incorrectly treat an ACK from the remote peer as a
"bad retransmit recovery" and restore saved values such as snd_ssthresh and
snd_cwnd. However, if the socket has never had a retransmit timeout, then
these saved values will be zero, so snd_ssthresh and snd_cwnd will be set
to 0.
If the socket is in fast recovery (this can be caused by excessive
duplicate ACKs such as those fixed by 220794), then each ACK that arrives
triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd
to be no larger than snd_ssthresh. In effect, the socket's send window
is permamently stuck at 0 even though the remote peer is advertising a
much larger window and pending data is only sent via TCP window probes
(so one byte every few seconds).
Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that
the various snd_*_prev fields in the pcb are valid and only perform
"bad retransmit recovery" if this flag is set in the pcb. The flag is set
on the first retransmit timeout that occurs and is cleared on subsequent
retransmit timeouts or when entering the persist state.
John Baldwin [Fri, 29 Apr 2011 14:06:37 +0000 (14:06 +0000)]
Add a 'show progress' command that shows a summary of all in-progress
commands for a given adapter. Specifically, it shows the status of any
drive or volume activities currently in progress similar to the
'drive process' and 'volume progress' commands.
Rick Macklem [Thu, 28 Apr 2011 23:21:50 +0000 (23:21 +0000)]
Fix the new NFS client so that it handles the "nfs_args" value
in mnt_optnew. This is needed so that the old mount(2) syscall
works and that is needed so that amd(8) works. The code was
basically just cribbed from sys/nfsclient/nfs_vfsops.c with minor
changes. This patch is mainly to fix the new NFS client so that
amd(8) works with it. Thanks go to Craig Rodrigues for helping with
this.
Jung-uk Kim [Thu, 28 Apr 2011 22:23:39 +0000 (22:23 +0000)]
Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.
John Baldwin [Thu, 28 Apr 2011 17:44:24 +0000 (17:44 +0000)]
Due to space constraints, the UFS boot2 and boot1 use an evil hack where
boot2 calls back into boot1 to perform disk reads. The ZFS MBR boot blocks
do not have the same space constraints, so remove this hack for ZFS.
While here, remove commented out code to support C/H/S addressing from
zfsldr. The ZFS and GPT bootstraps always just use EDD LBA addressing.
A diagnostic tool to go along with the vxge(4) 10GbE driver.
This tool can be used to print statistics, registers, and
other device specific information once the driver is loaded
into the kernel.
Submitted by: Sriram Rapuru from Exar
MFC after: 2 weeks
Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.
I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).
John Baldwin [Thu, 28 Apr 2011 14:27:17 +0000 (14:27 +0000)]
Sync with several changes in UFS/FFS:
- 77115: Implement support for O_DIRECT.
- 98425: Fix a performance issue introduced in 70131 that was causing
reads before writes even when writing full blocks.
- 98658: Rename the BALLOC flags from B_* to BA_* to avoid confusion with
the struct buf B_ flags.
- 100344: Merge the BA_ and IO_ flags so so that they may both be used in
the same flags word. This merger is possible by assigning the IO_ flags
to the low sixteen bits and the BA_ flags the high sixteen bits.
- 105422: Fix a file-rewrite performance case.
- 129545: Implement IO_INVAL in VOP_WRITE() by marking the buffer as
"no cache".
- Readd the DOINGASYNC() macro and use it to control asynchronous writes.
Change i-node updates to honor DOINGASYNC() instead of always being
synchronous.
- Use a PRIV_VFS_RETAINSUGID check instead of checking cr_uid against 0
directly when deciding whether or not to clear suid and sgid bits.
Adrian Chadd [Thu, 28 Apr 2011 12:52:01 +0000 (12:52 +0000)]
Re-enable the wireless build parameters for the AR9130 WMAC.
* enable 11n
* add ath_ahb so the AHB<->ath glue is linked in
* disable descriptor order swapping, it isn't needed here
* disable interrupt mitigation, it isn't supported here
Adrian Chadd [Thu, 28 Apr 2011 12:47:40 +0000 (12:47 +0000)]
Introduce AR9130 (HOWL) WMAC support to the FreeBSD HAL.
The AR9130 is an AR9160/AR5416 family WMAC which is glued directly
to the AR913x SoC peripheral bus (APB) rather than via a PCI/PCIe
bridge.
The specifics:
* A new build option is required to use the AR9130 - AH_SUPPORT_AR9130.
This is needed due to the different location the RTC registers live
with this chip; hopefully this will be undone in the future.
This does currently mean that enabling this option will break non-AR9130
builds, so don't enable it unless you're specifically building an image
for the AR913x SoC.
* Add the new probe, attach, EEPROM and PLL methods specific to Howl.
* Add a work-around to ah_eeprom_v14.c which disables some of the checks
for endian-ness and magic in the EEPROM image if an eepromdata block
is provided. This'll be fixed at a later stage by porting the ath9k
probe code and making sure it doesn't break in other setups (which
my previous attempt at this did.)
* Sprinkle Howl modifications throughput the interrupt path - it doesn't
implement the SYNC interrupt registers, so ignore those.
* Sprinkle Howl chip powerup/down throughout the reset path; the RTC methods
were
* Sprinkle some other Howl workarounds in the reset path.
* Hard-code an alternative setup for the AR_CFG register for Howl, that
sets up things suitable for Big-Endian MIPS (which is the only platform
this chip is glued to.)
This has been tested on the AR913x based TP-Link WR-1043nd mode, in
legacy, HT/20 and HT/40 modes.
Caveats:
* 2ghz has only been tested. I've not seen any 5ghz radios glued to this
chipset so I can't test it.
* AR5416_INTERRUPT_MITIGATION is not supported on the AR9130. At least,
it isn't implemented in ath9k. Please don't enable this.
* This hasn't been tested in MBSS mode or in RX/TX block-aggregation mode.
Rick Macklem [Thu, 28 Apr 2011 00:20:35 +0000 (00:20 +0000)]
Update man pages related to the change in default NFS client
applied by r221124. I also deleted references to idmapd, since that
daemon no longer exists.
This is a content change.
John Baldwin [Wed, 27 Apr 2011 20:08:44 +0000 (20:08 +0000)]
Only align MSI message groups based on the number of messages being
allocated, not the maximum number of messages the device supports. The
spec only requires the former, and I believe I implemented the latter due
to misunderstanding an e-mail. In particular, this fixes an issue where
having several devices that all support 16 messages can run out of
IDT vectors on x86 even though the driver only uses a single message.
Expose ip_icmp.c to INET6 as well and only export badport_bandlim()
along with the two sysctls in the non-INET case.
The bandlim types work for all cases I reviewed in IPv6 as well and
the sysctls are available as we export net.inet.* from in_proto.c.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Move ip_defttl to raw_ip.c where it is actually used. In an IPv6
only world we do not want to compile ip_input.c in for that and
it is a shared default with INET6.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs. For module builds the framework needs
adjustments for at least carp.
Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days
Rick Macklem [Wed, 27 Apr 2011 18:19:26 +0000 (18:19 +0000)]
This patch is believed to fix a problem in the kernel rpc for
non-interruptible NFS mounts, where a kernel thread will seem
to be stuck sleeping on "rpccon". The msleep() in clnt_vc_create()
that was waiting to a TCP connect to complete would return ERESTART,
since PCATCH was specified. Then the tsleep() in clnt_reconnect_call()
would sleep for 1 second and then try again and again and...
The patch changes the msleep() in clnt_vc_create() so it only sets
the PCATCH flag for interruptible cases.