kib [Fri, 25 Mar 2011 22:31:28 +0000 (22:31 +0000)]
Report EBUSY instead of EROFS for attempt of deleting or renaming the
root directory of msdosfs mount. The VFS code would handle deletion
case itself too, assuming VV_ROOT flag is not lost. The msdosfs_rename()
should also note attempt to rename root via doscheckpath() or different
mount point check leading to EXDEV. Nonetheless, keep the checks for now.
The change is inspired by NetBSD change referenced in PR, but return
EBUSY like kern_unlinkat() does.
np [Fri, 25 Mar 2011 20:53:02 +0000 (20:53 +0000)]
Update T3 firmware to 7.11.0
Changes since 7.8.0 (from the official changelog):
- Fixed sporadic interrupt generation for associated CQ when processing
a local invalidate work request
- Changes to core scheduling to avoid starving requests from the host
under heavy RDMA Read Request load (e.g. packets to the wire)
- Programmed the tp tx resource limiter in function of the traffic (only
affects iWarp)
- Increased the egress NIC gather list length from 36 to 46 entries
pjd [Fri, 25 Mar 2011 20:19:15 +0000 (20:19 +0000)]
Add mapsize to the header just before sending the packet.
Before it could change later and we were sending invalid mapsize.
Some time ago I added optimization where when nodes are connected for the
first time and there were no writes to them yet, there is no initial full
synchronization. This bug prevented it from working.
avg [Fri, 25 Mar 2011 18:23:10 +0000 (18:23 +0000)]
rtld: eliminate double call to close(2) that may occur in load_object
The second close(2) call resulted in heisenbugs in some multi-threaded
applications where e.g. dlopen(3) call in one thread could close a file
descriptor for a file having been opened in other thread concurrently.
My litmus test for this issue was an openoffice.org build.
kib [Fri, 25 Mar 2011 16:38:10 +0000 (16:38 +0000)]
Handle the corner case in vm_fault_quick_hold_pages().
If supplied length is zero, and user address is invalid, function
might return -1, due to the truncation and rounding of the address.
The callers interpret the situation as EFAULT. Instead of handling
the zero length in caller, filter it in vm_fault_quick_hold_pages().
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
kib [Fri, 25 Mar 2011 14:00:36 +0000 (14:00 +0000)]
Add O_CLOEXEC flag to open(2) and fhopen(2).
The new function fallocf(9), that is renamed falloc(9) with added
flag argument, is provided to facilitate the merge to stable branch.
kib [Fri, 25 Mar 2011 10:57:57 +0000 (10:57 +0000)]
Fix file leakage in the freebsd32_ioctl routines.
Code inspection shows freebsd32_ioctl calls fget for a fd and calls
a subroutine to handle each specific ioctl. It is expected that the
subroutine will call fdrop when done. However many of the subroutines
will exit out early if copyin encounters an error resulting in fdrop
never being called.
Submitted by: John Wehle <john feith com>
MFC after: 3 days
adrian [Fri, 25 Mar 2011 10:55:25 +0000 (10:55 +0000)]
After discussing with Bernhard, the "right" way in net80211 to check
the channel width is ni->ni_chw, which is set to the negotiated channel
width. ni->ni_htflags is the capability, rather than the negotiated
value.
Teach both the TX path and the sample rate module about this.
adrian [Fri, 25 Mar 2011 04:15:30 +0000 (04:15 +0000)]
Re-disable the setting of 2040/shortgi bits for now.
This seems to work fine for STA but not HT/20 AP mode.
Further discussion with net80211 people will need to take place
to ensure that the right flags are set based on the negotiated
capabilities of the remote peer, rather than whatever the local
parameters are.
Sending short-gi frames in 20mhz may work on some chips but
it certainly isn't supported on anything currently supported
by the HAL; and sending HT40 frames in HT20 mode just plain
won't work.
adrian [Fri, 25 Mar 2011 00:45:24 +0000 (00:45 +0000)]
After discussion with Felix Fietkau (nbd) about the ath9k Merlin LNA bit
settings, it seems that our defines are backwards and don't match what
is in the EEPROM documentation or internal driver.
The ath9k code used to have a bitfield here, rather than a uint8_t, and
there were #defines used to swap the order based on the endian of the
platform - this wasn't because of nybble or bit ordering of the
underlying host but because of what the compiler was doing.
This may be the reason for the backwards field numbers, as ath9k had
similar issues.
adrian [Fri, 25 Mar 2011 00:03:21 +0000 (00:03 +0000)]
Bring over interrupt mitigation changes from ath9k.
* The existing interrupt mitigation code didn't mitigate anything - the
per-packet TX/RX interrupts are still occuring. It's possible this
worked for the AR5416 but not any later chipsets; I'll investigate and
update as needed.
* Set both the RX and TX threshold registers whilst I'm at it.
This is verified to work on the AR9220 and AR9160. I'm leaving it off
by default in case it's truely broken, but I need to have it enabled
when doing 11n testing or interrupt loads exceed 10,000 interrupts/sec.
mav [Thu, 24 Mar 2011 21:31:32 +0000 (21:31 +0000)]
MFgraid/head:
Add new RAID GEOM class, that is going to replace ataraid(4) in supporting
various BIOS-based software RAIDs. Unlike ataraid(4) this implementation
does not depend on legacy ata(4) subsystem and can be used with any disk
drivers, including new CAM-based ones (ahci(4), siis(4), mvs(4), ata(4)
with `options ATA_CAM`). To make code more readable and extensible, this
implementation follows modular design, including core part and two sets
of modules, implementing support for different metadata formats and RAID
levels.
Support for such popular metadata formats is now implemented:
Intel, JMicron, NVIDIA, Promise (also used by AMD/ATI) and SiliconImage.
Such RAID levels are now supported:
RAID0, RAID1, RAID1E, RAID10, SINGLE, CONCAT.
For any all of these RAID levels and metadata formats this class supports
full cycle of volume operations: reading, writing, creation, deletion,
disk removal and insertion, rebuilding, dirty shutdown detection
and resynchronization, bad sector recovery, faulty disks tracking,
hot-spare disks. For Intel and Promise formats there is support multiple
volumes per disk set.
Look graid(8) manual page for additional details.
Co-authored by: imp
Sponsored by: Cisco Systems, Inc. and iXsystems, Inc.
pjd [Thu, 24 Mar 2011 20:28:09 +0000 (20:28 +0000)]
Checking file access on size change is bogus. The checks are done earlier by
VFS where we know if this is truncate(2) or ftruncate(2). If this is the
latter we should depend on the mode the file was opened and not on the current
permission.
PR: standards/154873
Reported by: Mark Martinec <Mark.Martinec@ijs.si>
Discussed with: Eric Schrock <eric.schrock@delphix.com>
Discussed with: Mark Maybee <Mark.Maybee@Oracle.COM>
MFC after: 1 month
mav [Thu, 24 Mar 2011 19:23:42 +0000 (19:23 +0000)]
MFgraid/head r218212, r218257:
Introduce new type of BIO_GETATTR -- GEOM::setstate, used to inform lower
GEOM about state of it's providers from the point of upper layers.
Make geom_disk use led(4) subsystem to illuminate states in such fashion:
FAILED - "1" (on), REBUILD - "f5" (slow blink), RESYNC - "f1" (fast blink),
ACTIVE - "0" (off).
LED name should be set for each disk via kern.geom.disk.%s.led sysctl.
Later disk API could be extended to allow disk driver to report this info
in custom way via it's own facilities.
mav [Thu, 24 Mar 2011 19:11:05 +0000 (19:11 +0000)]
MFgraid/head r217014:
Make `geom XXX list` and `geom XXX status` outputs more consistent:
Add -a options to print all geoms, not only ones with providers.
Add -g option for `status` to report geom's names, not provider's.
Make `status` by default report provider's status (if present), not geom's.
Make `status` report consumer's statuses, not only "synchronized" field.
jhb [Thu, 24 Mar 2011 18:40:11 +0000 (18:40 +0000)]
Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
in fork to honor the locking requirements. While here, expand the scope
of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously
the code was locking the new child process (p2) after it had locked the
parent process (p1). However, when locking two processes, the safe order
is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
having the process locked to use PROC_LOCK(). Every place was already
locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading
the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.
adrian [Thu, 24 Mar 2011 15:27:15 +0000 (15:27 +0000)]
Fix a WME corner case found by the FreeBSD 802.11n testing crew.
The symptom: sometimes 11n (and non-11n) throughput is great.
Sometimes it isn't. Much teeth gnashing occured, and much kernel
bisecting happened, until someone figured out it was the order
of which things were rebooted, not the kernel versions.
(Which was great news to me, it meant that I hadn't broken if_ath.)
What we found was that sometimes the WME parameters for the best-effort
queue had a burst window ("txop") in which the station would be allowed
to TX as many packets as it could fit inside that particular burst
window. This improved throughput.
After initially thinking it was a bug - the WME parameters for the
best-effort queue -should- have a txop of 0, Bernard and I discovered
"aggressive mode" in net80211 - where the WME BE queue parameters
are changed if there's not a lot of high priority traffic going on.
The WME parameters announced in the association response and beacon
frames just "change" based on what the current traffic levels are.
So in fact yes, the STA was acutally supposed to be doing this higher
throughput stuff as it's just meant to be configuring things based on
the WME parameters - but it wasn't.
What was eventually happening was this:
* at startup, the wme qosinfo count field would be 0;
* it'd be parsed in ieee80211_parse_wmeparams();
* and it would be bumped (to say 10);
* .. and the WME queue parameters would be correctly parsed and set.
But then, when you restarted the assocation (eg hostap goes away and
comes back with the same qosinfo count field of 10, or if you
destroy the sta VIF and re-create it), the WME qosinfo count field -
which is associated not to the VIF, but to the main interface -
wouldn't be cleared, so the queue default parameters would be used
(which include no burst setting for the BE queue) and would remain
that way until the hostap qosinfo count field changed, or the STA
was actually rebooted.
This fix simply cleares the wme capability field (which has the count
field) to 0, forcing it to be reset by the next received beacon.
Thanks go to Milu for finding it and helping me track down what was
going on, and Bernard Schmidt for working through the net80211 and
WME specific magic.
mav [Thu, 24 Mar 2011 08:37:48 +0000 (08:37 +0000)]
MFgraid/head r217827:
Change BIO_GETATTR("GEOM::kerneldump") API to make set_dumper() called by
consumer (geom_dev) instead of provider (geom_disk). This allows any geom
insert it's code into the dump call chain, implementing more sophisticated
functionality then just disk partitioning.
np [Thu, 24 Mar 2011 01:03:01 +0000 (01:03 +0000)]
Do not over-allocate MSI interrupts for the case where each ingress
queue has its own interrupt. If the exact number that we need is not a
power of 2 and we're using MSI, then switch to interrupt multiplexing.
While here, replace the magic numbers with something more readable.
adrian [Wed, 23 Mar 2011 23:48:44 +0000 (23:48 +0000)]
Make the ar2133ForceBias() call controllable at runtime.
At least one AR5416 user has reported measurable throughput drops
with this option. For now, disable it and make it a run-time
twiddle. It won't take affect until the next radio programming
trip though (eg channel scan, channel change.)
delphij [Wed, 23 Mar 2011 22:08:01 +0000 (22:08 +0000)]
humanize_number(3) multiply the input number by 100, which could cause an
integer overflow when the input is very large (for example, 100 Pi would
become about 10 Ei which exceeded signed int64_t).
Solve this issue by splitting the division into two parts and avoid the
multiplication.
jh [Wed, 23 Mar 2011 17:56:38 +0000 (17:56 +0000)]
Recognize "ro", "rdonly", "norw", "rw" and "noro" as equal options in
vfs_equalopts(). This allows vfs_sanitizeopts() to filter redundant
occurrences of these options. It was possible that for example both "ro"
and "rw" options became active concurrently.
PR: kern/133614
Discussed on: freebsd-hackers
MFC after: 1 month
alc [Wed, 23 Mar 2011 16:38:29 +0000 (16:38 +0000)]
Modestly increase the maximum allowed size of the kmem map on i386.
Also, express this new maximum as a fraction of the kernel's address
space size rather than a constant so that increasing KVA_PAGES will
automatically increase this maximum. As a side-effect of this change,
kern.maxvnodes will automatically increase by a proportional amount.
While I'm here ensure that this change doesn't result in an unintended
increase in maxpipekva on i386. Calculate maxpipekva based upon the
size of the kernel address space and the amount of physical memory
instead of the size of the kmem map. The memory backing pipes is not
allocated from the kmem map. It is allocated from its own submap of
the kernel map. In short, it has no real connection to the kmem map.
(In fact, the commit messages for the maxpipekva auto-sizing talk
about using the kernel map size, cf. r117325 and r117391, even though
the implementation actually used the kmem map size.) Although the
calculation is now done differently, the resulting value for
maxpipekva should remain almost the same on i386. However, on amd64,
the value will be reduced by 2/3. This is intentional. The recent
change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had
the unnecessary side-effect of increasing maxpipekva. This change is
effectively restoring maxpipekva on amd64 to its prior value.
Eliminate init_param3() since it is no longer used.
nwhitehorn [Wed, 23 Mar 2011 16:22:08 +0000 (16:22 +0000)]
Add support for memstick generation on PowerPC. This is a little suboptimal
since glabel doesn't know about APM partitioning yet, but works well enough.
adrian [Wed, 23 Mar 2011 03:58:55 +0000 (03:58 +0000)]
The AR5416+ chips all have MIB counters (which the AR5416 ANI code assumes)
so there's no need to enable the RX of invalid frames just to do ANI.
The if_ath code and AR5212 ANI code setup the RX filter bits to enable
receiving OFDM/CCK errors if the device doesn't have the hardware
MIB counters. It isn't initialising it for the AR5416+ because all of
those chips have hardware MIB counters.
This fixes the odd (and performance affecting!) situation where if ani
is enabled (via sysctl dev.ath.X.intmit) then suddenly there's be a very
large volume of phy errors - which is good to track, but not what was
intended. Since each PHY error is a received (0 length) frame, it can
significantly tie up the RX side of things.
jeff [Wed, 23 Mar 2011 02:47:04 +0000 (02:47 +0000)]
- Correct the vlan filter programming. The device filter is built in
reverse order.
- Name the cq taskqueues according to whether they handle rx or tx.
- Default LRO to on.
nwhitehorn [Wed, 23 Mar 2011 01:26:21 +0000 (01:26 +0000)]
Allow setting of parameters for file systems (e.g. softupdates), turn on
SUJ by default, and allow creation and mounting of FAT filesystems from
the installer.
marcel [Tue, 22 Mar 2011 17:19:35 +0000 (17:19 +0000)]
Change the load address from offset 0 in region 1 to offset 4G in region 0.
This (almost) gives us the address space back (at the bottom) that we lost
at the top.
Region 0 has traditionally been reserved for IA-32 emulation, which has not
been of great interest. By starting 64-bit processes at the 4G boundary we
at least preserve some of the advantages:
1. Any invalid pointer cast (from int to pointer and back) will still
always fail and not only when more than 4GB of memory is in use.
2. Memory sharing between 64-bit and 32-bit processes is still possibly
by using addresses < 4G.
adrian [Tue, 22 Mar 2011 13:35:56 +0000 (13:35 +0000)]
Flip this over to be a configurable option for people who wish to play with it.
It's still not ready for prime-time - there's some TX niggles with these 11n
cards that I'm still trying to wrap my head around, and AMPDU-TX is just not
implemented so things will come to a crashing halt if you're not careful.
jhb [Tue, 22 Mar 2011 12:05:49 +0000 (12:05 +0000)]
Rename pci_find_extcap() to pci_find_cap(). PCI now uses the term
"extended capabilities" to refer to the new set of capability structures
starting at offset 0x100 in config space for PCI-express devices. For now
both function names will still work. I will merge this to older branches
to ease driver portability, but 9.0 will ship with a new pci_find_extcap()
function that locates extended capabilities instead.
adrian [Tue, 22 Mar 2011 10:29:36 +0000 (10:29 +0000)]
Bring over an XPA (external power amplifer) bias fix for the AR9160.
This fix modifies the const addac initval array, rather than modifying
a local copy. It means that running >1 AR9160 on a board may prove to
be unpredictable.
The AR5416 init path also does something similar, so supporting
>1 AR5416 of different revisions could cause problems.
The later fix will be to create a private copy of the Addac data
for the AR5416, AR9160 (and AR9100 when it's merged in) and then
modify that as needed.
adrian [Tue, 22 Mar 2011 07:19:49 +0000 (07:19 +0000)]
Fix OFDM ANI statistics gathering for the AR5416 and later chips.
I found this when trying to figure out why the RX PHY error count
didn't match the OFDM error count ANI was using. It turns out
there was two problems:
* What this commit addresses - using the wrong mask for OFDM errors,
and
* The RX filter is set incorrectly after a channel scan (at least)
even if interference mitigation is enabled by default.
ANI is still disabled by default for the AR5416 and later chips.
jeff [Tue, 22 Mar 2011 04:50:47 +0000 (04:50 +0000)]
- Don't use a separate set of rx queues for UDP, hash them into the same
set as TCP.
- Eliminate the fully linear non-scatter/gather rx path, there is no
harm in using arrays of clusters for both TCP and UDP.
- Implement support for enabling/disabling per-vlan priority pause and
queues via sysctl.
nwhitehorn [Tue, 22 Mar 2011 01:14:53 +0000 (01:14 +0000)]
Use labels to find release media instead of hard-coded device paths. This
makes booting more reliable (and working at all on USB sticks). While here,
move responsibility for setting up fstab into the various platform mk-*.sh
scripts.
adrian [Tue, 22 Mar 2011 00:14:17 +0000 (00:14 +0000)]
Bring over a few queue changes from ath9k:
* add pspoll/uapsd queue setup defaults;
* enable the exponential backoff window rather than the random
backoff window when doing TX contention management.
adrian [Tue, 22 Mar 2011 00:12:26 +0000 (00:12 +0000)]
Even though it's very unlikely the misc mode register setting at -attach-
would be a problem, make sure it isn't overwritten by whatever is in
there at cold reset.
This brings the > ar5416 init path treatment of AR_MISC_MODE.