]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
11 years agoOnly four specific ATA PIO commands transfer several sectors per DRQ block
mav [Thu, 1 Nov 2012 00:09:01 +0000 (00:09 +0000)]
Only four specific ATA PIO commands transfer several sectors per DRQ block
(interrupt).  All other ATA PIO commands transfer one sector or 512 bytes
at one time.  Hardcode these exceptions in ata(4) with ATA_CAM option.
This fixes timeout of READ LOG EXT command used by `smartctl -x /dev/adaX`.

11 years agoA few important fixes:
jfv [Wed, 31 Oct 2012 23:50:36 +0000 (23:50 +0000)]
A few important fixes:
  - Testing TSO6 has led me to discover that HW RSC is
    a problematic feature, it is ONLY designed to work
    with IPv4 in the first place, and if IP forwarding
    is done it can't be disabled as LRO in the stack,
    also initial testing we've done at Intel shows an
    equal performance using TSO[46] on the TX and LRO
    on RX, if you ran older code on 82599 or later hardware
    you actually could have detrimental performance for
    this reason. So I am disabling the feature by default
    and all our adapters will now use LRO instead.

  - If you have flow control off and multiple queues it
    was possible when the buffer of one queue becomes
    full that all RX movement is stalled, to eliminate
    this problem a feature bit is now set that will allow
    packets to be dropped when full rather than stall.
    Note, the default is to have flow control on, and this
    keeps this from happening.

  - Because of the recent fixes in the stack, LRO is now
    auto-disabled when problematic, so I have decided to
    enable it by default in the capabilities in the driver.

  - There are some 1G modules used by some customers, a couple
    small tweaks to properly support those in the media code.

  - A note: we have now done some testing of TSO6 and using
    LRO with IPv6 and it all works great!! Seeing line rate
    in both directions in best cases. Thanks bz for your
    excellent work!!

11 years agoUse callout_reset_curcpu to allow the callout to be handled by the
jimharris [Wed, 31 Oct 2012 23:44:19 +0000 (23:44 +0000)]
Use callout_reset_curcpu to allow the callout to be handled by the
current CPU and not always CPU 0.

This has the added benefit of reducing a huge amount of spinlock
contention on the callout_cpu spinlock for CPU 0.

Sponsored by: Intel

11 years agoASUS EeePC 1001px has strange variant of ALC269 CODEC, that mutes speaker
mav [Wed, 31 Oct 2012 22:11:51 +0000 (22:11 +0000)]
ASUS EeePC 1001px has strange variant of ALC269 CODEC, that mutes speaker
if unused in that configuration mixer at NID 15 is muted.  Probably CODEC
incorrectly reports its internal connections.  Hide that muter from the
driver to avoid muting and make built-in speaker work.

There are several different CODECs sharing this ID and I have not enough
information about them and the bug to implement more universal solution.

Tested by: Big Yuuta <init.py@gmail.com>
MFC after: 2 weeks

11 years agoSince the PLL changes aren't in here yet for the AR9130 half/quarter
adrian [Wed, 31 Oct 2012 21:14:25 +0000 (21:14 +0000)]
Since the PLL changes aren't in here yet for the AR9130 half/quarter
rate support, disable it.

11 years agoOops - this was incorrectly removed in a previous commit.
adrian [Wed, 31 Oct 2012 21:06:55 +0000 (21:06 +0000)]
Oops - this was incorrectly removed in a previous commit.

11 years agoOops - missing from the last commit - add ANI immunity levels for AR9160.
adrian [Wed, 31 Oct 2012 21:04:23 +0000 (21:04 +0000)]
Oops - missing from the last commit - add ANI immunity levels for AR9160.

Obtained from: Qualcomm Atheros

11 years agoHAL updates!
adrian [Wed, 31 Oct 2012 21:03:55 +0000 (21:03 +0000)]
HAL updates!

* Add some more ANI spur immunity levels.
* For AR5111 radios attached to an AR5212, limit the 5GHz channels
  that are available. A later revision of the AR5111 supports the 4.9GHz
  PSB channels but right now there's no check in place for the radio
  revision.

  If someone wants PSB support on AR5212+AR5111 radios then please let
  me know and I'll add the relevant version check.

Obtained from: Qualcomm Atheros

11 years agoAdd in the last random assortment of missing bits for the AR9380 HAL.
adrian [Wed, 31 Oct 2012 21:00:01 +0000 (21:00 +0000)]
Add in the last random assortment of missing bits for the AR9380 HAL.

Obtained from: Qualcomm Atheros

11 years agoAdd the emulation PCI device id - these days, 0xabcd shows up all over
adrian [Wed, 31 Oct 2012 20:58:24 +0000 (20:58 +0000)]
Add the emulation PCI device id - these days, 0xabcd shows up all over
the internet as "AR9380 and later which didn't get its PCI ID written
in at power-on", so it's hardly an unknown constant.

Obtained from: Qualcomm Atheros

11 years agoCorrect code that was lost somewhere in the past,
jfv [Wed, 31 Oct 2012 18:16:42 +0000 (18:16 +0000)]
Correct code that was lost somewhere in the past,
this was designed to keep duplicate null vlan tags from
being added. When doing vlans purely via the switch
this problem will occur. Reported by external customer.

11 years agoRework the known mutexes to benefit about staying on their own
attilio [Wed, 31 Oct 2012 18:07:18 +0000 (18:07 +0000)]
Rework the known mutexes to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct mtx_padalign.

The sole exception being nvme and sxfge drivers, where the author
redefined CACHE_LINE_SIZE manually, so they need to be analyzed and
dealt with separately.

Reviwed by: jimharris, alc

11 years agoPad and align the callout_cpu mtx to its own cacheline to reduce false
jimharris [Wed, 31 Oct 2012 17:12:12 +0000 (17:12 +0000)]
Pad and align the callout_cpu mtx to its own cacheline to reduce false
sharing especially on the default CPU 0 callout_cpu structure.

This will be followed up by attilio@ with a conversion to the new struct
mtx_padalign but doing this manual conversion first gives an easy MFC
candidate since mtx_padalign is a more extensive system change.

Sponsored by: Intel
Reviewed by: jeff, attilio
MFC after: 1 week

11 years agoCorrect attribution.
des [Wed, 31 Oct 2012 15:04:27 +0000 (15:04 +0000)]
Correct attribution.

11 years agoGenericise the (out of date) instructions from moving from stable to
gavin [Wed, 31 Oct 2012 13:52:03 +0000 (13:52 +0000)]
Genericise the (out of date) instructions from moving from stable to
current.

MFC after: 3 days

11 years agoGive mtx(9) the ability to crunch different type of structures, with the
attilio [Wed, 31 Oct 2012 13:38:56 +0000 (13:38 +0000)]
Give mtx(9) the ability to crunch different type of structures, with the
only constraint that they have a lock cookie named mtx_lock.
This name, then, becames reserved from the struct that wants to use the
mtx(9) KPI and other locking primitives cannot reuse it for their
members.

Namely such structs are the current struct mtx and the new
struct mtx_padalign.  The new structure will define an object which is
the same as the same layout of a struct mtx but will be allocated in
areas aligned to the cache line size and will be as big as a cache line.

This is supposed to give higher performance for highly contented mutexes
both spin or sleep (because of the adaptive spinning), where the cache
line contention results in too much traffic on the system bus.

The struct mtx_padalign can be used in a completely transparent way
with the mtx(9) KPI.

At the moment, a possibility to MFC the patch should be carefully
evaluated because this patch breaks the low level KPI
(not its representation though).

Discussed with: jhb
Reviewed by: jeff, andre
Reviewed by: mdf (earlier version)
Tested by: jimharris

11 years agoMerge r242125 into the other ARMv6 copies of initarm.
andrew [Wed, 31 Oct 2012 08:25:45 +0000 (08:25 +0000)]
Merge r242125 into the other ARMv6 copies of initarm.

11 years agoI've had some feedback that CCK rates are more reliable than MCS 0
adrian [Wed, 31 Oct 2012 06:35:50 +0000 (06:35 +0000)]
I've had some feedback that CCK rates are more reliable than MCS 0
in some very degenerate conditions.

However, until ath_rate_form_aggr() is taught to not form aggregates
if ANY selected rate is non-MCS, this can't yet be enabled.

So, just add a comment.

11 years agoI give up - introduce a TX lock to serialise TX operations.
adrian [Wed, 31 Oct 2012 06:27:58 +0000 (06:27 +0000)]
I give up - introduce a TX lock to serialise TX operations.

I've tried serialising TX using queues and such but unfortunately
due to how this interacts with the locking going on elsewhere in the
networking stack, the TX task gets delayed, resulting in quite a
noticable throughput loss:

* baseline TCP for 2x2 11n HT40 is ~ 170mbit/sec;
* TCP for TX task in the ath taskq, with the RX also going on - 80mbit/sec;
* TCP for TX task in a separate, second taskq - 100mbit/sec.

So for now I'm going with the Linux wireless stack approach - lock tx
early.  The linux code does in the wireless stack, before the 802.11
state stuff happens and before it's punted to the driver.
But TX locking needs to also occur at the driver layer as the TX
completion code _also_ begins to drain the ifnet TX queue.

Whilst I'm here, add some KTR traces for the TX path.

Note:

* This really should be done at the net80211 layer (as well, at least.)
  But that'll have to wait for a little more thought to happen.

11 years agoFix longstanding misprint.
jmallett [Wed, 31 Oct 2012 04:44:32 +0000 (04:44 +0000)]
Fix longstanding misprint.

11 years agoIf the CF physical base is 0, attach no CF devices. This fixes a warning
jmallett [Wed, 31 Oct 2012 04:23:36 +0000 (04:23 +0000)]
If the CF physical base is 0, attach no CF devices.  This fixes a warning
about a 0 passed to cvmx_phys_to_ptr on systems without a CF interface,
such as the RSYS4GBE.

11 years ago- Do not put in the mntqueue half-constructed vnodes.
davide [Wed, 31 Oct 2012 03:55:33 +0000 (03:55 +0000)]
- Do not put in the mntqueue half-constructed vnodes.
- Change the code so that it relies on vfs_hash rather than on a
  home-made hashtable.
- There's no need to inline fnv_32_buf().

Reviewed by: delphij
Tested by: pho
Sponsored by: iXsystems inc.

11 years agoFix panic due to page faults while in kernel mode, under conditions of
davide [Wed, 31 Oct 2012 03:34:07 +0000 (03:34 +0000)]
Fix panic due to page faults while in kernel mode, under conditions of
VM pressure. The reason is that in some codepaths pointers to stack
variables were passed from one thread to another.

In collaboration with: pho
Reported by: pho's stress2 suite
Sponsored by: iXsystems inc.

11 years agoChange the code to use %jd as printf() placeholder for uio_offset and
davide [Wed, 31 Oct 2012 02:54:44 +0000 (02:54 +0000)]
Change the code to use %jd as printf() placeholder for uio_offset and
cast to intmax_t.

Suggested by: pjd
Sponsored by: iXsystems inc.

11 years agoMinor mdoc and language fixes.
joel [Tue, 30 Oct 2012 22:30:30 +0000 (22:30 +0000)]
Minor mdoc and language fixes.

11 years agoRemoved unnecessary bits in the header that shows where I stole the template
bapt [Tue, 30 Oct 2012 22:26:19 +0000 (22:26 +0000)]
Removed unnecessary bits in the header that shows where I stole the template

11 years agoDocument the pw_util(3) functions
bapt [Tue, 30 Oct 2012 22:18:08 +0000 (22:18 +0000)]
Document the pw_util(3) functions

Reviewed by: des, gjb

11 years agoPull in r165377 from upstream llvm trunk:
dim [Tue, 30 Oct 2012 22:09:53 +0000 (22:09 +0000)]
Pull in r165377 from upstream llvm trunk:

  X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch
  for the others.

  Otherwise it will try to use SSE patterns and fail horribly if sse is
  disabled.

  Fixes PR14035.

This should fix the following assertion failure:

  Assertion failed: (Reg >= X86::FP0 && Reg <= X86::FP6 && "Expected FP
  register!"), function getFPReg, file
  contrib/llvm/lib/Target/X86/X86FloatingPoint.cpp, line 330.

which can show up when compiling contrib/compiler-rt, using -march=i686
through -march=pentium3 (CPU's which do support fcmov, but don't support
SSE2).

MFC after: 1 week

11 years agoFix problem with geom_label(4) not recognizing UFS labels on filesystems
trasz [Tue, 30 Oct 2012 21:32:10 +0000 (21:32 +0000)]
Fix problem with geom_label(4) not recognizing UFS labels on filesystems
extended using growfs(8).  The problem here is that geom_label checks if
the filesystem size recorded in UFS superblock is equal to the provider
(i.e. device) size.  This check cannot be removed due to backward
compatibility.  On the other hand, in most cases growfs(8) cannot set
fs_size in the superblock to match the provider size, because, differently
from newfs(8), it cannot recompute cylinder group sizes.

To fix this problem, add another superblock field, fs_providersize, used
only for this purpose.  The geom_label(4) will attach if either fs_size
(filesystem created with newfs(8)) or fs_providersize (filesystem expanded
using growfs(8)) matches the device size.

PR: kern/165962
Reviewed by: mckusick
Sponsored by: FreeBSD Foundation

11 years agoCatch up with r238925. ktr_entries may not be a power of 2.
np [Tue, 30 Oct 2012 21:10:06 +0000 (21:10 +0000)]
Catch up with r238925.  ktr_entries may not be a power of 2.

11 years agoatrun(8): scale default load average limit with the number of CPUs
mjg [Tue, 30 Oct 2012 19:46:00 +0000 (19:46 +0000)]
atrun(8): scale default load average limit with the number of CPUs

Previously atrun refused to run jobs if load average was not below fixed limit of 1.5.

PR: 173175
Reviewed by: peterj
Approved by: trasz (mentor)
MFC after: 2 weeks

11 years agoIf a USB mass storage device doesn't respond properly
hselasky [Tue, 30 Oct 2012 16:56:16 +0000 (16:56 +0000)]
If a USB mass storage device doesn't respond properly
to the initial SCSI INQUIRY command, enable all quirks.
This fixes detection of some Transcend TS2GUFM devices.

MFC after: 1 week
Reported by: Michael Dexter

11 years agoFix SMP build for omap4
cognet [Tue, 30 Oct 2012 15:25:01 +0000 (15:25 +0000)]
Fix SMP build for omap4

Submitted by: Giovanni Trematerra <gianni at freebsd DOT org>

11 years agoFixup r240246: hwpmc needs to retain the pinning until ASTs are not
attilio [Tue, 30 Oct 2012 15:10:50 +0000 (15:10 +0000)]
Fixup r240246: hwpmc needs to retain the pinning until ASTs are not
executed. This means past the point where userret() is generally
executed.

Skip the td_pinned check if a callchain tracing is currently happening
and add a more robust check to pmc_capture_user_callchain() in order to
catch td_pinned leak past ast() in hwpmc case.

Reported and tested by: fabient
MFC after: 1 week
X-MFC: r240246

11 years ago- Remove BCE_JUMBO_HDRSPLIT kernel option which was forgotten in r218423.
zont [Tue, 30 Oct 2012 13:22:39 +0000 (13:22 +0000)]
- Remove BCE_JUMBO_HDRSPLIT kernel option which was forgotten in r218423.

Approved by: davidch
Approved by: kib (mentor)

11 years agoDocument disk_resize(9).
trasz [Tue, 30 Oct 2012 13:05:50 +0000 (13:05 +0000)]
Document disk_resize(9).

11 years agoUse M_ZERO instead of explicit memsets and bzeros.
trasz [Tue, 30 Oct 2012 12:52:41 +0000 (12:52 +0000)]
Use M_ZERO instead of explicit memsets and bzeros.

11 years agoSet all pins initial connection status to unknown (2) and then update it
mav [Tue, 30 Oct 2012 12:44:30 +0000 (12:44 +0000)]
Set all pins initial connection status to unknown (2) and then update it
with the real value in regular way if sensing is supported.  This fixes
minor inconsistency when playback redirection appeared in undefined state
on boot if headphones were not connected.

11 years agotdq_lock_pair() already does spinlock_enter() so migration is not
attilio [Tue, 30 Oct 2012 12:25:52 +0000 (12:25 +0000)]
tdq_lock_pair() already does spinlock_enter() so migration is not
possible in sched_balance_pair(). Remove redundant sched_pin().

Reviewed by: marius, jeff

11 years agoPrint card and subsystem IDs in verbose logs to help to identify system.
mav [Tue, 30 Oct 2012 10:59:42 +0000 (10:59 +0000)]
Print card and subsystem IDs in verbose logs to help to identify system.
Hide some less iseful messages under debug.

11 years agoThe argument len of m_pullup(9) could be less than or equal to MHLEN.
kevlo [Tue, 30 Oct 2012 10:13:26 +0000 (10:13 +0000)]
The argument len of m_pullup(9) could be less than or equal to MHLEN.

Reviewed by: glebius

11 years agoTeach pw(8) about how to use pw/gr API to reduce code duplication
bapt [Tue, 30 Oct 2012 08:00:53 +0000 (08:00 +0000)]
Teach pw(8) about how to use pw/gr API to reduce code duplication

MFC after: 2 months

11 years agoTSO engine of L1 requires a separate DMA descriptor for TCP
yongari [Tue, 30 Oct 2012 07:55:03 +0000 (07:55 +0000)]
TSO engine of L1 requires a separate DMA descriptor for TCP
payload.  This means driver has to split a TX buffer into two
pieces of TX buffers when the TX buffer contains both
ethernet/IP/TCP header and partial TCP payload.  The controller
does not require all header should be in a TX buffer but driver
forced it to compute IP/TCP header size/offset which is required
parameter to configure DMA descriptor for TSO.
While here, slightly reorder DMA descriptor setup to enhance
readability and remove unnecessary code for TSO(upper stack never
requests TSO when the frame length is less than or equal to MTU).

Reported by: Yamagi Burmeister <lists <> yamagi dot org>
Tested by: Yamagi Burmeister <lists <> yamagi dot org>
MFC After: 1 week

11 years agoActually check board type rather than using a specialized octeon_is_simulation
jmallett [Tue, 30 Oct 2012 06:36:14 +0000 (06:36 +0000)]
Actually check board type rather than using a specialized octeon_is_simulation
function.

11 years agoRemove oct_read64 and oct_write64 and use their equivalents from the Simple
jmallett [Tue, 30 Oct 2012 06:29:17 +0000 (06:29 +0000)]
Remove oct_read64 and oct_write64 and use their equivalents from the Simple
Executive, which are used everywhere else in the Octeon port.  While here,
remove other unused things from octeon_pcmap_regs.h.

11 years agoRemove stale declarations.
jmallett [Tue, 30 Oct 2012 06:19:46 +0000 (06:19 +0000)]
Remove stale declarations.

11 years agoMove the call to platform_gpio_init() into initarm_gpio_init() to reduce
andrew [Tue, 30 Oct 2012 06:11:09 +0000 (06:11 +0000)]
Move the call to platform_gpio_init() into initarm_gpio_init() to reduce
the diff to the other FDT versions of initarm.

11 years agoSpeed feature tests and initialize helper configuration that some CPUs require.
jmallett [Tue, 30 Oct 2012 06:07:30 +0000 (06:07 +0000)]
Speed feature tests and initialize helper configuration that some CPUs require.

11 years agoSeparate interrupts enable/disable logic from setting port parameters.
gonzo [Tue, 30 Oct 2012 01:52:49 +0000 (01:52 +0000)]
Separate interrupts enable/disable logic from setting port parameters.
Otherwise setting baud rate in TTY mode effectively disables TX/RX
interrupts and renders port unusable.

11 years agos/dettach/detach/g
delphij [Tue, 30 Oct 2012 01:29:45 +0000 (01:29 +0000)]
s/dettach/detach/g

Approved by: pjd
MFC after: 1 month

11 years agoMinor addition to r242323:
mav [Mon, 29 Oct 2012 21:08:06 +0000 (21:08 +0000)]
Minor addition to r242323:
Alike to BIO_WRITE, report success if at least one subdisk succeeded with
BIO_DELETE.  But unlike BIO_WRITE don't fail disk on BIO_DELETE error.

Sponsored by: iXsystems, Inc.
MFC after: 1 month

11 years agoWhitespace changes due to upstream integration of SCTP changes in the
tuexen [Mon, 29 Oct 2012 20:47:32 +0000 (20:47 +0000)]
Whitespace changes due to upstream integration of SCTP changes in the
FreeBSD code base.

11 years agoAdd braces (as used elsewhere in the SCTP code).
tuexen [Mon, 29 Oct 2012 20:44:29 +0000 (20:44 +0000)]
Add braces (as used elsewhere in the SCTP code).

11 years agoUse ntohs() and htons() in correct order. However, this doesn't change
tuexen [Mon, 29 Oct 2012 20:42:48 +0000 (20:42 +0000)]
Use ntohs() and htons() in correct order. However, this doesn't change
functionality.

11 years agobackout r242319, racy and not done in the right place
bapt [Mon, 29 Oct 2012 18:06:09 +0000 (18:06 +0000)]
backout r242319, racy and not done in the right place

Reported by: Garrett Cooper  <yanegomi@gmail.com>

11 years agoAdd basic BIO_DELETE support to GEOM RAID class for all RAID levels.
mav [Mon, 29 Oct 2012 18:04:38 +0000 (18:04 +0000)]
Add basic BIO_DELETE support to GEOM RAID class for all RAID levels.

If at least one subdisk in the volume supports it, BIO_DELETE requests
will be propagated down.  Unfortunatelly, for RAID levels with redundancy
unmapped blocks will be mapped back during first rebuild/resync process.

Sponsored by: iXsystems, Inc.
MFC after: 1 month

11 years agoFix locking problem in disk_resize(); previously it would run without
trasz [Mon, 29 Oct 2012 17:52:43 +0000 (17:52 +0000)]
Fix locking problem in disk_resize(); previously it would run without
topology lock, resulting in assertion when running with DIAGNOSTIC.

Reviewed by: mav (earlier version)

11 years agoAdd BCM2835 SDHCI driver and enable it in Raspberry Pi config
gonzo [Mon, 29 Oct 2012 17:23:45 +0000 (17:23 +0000)]
Add BCM2835 SDHCI driver and enable it in Raspberry Pi config

11 years agoAdd new quirks:
gonzo [Mon, 29 Oct 2012 17:21:58 +0000 (17:21 +0000)]
Add new quirks:
  - Data timeout is broken
  - Data timeout uses SD clock
  - Capabilities register is unavailable

Add calculations for clock divisor for SDHCI 3.0

11 years agomake pw_init and gr_init fail if the specified master password or group file is
bapt [Mon, 29 Oct 2012 17:19:43 +0000 (17:19 +0000)]
make pw_init and gr_init fail if the specified master password or group file is
a directory.

MFC after: 1 month

11 years agoWork around broken device tree on last-generation PowerPC iMacs
nwhitehorn [Mon, 29 Oct 2012 14:27:28 +0000 (14:27 +0000)]
Work around broken device tree on last-generation PowerPC iMacs
(PowerMac12,1), which have a mac-io MPIC cell that indifies itself
as the root PIC despite the actual root PIC being on the northbridge.
No CPC945 systems have a mac-io PIC that does anything so just don't
attach on CPC945 (U4) systems.

MFC after: 3 days

11 years agoMake GEOM RAID more aggressive in marking volumes as clean on shutdown
mav [Mon, 29 Oct 2012 14:18:54 +0000 (14:18 +0000)]
Make GEOM RAID more aggressive in marking volumes as clean on shutdown
and move that action from shutdown_pre_sync to shutdown_post_sync stage
to avoid extra flapping.

ZFS tends to not close devices on shutdown, that doesn't allow GEOM RAID
to shutdown gracefully.  To handle that, mark volume as clean just when
shutdown time comes and there are no active writes.

MFC after: 2 weeks

11 years agoForced commit to provide the correct commit message to r242251:
andre [Mon, 29 Oct 2012 13:16:33 +0000 (13:16 +0000)]
Forced commit to provide the correct commit message to r242251:

  Defer sending an independent window update if a delayed ACK is pending
  saving a packet.  The window update then gets piggy-backed on the next
  already scheduled ACK.

Added grammar fixes as well.

MFC after: 2 weeks

11 years agoIn soreceive_stream() don't drop an already dequeued mbuf chain by
andre [Mon, 29 Oct 2012 12:31:12 +0000 (12:31 +0000)]
In soreceive_stream() don't drop an already dequeued mbuf chain by
overwriting the return mbuf pointer with newly received data after
a loop.  Instead append the new mbuf chain to the existing one.

Fix up sb_lastrecord when dequeuing mbuf's so that sbappend_stream()
doesn't get confused.

For the remainder copy case in the mbuf delivery part deduct the
copied length len instead of the whole mbuf length.  Additionally
don't depend on 'n' being being available which isn't true in the
case of MSG_PEEK.

Fix the MSG_WAITALL case by comparing against sb_hiwat.  Before
it was looping for every receive as sb_lowat normally is zero.
Add comment about issue with (MSG_WAITALL | MSG_PEEK) which isn't
properly handled.

Submitted by: trociny (except for the change in last paragraph)

11 years agoDefine the delayed ACK timeout value directly as hz/10 instead of
andre [Mon, 29 Oct 2012 12:17:02 +0000 (12:17 +0000)]
Define the delayed ACK timeout value directly as hz/10 instead of
obfuscating it by going through PR_FASTHZ.  No functional change.

MFC after: 2 weeks

11 years agoAdd logging for socket attach failures in sonewconn() during accept(2).
andre [Mon, 29 Oct 2012 12:14:57 +0000 (12:14 +0000)]
Add logging for socket attach failures in sonewconn() during accept(2).
Include the pointer to the PCB so it can be attributed to a particular
application by corresponding it to "netstat -A" output.

MFC after: 2 weeks

11 years agoadd support for newer Lenovo ThinkPads to acpi_ibm
bapt [Mon, 29 Oct 2012 10:22:00 +0000 (10:22 +0000)]
add support for newer Lenovo ThinkPads to acpi_ibm

PR: kern/164538
Submitted by: Pierre Imai <pierre@imai.at>
MFC after: 2 weeks

11 years agoSince the macro dtom() has been removed, fix comments about the dtom.
kevlo [Mon, 29 Oct 2012 10:04:28 +0000 (10:04 +0000)]
Since the macro dtom() has been removed, fix comments about the dtom.

Reviewed by: glebius

11 years agoAdd a sysctl to change the LED display.
jmallett [Mon, 29 Oct 2012 07:06:23 +0000 (07:06 +0000)]
Add a sysctl to change the LED display.

11 years agoLoad ipdivert.ko when natd_enable=YES.
hrs [Mon, 29 Oct 2012 06:31:51 +0000 (06:31 +0000)]
Load ipdivert.ko when natd_enable=YES.

PR: conf/167566

11 years agoReplace the page hold queue, PQ_HOLD, by a new page flag, PG_UNHOLDFREE,
alc [Mon, 29 Oct 2012 06:15:04 +0000 (06:15 +0000)]
Replace the page hold queue, PQ_HOLD, by a new page flag, PG_UNHOLDFREE,
because the queue itself serves no purpose.  When a held page is freed,
inserting the page into the hold queue has the side effect of setting the
page's "queue" field to PQ_HOLD.  Later, when the page is unheld, it will
be freed because the "queue" field is PQ_HOLD.  In other words, PQ_HOLD is
used as a flag, not a queue.  So, this change replaces it with a flag.

To accomodate the new page flag, make the page's "flags" field wider and
"oflags" field narrower.

Reviewed by: kib

11 years agoClarify a warning message.
kientzle [Mon, 29 Oct 2012 03:31:22 +0000 (03:31 +0000)]
Clarify a warning message.

11 years agoWrap some long lines and display board serial numbers at boot.
jmallett [Mon, 29 Oct 2012 02:10:20 +0000 (02:10 +0000)]
Wrap some long lines and display board serial numbers at boot.

11 years agoCompiler have a precise knowledge of the content of sched_pin() and
attilio [Mon, 29 Oct 2012 01:35:17 +0000 (01:35 +0000)]
Compiler have a precise knowledge of the content of sched_pin() and
sched_unpin() as they are functions static and inline.  This way it
can do two dangerous things:
- Reorder instructions around both of them, taking out from the safe
  path operations that are supposed to be (ie. per-cpu accesses)
- Cache the value of td_pinned in CPU registers not making visible
  in kernel context to the scheduler once it is scanning the runqueue,
  as td_pinned is not marked volatile.

In order to avoid both possible bugs explicitly, protect the safe path
with compiler memory barriers. This will prevent reordering and caching
by the compiler about td_pinned operations.

Generally this could lead to suboptimal code traversing the pinnings
but this is not the case as can be easilly verified:
http://lists.freebsd.org/pipermail/svn-src-projects/2012-October/005797.html

Discussed with: jeff, jhb
MFC after: 2 weeks

11 years agoUse Simple Executive LED display routines, which correctly use the LED base
jmallett [Mon, 29 Oct 2012 00:51:53 +0000 (00:51 +0000)]
Use Simple Executive LED display routines, which correctly use the LED base
address passed from the bootloader, rather than using a hard-coded value.

Make FreeBSD announce itself on the LED display similar to other kernels.

Remove uses of the previous LED routines, which were under-used and only used
in drivers for what seem like debugging purposes, despite those drivers being
widely-tested.

Remove several inlines for accessing memory that duplicate other functions
which are now used instead, as they are now entirely unused.

11 years agoRecognize the Marvell 88E1145 Quad Gigabit PHY.
jmallett [Mon, 29 Oct 2012 00:17:12 +0000 (00:17 +0000)]
Recognize the Marvell 88E1145 Quad Gigabit PHY.

11 years agoBegin fleshing out some software queue awareness for TIM handling with
adrian [Sun, 28 Oct 2012 21:13:12 +0000 (21:13 +0000)]
Begin fleshing out some software queue awareness for TIM handling with
the power save queue.

* introduce some new ATH_NODE lock protected fields, tracking the
  net80211 psq and TIM state;
* when doing buffer transitions - ie, when sending and completing
  buffers - check the state of the SWQ and update the TIM appropriately.
* when clearing the TIM bit, if the SWQ is not empty then delay clearing
  it.

This is racy, but it's no less racy than the current net80211 power
save queue management code.  Specifically, with multiple TX threads,
it's quite plausible that parallel state updates will race and the
TIM will be left in an inconsistent state.  I'll address that in
a follow-up commit.

11 years agoMake it clear that NULL can only be returned when M_NOWAIT was used.
trasz [Sun, 28 Oct 2012 21:01:32 +0000 (21:01 +0000)]
Make it clear that NULL can only be returned when M_NOWAIT was used.

11 years agoRemove useless check; vm_pindex_t is unsigned on all architectures.
trasz [Sun, 28 Oct 2012 20:03:57 +0000 (20:03 +0000)]
Remove useless check; vm_pindex_t is unsigned on all architectures.

CID: 3701
Found with: Coverity Prevent

11 years agoIf the user has closed the socket then drop a persisting connection
andre [Sun, 28 Oct 2012 19:58:20 +0000 (19:58 +0000)]
If the user has closed the socket then drop a persisting connection
after a much reduced timeout.

Typically web servers close their sockets quickly under the assumption
that the TCP connections goes away as well.  That is not entirely true
however.  If the peer closed the window we're going to wait for a long
time with lots of data in the send buffer.

MFC after: 2 weeks

11 years agoIncrease the initial CWND to 10 segments as defined in IETF TCPM
andre [Sun, 28 Oct 2012 19:47:46 +0000 (19:47 +0000)]
Increase the initial CWND to 10 segments as defined in IETF TCPM
draft-ietf-tcpm-initcwnd-05. It explains why the increased initial
window improves the overall performance of many web services without
risking congestion collapse.

As long as it remains a draft it is placed under a sysctl marking it
as experimental:
 net.inet.tcp.experimental.initcwnd10 = 1
When it becomes an official RFC soon the sysctl will be changed to
the RFC number and moved to net.inet.tcp.

This implementation differs from the RFC draft in that it is a bit
more conservative in the case of packet loss on SYN or SYN|ACK because
we haven't reduced the default RTO to 1 second yet.  Also the restart
window isn't yet increased as allowed.  Both will be adjusted with
upcoming changes.

Is is enabled by default.  In Linux it is enabled since kernel 3.0.

MFC after: 2 weeks

11 years agoDeclare functions as static and move global variables to the top;
trasz [Sun, 28 Oct 2012 19:38:42 +0000 (19:38 +0000)]
Declare functions as static and move global variables to the top;
no functional changes.

11 years agoUpdate comment to reflect the change made in r242263.
andre [Sun, 28 Oct 2012 19:22:18 +0000 (19:22 +0000)]
Update comment to reflect the change made in r242263.

MFC after: 2 weeks

11 years agoAdd SACK_PERMIT to the list of TCP options that are switched off after
andre [Sun, 28 Oct 2012 19:20:23 +0000 (19:20 +0000)]
Add SACK_PERMIT to the list of TCP options that are switched off after
retransmitting a SYN three times.

MFC after: 2 weeks

11 years agoSimplify and enhance the window change/update acceptance logic,
andre [Sun, 28 Oct 2012 19:16:22 +0000 (19:16 +0000)]
Simplify and enhance the window change/update acceptance logic,
especially in the presence of bi-directional data transfers.

snd_wl1 tracks the right edge, including data in the reassembly
queue, of valid incoming data.  This makes it like rcv_nxt plus
reassembly.  It never goes backwards to prevent older, possibly
reordered segments from updating the window.

snd_wl2 tracks the left edge of sent data.  This makes it a duplicate
of snd_una.  However joining them right now is difficult due to
separate update dependencies in different places in the code flow.

snd_wnd tracks the current advertized send window by the peer.  In
tcp_output() the effective window is calculated by subtracting the
already in-flight data, snd_nxt less snd_una, from it.

ACK's become the main clock of window updates and will always update
the window when the left edge of what we sent is advanced.  The ACK
clock is the primary signaling mechanism in ongoing data transfers.
This works reliably even in the presence of reordering, reassembly
and retransmitted segments.  The ACK clock is most important because
it determines how much data we are allowed to inject into the network.

Zero window updates get us out of persistence mode are crucial.  Here
a segment that neither moves ACK nor SEQ but enlarges WND is accepted.

When the ACK clock is not active (that is we're not or no longer
sending any data) any segment that moves the extended right SEQ edge,
including out-of-order segments, updates the window.  This gives us
updates especially during ping-pong transfers where the peer isn't
done consuming the already acknowledged data from the receive buffer
while responding with data.

The SSH protocol is a prime candidate to benefit from the improved
bi-directional window update logic as it has its own windowing
mechanism on top of TCP and is frequently sending back protocol ACK's.

Tcpdump provided by: darrenr
Tested by: darrenr
MFC after: 2 weeks

11 years agoFor retransmits of SYN|ACK from the syncache use the slightly more
andre [Sun, 28 Oct 2012 19:02:07 +0000 (19:02 +0000)]
For retransmits of SYN|ACK from the syncache use the slightly more
aggressive special tcp_syn_backoff[] retransmit schedule instead of
the normal tcp_backoff[] schedule for established connections.

MFC after: 2 weeks

11 years agoWhen retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE,
andre [Sun, 28 Oct 2012 18:56:57 +0000 (18:56 +0000)]
When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE,
the default retransmit timeout, as base to calculate the backoff
time until next try instead of the TCP_REXMTVAL() macro which only
works correctly when we already have measured an actual RTT+RTTVAR.

Before it would cause the first retransmit at RTOBASE, the next
four at the same time (!) about 200ms later, and then another one
again RTOBASE later.

MFC after: 2 weeks

11 years agoFix two problems that caused instant panic when the device mounted
trasz [Sun, 28 Oct 2012 18:53:28 +0000 (18:53 +0000)]
Fix two problems that caused instant panic when the device mounted
with softupdates went away.  Note that this does not fix the problem
entirely; I'm committing it now to make it easier for someone to pick
up the work.

Reviewed by: mckusick

11 years agoAdd a temporary (for values of "temporary") work around for hotplug
adrian [Sun, 28 Oct 2012 18:46:06 +0000 (18:46 +0000)]
Add a temporary (for values of "temporary") work around for hotplug
support with ath(4) and VIMAGE.

Right now the VIMAGE code doesn't supply a default vnet context during:

* hotplug attach;
* any device detach.

It special cases kldload/boot time probing (by setting the context to
vnet0) but that doesn't occur when probing devices during a bus rescan -
eg, adding a cardbus card.

These will eventually go away when the VIMAGE support extends to providing
default contexts to hotplug attach/detach.

11 years agoRemove bogus 'else' in #ifdef that prevented the rttvar from being reset
andre [Sun, 28 Oct 2012 18:45:04 +0000 (18:45 +0000)]
Remove bogus 'else' in #ifdef that prevented the rttvar from being reset
tcp_timer_rexmt() on retransmit for IPv6 sessions.

MFC after: 2 weeks

11 years agoImprove m_cat() by being able to also merge contents from M_EXT
andre [Sun, 28 Oct 2012 18:38:51 +0000 (18:38 +0000)]
Improve m_cat() by being able to also merge contents from M_EXT
mbuf's by doing proper testing with M_WRITABLE().

In m_collapse() replace an incomplete manual check for M_RDONLY
with the M_WRITABLE() macro that also tests for shared buffers
and other cases that make a particular mbuf immutable.

MFC after: 2 weeks

11 years agoAllow arbitrary MSS sizes and don't mind about the cluster size anymore.
andre [Sun, 28 Oct 2012 18:33:52 +0000 (18:33 +0000)]
Allow arbitrary MSS sizes and don't mind about the cluster size anymore.
We've got more cluster sizes for quite some time now and the orginally
imposed limits and the previously codified thoughts on efficiency gains
are no longer true.

MFC after: 2 weeks

11 years agoChange the syncache count reporting the current number of entries
andre [Sun, 28 Oct 2012 18:07:34 +0000 (18:07 +0000)]
Change the syncache count reporting the current number of entries
from an unprotected u_int that reports garbage on SMP to a function
based sysctl obtaining the current value from UMA.

Also read back the actual cache_limit after page size rounding by UMA.

PR: kern/165879
MFC after: 2 weeks

11 years agoSimplify implementation of net.inet.tcp.reass.maxsegments and
andre [Sun, 28 Oct 2012 17:59:46 +0000 (17:59 +0000)]
Simplify implementation of net.inet.tcp.reass.maxsegments and
net.inet.tcp.reass.cursegments.

MFC after: 2 weeks

11 years agoPrevent a flurry of forced window updates when an application is
andre [Sun, 28 Oct 2012 17:40:35 +0000 (17:40 +0000)]
Prevent a flurry of forced window updates when an application is
doing small reads on a (partially) filled receive socket buffer.

Normally one would a send a window update every time the available
space in the socket buffer increases by two times MSS.  This leads
to a flurry of window updates that do not provide any meaningful
new information to the sender.  There still is available space in
the window and the sender can continue sending data.  All window
updates then get carried by the regular ACKs.  Only when the socket
buffer was (almost) full and the window closed accordingly a window
updates delivery new information and allows the sender to start
sending more data again.

Send window updates only every two MSS when the socket buffer
has less than 1/8 space available, or the available space in the
socket buffer increased by 1/4 its full capacity, or the socket
buffer is very small.  The next regular data ACK will carry and
report the exact window size again.

Reported by: sbruno
Tested by: darrenr
Tested by: Darren Baginski
PR: kern/116335
MFC after: 2 weeks

11 years agoWhen SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
andre [Sun, 28 Oct 2012 17:30:28 +0000 (17:30 +0000)]
When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
reduce the initial CWND to one segment.  This reduction got lost
some time ago due to a change in initialization ordering.

Additionally in tcp_timer_rexmt() avoid entering fast recovery when
we're still in TCPS_SYN_SENT state.

MFC after: 2 weeks

11 years agoWhen SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
andre [Sun, 28 Oct 2012 17:25:08 +0000 (17:25 +0000)]
When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
reduce the initial CWND to one segment.  This reduction got lost
some time ago due to a change in initialization ordering.

Additionally in tcp_timer_rexmt() avoid entering fast recovery when
we're still in TCPS_SYN_SENT state.

MFC after: 2 weeks

11 years agoAdjust the initial default CWND upon connection establishment to the
andre [Sun, 28 Oct 2012 17:16:09 +0000 (17:16 +0000)]
Adjust the initial default CWND upon connection establishment to the
new and increased values specified by RFC5681 Section 3.1.

The even larger initial CWND per RFC3390, if enabled, is not affected.

MFC after: 2 weeks

11 years agoImplement support for the so-called USB feedback endpoint for USB
hselasky [Sun, 28 Oct 2012 14:37:17 +0000 (14:37 +0000)]
Implement support for the so-called USB feedback endpoint for USB
audio devices. This endpoint gives clues to the USB host about the
actual data rate on asynchronous endpoints and makes the more
expensive USB audio devices usable under FreeBSD.
The Linux USB audio driver was used as reference for the
automagic shift of the received value.

MFC after: 1 week

11 years agoFix compilation on ia64 when page size is configured for 16KB.
kib [Sun, 28 Oct 2012 11:53:54 +0000 (11:53 +0000)]
Fix compilation on ia64 when page size is configured for 16KB.

Reviewed by: alc, marcel