arybchik [Fri, 17 Jun 2016 09:04:06 +0000 (09:04 +0000)]
MFC r301427
sfxge(4): allow firmware to auto-configure event queues on Medford
On Medford, licenses are required to enable RX and event cut through and to
disable RX batching. To avoid the need for the driver to make decisions based on
the licensing state, the MC_CMD_INIT_EVQ has been extended to allow us to leave
the decision to the firmware. If the adapter is licensed for low-latency use,
the firmware will choose the optimal settings for latency, otherwise it will use
the best settings for throughput.
For Huntington we still need to choose the settings ourselves.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6717
arybchik [Fri, 17 Jun 2016 09:02:51 +0000 (09:02 +0000)]
MFC r301309
sfxge(4): always be ready to receive batched events
When the low-latency firmware variant is running, it is reported as not
being capable of batching RX events, but it can still do so if the
FORCE_EV_MERGING flag is set on an RXQ. Therefore we need to handle
batched RX events even if the capability isn't set.
If this bug is fixed in the firmware such that the capability is set
even when running the low-latency firmware variant, it will almost
always be reported so I don't think we lose much by removing the check.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6705
arybchik [Fri, 17 Jun 2016 09:01:11 +0000 (09:01 +0000)]
MFC r301308
sfxge(4): add helper to compute timer quantum
This also adjusts the timer values used to match the Linux net
driver implementation:
a) non-zero time intervals should result in at least one quantum
b) timer load/reload values are only zero biased for Falcon/Siena
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6704
arybchik [Fri, 17 Jun 2016 08:59:08 +0000 (08:59 +0000)]
MFC r301237
sfxge(4): support EVQ timer workaround via MCDI
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/6675
arybchik [Fri, 17 Jun 2016 08:56:01 +0000 (08:56 +0000)]
MFC r301122
sfxge(4): set moderation in efx_ev_qcreate
This simplifies setting an initial interrupt moderation value, and
avoids most calls to evx_ev_qmoderate from contexts where MCDI is
not allowed (MCDI is need for an EVQ timer workaround in a later patch).
Submitted by: Andy Moreton <amoreton at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6673
sephe [Thu, 16 Jun 2016 03:04:16 +0000 (03:04 +0000)]
MFC 297142,297143,297176,297177,297178,297221
297142
hyperv: Factor out snprinf_hv_guid()
Submitted by: Ju Sun <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5651
Submitted by: Jun Su <junsu microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5669
297176
hyperv/evttimer: Use an independent message slot so that it can work
Using the same message slot as the other types of the messages has
the side effect that the event timer message could be deferred to
the swi threads to run (lacking of trapframe and the original code
didn't even handle that, so the event timer was actually broken).
As of this commit we use an independent message slot for event timer,
so that we could handle all of event timer messages in the interrupt
handler directly. Note, the message slot for event timer is still
bind to the same interrupt vector as the other types of messages.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe
Discussed with: Jun Su <junsu microsoft com>, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5696
297177
hyperv/vmbus: Use taskqueue_fast for non-performance critical messages
This gets rid of the per-cpu SWIs.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5215
297178
hyperv/vmbus: Remove NULL check for taskqueue_create_fast(M_WAITOK)
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5215
297221
hyperv/vmbus: Create per-cpu fast taskqueue for msg handling
Using one taskqueue does not work, since the EOM MSR must be written
on the msg's owner CPU.
Noticed by: Jun Su <junsu microsoft com>
Discussed with: Jun Su <junsu microsoft com>, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
sephe [Thu, 16 Jun 2016 01:57:16 +0000 (01:57 +0000)]
MFC 297802,297803(297481),297804
297802
hyperv: Identify Hyper-V features and recommends properly
Features bits will be used to detect devices, e.g. timers, which
do not have corresponding event channels.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
Rearranged by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
279803(297481)
hyperv: Register Hyper-V timer early enough for TSC freq calibration
The i8254 simulation in Hyper-V is kinda broken and is not available
in Generation 2 Hyper-V VMs, so Hyper-V timer must be registered early
enough so that it can be used to do the TSC freq calibration.
This fixes the notorious warning like this:
calcru: runtime went backwards from 50 usec to 25 usec for pid 0 (kernel)
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: kib, sephe
Tested by: kib, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5778
=================
hyperv: Resurrect r297481
This time we make sure that the TIME_REF_COUNT MSR exists.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
vangyzen [Wed, 15 Jun 2016 14:11:49 +0000 (14:11 +0000)]
MFC r301532
newsyslog: Eliminate unnecessary sleep(10) when -R and -s are specified
After going through the signal work list, during which do_sigwork()
is called and essentially does nothing because -s and -R were
specified on the command line, newsyslog will sleep for 10 seconds
as the (verbose) code says: "Pause 10 seconds to allow daemon(s)
to close log file(s)".
However, the man page verbiage for -R (and -s) seems quite clear
that this sleep() is unnecessary because the daemon was expected
to have already closed the log file before calling newsyslog.
PR: 210020
Submitted by: David A. Bright <david_a_bright@dell.com>
Sponsored by: Dell Inc.
sephe [Wed, 15 Jun 2016 09:52:01 +0000 (09:52 +0000)]
MFC 297635
hyperv/vmbus: Use default mtx for channel message queue
First of all sema_post() can't be called w/ spinlock, and the channel
message queue processing is not on hot code path, i.e. spinlock is not
necessary.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5812
sephe [Wed, 15 Jun 2016 09:39:41 +0000 (09:39 +0000)]
MFC 297219
hyperv/vmbus: use a better retry method in hv_vmbus_post_message()
Most often, hv_vmbus_post_message() doesn't fail. However, it fails
intermittently when GPADLs of large shared memory is to be established
with the host, e.g. on the hn(4) attach path: a GPADL of 15MB sendbuf
is created, for which lots of messages will be flooded to the host.
The host side tries to throttle the message rate by returning
HV_STATUS_INSUFFICIENT_BUFFERS.
Before this commit, we do several retries for failed messages, but the
delay between each retry is pretty/too low, which will cause sporadic
message posting failure. We now use large delay (>=1ms) between each
retry to fix the message posting failure.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5715
truckman [Wed, 15 Jun 2016 06:40:30 +0000 (06:40 +0000)]
MFC r301592
Don't leak addrinfo if ai->ai_addrlen <= minsiz test fails.
If the ai->ai_addrlen <= minsiz test fails, then freeaddrinfo()
does not get called to free the memory just allocated by getaddrinfo().
Fix by moving ai->ai_addrlen <= minsiz to a separate nested if
block, and keep freeaddrinfo() in the outer block so that freeaddrinfo()
will be called whenever getaddrinfo() succeeds.
truckman [Wed, 15 Jun 2016 06:33:40 +0000 (06:33 +0000)]
MFC r301582
Explicitly NUL terminate the buffer filled by fread().
The fix in r300649 was not sufficient to convince Coverity that the
buffer was NUL terminated, even with the buffer pre-zeroed. Swap
the size and nmemb arguments to fread() so that a valid lenght is
returned, which we can use to terminate the string in the buffer
at the correct location. This should also quiet the complaint about
the return value of fread() not being checked.
sephe [Wed, 15 Jun 2016 06:32:00 +0000 (06:32 +0000)]
MFC 296293,296296,296297,296305
296293
hyperv/hn: Pass channel to hv_nv_on_receive_completion()
While I'm here, staticize this function.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Modified by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
296296
hyperv/hn: Make read buffer per-channel
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reorganized by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
296297
hyperv/hn: Fix typo in comment
MFC after: 1 week
Sponsored by: Microsoft OSTC
296305
hyperv/hn: Make # of rings configurable
And since the host may not being able to allocate the # of rings
requested by us, save the # of rings allocated by the host in the
ring_inuse counters; use ring_inuse counters for run time operation.
truckman [Wed, 15 Jun 2016 06:27:43 +0000 (06:27 +0000)]
MFC r299484, r301574
r299484 | cem | 2016-05-11 15:04:28 -0700 (Wed, 11 May 2016) | 13 lines
random(6): Fix double-close
In the case where a file lacks a trailing newline, there is some "evil" code to
reverse goto the tokenizing code ("make_token") for the final token in the
file. In this case, 'fd' is closed more than once. Use a negative sentinel
value to guard close(2), preventing the double close.
Ideally, this code would be restructured to avoid this ugly construction.
Fix a (false positive?) Argument cannot be negative coverity defect.
Rather than guarding close(fd) with an fd >= 0 test and setting fd
to -1 when it is closed to avoid a potential double-close, just
move the close() call after the conditional "goto make_token". This
moves the close() call totally outside the loop to avoid the
possibility of calling it twice. This should also prevent a Coverity
warning about checking fd for validity after it was previously passed
to read().
sephe [Wed, 15 Jun 2016 06:08:05 +0000 (06:08 +0000)]
MFC 296291,301109
296291
hyperv/chan: Factor out the vcpu setting
And use it for cpu0 assignment; it does not sound right to assume that
cpu0 maps to vcpu0. And this factored out function will be exposed to
drivers, if driver specific CPU binding is needed, e.g. hn(4).
Move default cpu select after saving channel offer message. This makes
sure that all useful information of the channel has been setup.
sephe [Wed, 15 Jun 2016 05:31:35 +0000 (05:31 +0000)]
MFC 296180,297634
296180
hyperv: Use proper fence function to keep store-load order for msgs
sfence only makes sure about the store-store order, which is not
sufficient here. Use atomic_thread_fence_seq_cst() as suggested
jhb and kib (a locked op in the nutshell, which should have the
Reviewed by: jhb, kib, Jun Su <junsu microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5436
297634
hyperv: Use mb() instead of atomic_thread_fence_seq_cst()
Since atomic_thread_fence_seq_cst() will become compiler fence on UP kernel.
Reviewed by: kib, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5852
sephe [Wed, 15 Jun 2016 05:16:37 +0000 (05:16 +0000)]
MFC 296178
buf_ring/drbr: Add buf_ring_peek_clear_sc and use it in drbr_peek
Unlike buf_ring_peek, it only supports single consumer mode, and it
clears the cons_head if DEBUG_BUFRING/INVARIANTS is defined.
The normal use case of drbr_peek for network drivers is:
m = drbr_peek(br);
err = hw_spec_encap(&m); /* could m_defrag/m_collapse */
(*)
if (err) {
if (m == NULL)
drbr_advance(br);
else
drbr_putback(br, m);
/* break the loop */
}
drbr_advance(br);
The race is:
If hw_spec_encap() m_defrag or m_collapse the mbuf, i.e. the old mbuf
was freed, or like the Hyper-V's network driver, that transmission-
done does not even require the TX lock; then on the other CPU at the
(*) time, the freed mbuf could be recycled and being drbr_enqueue even
before the current CPU had the chance to call drbr_{advance,putback}.
This triggers a panic in drbr_enqueue duplicated element check, if
DEBUG_BUFRING/INVARIANTS is defined.
Use buf_ring_peek_clear_sc() in drbr_peek() to fix the above race.
This change is a NO-OP, if neither DEBUG_BUFRING nor INVARIANTS are
defined.
jamie [Wed, 15 Jun 2016 01:56:20 +0000 (01:56 +0000)]
MFC r301745:
Make sure the OSD methods for jail set and remove can't run concurrently,
by holding allprison_lock exclusively (even if only for a moment before
downgrading) on all paths that call PR_METHOD_REMOVE. Since they may run
on a downgraded lock, it's still possible for them to run concurrently
with PR_METHOD_GET, which will need to use the prison lock.
mav [Wed, 15 Jun 2016 01:42:53 +0000 (01:42 +0000)]
MFC r301293:
When negotiating NTB_SB01BASE_LOCKUP workaround, don't try to limit the
BAR size to 1MB. According to Xeon v3 specifications and my tests, that
size register is write-once and so not writeable after BIOS written it.
Instead of that, make the code work with BAR of any sufficient size,
properly calculating offset within its base. It also simplifies the code.
295918
hyperv/hn: Use IFQ_DRV_PREPEND instead of IF_PREPEND
IF_PREPEND promises out-of-order packet sending when the TX desc list
is depleted. It was overlooked and copied blindly when the transmission
path was partially rewritten.
sephe [Mon, 13 Jun 2016 08:03:53 +0000 (08:03 +0000)]
MFC 295748,295792,295793,295794
295748
hyperv/hn: Use buf_ring for txdesc list
So one spinlock is avoided, which would be potentially dangerous for
virtual machine, if the spinlock holder was scheduled out by the host,
as noted by royger.
Old spinlock based txdesc list is still kept around, so we could have
a safe fallback.
No performance regression nor improvement is observed.
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5290
295792
hyperv/hn: Add option to bind TX taskqueues to the specified CPU
It will be used to help tracking host side transmission ring selection
issue; and it will be turned on by default, once we have concrete result.
Reviewed by: adrian, Jun Su <junsu microsoft com>
Approved by: adrian (mento)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5316
295793
hyperv/hn: Enable IP header checksum offloading for WIN8 (WinServ2012)
Tested on Windows Server 2012.
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5317
295794
hyperv/hn: Free the txdesc buf_ring when the TX ring is destroyed
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5318
sephe [Mon, 13 Jun 2016 07:30:54 +0000 (07:30 +0000)]
MFC 295743,295744,295745,295746,295747
295743
hyperv/hn: Change global tunable prefix to hw.hn
And use SYSCTL+CTLFLAG_RDTUN for them.
Suggested by: adrian
Reviewed by: adrian, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5274
295744
hyperv/hn: Split RX ring data structure out of softc
This paves the way for upcoming vRSS stuffs and eases more code cleanup.
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5275
295745
hyperv/hn: Use taskqueue_enqueue()
This also eases experiment on the non-fast taskqueue.
Reviewed by: adrian, Jun Su <junsu microsoft com>
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5276
295746
hyperv/hn: Use non-fast taskqueue for transmission
Performance stays same; so no need to use fast taskqueue here.
295747
hyperv/hn: Split TX ring data structure out of softc
This paves the way for upcoming vRSS stuffs and eases more code cleanup.
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5283
sephe [Mon, 13 Jun 2016 07:03:00 +0000 (07:03 +0000)]
MFC 295740,295741,295742
295740
hyperv/hn: Set the TCP ACK/data segment aggregation limit
Set TCP ACK append limit to 1, i.e. aggregate 2 ACKs at most. Aggregating
anything more than 2 hurts TCP sending performance in hyperv. This
significantly improves the TCP sending performance when the number of
concurrent connetion is low (2~8). And it greatly stabilizes the TCP
sending performance in other cases.
Set TCP data segments aggregation length limit to 37500. Without this
limitation, hn(4) could aggregate ~45 TCP data segments for each
connection (even at 64 or more connections) before dispatching them to
socket code; large aggregation slows down ACK sending and eventually
hurts/destabilizes TCP reception performance. This setting stabilizes
and improves TCP reception performance for >4 concurrent connections
significantly.
sephe [Mon, 13 Jun 2016 06:38:46 +0000 (06:38 +0000)]
MFC 295307,295308,295309,295606
295307
hyperv: Use standard taskqueue instead of hv_work_queue
HyperV code was ported from Linux. There is an implementation of
work queue called hv_work_queue. In FreeBSD, taskqueue could be
used for the same purpose. Convert all the consumer of hv_work_queue
to use taskqueue, and remove work queue implementation.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: adrian, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4963
295308
hyperv: Use WAITOK in the places where we can wait
And convert rndis non-hot path spinlock to mutex.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: adrian, sephe
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5081
295309
hyperv: Use malloc for page allocation.
We will eventually convert them to use busdma.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: adrian, sephe, Dexuan Cui <decui microsoft com>
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5087
295606
hyperv/hn: Fix typo in comment
Noticed by: avos
Reviewed by: adrian, avos, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5199
sephe [Mon, 13 Jun 2016 05:43:42 +0000 (05:43 +0000)]
MFC 295296,295297,295298,295299,295300,295301
295296
hyperv/hn: Avoid duplicate csum features settings
- Record csum features in softc, so we don't need to duplicate the
logic from attach path to ioctl path.
- Protect if_capenable and if_hwassist changes by main lock.
- Prefer turn on/off bits in if_hwassist explicitly instead of using
XOR.
Reviewed by: adrian, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5085
295297
hyperv/hn: Reorganize TX csum offloading
- For non-TSO offloading, we don't need to access mbuf to know
which csum offloading is requested, we can just use the
CSUM_{IP,TCP,UDP} in the csum_flags.
- For TSO offloading, we still can depend on CSUM_{TSO4,TSO6}
in the csum_flags to tell whether the TSO packet is an IPv4
TSO packet or an IPv6 TSO packet.
This streamlines csum offloading handling (remove the two goto)
and allows us the nuke the unnecessary get_transport_proto_type().
Reviewed by: adrian, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5098
295298
hyperv/hn: Enable IP header checksum offloading
So that:
- TCP/IP stack will not do unnecessary IP header checksum for TSO
packets.
- Reduce guest load for non-TSO IP packets.
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5099
295299
hyperv/hn: Enable UDP RXCSUM
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5102
295300
hyperv/hn: Add sysctls to trust host side UDP and IP csum verification
Reviewed by: adrian, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5103
295301
hyperv/hn: Obey IFCAP_RXCSUM configure
Reviewed by: adrian
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5104
sephe [Mon, 13 Jun 2016 05:06:07 +0000 (05:06 +0000)]
MFC 294886
hyperv/vmbus: Event handling code refactor.
- Use taskqueue instead of swi for event handling.
- Scan the interrupt flags in filter
- Disable ringbuffer interrupt mask in filter to ensure no unnecessary
interrupts.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: adrian, sephe, Dexuan <decui microsoft com>
Approved by: adrian (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4920
sephe [Mon, 13 Jun 2016 03:28:37 +0000 (03:28 +0000)]
MFC 294701,294702,294703,294705,294788
294701
hyperv/hn: Use m_copydata for chimney sending.
While I'm here, move stack variables near their usage.
Reviewed by: adrian, delphij, Jun Su <junsu microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4977
294702
hyperv/hn: Remove unnecessary zeroing out the netvsc_packet
All used fields are setup one by one, so there is no need to zero
out this large struct.
While I'm here, move the stack variable near its usage.
Reviewed by: adrian, delphij, Jun Su <junsu microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4978
294703
hyperv/hn: Trust host TCP segment checksum verification by default.
According to all available information, VMSWITCH always does the
TCP segment checksum verification before sending the segment to
guest.
Reviewed by: adrian, delphij, Hongjiang Zhang <honzhan microsoft com>
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4991
294705
hyperv/vmbus: Avoid extra copy of page information.
The page information array could contain up to 32 elements (i.e. 512B).
And on network side w/ TSO, 11+ (176B+) elements, i.e. ~44K TSO packet,
in the page information array is quite common.
This saves us some cpu cycles.
Reviewed by: adrian, delphij
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4992
294788
hyperv/hn: Improve sending performance
- Avoid main lock contention by trylock for if_start, if that fails,
schedule TX taskqueue for if_start
- Don't do direct sending if the packet to be sent is large, e.g.
TSO packet.
This change gives me stable 9.1Gbps TCP sending performance w/ TSO
over a 10Gbe directly connected network (the performance fluctuated
between 4Gbps and 9Gbps before this commit). It also improves non-
TSO TCP sending performance a lot.
Reviewed by: adrian, royger
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5074
sephe [Mon, 13 Jun 2016 02:54:58 +0000 (02:54 +0000)]
MFC 293653
hyperv/kvp_daemon: Make poll(2) block indefinitely
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, me, adrain
Approved by: adrian
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4762
dim [Sun, 12 Jun 2016 11:45:45 +0000 (11:45 +0000)]
MFC r300967:
Stop exposing the C11 _Atomic() macro in <sys/cdefs.h>, when compiling
for C++. It clashes with the one in libc++'s <atomic> header.
(Previously, the _Atomic() macro was defined in <stdatomic.h>, which is
only for use with C11, but for various reasons it was moved to its
current location in r251804.)
ngie [Sun, 12 Jun 2016 05:57:42 +0000 (05:57 +0000)]
MFC r300395:
Silence top(1) compiler warnings
The contrib/top code is no longer maintained upstream (last pulled 16 years
ago). The K&R-style followed by the code spews -Wimplicit-int and -Wreturn-type
warnings, amongst others. This silences 131 warnings with as little modification
as possible by adding necessary return types, definitions, headers, and header
guards, and missing header includes.
The 5 warnings that remain are due to undeclared ncurses references. I didn't
include curses.h and term.h because there are several local functions and macros
that conflict with those definitions.
sys/vmmeter.h: Fix trivial '-Wsign-compare' warning in common header
Frankly, it doesn't make sense for vm_pageout_wakeup_thresh to have a negative
value (it is only ever set to a fraction of v_free_min, which is unsigned and
also obviously non-negative). But I'm not going to try and convert every
non-negative scalar in the VM to unsigned today, so just cast it for the
comparison.
sys/vmmeter.h: Fix trivial '-Wsign-compare' warning in common header
Frankly, it doesn't make sense for vm_pageout_wakeup_thresh to have a negative
value (it is only ever set to a fraction of v_free_min, which is unsigned and
also obviously non-negative). But I'm not going to try and convert every
non-negative scalar in the VM to unsigned today, so just cast it for the
comparison.
pfg [Fri, 10 Jun 2016 22:07:17 +0000 (22:07 +0000)]
MFC r300301, r300319:
GCC: Add support for named initializers for anonymous structs/unions.
This is a C11 feature that is starting to get used in places such as Mesa.
This implementation takes a different approach to upstream and is
therefore not covered by GPLv3.
ngie [Fri, 10 Jun 2016 18:34:31 +0000 (18:34 +0000)]
MFC r295618,r300100,r300531:
r295618 (by cem):
NTB: workaround for high traffic hardware hang
This patch comes from Dave Jiang's Linux tree, davejiang/ntb. It hasn't
been accepted into Linus' tree, so I do not have an authoritative SHA1
to point at. Original commit log:
=====================================================================
A hardware errata causes the NTB to hang when heavy bi-directional
traffic in addition to the usage of BAR0/1 (where the registers reside,
including the doorbell registers to trigger interrupts).
This workaround is only available on Haswell and Broadwell platform.
The workaround is to enable split BAR in the BIOS to allow the 64bit
BAR4 to be split into two 32bit BAR4 and BAR5. The BAR4 shall be pointed
to LAPIC region of the remote host. We will bypass the db mechanism and
directly trigger the MSIX interrupts. The offsets and vectors are
exchanged during transport scratch pad negotiation. The scratch pads are
now overloaded in order to allow the exchange of the information. This
gets around using the doorbell and prevents the lockup with additional
pcode changes in BIOS.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
=====================================================================
Notable changes in the FreeBSD version of this patch:
* The MSIX BAR is configurable, like hw.ntb.b2b_mw_idx (msix_mw_idx).
The Linux version of the patch only uses BAR4.
* MSIX negotiation aborts if the link goes down.
Obtained from: Linux (Dual BSD/GPL driver)
r300100 (by cem):
ntb_hw(4): Add sysctls for administrative/test link config, state
dev.ntb_hw.0.admin_up=0/1: Like ifconfig UP/DOWN.
dev.ntb_hw.0.active=0/1: Like ifconfig 'status'
r300531 (by cem):
ntb_hw(4): Only record the first three MSIX vectors
Don't overrun the msix_data array by reading the (unused) link state
interrupt information.
ngie [Fri, 10 Jun 2016 18:21:05 +0000 (18:21 +0000)]
MFC r299513,r299515:
r299513 (by cem):
rtadvd(8): Don't use-after-free
This whole block of code as committed fully formed in r224144. I'm not really
sure what the intent was, but it seems plausible that !persist ifis could need
other member cleanup. Don't free the object until after we've finished
cleaning its members.
ngie [Fri, 10 Jun 2016 18:10:32 +0000 (18:10 +0000)]
MFC r299510:
r299510 (by cem):
libmp: Fix trivial buffer overrun
fgetln yields a non-NUL-terminated buffer and its length. This routine
attempted to NUL-terminate it, but did not allocate space for the NUL. So,
allocate space for the NUL.
ngie [Fri, 10 Jun 2016 18:02:51 +0000 (18:02 +0000)]
MFC r299507:
r299507 (by cem):
rtadvd(8): Fix a typo in full msg receive logic
Check against the size of the struct, not the pointer. Previously, a message
with a cm_len between 9 and 23 (inclusive) could cause int msglen to underflow
and read(2) to be invoked with msglen size (implicitly cast to signed),
overrunning the caller-provided buffer.
All users of cm_recv() supply a stack buffer.
On the other hand, the rtadvd control socket appears to only be writable by the
owner, who is probably root.
While here, correct some types to be size_t or ssize_t.
CID: 1008477
Security: unix socket remotes may overflow stack in rtadvd
ngie [Fri, 10 Jun 2016 17:57:50 +0000 (17:57 +0000)]
MFC r300836:
Quell false positives in svc_vc_create and svc_vc_create_conn with cd and xprt
Both cd and xprt will be non-NULL after their respective malloc(9) wrappers are
called (mem_alloc and svc_xprt_alloc, which calls mem_alloc) as mem_alloc
always gets called with M_WAITOK|M_ZERO today. Thus, testing for them being
non-NULL is incorrect -- it misleads Coverity and it misleads the reader.
Remove some unnecessary NULL initializations as a follow up to help solidify
the fact that these pointers will be initialized properly in sys/rpc/.. with
the interfaces the way they are currently.
ngie [Fri, 10 Jun 2016 15:47:20 +0000 (15:47 +0000)]
MFC r299503,r299504:
r299503 (by cem):
snd_hda(4): Don't pass bogus sizeof()s to unused sysctl arg2 parameter
None of the sysctl handlers in hdaa use the arg2 parameter, so just pass zero
instead. Additionally, the sizes being passed in were suspect (size of the
pointer rather than the value).
ngie [Fri, 10 Jun 2016 14:48:10 +0000 (14:48 +0000)]
MFC r299495:
r299495 (by cem):
libkrb5: Fix potential double-free
If krb5_make_principal fails, tmp_creds.server may remain a pointer to freed
memory and then be double-freed. After freeing it the first time, initialize
it to NULL, which causes subsequent krb5_free_principal calls to do the right
thing.
ngie [Fri, 10 Jun 2016 14:40:41 +0000 (14:40 +0000)]
MFC r299491:
r299491 (by cem):
route6d(8): Fix potential double-free
In the case that the subsequent sysctl(3) call failed, 'buf' could be free(3)ed
repeatedly. It isn't clear to me that that case is possible, but be clear and
do the right thing in case it is.
ngie [Fri, 10 Jun 2016 14:08:41 +0000 (14:08 +0000)]
MFC r299460:
r299460 (by cem):
fsck_ffs: Don't overrun mount device buffer
Maybe this case is impossible. Either way, when attempting to "/dev/"-prefix a
non-global device name, check that we do not overrun the f_mntfromname buffer.
In this case, truncating (with strlcpy or similar) would not be useful, since
the f_mntfromname result of getmntpt() is passed directly to open(2) later.
* Articles, Papers and Presentations
<http://caia.swin.edu.au/freebsd/aqm/papers.html>
* Patches and Tools <http://caia.swin.edu.au/freebsd/aqm/downloads.html>
Overview
Recent years have seen a resurgence of interest in better managing
the depth of bottleneck queues in routers, switches and other places
that get congested. Solutions include transport protocol enhancements
at the end-hosts (such as delay-based or hybrid congestion control
schemes) and active queue management (AQM) schemes applied within
bottleneck queues.
The notion of AQM has been around since at least the late 1990s
(e.g. RFC 2309). In recent years the proliferation of oversized
buffers in all sorts of network devices (aka bufferbloat) has
stimulated keen community interest in four new AQM schemes -- CoDel,
FQ-CoDel, PIE and FQ-PIE.
The IETF AQM working group is looking to document these schemes,
and independent implementations are a corner-stone of the IETF's
process for confirming the clarity of publicly available protocol
descriptions. While significant development work on all three schemes
has occured in the Linux kernel, there is very little in FreeBSD.
Project Goals
This project began in late 2015, and aims to design and implement
functionally-correct versions of CoDel, FQ-CoDel, PIE and FQ_PIE
in FreeBSD (with code BSD-licensed as much as practical). We have
chosen to do this as extensions to FreeBSD's ipfw/dummynet firewall
and traffic shaper. Implementation of these AQM schemes in FreeBSD
will:
* Demonstrate whether the publicly available documentation is
sufficient to enable independent, functionally equivalent implementations
* Provide a broader suite of AQM options for sections the networking
community that rely on FreeBSD platforms
Program Members:
* Rasool Al Saadi (developer)
* Grenville Armitage (project lead)
Acknowledgements:
This project has been made possible in part by a gift from the
Comcast Innovation Fund.
[Remove some code that was added to the mq_append() inline function in
HEAD by r258457, which was not merged to stable/10. The AQM patch
moved mq_append() from ip_dn_io.c to the new file ip_dn_private.h, so
we need to remove that copy of the r258457 changes.]
------------------------------------------------------------------------
r300781 | truckman | 2016-05-26 14:44:52 -0700 (Thu, 26 May 2016) | 7 lines
Modify BOUND_VAR() macro to wrap all of its arguments in () and tweak
its expression to work on powerpc and sparc64 (gcc compatibility).
Cast some expressions that multiply a long long constant by a
floating point constant to int64_t. This avoids the runtime
conversion of the the other operand in a set of comparisons from
int64_t to floating point and doing the comparisions in floating
point.
Replace constant expressions that contain multiplications by
fractional floating point values with integer divides. This will
eliminate any chance that the compiler will generate code to evaluate
the expression using floating point at runtime.
Suggested by: bde
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 8 days (with r300779 and r300949)