- Make sure that STIMER0 is disabled before writting to it, since
writing to an enabled STIMER will result in undefined behaviour.
- It is unnecessary to reconfigure STIMER0 upon each et_start().
- Make sure that MSR_HV_REF_TIME_COUNT will not return 0, since
writing 0 to STIMER_COUNT will disable the target STIMER.
sephe [Thu, 23 Jun 2016 08:38:01 +0000 (08:38 +0000)]
MFC 300981
tcp: Don't prematurely drop receiving-only connections
If the connection was persistent and receiving-only, several (12)
sporadic device insufficient buffers would cause the connection be
dropped prematurely:
Upon ENOBUFS in tcp_output() for an ACK, retransmission timer is
started. No one will stop this retransmission timer for receiving-
only connection, so the retransmission timer promises to expire and
t_rxtshift is promised to be increased. And t_rxtshift will not be
reset to 0, since no RTT measurement will be done for receiving-only
connection. If this receiving-only connection lived long enough
(e.g. >350sec, given the RTO starts from 200ms), and it suffered 12
sporadic device insufficient buffers, i.e. t_rxtshift >= 12, this
receiving-only connection would be dropped prematurely by the
retransmission timer.
We now assert that for data segments, SYNs or FINs either rexmit or
persist timer was wired upon ENOBUFS. And don't set rexmit timer
for other cases, i.e. ENOBUFS upon ACKs.
Discussed with: lstewart, hiren, jtl, Mike Karels
MFC after: 3 weeks
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5872
For channel0, it will never be processed on event handling path,
so there is no need to install it. After skipping in the channel0
installation, we could discard the channel0 check on event
handling hot code path.
r300892:
Rename function to be less generic.
r300893:
Don't truncate existing error when writing the log.
r301130:
Enable filemon on all architectures.
r301404:
Support all architectures by just using sysent.
r301414:
Fix build after r301404.
r301460:
Cleanup COMPAT_FREEBSD32 support.
r297156:
Track filemon usage via a proc.p_filemon pointer rather than its own lists.
r297157:
Stop tracking stat(2).
r297158:
Consolidate open(2) and openat(2) code.
r297159:
Use curthread for vn_fullpath.
r297161:
Attempt to use the namecache for openat(2) path resolution.
r297172:
Consolidate common link(2) logic.
r297200:
Follow-up r297156: Close the log in filemon_dtr rather than in the last
reference.
r297201:
Return any log write failure encountered when closing the filemon fd.
r297202:
Remove unused done argument to copyinstr(9).
r297203:
Handle copyin failures.
r297256:
Remove unneeded return left from refactoring.
Support for compression has been available from July 2007 but it
was never imported due to concerns with patents once held by
STAC/HiFn. The issues have clearly been resolved so bring it
in now.
Special thanks to Brett Glass for preserving the code and
pointing documentation for the expiration case.
sephe [Tue, 21 Jun 2016 05:33:26 +0000 (05:33 +0000)]
MFC 298449,298568
298449
hyperv/et: Make Hyper-V event timer a device.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5957
298568
hyperv/et: Strip extra white space in function name
sephe [Tue, 21 Jun 2016 04:51:55 +0000 (04:51 +0000)]
MFC 297931,298022
297931
Expose doreti as a global symbol on amd64 and i386.
doreti provides the common code path for returning from interrupt
andlers on x86. Exposing doreti as a global symbol allows kernel
modules to include low-level interrupt handlers instead of requiring
all low-level handlers to be statically compiled into the kernel.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: kib
298022
hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: jhb, kib, sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5910
truckman [Mon, 20 Jun 2016 19:00:47 +0000 (19:00 +0000)]
MFC r300240
Change net.inet.tcp.ecn.enable sysctl mib from a binary off/on
control to a three way setting.
0 - Totally disable ECN. (no change)
1 - Enable ECN if incoming connections request it. Outgoing
connections will request ECN. (no change from present != 0 setting)
2 - Enable ECN if incoming connections request it. Outgoing
conections will not request ECN.
Change the default value of net.inet.tcp.ecn.enable from 0 to 2.
Linux version 2.4.20 and newer, Solaris, and Mac OS X 10.5 and newer have
similar capabilities. The actual values above match Linux, and the default
matches the current Linux default.
cy [Sun, 19 Jun 2016 00:39:23 +0000 (00:39 +0000)]
MFC r300259:
Enable the two ip_frag tuneables. The code is there but the two
ip_frag tuneables aren't registered in the ipf_tuners linked list.
This commmit enables the two existing ip_frag tuneables by registering
them.
ed [Sat, 18 Jun 2016 12:44:14 +0000 (12:44 +0000)]
MFC r301406:
Don't test for INKERNEL to check whether we're in kernel space.
It turns out that <machine/param.h> actually defines a macro under this
name, even when we're not in kernelspace. This causes us to suppress
some macro definitions that are used by userspace apps.
emaste [Sat, 18 Jun 2016 01:23:38 +0000 (01:23 +0000)]
MFC r300231: elf_common.h: add section header flag and dynamic types
SHF_COMPRESSED section contains compressed data
DT_TLSDESC_PLT Location of PLT entry for TLS descriptor resolver calls
DT_TLSDESC_GOT Location of GOT entry used by resolver PLT entry
mm [Fri, 17 Jun 2016 22:40:10 +0000 (22:40 +0000)]
MFC r299529,r299540,r299576,r299896:
r299529,r299540:
Update libarchive to 3.2.0
New features:
- new bsdcat command-line utility
- LZ4 compression (in src only via external utility from ports)
- Warc format support
- 'Raw' format writer
- Zip: Support archives >4GB, entries >4GB
- Zip: Support encrypting and decrypting entries
- Zip: Support experimental streaming extension
- Identify encrypted entries in several formats
- New --clear-nochange-flags option to bsdtar tries to remove noschg and
similar flags before deleting files
- New --ignore-zeros option to bsdtar to handle concatenated tar archives
- Use multi-threaded LZMA decompression if liblzma supports it
- Expose version info for libraries used by libarchive
arybchik [Fri, 17 Jun 2016 09:04:06 +0000 (09:04 +0000)]
MFC r301427
sfxge(4): allow firmware to auto-configure event queues on Medford
On Medford, licenses are required to enable RX and event cut through and to
disable RX batching. To avoid the need for the driver to make decisions based on
the licensing state, the MC_CMD_INIT_EVQ has been extended to allow us to leave
the decision to the firmware. If the adapter is licensed for low-latency use,
the firmware will choose the optimal settings for latency, otherwise it will use
the best settings for throughput.
For Huntington we still need to choose the settings ourselves.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6717
arybchik [Fri, 17 Jun 2016 09:02:51 +0000 (09:02 +0000)]
MFC r301309
sfxge(4): always be ready to receive batched events
When the low-latency firmware variant is running, it is reported as not
being capable of batching RX events, but it can still do so if the
FORCE_EV_MERGING flag is set on an RXQ. Therefore we need to handle
batched RX events even if the capability isn't set.
If this bug is fixed in the firmware such that the capability is set
even when running the low-latency firmware variant, it will almost
always be reported so I don't think we lose much by removing the check.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6705
arybchik [Fri, 17 Jun 2016 09:01:11 +0000 (09:01 +0000)]
MFC r301308
sfxge(4): add helper to compute timer quantum
This also adjusts the timer values used to match the Linux net
driver implementation:
a) non-zero time intervals should result in at least one quantum
b) timer load/reload values are only zero biased for Falcon/Siena
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6704
arybchik [Fri, 17 Jun 2016 08:59:08 +0000 (08:59 +0000)]
MFC r301237
sfxge(4): support EVQ timer workaround via MCDI
Submitted by: Andy Moreton <amoreton at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/6675
arybchik [Fri, 17 Jun 2016 08:56:01 +0000 (08:56 +0000)]
MFC r301122
sfxge(4): set moderation in efx_ev_qcreate
This simplifies setting an initial interrupt moderation value, and
avoids most calls to evx_ev_qmoderate from contexts where MCDI is
not allowed (MCDI is need for an EVQ timer workaround in a later patch).
Submitted by: Andy Moreton <amoreton at solarflare.com>
Reviewed by: gnn
Sponsored by: Solarflare Communications, Inc.
Differential Revision: https://reviews.freebsd.org/D6673
sephe [Thu, 16 Jun 2016 03:04:16 +0000 (03:04 +0000)]
MFC 297142,297143,297176,297177,297178,297221
297142
hyperv: Factor out snprinf_hv_guid()
Submitted by: Ju Sun <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5651
Submitted by: Jun Su <junsu microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5669
297176
hyperv/evttimer: Use an independent message slot so that it can work
Using the same message slot as the other types of the messages has
the side effect that the event timer message could be deferred to
the swi threads to run (lacking of trapframe and the original code
didn't even handle that, so the event timer was actually broken).
As of this commit we use an independent message slot for event timer,
so that we could handle all of event timer messages in the interrupt
handler directly. Note, the message slot for event timer is still
bind to the same interrupt vector as the other types of messages.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe
Discussed with: Jun Su <junsu microsoft com>, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5696
297177
hyperv/vmbus: Use taskqueue_fast for non-performance critical messages
This gets rid of the per-cpu SWIs.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5215
297178
hyperv/vmbus: Remove NULL check for taskqueue_create_fast(M_WAITOK)
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5215
297221
hyperv/vmbus: Create per-cpu fast taskqueue for msg handling
Using one taskqueue does not work, since the EOM MSR must be written
on the msg's owner CPU.
Noticed by: Jun Su <junsu microsoft com>
Discussed with: Jun Su <junsu microsoft com>, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
sephe [Thu, 16 Jun 2016 01:57:16 +0000 (01:57 +0000)]
MFC 297802,297803(297481),297804
297802
hyperv: Identify Hyper-V features and recommends properly
Features bits will be used to detect devices, e.g. timers, which
do not have corresponding event channels.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
Rearranged by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
279803(297481)
hyperv: Register Hyper-V timer early enough for TSC freq calibration
The i8254 simulation in Hyper-V is kinda broken and is not available
in Generation 2 Hyper-V VMs, so Hyper-V timer must be registered early
enough so that it can be used to do the TSC freq calibration.
This fixes the notorious warning like this:
calcru: runtime went backwards from 50 usec to 25 usec for pid 0 (kernel)
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: kib, sephe
Tested by: kib, sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5778
=================
hyperv: Resurrect r297481
This time we make sure that the TIME_REF_COUNT MSR exists.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
vangyzen [Wed, 15 Jun 2016 14:11:49 +0000 (14:11 +0000)]
MFC r301532
newsyslog: Eliminate unnecessary sleep(10) when -R and -s are specified
After going through the signal work list, during which do_sigwork()
is called and essentially does nothing because -s and -R were
specified on the command line, newsyslog will sleep for 10 seconds
as the (verbose) code says: "Pause 10 seconds to allow daemon(s)
to close log file(s)".
However, the man page verbiage for -R (and -s) seems quite clear
that this sleep() is unnecessary because the daemon was expected
to have already closed the log file before calling newsyslog.
PR: 210020
Submitted by: David A. Bright <david_a_bright@dell.com>
Sponsored by: Dell Inc.
sephe [Wed, 15 Jun 2016 09:52:01 +0000 (09:52 +0000)]
MFC 297635
hyperv/vmbus: Use default mtx for channel message queue
First of all sema_post() can't be called w/ spinlock, and the channel
message queue processing is not on hot code path, i.e. spinlock is not
necessary.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5812
sephe [Wed, 15 Jun 2016 09:39:41 +0000 (09:39 +0000)]
MFC 297219
hyperv/vmbus: use a better retry method in hv_vmbus_post_message()
Most often, hv_vmbus_post_message() doesn't fail. However, it fails
intermittently when GPADLs of large shared memory is to be established
with the host, e.g. on the hn(4) attach path: a GPADL of 15MB sendbuf
is created, for which lots of messages will be flooded to the host.
The host side tries to throttle the message rate by returning
HV_STATUS_INSUFFICIENT_BUFFERS.
Before this commit, we do several retries for failed messages, but the
delay between each retry is pretty/too low, which will cause sporadic
message posting failure. We now use large delay (>=1ms) between each
retry to fix the message posting failure.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5715
truckman [Wed, 15 Jun 2016 06:40:30 +0000 (06:40 +0000)]
MFC r301592
Don't leak addrinfo if ai->ai_addrlen <= minsiz test fails.
If the ai->ai_addrlen <= minsiz test fails, then freeaddrinfo()
does not get called to free the memory just allocated by getaddrinfo().
Fix by moving ai->ai_addrlen <= minsiz to a separate nested if
block, and keep freeaddrinfo() in the outer block so that freeaddrinfo()
will be called whenever getaddrinfo() succeeds.
truckman [Wed, 15 Jun 2016 06:33:40 +0000 (06:33 +0000)]
MFC r301582
Explicitly NUL terminate the buffer filled by fread().
The fix in r300649 was not sufficient to convince Coverity that the
buffer was NUL terminated, even with the buffer pre-zeroed. Swap
the size and nmemb arguments to fread() so that a valid lenght is
returned, which we can use to terminate the string in the buffer
at the correct location. This should also quiet the complaint about
the return value of fread() not being checked.
sephe [Wed, 15 Jun 2016 06:32:00 +0000 (06:32 +0000)]
MFC 296293,296296,296297,296305
296293
hyperv/hn: Pass channel to hv_nv_on_receive_completion()
While I'm here, staticize this function.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Modified by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
296296
hyperv/hn: Make read buffer per-channel
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reorganized by: sephe
MFC after: 1 week
Sponsored by: Microsoft OSTC
296297
hyperv/hn: Fix typo in comment
MFC after: 1 week
Sponsored by: Microsoft OSTC
296305
hyperv/hn: Make # of rings configurable
And since the host may not being able to allocate the # of rings
requested by us, save the # of rings allocated by the host in the
ring_inuse counters; use ring_inuse counters for run time operation.
truckman [Wed, 15 Jun 2016 06:27:43 +0000 (06:27 +0000)]
MFC r299484, r301574
r299484 | cem | 2016-05-11 15:04:28 -0700 (Wed, 11 May 2016) | 13 lines
random(6): Fix double-close
In the case where a file lacks a trailing newline, there is some "evil" code to
reverse goto the tokenizing code ("make_token") for the final token in the
file. In this case, 'fd' is closed more than once. Use a negative sentinel
value to guard close(2), preventing the double close.
Ideally, this code would be restructured to avoid this ugly construction.
Fix a (false positive?) Argument cannot be negative coverity defect.
Rather than guarding close(fd) with an fd >= 0 test and setting fd
to -1 when it is closed to avoid a potential double-close, just
move the close() call after the conditional "goto make_token". This
moves the close() call totally outside the loop to avoid the
possibility of calling it twice. This should also prevent a Coverity
warning about checking fd for validity after it was previously passed
to read().
sephe [Wed, 15 Jun 2016 06:08:05 +0000 (06:08 +0000)]
MFC 296291,301109
296291
hyperv/chan: Factor out the vcpu setting
And use it for cpu0 assignment; it does not sound right to assume that
cpu0 maps to vcpu0. And this factored out function will be exposed to
drivers, if driver specific CPU binding is needed, e.g. hn(4).
Move default cpu select after saving channel offer message. This makes
sure that all useful information of the channel has been setup.
sephe [Wed, 15 Jun 2016 05:31:35 +0000 (05:31 +0000)]
MFC 296180,297634
296180
hyperv: Use proper fence function to keep store-load order for msgs
sfence only makes sure about the store-store order, which is not
sufficient here. Use atomic_thread_fence_seq_cst() as suggested
jhb and kib (a locked op in the nutshell, which should have the
Reviewed by: jhb, kib, Jun Su <junsu microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5436
297634
hyperv: Use mb() instead of atomic_thread_fence_seq_cst()
Since atomic_thread_fence_seq_cst() will become compiler fence on UP kernel.
Reviewed by: kib, Dexuan Cui <decui microsoft com>
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5852
sephe [Wed, 15 Jun 2016 05:16:37 +0000 (05:16 +0000)]
MFC 296178
buf_ring/drbr: Add buf_ring_peek_clear_sc and use it in drbr_peek
Unlike buf_ring_peek, it only supports single consumer mode, and it
clears the cons_head if DEBUG_BUFRING/INVARIANTS is defined.
The normal use case of drbr_peek for network drivers is:
m = drbr_peek(br);
err = hw_spec_encap(&m); /* could m_defrag/m_collapse */
(*)
if (err) {
if (m == NULL)
drbr_advance(br);
else
drbr_putback(br, m);
/* break the loop */
}
drbr_advance(br);
The race is:
If hw_spec_encap() m_defrag or m_collapse the mbuf, i.e. the old mbuf
was freed, or like the Hyper-V's network driver, that transmission-
done does not even require the TX lock; then on the other CPU at the
(*) time, the freed mbuf could be recycled and being drbr_enqueue even
before the current CPU had the chance to call drbr_{advance,putback}.
This triggers a panic in drbr_enqueue duplicated element check, if
DEBUG_BUFRING/INVARIANTS is defined.
Use buf_ring_peek_clear_sc() in drbr_peek() to fix the above race.
This change is a NO-OP, if neither DEBUG_BUFRING nor INVARIANTS are
defined.
jamie [Wed, 15 Jun 2016 01:56:20 +0000 (01:56 +0000)]
MFC r301745:
Make sure the OSD methods for jail set and remove can't run concurrently,
by holding allprison_lock exclusively (even if only for a moment before
downgrading) on all paths that call PR_METHOD_REMOVE. Since they may run
on a downgraded lock, it's still possible for them to run concurrently
with PR_METHOD_GET, which will need to use the prison lock.