Creating some files together to do the build system changes in one go.
Submitted by: Mark Spender <mspender at solarflare.com>
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D4859
Sepherosa Ziehau [Tue, 12 Jan 2016 01:50:56 +0000 (01:50 +0000)]
hyperv/hn: Avoid mbuf cluster allocation, if the packet is small.
This one mainly avoids mbuf cluster allocation for TCP ACKs during
TCP sending tests. And it gives me ~200Mbps improvement (4.7Gbps
-> 4.9Gbps), when running iperf3 TCP sending test w/ 16 connections.
While I'm here, nuke the unnecessary zeroing out pkthdr.csum_flags.
Reviewed by: adrain
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4853
Sepherosa Ziehau [Tue, 12 Jan 2016 01:41:34 +0000 (01:41 +0000)]
hyperv/hn: Implement SIOC[SG]IFMEDIA support
Many applications and kernel modules (e.g. bridge) rely on the ifmedia
status report; give them what they want.
Submitted by: Dexuan Cui <decui microsoft com>
Reviewed by: Jun Su <junsu microsoftc com>, me, adrian
Modified by: me (minor)
Original differential: https://reviews.freebsd.org/D4611
Differential Revision: https://reviews.freebsd.org/D4852
Approved by: adrian (mentor)
Sponsored by: Microsoft OSTC
Sepherosa Ziehau [Tue, 12 Jan 2016 01:30:51 +0000 (01:30 +0000)]
hyperv/hn: Implement LRO
- Implement the LRO using tcp_lro APIs, and LRO is enabled by default.
- Add several stats sysctl nodes.
- Check IP/TCP length before sending the packet to tcp_lro_rx(), if host
does not provide RX csum information (*); and add an option through
sysctl to always trust host TCP segment csum checks (default is off).
- Add sysctl to control the LRO entry depth; it is disabled by default.
It is used to avoid holding too much TCP segments in driver. Limiting
the LRO entry depth helps a lot in a one/two streams RX test.
This one 3x the RX performance on my local test (3Gbps -> 10Gbps), and
~2x the RX performance over a directly connected 40Ge network (5Gbps ->
9Gbps).
(*) It seems the host stops supplying csum information, once the network
load is high. This still needs investigation...
Reviewed by: Hongjiang Zhang <honzhan microsoft com>,
Dexuan Cui <decui microsoft com>,
Jun Su <junsu microsoft com>,
delphij
Tested by: me (local),
Hongjiang Zhang <honzhan microsoft com>
(directly connected 40Ge)
Approved by: delphij (mentor), adrian (mentor, no objection)
With feedback from: delphij, Hongjiang Zhang <honzhan microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4824
Alan Somers [Mon, 11 Jan 2016 22:15:46 +0000 (22:15 +0000)]
Fix importing l2arc device by guid
After r292066, vdev_geom verifies both the vdev and pool guids of device
labels during open. However, spare and l2arc devices don't have pool guids,
so opening them by guid will fail (opening by path, when the pathname is
known, still succeeds). This change allows a vdev to be opened by guid if
the label contains no pool_guid, which is the case for inactive spares and
l2arc devices.
Enji Cooper [Mon, 11 Jan 2016 22:01:33 +0000 (22:01 +0000)]
Similar to r293704, fix theoretical leak of netconfig(3) resources in
__rpcbind_is_up(..) if getnetconfig(3) is partly successful in allocating
resources, but not completely successful by moving the endnetconfig(3) call
up before we return from the function if nconf == NULL.
Enji Cooper [Mon, 11 Jan 2016 21:56:53 +0000 (21:56 +0000)]
Fix theoretical leak of netconfig(3) resources in svcunix_create(..)
In the event that the getconfig(3) call in svcunix_create is partly successful,
some of the netconfig(3) resources allocated might be leaked if the call returns
NULL as endnetconfig(3) wasn't called explicitly in that case. Ensure that the
resources are fully cleaned up by going to the `done` label, which will call
endnetconfig(3) for us.
Colin Percival [Mon, 11 Jan 2016 21:02:30 +0000 (21:02 +0000)]
Add two more assertions to catch busdma problems. Each segment provided
by busdma to the blkfront driver must be an integer number of sectors,
and must be aligned in memory on a "sector" boundary.
Having these assertions yesterday would have made finding the bug fixed
in r293698 somewhat easier.
Colin Percival [Mon, 11 Jan 2016 20:38:39 +0000 (20:38 +0000)]
Fix a bug introduced in r291716:
"The problem with the approach taken both in _bus_dmamap_load_pages and
bus_dmamap_load_ma_triv is that they split the request buffer into
arbitrary chunks based on page boundaries, creating segments that no
longer have a size that's a multiple of the sector size. This breaks
drivers like blkfront (and probably other stuff)." [1]
This was most easily triggered by running `fsck /` on a system running
in Xen (e.g. Amazon EC2) but also showed up via growfs(8) and probably
many other userland tools which access the disk directly.
Patch by: royger [1]
"Thinks this should be fine" by: ken
Change the type of newsize argument in the smbfs_smb_setfsize() function
from int to int64.
MSDN says that SMB_SET_FILE_END_OF_FILE_INFO uses signed 64-bit integer
to specify offset, but since smbfs_smb_setfsize() has used plain int,
a value was truncated in case when offset was larger than 2G.
https://msdn.microsoft.com/en-us/library/ff469975.aspx
In particular, now `truncate -s 10G` will work correctly on the mounted
SMB share.
Reported and tested by: Eugene Grosbein <eugen at grosbein dot net>
MFC after: 1 week
Alan Somers [Mon, 11 Jan 2016 17:57:26 +0000 (17:57 +0000)]
Record physical path information in ZFS Vdevs
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
If available, record the physical path of a vdev in ZFS meta-data.
Do this both when opening the vdev, and when receiving an attribute
change notification from GEOM.
Make vdev_geom_close() synchronous instead of deferring its work to
a GEOM event handler. There is no benefit to deferring the work and
this prevents a future open call from referencing a consumer that is
scheduled for destruction. The close followed by an immediate open
will occur during a vdev reprobe triggered by any type of I/O error.
Consolidate vdev_geom_close() and vdev_geom_detach() into
vdev_geom_close() and vdev_geom_close_locked(). This also moves the
cross linking operations between vdev and GEOM consumer into a
single place (linking in vdev_geom_attach() and unlinking in
vdev_geom_close_locked()).
Allan Jude [Mon, 11 Jan 2016 15:35:29 +0000 (15:35 +0000)]
DIOCGSECTORSIZE expects to write to a u_int, but struct zfs_probe_args
member secsz was a uint16_t
sys/boot/zfs/zfs.c has a probe args structure member, secsz, that is a
uint16_t for media sector size; it is used as an argument for ioctl()
at line 484. however, this ioctl writes 32 bits of data (u_int *) and
therefore this ioctl will overwrite and corrupt 16 bits of memory.
other use cases seem to use correct u_int type for secsz.
PR: 204358
Submitted by: Toomas Soome <tsoome at me.com>
Reviewed by: asomers, delphij, smh
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D4811
Steven Hartland [Mon, 11 Jan 2016 10:24:30 +0000 (10:24 +0000)]
Close iSCSI sessions on shutdown
Ensure that all iSCSI sessions are correctly terminated during shutdown.
* Enhances the changes done by r286226 (D3052).
* Add shutdown post sync event to run after filesystem shutdown
(SHUTDOWN_PRI_FIRST) but before CAM shutdown (SHUTDOWN_PRI_DEFAULT).
* Changes iscsi_maintenance_thread to processes terminate in preference to
reconnect.
Bring RADIX_MPATH support to new routing KPI to ease migration.
Move actual rte selection process from rtalloc_mpath_fib()
to the rt_path_selectrte() function. Add public
rt_mpath_select() to use in fibX_lookup_ functions.
Sepherosa Ziehau [Mon, 11 Jan 2016 03:30:16 +0000 (03:30 +0000)]
hyperv/kvp_daemon: Make poll(2) block indefinitely
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: Dexuan Cui <decui microsoft com>, me, adrain
Approved by: adrian
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D4762
Andrew Turner [Sun, 10 Jan 2016 23:41:31 +0000 (23:41 +0000)]
Use -mlong-calls to build crt1.o and gcrt1.o. This tells the compiler to
generate code to branch based on an address in a register. This allows us
to have binaries larger than the 32MiB limit of a branch instruction.
The main use of this is with clang. Clang 3.8.0 has been shown to be larger
than the above limit.
Marius Strobl [Sun, 10 Jan 2016 18:11:23 +0000 (18:11 +0000)]
- Add support for Advantech PCI-1602 Rev. B1 and PCI-1603 cards. [1]
- Add a description of Advantech PCI-1602 Rev. A boards. [1]
- Properly set up REG_ACR also for PCI-1602 Rev. A based on what the
Advantech-supplied Linux driver does.
- Additionally use the macros of <dev/ic/ns16550.h> to replace existing
magic values and get rid of trivial comments.
- Fix the style of some comments.
PR: 205359 [1]
Submitted by: Jan Mikkelsen (original patch) [1]
Nathan Whitehorn [Sun, 10 Jan 2016 18:00:01 +0000 (18:00 +0000)]
Remove dead code and dead comments, most notably the implemenation of the
now-obsolete setfault(). No NetBSD code exists in the AIM locore files, so
update the copyrights there.
Nathan Whitehorn [Sun, 10 Jan 2016 16:42:14 +0000 (16:42 +0000)]
Use setjmp() instead of the identical-except-for-having-a-wrong-prototype
setfault() when testing for faults. This should also help the compiler
do the right thing with this complicated-to-optimize function.
Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().
in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
socket-less (ND code). The main reason for that change is inability to
specify non-default FIB for callers w/o socket since (internally) inpcb
is used to determine fib.
As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
pass IPv6 address w/ explicitly specified scope as separate argument.
Bjoern A. Zeeb [Sun, 10 Jan 2016 08:14:25 +0000 (08:14 +0000)]
Initialize error after r293626 in case neither INET nor INET6 is
compiled into the kernel. Ideally lots more code would just not
be called (or compiled in) in that case but that requires a lot
more surgery. For now try to make IP-less kernels compile again.
Enji Cooper [Sat, 9 Jan 2016 23:45:25 +0000 (23:45 +0000)]
- Delete non-TAP testcases
- Add a conf.sh file for executing common functions with geom_gate
- Use attach_md for attaching md(4) devices
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
- Add/increase sleeps to try and improve synchronization
- Add debug output for when checksums fail
Make graphical consoles work under PowerKVM. Without using hypercalls, it is
not possible to write the framebuffer before pmap is up. Solve this by
deferring initialization until that happens, like on PS3.
Finish r275196: do not dereference rtentry in if_output() routines.
The only piece of information that is required is rt_flags subset.
In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
to check if this particular mbuf needs to be dropped (and what
error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
regardless of RTF_HOST flag existence. This is due to upcoming routing
changes where RTF_HOST value won't be available as lookup result.
All other functions require RTF_GATEWAY flag to check if they need
to return EHOSTUNREACH instead of EHOSTDOWN error.
There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
error value. In fact, the only place where this result is propagated
is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).
Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
rte flags to ro_flags. Call this function in ip_output() after looking up/
verifying rte.
Kevin Lo [Sat, 9 Jan 2016 14:53:23 +0000 (14:53 +0000)]
- Add the definition of CHARCLASS_NAME_MAX, as per POSIX.1-2001.
- Avoid namespace pollution and move definitions of _POSIX2_CHARCLASS_NAME_MAX
and _POSIX2_COLL_WEIGHTS_MAX into the .2001 section.
With input from bde.
This check was added in initial? netinet6/ import
back in 1999 (r53541).
It effectively became unnecessary after 'address/prefix clean-ups'
KAME commit 90ff8792e676132096a440dd787f99a5a5860ee8 (github) in 2001
(merged to FreeBSD in r78064) where prefix check was added to
nd6_prefix_onlink(). Similar IPv4 check (in_addroute() was added
in r137628).
Additionally, the right plance for this (or similar) check is the prefix
addition code (nd6_prefix_onlink(), nd6_prefix_onlink_rtrequest(),
in_addprefix() or rtinit()), but not the generic radix insert routine.
Such handler should pass different set of variables, instead
of directly providing 2 locked route entries.
Given that it hasn't been really used since at least 2012, remove
current code.
Will re-add it after finishing most major routing-related changes.
Mark Johnston [Sat, 9 Jan 2016 01:56:46 +0000 (01:56 +0000)]
Prevent cv_waiters wraparound.
r282971 attempted to fix this problem by decrementing cv_waiters after
waking up from sleeping on a condition variable, but this can result in
a use-after-free if the CV is freed before all woken threads have had a
chance to run. Instead, avoid incrementing cv_waiters past INT_MAX, and
have cv_signal() explicitly check for sleeping threads once cv_waiters has
reached this bound.
Glen Barber [Sat, 9 Jan 2016 00:45:38 +0000 (00:45 +0000)]
Set FORCE_PKG_REGISTER=1 when installing packages to avoid failures
when re-using build chroot(8) environments.
This is based on the patch in the PR referenced below, but instead
of using 'reinstall' in two locations (one of which already uses
FORCE_PKG_REGISTER=1), changes the non-embedded behavior.
PR: 205998
Submitted by: ngie
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Ed Maste [Sat, 9 Jan 2016 00:42:07 +0000 (00:42 +0000)]
Support use of LLVM's libunwind for exception unwinding
It is built in libgcc_s.so and libgcc_eh.a to simplify transition.
It is enabled by default on arm64 (where we previously had no other
unwinder) and may be enabled for testing on other platforms by setting
WITH_LLVM_LIBUNWIND in src.conf(5).
Also add compiler-rt's __gcc_personality_v0 implementation for use with
the LLVM unwinder.
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D4787
Enji Cooper [Fri, 8 Jan 2016 21:47:41 +0000 (21:47 +0000)]
- Move functions that might be used in class-specific cleanup functions
(geom_test_cleanup, etc) down so the testcases don't emit noise when
bailing
- Conform to the TAP protocol better when dealing with classes that can't
be loaded and with temporary files that can't be allocated for tracking
md(4) devices.
Enji Cooper [Fri, 8 Jan 2016 21:38:26 +0000 (21:38 +0000)]
- Make test-1.sh into a TAP testable testcase
- Delete test-2.sh as it was an incomplete testcase, and the contents were
basically a subset of test-1.sh
- Add a conf.sh file for executing common functions with geom_uzip
- Use attach_md for attaching md(4) devices
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
Enji Cooper [Fri, 8 Jan 2016 21:28:09 +0000 (21:28 +0000)]
- Add a geom_stripe specific cleanup function and trap on that function at
exit so things are cleaned up properly
- Use attach_md for attaching md(4) devices
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
Enji Cooper [Fri, 8 Jan 2016 21:25:27 +0000 (21:25 +0000)]
- Add a geom_shsec specific cleanup function and trap on that function at
exit so things are cleaned up properly
- Use attach_md for attaching md(4) devices
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
Gleb Smirnoff [Fri, 8 Jan 2016 20:34:57 +0000 (20:34 +0000)]
New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.
The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.
Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.
The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.
Celebrates: Netflix global launch!
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
Relnotes: yes
Enji Cooper [Fri, 8 Jan 2016 19:47:49 +0000 (19:47 +0000)]
- Add a geom_raid3 specific cleanup function and trap on that function at
exit so things are cleaned up properly
- Use attach_md for attaching md(4) devices
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
Enji Cooper [Fri, 8 Jan 2016 19:43:18 +0000 (19:43 +0000)]
- Add a conf.sh file for executing common functions with gnop
- Use attach_md for attaching md(4) devices
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
Enji Cooper [Fri, 8 Jan 2016 19:38:59 +0000 (19:38 +0000)]
- Add a conf.sh file for executing common functions with geli
-- Use linear probing to find the first unique md(4) device, unlike the other
code which uses attach_md, as geli(8) allocates the md(4) devices itself
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
Ed Maste [Fri, 8 Jan 2016 19:12:26 +0000 (19:12 +0000)]
Reduce libstand Makefile duplication
Userboot's copy of the libstand Makefile had more extensive changes
compared to the one in sys/boot/libstand32, but it turns out these are
not intentional and we can just include lib/libstand/Makefile as done
for libstand32 in r293040.
Reviewed by: imp, jhb
Tested by: allanjude
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D4793
Enji Cooper [Fri, 8 Jan 2016 19:10:52 +0000 (19:10 +0000)]
- Use attach_md for memory disks so they can be tracked.
- Add a geom_concat specific cleanup function and trap on that function at
exit so things are cleaned up properly
- Don't hardcode /tmp for temporary files, which violates the kyua sandbox
Gleb Smirnoff [Fri, 8 Jan 2016 19:03:20 +0000 (19:03 +0000)]
Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like
sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM
sockets, due to sendfile(2) supporting only the latter, there is a corner
case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake
of control data, albeit being stream socket.
Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY,
similar to m_demote().
Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643.
Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better
locking for RTM_ADD / RTM_DELETE cases.