Robert Watson [Sat, 15 Mar 2014 00:57:50 +0000 (00:57 +0000)]
Several years after initial development, merge prototype support for
linking NIC Receive Side Scaling (RSS) to the network stack's
connection-group implementation. This prototype (and derived patches)
are in use at Juniper and several other FreeBSD-using companies, so
despite some reservations about its maturity, merge the patch to the
base tree so that it can be iteratively refined in collaboration rather
than maintained as a set of gradually diverging patch sets.
(1) Merge a software implementation of the Toeplitz hash specified in
RSS implemented by David Malone. This is used to allow suitable
pcbgroup placement of connections before the first packet is
received from the NIC. Software hashing is generally avoided,
however, due to high cost of the hash on general-purpose CPUs.
(2) In in_rss.c, maintain authoritative versions of RSS state intended
to be pushed to each NIC, including keying material, hash
algorithm/ configuration, and buckets. Provide software-facing
interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both
the RSS standardised Toeplitz and a 'naive' variation with a hash
efficient in software but with poor distribution properties.
Implement rss_m2cpuid()to be used by netisr and other load
balancing code to look up the CPU on which an mbuf should be
processed.
(3) In the Ethernet link layer, allow netisr distribution using RSS as
a source of policy as an alternative to source ordering; continue
to default to direct dispatch (i.e., don't try and requeue packets
for processing on the 'right' CPU if they arrive in a directly
dispatchable context).
(4) Allow RSS to control tuning of connection groups in order to align
groups with RSS buckets. If a packet arrives on a protocol using
connection groups, and contains a suitable hardware-generated
hash, use that hash value to select the connection group for pcb
lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz
hash is available, we fall back on regular PCB lookup risking
contention rather than pay the cost of Toeplitz in software --
this is a less scalable but, at my last measurement, faster
approach. As core counts go up, we may want to revise this
strategy despite CPU overhead.
Where device drivers suitably configure NICs, and connection groups /
RSS are enabled, this should avoid both lock and line contention during
connection lookup for TCP. This commit does not modify any device
drivers to tune device RSS configuration to the global RSS
configuration; patches are in circulation to do this for at least
Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device
drivers is not particularly robust, nor aware of more advanced features
such as runtime reconfiguration/rebalancing. This will hopefully prove
a useful starting point for refinement.
No MFC is scheduled as we will first want to nail down a more mature
and maintainable KPI/KBI for device drivers.
Neel Natu [Fri, 14 Mar 2014 22:07:08 +0000 (22:07 +0000)]
Don't dump entries that were modified during the time the KTR buffer was being
copied to userspace. Failing to do this would result in entries at the bottom
of the ktrdump output to be more recent than entries at the top.
With this change the timestamps are monotonically decreasing going from the
top to the bottom of the ktrdump output.
Neel Natu [Fri, 14 Mar 2014 21:35:16 +0000 (21:35 +0000)]
Fix an issue with ktrdump(8) where it would not print all entries in the
KTR buffer.
This happens when 'i' tries to wrap around from 0 to 'entries - 1'. Since 'i'
is a signed integer the modulo operation actually returns a negative number.
Fix this by computing the next index to use "by hand" instead of relying
on the modulo operator.
Warner Losh [Fri, 14 Mar 2014 20:20:32 +0000 (20:20 +0000)]
NanoBSD has a utility shell script called save_cfg which helps keep
/cfg updated with the modified configuration files in /etc. I have
written an improved version with the following features:
* Recurses directories.
* Only requires file arguments the first time the file/directory is
* added to /cfg.
* Handles file deletions.
PR: 145962, 157533
Submitted by: Aragon Gouveia and Alex Bakhtin
Julio Merino [Fri, 14 Mar 2014 12:55:06 +0000 (12:55 +0000)]
Remove unnecessary svn:executable property from source file.
The atf cp_test.sh sample file should have never been marked executable in
the first place because this file needs to be "built" first before being
usable.
Julio Merino [Fri, 14 Mar 2014 12:52:55 +0000 (12:52 +0000)]
Move FreeBSD Test Suite-specific code to a suite.test.mk file.
The new suite.test.mk file contains all the logic needed to install test
programs under /usr/tests/ and to support Kyua as the run-time engine.
This file is included by default by bsd.test.mk so Makefiles do not need
to care about its existence.
Specific Makefiles can define NOT_FOR_TEST_SUITE to indicate that whatever
test programs they are building are not supposed to be installed under
/usr/tests/ nor run by Kyua. (The effect of passing this setting is that
suite.test.mk is simply not included.)
NOT_FOR_TEST_SUITE should never be used by Makefiles in the base system.
This functionality is provided so that third-parties can hook in their
own test code, with different semantics, if they wish. This was asked
for by sjg@.
Julio Merino [Fri, 14 Mar 2014 08:56:19 +0000 (08:56 +0000)]
Make bsd.test.mk the only public mk fragment for the building of tests.
Change {atf,plain,tap}.test.mk to be internal implementation details of
bsd.test.mk. Makefiles that build tests should now only include bsd.test.mk
and declaratively specify what they want to build, without worrying about
the internal implementation of the mk files.
The reason for this change is to permit building test programs of different
interfaces from a single directory, which is something I had a need for
while porting tests over from src/tools/regression/.
Additionally, this change makes it possible to perform some other requested
changes to bsd.test.mk in an easier manner. Coming soon.
Kevin Lo [Fri, 14 Mar 2014 06:38:22 +0000 (06:38 +0000)]
Reset the bit of the R92C_MCUFWDL associated with checksum report
before loading firmware page. It may fix this problem:
"urtwn0: timeout waiting for checksum report"
Gleb Smirnoff [Fri, 14 Mar 2014 06:29:43 +0000 (06:29 +0000)]
Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.
Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
Devin Teske [Fri, 14 Mar 2014 03:42:05 +0000 (03:42 +0000)]
Rewrite usermgmt -- hooking it into the scripting system with dispatch
commands addUser, deleteUser, and editUser. Getting rid of the awkward-
to-use `userinput' bolt-on which Ron and I talked about rewriting.
Gleb Smirnoff [Fri, 14 Mar 2014 02:58:48 +0000 (02:58 +0000)]
Remove IPX support.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.
Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
Devin Teske [Fri, 14 Mar 2014 02:50:32 +0000 (02:50 +0000)]
Fix future namespace issues for functions taking $var_to_set -- functions
taking a variable to set need to make sure they protect their locals; if
$var_to_set positional argument coincides with a local the expected call
to `setvar' will fail to reach outside of the function's namespace. When
such collisions are experienced (as I did in the rewrite of usermgmt) the
solution is to append a full or abbreviated version of the function name
to the local (ultimately eliminating collisions). This is rarely needed
and only occurs when you have a lot of like-named functions that pass
very similar $var_to_set positional arguments to each other (such as-is
the case with an expansive library such as `dialog.subr').
Devin Teske [Fri, 14 Mar 2014 02:37:39 +0000 (02:37 +0000)]
Remove indexfile from debug statement as it is already logged by
f_index_menusel_command() used just-prior to this debug statement.
Also, log the arguments being passed to the resword.
Ian Lepore [Fri, 14 Mar 2014 00:49:02 +0000 (00:49 +0000)]
Fix an uninitialized variable error I perpetrated when splitting some
code into a separate function. Pass the missing value from main() to
the probe_disks() function.
Re-format the license to conform to our BSD license template as much
as possible. This does not change the wording in any way.
Remove the 3rd clause ("advertising clause") of the BSD license as
permitted by the University of Berkeley on July 22, 1999. While the
clause itself mentions Lawrence Berkeley Laboratory, UCB is the sole
copyright holder of this file.
Alan Somers [Thu, 13 Mar 2014 18:42:12 +0000 (18:42 +0000)]
Replace 4.4BSD Lite's unix domain socket backpressure hack with a cleaner
mechanism, based on the new SB_STOP sockbuf flag. The old hack dynamically
changed the sending sockbuf's high water mark whenever adding or removing
data from the receiving sockbuf. It worked for stream sockets, but it never
worked for SOCK_SEQPACKET sockets because of their atomic nature. If the
sockbuf was partially full, it might return EMSGSIZE instead of blocking.
The new solution is based on DragonFlyBSD's fix from commit 3a6117bbe0ed6a87605c1e43e12a1438d8844380 on 2008-05-27. It adds an SB_STOP
flag to sockbufs. Whenever uipc_send surpasses the socket's size limit, it
sets SB_STOP on the sending sockbuf. sbspace() will then return 0 for that
sockbuf, causing sosend_generic and friends to block. uipc_rcvd will
likewise clear SB_STOP. There are two fringe benefits: uipc_{send,rcvd} no
longer need to call chgsbsize() on every send and receive because they don't
change the sockbuf's high water mark. Also, uipc_sense no longer needs to
acquire the UIPC linkage lock, because it's simpler to compute the
st_blksizes.
There is one drawback: since sbspace() will only ever return 0 or the
maximum, sosend_generic will allow the sockbuf to exceed its nominal maximum
size by at most one packet of size less than the max. I don't think that's
a serious problem. In fact, I'm not even positive that FreeBSD guarantees a
socket will always stay within its nominal size limit.
sys/sys/sockbuf.h
Add the SB_STOP flag and adjust sbspace()
sys/sys/unpcb.h
Delete the obsolete unp_cc and unp_mbcnt fields from struct unpcb.
sys/kern/uipc_usrreq.c
Adjust uipc_rcvd, uipc_send, and uipc_sense to use the SB_STOP
backpressure mechanism. Removing obsolete unpcb fields from
db_show_unpcb.
tests/sys/kern/unix_seqpacket_test.c
Clear expected failures from ATF.
Devin Teske [Thu, 13 Mar 2014 18:16:42 +0000 (18:16 +0000)]
Fix pw(8) deletion of group "username" on userdel even if group "username"
is not associated with user "username". E.g., user "foo" has primary group
"wheel" and is unassociated with group "foo", yet userdel would delete the
group "foo" when deleting user "foo" (despite the fact that user "foo" is
not associated with group "foo" in any way).
Patch committed with minor style(9) changes.
PR: bin/169471
Submitted by: Alexander Pyhalov <apyhalov@gmail.com>
Ryan Stone [Thu, 13 Mar 2014 15:57:25 +0000 (15:57 +0000)]
Add MSI support to puc(9)
Add support for MSI interrupts in the puc(9) driver. By default the driver
will prefer MSI interrupts to legacy interrupts. A tunable,
hw.puc.msi_disable, has been added to force the allocation of legacy
interrupts.
Gleb Smirnoff [Thu, 13 Mar 2014 03:42:24 +0000 (03:42 +0000)]
Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Michael Tuexen [Wed, 12 Mar 2014 17:18:15 +0000 (17:18 +0000)]
Put the offset of the CRC32C in csum_data instead of 0.
The virtio driver needs the offset to be stored in csum_data,
like in the case for UDP and TCP.
The virtio problem was reported by
Niu Zhixiong <kaiaixi@gmail.com>, who helped in debugging
and testing the patch.
Gleb Smirnoff [Wed, 12 Mar 2014 14:29:08 +0000 (14:29 +0000)]
Since both netinet/ and netinet6/ call into netipsec/ and netpfil/,
the protocol specific mbuf flags are shared between them.
- Move all M_FOO definitions into a single place: netinet/in6.h, to
avoid future clashes.
- Resolve clash between M_DECRYPTED and M_SKIP_FIREWALL which resulted
in a failure of operation of IPSEC and packet filters.
Thanks to Nicolas and Georgios for all the hard work on bisecting,
testing and finally finding the root of the problem.
PR: kern/186755
PR: kern/185876
In collaboration with: Georgios Amanakis <gamanakis gmail.com>
In collaboration with: Nicolas DEFFAYET <nicolas-ml deffayet.com>
Sponsored by: Nginx, Inc.
Julio Merino [Wed, 12 Mar 2014 12:27:13 +0000 (12:27 +0000)]
Make ether_line really report an error when all input is invalid.
The previous code failed to return an error condition when the whole input
was invalid due to improper handling of the sscanf return value. Actually,
this failure was properly being caught by a test in
tools/regression/lib/libc/net/test-ether.t but was not noticed because
these tests are never run. (On my way to fixing that ;-)
The fix applied here resembles the implementation of ether_line in NetBSD
modulo the setting of an errno value (which is not documented as an
expectation in the manpage anyway).
Julio Merino [Wed, 12 Mar 2014 10:59:51 +0000 (10:59 +0000)]
Remove broken tests for eui64_line.
This function is not public and brooks (initial committer adding the code)
suggests the deletion of the tests (which I don't know if they work)
instead of changing the visibility of the function.
Julio Merino [Wed, 12 Mar 2014 10:45:22 +0000 (10:45 +0000)]
Make the strerror tests work without libtap.
Just replace the simple calls to the library with ad-hoc code. We should
later rewrite these with the ATF libraries anyway, which are part of the
base system.
Julio Merino [Wed, 12 Mar 2014 10:42:58 +0000 (10:42 +0000)]
Turn a test precondition into a skip in the mdconfig tests.
Tests that cannot be run because a precondition is not met should be
marked as skipped, not failed. Do this for the tests in mdconfig that
first check if the caller user is root.
Julio Merino [Wed, 12 Mar 2014 10:41:14 +0000 (10:41 +0000)]
Fix sa tests.
Small divergences in the output padding made some sa tests fail. Just
trim all whitespace from the outputs and the golden files so comparisons
are less fragile and the tests pass again.
Julio Merino [Wed, 12 Mar 2014 10:38:32 +0000 (10:38 +0000)]
Only run the make tests when make is fmake.
Because bmake is the default make being built, many of the tests here
fail due to differences between the two. Just skip the tests for now
when using fmake.
Use correct types for sizeof() in the calculations for the malloc(9) sizes [1].
While there, remove unneeded checks for failed allocations with M_WAITOK flag.
Ian Lepore [Tue, 11 Mar 2014 22:41:34 +0000 (22:41 +0000)]
Remove the unreferenced DATA() macro. That leaves only GET_CURTHREAD_PTR()
which was added by cognet in 2012, so remove the no-longer-applicable
license stuff that referred to all the old contents, and put in a
standard 2-clause BSD license (to cover the 6 lines of useful code left
in here).
Ian Lepore [Tue, 11 Mar 2014 22:02:49 +0000 (22:02 +0000)]
Enhance the mechanism that lets you configure the ubldr boot device by
setting the u-boot environment variable loaderdev=. It used to accept only
'disk' or 'net'. Now it allows specification of unit, slice, and partition
as well. In addition to the generic 'disk' it also accepts specific
storage device types such as 'mmc' or 'sata'.
If there isn't a loaderdev env var, the historical behavior is maintained.
It will use the first storage device it finds, or a network device if
no working storage device exists.
99% of the work on this was done by Patrick Kelsey, but I made some
changes, so if anything goes wrong, blame me.
Dimitry Andric [Tue, 11 Mar 2014 21:43:10 +0000 (21:43 +0000)]
Garbage collect the old way of adding the libstdc++ include directories
in clang's InitHeaderSearch.cpp. This has been superseded by David
Chisnall's commit in r255321.
Moreover, if libc++ is used, the libstdc++ include directories should
not be in the search path at all. These directories are now only used
if you pass -stdlib=libstdc++.