adrian [Sun, 19 Apr 2015 17:15:55 +0000 (17:15 +0000)]
Refactor out the _PXM -> VM domain lookup done in ACPI, in preparation for
its use in upcoming code.
This is inspired by something in jhb's NUMA IRQ allocation patchset.
However, the tricky bit here is that the PXM lookup for a node may
fail, requiring a lookup on the parent node. So if it doesn't
exist, don't fail - just go up to the parent. Only error out of the
lookup is the ACPI lookup returns an error.
adrian [Sun, 19 Apr 2015 17:07:51 +0000 (17:07 +0000)]
Update pkt-gen to optionally use randomised source/destination
IPv4 addresses/ports.
When doing traffic testing of actual code that /does/ things to the
packet (rather than say, 'bridge.c'), it's typically a good idea to
use a variety of cache-busting and flow-tracking-busting packet
spreads. The pkt-gen method of testing an IP range was to walk
it linearly - which is fine, but not useful enough.
This can be used to completely randomize the source/destination
addresses (eg to test out flow-tracking-busting) and to keep the
destination fixed whilst randomising the source (eg to test out
what a DDoS may look like.)
identd: remove redundant zeroing
se_rpc_lowvers was set to 0 twice, so remove one of them
I can not find any other variable which they may have been a typo of.
README: changes and fixups
Two orthogonal goals:
- try to make README look a little nicer on phabricator by using
Remarkup syntax for commands (using `` instead of using a closing ')
- try to make README look a little nicer on github.
- Don't encourage `make world` when the handbook specifies otherwise
- Change language around documentation to be a bit clearer
sh: Fix the trap builtin to be POSIX-compliant for 'trap exit SIG' and 'trap n n...'.
The parser considered 'trap exit INT' to reset the default for both EXIT and
INT. This beahvior is not POSIX compliant. This was avoided if a value was
specified for 'exit', but then disallows exiting with the signal received. A
possible workaround is using ' exit'.
However POSIX does allow this type of behavior if the parameters are all
integers. Fix the handling for this and clarify its support in the manpage
since it is specifically allowed by POSIX.
The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter. The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development. The
shims were reasonable to allow easier revert to the older kernel at
that time.
Now, two or three major releases later, shims do not serve any
purpose. Such old kernels cannot handle current libc, so revert the
compatibility code.
Make padded syscalls support conditional under the COMPAT6 config
option. For COMPAT32, the syscalls were under COMPAT6 already.
Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.
Reviewed by: jhb, imp (previous versions)
Discussed with: peter
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This supports e500v1, e500v2, and e500mc. Tested only on e500v2, but the
performance counters are identical across all, with e500mc having some
additional events.
Make wait6(2), waitid(3) and ppoll(2) cancellation points. The
waitid() function is required to be cancellable by the standard. The
wait6() and ppoll() follow the other syscalls in their groups.
Reviewed by: jhb, jilles (previous versions)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Add manual pages for the io, ip, proc, sched, tcp and udp DTrace providers.
The format of these pages is somewhat experimental, so they may be subject
to further tweaking.
They were added for compatibility with the sched provider in Solaris and
illumos, but our sched provider is already incompatible since it uses native
types, so there isn't much point in keeping them around.
SDT(9): add a section on SDT providers, mentioning the "sdt" provider.
Add examples demonstrating how one can list available providers and the
DTrace probes provided by a provider.
Workaround bhyve virtual disks operation on top of GEOM providers.
GEOM does not support scatter/gather lists in its I/Os. Such requests
are cut in pieces by physio(), that may be problematic, if those pieces
are not multiple of provider's sector size. If such case is detected,
move the data through temporary sequential buffer.
Initialize td_sel in the thread_init(). Struct thread is not zeroed
on the initial allocation, but seltdinit() assumes that td_sel is NULL
or a valid pointer. Note that thread_fini()/seltdfini() also relies
on this, but correctly resets td_sel to NULL.
Change ipsec_address() and ipsec_logsastr() functions to take two
additional arguments - buffer and size of this buffer.
ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.
ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.
To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).
And now ipsec_address() uses inet_ntop() function from libkern.
Requeue mbuf via netisr when we use IPSec tunnel mode and IPv6.
ipsec6_common_input_cb() uses partial copy of ip6_input() to parse
headers. But this isn't correct, when we use tunnel mode IPSec.
When we stripped outer IPv6 header from the decrypted packet, it
can become IPv4 packet and should be handled by ip_input. Also when
we use tunnel mode IPSec with IPv6 traffic, we should pass decrypted
packet with inner IPv6 header to ip6_input, it will correctly handle
it and also can decide to forward it.
The "skip" variable points to offset where payload starts. In tunnel
mode we reset it to zero after stripping the outer header. So, when
it is zero, we should requeue mbuf via netisr.
Fix handling of scoped IPv6 addresses in IPSec code.
* in ipsec_encap() embed scope zone ids into link-local addresses
in the new IPv6 header, this helps ip6_output() disambiguate the
scope;
* teach key_ismyaddr6() use in6_localip(). in6_localip() is less
strict than key_sockaddrcmp(). It doesn't compare all fileds of
struct sockaddr_in6, but it is faster and it should be safe,
because all SA's data was checked for correctness. Also, since
IPv6 link-local addresses in the &V_in6_ifaddrhead are stored in
kernel-internal form, we need to embed scope zone id from SA into
the address before calling in6_localip.
* in ipsec_common_input() take scope zone id embedded in the address
and use it to initialize sin6_scope_id, then use this sockaddr
structure to lookup SA, because we keep addresses in the SADB without
embedded scope zone id.
The only thing is used from this code is ipip_output() function, that does
IPIP encapsulation. Other parts of XF_IP4 code were removed in r275133.
Also it isn't possible to configure the use of XF_IP4, nor from userland
via setkey(8), nor from the kernel.
Simplify the ipip_output() function and rename it to ipsec_encap().
* move IP_DF handling from ipsec4_process_packet() into ipsec_encap();
* since ipsec_encap() called from ipsec[64]_process_packet(), it
is safe to assume that mbuf is contiguous at least to IP header
for used IP version. Remove all unneeded m_pullup(), m_copydata
and related checks.
* use V_ip_defttl and V_ip6_defhlim for outer headers;
* use V_ip4_ipsec_ecn and V_ip6_ipsec_ecn for outer headers;
* move all diagnostic messages to the ipsec_encap() callers;
* simplify handling of ipsec_encap() results: if it returns non zero
value, print diagnostic message and free mbuf.
* some style(9) fixes.
More accurately collect name-cache statistics in sysctl functions
sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash().
These changes are in preparation for allowing changes in the size
of the vnode hash tables driven by increases and decreases in the
maximum number of vnodes in the system.
Add the necessary support to use both TX queues available on if_emac.
Each TX queue can hold one packet (yes, if_emac can send only two(!)
packets at a time).
Even with this change the very limited FIFO buffer (3 KiB for TX and 13 KiB
for RX) fill up too quick to sustain higher throughput.
For the TCP case it turns out that TX isn't the limiting factor, but the RX
side is (the FIFO fill up and starts to discard packets, so the sender has
to slow down).
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:
- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.
Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.
Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.
RELEASEDIR was removed in FreeBSD 9.x, at the same time /boot/loader
stopped using kgzip in the release process. We no longer need to build
kgzip as a cross tool, and tests for RELEASEDIR are obsolete, so
remove both.
andrew [Fri, 17 Apr 2015 09:14:58 +0000 (09:14 +0000)]
Use cp15_ifar_get to get the instruction fault address. When using Thumb-2
the instruction may be over two pages so the program counter could point
to the wrong page.
Buffers which can be memory mapped into userspace should never be
freed. Recycle the buffers instead. This patch also fixes a panic at
reboot issue when an UDL adapter is attached to the system.
Do not strip the ethernet CRC until we read all data from FIFO, otherwise
the CRC bytes would be left in FIFO causing the failure of next packet
(wrong packet header).
When this error happens the receiver has to be disabled and the RX FIFO
flushed, discarding valid packets.
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
arm64 relies on an external binutils port or package right now, because
the in-tree linker from binutils 2.17.50 does not support arm64. Add
arm64 to universe if the linker is available. If not output a message
that arm64 is skipped.
buildworld and buildkernel use the external binutils automatically, so
it's sufficient to run 'pkg install aarch64-binutils' to build
FreeBSD/arm64.
Differential Revision: https://reviews.freebsd.org/D2302
Reviewed by: andrew, imp
Sponsored by: The FreeBSD Foundation
mav@ has found that NFS servers exporting ZFS file systems
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.
$M should be the kernel machine src directory, ${MACHINE}. In most cases
${MACHINE} and ${MACHINE_CPUARCH} are the same, but this is not true for
pc98 and arm64.
It appears we previously set M=${MACHINE_CPUARCH} as a workaround to
accommodate pc98, where MACHINE_CPUARCH is pc98 but it uses
sys/i386/i386/genassym.c.
arm64 relies on this being set correctly, so update $M and add explicit
workarounds for pc98.
Differential Revision: https://reviews.freebsd.org/D2307
Reviewed by: andrew, imp
Sponsored by: The FreeBSD Foundation
the libxo output for uptime returned multiple 'uptime' keys, one each for number of days, hours, and minutes of uptime.
This is invalid JSON.
This patch makes the output the raw number of seconds, as well as adding keys for the individual unit values
A string of the original output from the plain-text uptime command is also added
Defeat race with MK_KERBEROS == yes introduced with bootstrap-tools
parallelization work done in r279197
- kerberos5/lib/libroken requires kerberos5/tools/make-roken to build
- kerberos5/tools/asn1_compile, kerberos5/tools/slc, and usr.bin/compile_et
require kerberos5/lib/libroken and kerberos5/lib/libvers
This race is incredibly evident when cross-building sparc64 on
ref10-amd64.freebsd.org
Fix SIGINFO race causing final results to be lost to stderr.
If a SIGINFO comes in after the file is read then the 'siginfo' flag is set to
1 and the next call to show_cnt() (at exit) would print the data to stderr
rather than the expected stdout.
This was found with spamming Poudriere with SIGINFO which caused a 'wc -l'
execution to return no data rather than an expected number.
A new loader.conf(5) option of geom_eli_passphrase_prompt="YES" will now
allow you to enter your geli(8) root-mount credentials prior to invoking
the kernel.
See check-password.4th(8) for details.
Differential Revision: https://reviews.freebsd.org/D2105
Reviewed by: imp, kmoore
Discussed on: -current
MFC after: 3 days
X-MFC-to: stable/10
Relnotes: yes
People are still getting burned by the byacc upgraded, switch to
always doing byacc until someone figures out the more nuanced version
to switch off of.
Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.
Remove THRMISC_VERSION. The thrmisc structure doesn't include a version
number, so this wasn't used (and can't easily be added). If at some point
we want to extend thrmisc, we will probably need to just add a new note
type and ensure that the new type includes a version number.
Fix an old and well-documented use-after-free race condition in
TCP timers:
- Add a reference from tcpcb to its inpcb
- Defer tcpcb deletion until TCP timers have finished
Differential Revision: https://reviews.freebsd.org/D2079
Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com>
Reviewed by: imp, rrs, adrian, jhb, bz
Approved by: jhb
Sponsored by: Verisign, Inc.
vidcontrol: make size argument optional again for syscons
r273544 changed the -f option allow no arguments in vt mode (used to
reset the font back to the default), but broke the optionality of the
size argument for syscons. Drop the required argument from syscons'
optstring for -f so the optional argument handler works the same way
for both syscons and vt.
Reported by: bde
Sponsored by: The FreeBSD Foundation