Workaround bhyve virtual disks operation on top of GEOM providers.
GEOM does not support scatter/gather lists in its I/Os. Such requests
are cut in pieces by physio(), that may be problematic, if those pieces
are not multiple of provider's sector size. If such case is detected,
move the data through temporary sequential buffer.
Initialize td_sel in the thread_init(). Struct thread is not zeroed
on the initial allocation, but seltdinit() assumes that td_sel is NULL
or a valid pointer. Note that thread_fini()/seltdfini() also relies
on this, but correctly resets td_sel to NULL.
Change ipsec_address() and ipsec_logsastr() functions to take two
additional arguments - buffer and size of this buffer.
ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.
ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.
To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).
And now ipsec_address() uses inet_ntop() function from libkern.
Requeue mbuf via netisr when we use IPSec tunnel mode and IPv6.
ipsec6_common_input_cb() uses partial copy of ip6_input() to parse
headers. But this isn't correct, when we use tunnel mode IPSec.
When we stripped outer IPv6 header from the decrypted packet, it
can become IPv4 packet and should be handled by ip_input. Also when
we use tunnel mode IPSec with IPv6 traffic, we should pass decrypted
packet with inner IPv6 header to ip6_input, it will correctly handle
it and also can decide to forward it.
The "skip" variable points to offset where payload starts. In tunnel
mode we reset it to zero after stripping the outer header. So, when
it is zero, we should requeue mbuf via netisr.
Fix handling of scoped IPv6 addresses in IPSec code.
* in ipsec_encap() embed scope zone ids into link-local addresses
in the new IPv6 header, this helps ip6_output() disambiguate the
scope;
* teach key_ismyaddr6() use in6_localip(). in6_localip() is less
strict than key_sockaddrcmp(). It doesn't compare all fileds of
struct sockaddr_in6, but it is faster and it should be safe,
because all SA's data was checked for correctness. Also, since
IPv6 link-local addresses in the &V_in6_ifaddrhead are stored in
kernel-internal form, we need to embed scope zone id from SA into
the address before calling in6_localip.
* in ipsec_common_input() take scope zone id embedded in the address
and use it to initialize sin6_scope_id, then use this sockaddr
structure to lookup SA, because we keep addresses in the SADB without
embedded scope zone id.
The only thing is used from this code is ipip_output() function, that does
IPIP encapsulation. Other parts of XF_IP4 code were removed in r275133.
Also it isn't possible to configure the use of XF_IP4, nor from userland
via setkey(8), nor from the kernel.
Simplify the ipip_output() function and rename it to ipsec_encap().
* move IP_DF handling from ipsec4_process_packet() into ipsec_encap();
* since ipsec_encap() called from ipsec[64]_process_packet(), it
is safe to assume that mbuf is contiguous at least to IP header
for used IP version. Remove all unneeded m_pullup(), m_copydata
and related checks.
* use V_ip_defttl and V_ip6_defhlim for outer headers;
* use V_ip4_ipsec_ecn and V_ip6_ipsec_ecn for outer headers;
* move all diagnostic messages to the ipsec_encap() callers;
* simplify handling of ipsec_encap() results: if it returns non zero
value, print diagnostic message and free mbuf.
* some style(9) fixes.
More accurately collect name-cache statistics in sysctl functions
sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash().
These changes are in preparation for allowing changes in the size
of the vnode hash tables driven by increases and decreases in the
maximum number of vnodes in the system.
Add the necessary support to use both TX queues available on if_emac.
Each TX queue can hold one packet (yes, if_emac can send only two(!)
packets at a time).
Even with this change the very limited FIFO buffer (3 KiB for TX and 13 KiB
for RX) fill up too quick to sustain higher throughput.
For the TCP case it turns out that TX isn't the limiting factor, but the RX
side is (the FIFO fill up and starts to discard packets, so the sender has
to slow down).
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:
- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.
Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.
Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.
RELEASEDIR was removed in FreeBSD 9.x, at the same time /boot/loader
stopped using kgzip in the release process. We no longer need to build
kgzip as a cross tool, and tests for RELEASEDIR are obsolete, so
remove both.
andrew [Fri, 17 Apr 2015 09:14:58 +0000 (09:14 +0000)]
Use cp15_ifar_get to get the instruction fault address. When using Thumb-2
the instruction may be over two pages so the program counter could point
to the wrong page.
Buffers which can be memory mapped into userspace should never be
freed. Recycle the buffers instead. This patch also fixes a panic at
reboot issue when an UDL adapter is attached to the system.
Do not strip the ethernet CRC until we read all data from FIFO, otherwise
the CRC bytes would be left in FIFO causing the failure of next packet
(wrong packet header).
When this error happens the receiver has to be disabled and the RX FIFO
flushed, discarding valid packets.
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
arm64 relies on an external binutils port or package right now, because
the in-tree linker from binutils 2.17.50 does not support arm64. Add
arm64 to universe if the linker is available. If not output a message
that arm64 is skipped.
buildworld and buildkernel use the external binutils automatically, so
it's sufficient to run 'pkg install aarch64-binutils' to build
FreeBSD/arm64.
Differential Revision: https://reviews.freebsd.org/D2302
Reviewed by: andrew, imp
Sponsored by: The FreeBSD Foundation
mav@ has found that NFS servers exporting ZFS file systems
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.
$M should be the kernel machine src directory, ${MACHINE}. In most cases
${MACHINE} and ${MACHINE_CPUARCH} are the same, but this is not true for
pc98 and arm64.
It appears we previously set M=${MACHINE_CPUARCH} as a workaround to
accommodate pc98, where MACHINE_CPUARCH is pc98 but it uses
sys/i386/i386/genassym.c.
arm64 relies on this being set correctly, so update $M and add explicit
workarounds for pc98.
Differential Revision: https://reviews.freebsd.org/D2307
Reviewed by: andrew, imp
Sponsored by: The FreeBSD Foundation
the libxo output for uptime returned multiple 'uptime' keys, one each for number of days, hours, and minutes of uptime.
This is invalid JSON.
This patch makes the output the raw number of seconds, as well as adding keys for the individual unit values
A string of the original output from the plain-text uptime command is also added
Defeat race with MK_KERBEROS == yes introduced with bootstrap-tools
parallelization work done in r279197
- kerberos5/lib/libroken requires kerberos5/tools/make-roken to build
- kerberos5/tools/asn1_compile, kerberos5/tools/slc, and usr.bin/compile_et
require kerberos5/lib/libroken and kerberos5/lib/libvers
This race is incredibly evident when cross-building sparc64 on
ref10-amd64.freebsd.org
Fix SIGINFO race causing final results to be lost to stderr.
If a SIGINFO comes in after the file is read then the 'siginfo' flag is set to
1 and the next call to show_cnt() (at exit) would print the data to stderr
rather than the expected stdout.
This was found with spamming Poudriere with SIGINFO which caused a 'wc -l'
execution to return no data rather than an expected number.
A new loader.conf(5) option of geom_eli_passphrase_prompt="YES" will now
allow you to enter your geli(8) root-mount credentials prior to invoking
the kernel.
See check-password.4th(8) for details.
Differential Revision: https://reviews.freebsd.org/D2105
Reviewed by: imp, kmoore
Discussed on: -current
MFC after: 3 days
X-MFC-to: stable/10
Relnotes: yes
People are still getting burned by the byacc upgraded, switch to
always doing byacc until someone figures out the more nuanced version
to switch off of.
Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.
Remove THRMISC_VERSION. The thrmisc structure doesn't include a version
number, so this wasn't used (and can't easily be added). If at some point
we want to extend thrmisc, we will probably need to just add a new note
type and ensure that the new type includes a version number.
Fix an old and well-documented use-after-free race condition in
TCP timers:
- Add a reference from tcpcb to its inpcb
- Defer tcpcb deletion until TCP timers have finished
Differential Revision: https://reviews.freebsd.org/D2079
Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com>
Reviewed by: imp, rrs, adrian, jhb, bz
Approved by: jhb
Sponsored by: Verisign, Inc.
vidcontrol: make size argument optional again for syscons
r273544 changed the -f option allow no arguments in vt mode (used to
reset the font back to the default), but broke the optionality of the
size argument for syscons. Drop the required argument from syscons'
optstring for -f so the optional argument handler works the same way
for both syscons and vt.
Reported by: bde
Sponsored by: The FreeBSD Foundation
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.
Device probe value of BUS_PROBE_NOWILDCARD should be treated specially only
if the device has a fixed devclass. Otherwise it should be interpreted just
as if the driver doesn't want to claim the device.
Prior to this change a device that was not claimed explicitly by its driver
would remain "attached" to the driver that returned BUS_PROBE_NOWILDCARD.
This would bump up the reference on 'driver->refs' and its 'dev->ops' would
point to the 'driver->ops'. When the driver is subsequently unloaded the
'dev->ops->cls' is left pointing to freed memory.
This fixes an easily reproducible #GP fault caused by loading and unloading
vmm.ko multiple times.
andrew [Wed, 15 Apr 2015 14:30:07 +0000 (14:30 +0000)]
Enter a critical section when storing the vfp registers, we don't want to
be preempted here as this will enter back into this function, but the
hardware could be in an inconsistant state, and the vfp unit will be off
when switced back to this function.
andrew [Wed, 15 Apr 2015 14:18:25 +0000 (14:18 +0000)]
Ensure the userland thread and floating-point state has been saved before
copying the pcb. These values may have been changed just before the call
to fork and without a call to cpu_switch, where they would have been saved.
Implement support -z global linker option. It marks the shared object
as always participating in the global symbols namespace, regardless of
the way the object was brought into the process address space.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Implement support for binary to requesting specific stack size for the
initial thread. It is read by the ELF image activator as the virtual
size of the PT_GNU_STACK program header entry, and can be specified by
the linker option -z stack-size in newer binutils.
The soft RLIMIT_STACK is auto-increased if possible, to satisfy the
binary' request.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
snd_hda: add support for the Lenovo X1 20BS model.
This requires a patch to redirect the output to a separate DAC when
the headphones are used. While there, add device strings for Intel
Broadwell HDA controllers and Realtek ALC292 codecs.
When reading in the original file name from gzip header, we read
in PATH_MAX + 1 bytes from the file. In r281500, strrchr() is
used to strip possible path portion of the file name to mitigate
a possible attack. Unfortunately, strrchr() expects a buffer
that is NUL-terminated, and since we are processing potentially
untrusted data, we can not assert that be always true.
Solve this by reading in one less byte (now PATH_MAX) and
explicitly terminate the buffer after the read size with NUL.
If the direction is not PF_OUT we can never be forwarding. Some input packets
have rcvif != ifp (looped back packets), which lead us to ip6_forward() inbound
packets, causing panics.
Equally, we need to ensure that packets were really received and not locally
generated before trying to ip6_forward() them.
Initialize async_arg_ptr in xpt_async when called with async_code
AC_ADVINFO_CHANGED.
Without this change, newly inserted hard disks won't always have their
physical path device nodes created. The problem reproduces most readily
when attaching a large number of disks at once.
I can find no reason to allow packets with both SYN and FIN bits
set past this point in the code. The packet should be dropped and
not massaged as it is here.
andrew [Tue, 14 Apr 2015 10:40:37 +0000 (10:40 +0000)]
Use MACHINE in the efi loader when it is what we mean, it may not be the
same as MACHINE_CPUARCH, it just happened to be the case the architectures
this code currently supports.