Rewrite physio() to not allocate pbufs for unmapped I/O.
pbufs is a limited resource, and their allocator is not SMP-scalable.
So instead of always allocating pbuf to immediately convert it to bio,
allocate bio just here. If buffer needs kernel mapping, then pbuf is
still allocated, but used only as a source of KVA and storage for a list
of held pages.
On 40-core system doing many 512-byte reads from user level to array of
raw SSDs this change removes huge lock congestion inside pbuf allocator.
It improves peak performance from ~300K to ~1.2M IOPS. On my previous
24-core system this problem also existed, but was less serious.
The comment on BMCR data in if_media entry is wrong. The ifm_data stores
the index array, not a value for BMCR register. In case of IFM_10_T there
could be either MII_MEDIA_10_T or MII_MEDIA_10_T_FDX, which are 1 and 2,
accordingly. Neither matches a valid BMCR value. My guessing is that this
write is harmless, since later mii_phy_setmedia() would write a proper
value there.
The code is here since the initial checkin. Note that case IFM_100_TX has
the same comment, but a proper value of BMCR_ISO is written. So, collapse
two cases into one, always writing there BMCR_ISO.
Since brgphy doesn't call mii_phy_setmedia(), there is no reason to
set any value to ifm_data. If brgphy ever to call mii_phy_setmedia(),
then the value of BRGPHY_S1000 | BRGPHY_BMCR_FDX will trigger KASSERT.
While here, remove the obfuscating macro and wrap long lines.
- For executables search for matching (B) global uninitialized BSS symbols from
linked libraries. Only do this for BSS symbols that have a size which avoids
__bss_start. Without this some libraries would be considered unneeded even
though they were providing a B symbol.
- Add in the symbols from crt1.o to cover a handful of common unresolved symbols.
- Consider (C) common data symbols as provided by libraries/crt1.
- Move libkey() function to more appropriate place.
Merge the following from ^/projects/release-arm64 to allow
building FreeBSD/arm64 VM images and memstick.img installation
medium:
r281786, r281788, r281792:
r281786:
Add support for building arm64/aarch64 virtual machine images.
r281788:
Copy amd64/make-memstick.sh to arm64/make-memstick.sh for
aarch64 memory stick images.
Although arm64 does not yet have USB support, the memstick
image should be bootable with certain virtualization tools,
such as qemu.
r281792:
Add a buildenv_setup() prototype, intended to be overridden as
needed.
For example, the arm64/aarch64 build needs devel/aarch64-binutils,
so buildenv_setup() in the release.conf for this architecture
handles the installation of the port before buildworld/buildkernel.
A couple of fields are still exposed via struct bpf_if_ext so that
bpf_peers_present() can be inlined into its callers. However, this change
eliminates some type duplication in the resulting CTF container, since
otherwise ctfmerge(1) propagates the duplication through all types that
contain a struct bpf_if.
- Speedup significantly by not using subshells for data already fetched.
Ran against /usr/local/sbin/pkg:
Before: 25.12 real 12.41 user 33.14 sys
After: 0.53 real 0.49 user 0.13 sys
- Exit with 1 if any missing or unresolved symbol is detected.
- Add option '-U' to skip looking up unresolved symbols.
- Don't consider provided weak objects as unresolved (nm V).
phabricator related changes:
- don't lint either contrib or crypto: these are both externally written
directories
- add additional linters for spelling (check common typos like teh ->
the)
- chmod linter checks for executible bit on bad files
- merge-conflict checks for merge conflict tokens then may have been
resolved incorrectly
- filename checks for back characters in filenames
- json for json syntax correctness
- remove history.immutable: it is meaningless on subversion, and causes
workflow problems when trying to use git. It it set to 'true' by
default with hg
Always send log(9) messages to the message buffer.
It is truer to the semantics of logging for messages to *always*
go to the message buffer, where they can eventually be collected
and, in fact, be put into a log file.
This restores the behavior prior to r70239, which seems to have
changed it inadvertently.
Submitted by: Eric Badger <eric@badgerio.us>
Reviewed by: jhb
Approved by: kib (mentor)
Obtained from: Dell Inc.
MFC after: 1 week
When building VM disk images, vm_copy_base() uses tar(1) to
copy the userland from one md(4)-mounted filesystem to a clean
filesystem to prevent remnants of files that were added and
removed from resulting in an unclean filesystem. When newfs(8)
creates the first filesystem with journaled soft-updates enabled,
the /.sujournal file in the new filesystem cannot be overwritten
by the /.sujournal in the original filesystem.
To avoid this particular error case, do not enable journaled
soft-updates when creating the md(4)-backed filesystems, and
instead use tunefs(8) to enable journaled soft-updates after
the new filesystem is populated in vm_copy_base().
While here, fix a long standing bug where the build environment
/boot files were used by mkimg(1) when creating the VM disk
images by using the files in .OBJDIR.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
vidcontrol: skip invalid video modes returned by vt(4)
vt(4) has a stub CONS_MODEINFO ioctl that does not provide any data
but returns success. This needs to be fixed in the kernel, but address
it in vidcontrol(1) as well in case it's run on an older kernel.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
Activate write-only optimization if bpf device opened with O_WRONLY.
dhclient opens bpf as write-only to send packets. It never reads received
packets from that descriptor, but processing them in kernel takes time.
Especially much time takes packet timestamping on systems with expensive
timecounter, such as bhyve guest, where network speed dropped in half.
Remove code to support the top of the stack layout for FreeBSD 1.x/2.x
kernel, but keep explanation of the old ps_strings structure to make
it clear what sanity check tries to accomplish.
Noted by: Oliver Pinter <oliver.pinter@hardenedbsd.org>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
ed(1): Fix [-Werror=logical-not-parentheses]
/usr/src/bin/ed/glbl.c:64:36: error: logical not is only applied to
theleft hand side of comparison [-Werror=logical-not-parentheses]
marius [Sun, 19 Apr 2015 20:15:57 +0000 (20:15 +0000)]
Refine the workaround for Intel HSD131 [1] added in r269052:
- Use the full mask described by the erratum as with a sufficiently high
number of these false-positives, the overflow bit (bit 62) additionally
gets set [7].
- HSD131 has been brought into several other Haswell-derived CPUs including
to the next generation, i. e. Intel Broadwell. Thus, also skip reporting of
these benign errors by default on CPU models affected by HSM142, HSW131 and
BDM48 [2 - 5], describing the HSD131 silicon bug for additional models.
Also, Celeron 2955U with a CPU ID of 0x45 have been reported to be covered
by this fault [6], with the specification update concerned with HSM142 [2]
only referring to 0x3c and 0x46.
Submitted by: David Froehlich [7]
MFC after: 3 days
adrian [Sun, 19 Apr 2015 17:15:55 +0000 (17:15 +0000)]
Refactor out the _PXM -> VM domain lookup done in ACPI, in preparation for
its use in upcoming code.
This is inspired by something in jhb's NUMA IRQ allocation patchset.
However, the tricky bit here is that the PXM lookup for a node may
fail, requiring a lookup on the parent node. So if it doesn't
exist, don't fail - just go up to the parent. Only error out of the
lookup is the ACPI lookup returns an error.
adrian [Sun, 19 Apr 2015 17:07:51 +0000 (17:07 +0000)]
Update pkt-gen to optionally use randomised source/destination
IPv4 addresses/ports.
When doing traffic testing of actual code that /does/ things to the
packet (rather than say, 'bridge.c'), it's typically a good idea to
use a variety of cache-busting and flow-tracking-busting packet
spreads. The pkt-gen method of testing an IP range was to walk
it linearly - which is fine, but not useful enough.
This can be used to completely randomize the source/destination
addresses (eg to test out flow-tracking-busting) and to keep the
destination fixed whilst randomising the source (eg to test out
what a DDoS may look like.)
identd: remove redundant zeroing
se_rpc_lowvers was set to 0 twice, so remove one of them
I can not find any other variable which they may have been a typo of.
README: changes and fixups
Two orthogonal goals:
- try to make README look a little nicer on phabricator by using
Remarkup syntax for commands (using `` instead of using a closing ')
- try to make README look a little nicer on github.
- Don't encourage `make world` when the handbook specifies otherwise
- Change language around documentation to be a bit clearer
sh: Fix the trap builtin to be POSIX-compliant for 'trap exit SIG' and 'trap n n...'.
The parser considered 'trap exit INT' to reset the default for both EXIT and
INT. This beahvior is not POSIX compliant. This was avoided if a value was
specified for 'exit', but then disallows exiting with the signal received. A
possible workaround is using ' exit'.
However POSIX does allow this type of behavior if the parameters are all
integers. Fix the handling for this and clarify its support in the manpage
since it is specifically allowed by POSIX.
The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter. The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development. The
shims were reasonable to allow easier revert to the older kernel at
that time.
Now, two or three major releases later, shims do not serve any
purpose. Such old kernels cannot handle current libc, so revert the
compatibility code.
Make padded syscalls support conditional under the COMPAT6 config
option. For COMPAT32, the syscalls were under COMPAT6 already.
Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.
Reviewed by: jhb, imp (previous versions)
Discussed with: peter
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This supports e500v1, e500v2, and e500mc. Tested only on e500v2, but the
performance counters are identical across all, with e500mc having some
additional events.
Make wait6(2), waitid(3) and ppoll(2) cancellation points. The
waitid() function is required to be cancellable by the standard. The
wait6() and ppoll() follow the other syscalls in their groups.
Reviewed by: jhb, jilles (previous versions)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Add manual pages for the io, ip, proc, sched, tcp and udp DTrace providers.
The format of these pages is somewhat experimental, so they may be subject
to further tweaking.
They were added for compatibility with the sched provider in Solaris and
illumos, but our sched provider is already incompatible since it uses native
types, so there isn't much point in keeping them around.
SDT(9): add a section on SDT providers, mentioning the "sdt" provider.
Add examples demonstrating how one can list available providers and the
DTrace probes provided by a provider.
Workaround bhyve virtual disks operation on top of GEOM providers.
GEOM does not support scatter/gather lists in its I/Os. Such requests
are cut in pieces by physio(), that may be problematic, if those pieces
are not multiple of provider's sector size. If such case is detected,
move the data through temporary sequential buffer.
Initialize td_sel in the thread_init(). Struct thread is not zeroed
on the initial allocation, but seltdinit() assumes that td_sel is NULL
or a valid pointer. Note that thread_fini()/seltdfini() also relies
on this, but correctly resets td_sel to NULL.
Change ipsec_address() and ipsec_logsastr() functions to take two
additional arguments - buffer and size of this buffer.
ipsec_address() is used to convert sockaddr structure to presentation
format. The IPv6 part of this function returns pointer to the on-stack
buffer and at the moment when it will be used by caller, it becames
invalid. IPv4 version uses 4 static buffers and returns pointer to
new buffer each time when it called. But anyway it is still possible
to get corrupted data when several threads will use this function.
ipsec_logsastr() is used to format string about SA entry. It also
uses static buffer and has the same problem with concurrent threads.
To fix these problems add the buffer pointer and size of this
buffer to arguments. Now each caller will pass buffer and its size
to these functions. Also convert all places where these functions
are used (except disabled code).
And now ipsec_address() uses inet_ntop() function from libkern.
Requeue mbuf via netisr when we use IPSec tunnel mode and IPv6.
ipsec6_common_input_cb() uses partial copy of ip6_input() to parse
headers. But this isn't correct, when we use tunnel mode IPSec.
When we stripped outer IPv6 header from the decrypted packet, it
can become IPv4 packet and should be handled by ip_input. Also when
we use tunnel mode IPSec with IPv6 traffic, we should pass decrypted
packet with inner IPv6 header to ip6_input, it will correctly handle
it and also can decide to forward it.
The "skip" variable points to offset where payload starts. In tunnel
mode we reset it to zero after stripping the outer header. So, when
it is zero, we should requeue mbuf via netisr.