Improve carp(4) locking:
- Use the carp_sx to serialize not only CARP ioctls, but also carp_attach()
and carp_detach().
- Use cif_mtx to lock only access to those the linked list.
- These locking changes allow us to do some memory allocations with M_WAITOK
and also properly call callout_drain() in carp_destroy().
- In carp_attach() assert that ifaddr isn't attached. We always come here
with a pristine address from in[6]_control().
hiren [Tue, 21 Apr 2015 20:24:15 +0000 (20:24 +0000)]
For igb(4), when we are doing multiqueue, we are all setup to have full 32bit
RSS hash from the card. We do not need to hide that under "ifdef RSS" and should
expose that by default so others like lagg(4) can use that and avoid hashing the
traffic by themselves.
While here, improve comments and get rid of hidden/unimplemented RSS support
code for UDP.
Modify kern___getcwd() to take max pathlen limit as an additional
argument. This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.
Differential Revision: https://reviews.freebsd.org/D2335
Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Fix numerous issues in iic(4) and iicbus(4):
--Allow multiple open iic fds by storing addressing state in cdevpriv
--Fix, as much as possible, the baked-in race conditions in the iic
ioctl interface by requesting bus ownership on I2CSTART, releasing it on
I2CSTOP/I2CRSTCARD, and requiring bus ownership by the current cdevpriv
to use the I/O ioctls
--Reduce internal iic buffer size and remove 1K read/write limit by
iteratively calling iicbus_read/iicbus_write
--Eliminate dynamic allocation in I2CWRITE/I2CREAD
--Move handling of I2CRDWR to separate function and improve error handling
--Add new I2CSADDR ioctl to store address in current cdevpriv so that
I2CSTART is not needed for read(2)/write(2) to work
--Redesign iicbus_request_bus() and iicbus_release_bus():
--iicbus_request_bus() no longer falls through if the bus is already
owned by the requesting device. Multiple threads on the same device may
want exclusive access. Also, iicbus_release_bus() was never
device-recursive anyway.
--Previously, if IICBUS_CALLBACK failed in iicbus_release_bus(), but
the following iicbus_poll() call succeeded, IICBUS_CALLBACK would not be
issued again
--Do not hold iicbus mtx during IICBUS_CALLBACK call. There are
several drivers that may sleep in IICBUS_CALLBACK, if IIC_WAIT is passed.
--Do not loop in iicbus_request_bus if IICBUS_CALLBACK returns
EWOULDBLOCK; instead pass that to the caller so that it can retry if so
desired.
Rewrite physio() to not allocate pbufs for unmapped I/O.
pbufs is a limited resource, and their allocator is not SMP-scalable.
So instead of always allocating pbuf to immediately convert it to bio,
allocate bio just here. If buffer needs kernel mapping, then pbuf is
still allocated, but used only as a source of KVA and storage for a list
of held pages.
On 40-core system doing many 512-byte reads from user level to array of
raw SSDs this change removes huge lock congestion inside pbuf allocator.
It improves peak performance from ~300K to ~1.2M IOPS. On my previous
24-core system this problem also existed, but was less serious.
The comment on BMCR data in if_media entry is wrong. The ifm_data stores
the index array, not a value for BMCR register. In case of IFM_10_T there
could be either MII_MEDIA_10_T or MII_MEDIA_10_T_FDX, which are 1 and 2,
accordingly. Neither matches a valid BMCR value. My guessing is that this
write is harmless, since later mii_phy_setmedia() would write a proper
value there.
The code is here since the initial checkin. Note that case IFM_100_TX has
the same comment, but a proper value of BMCR_ISO is written. So, collapse
two cases into one, always writing there BMCR_ISO.
Since brgphy doesn't call mii_phy_setmedia(), there is no reason to
set any value to ifm_data. If brgphy ever to call mii_phy_setmedia(),
then the value of BRGPHY_S1000 | BRGPHY_BMCR_FDX will trigger KASSERT.
While here, remove the obfuscating macro and wrap long lines.
- For executables search for matching (B) global uninitialized BSS symbols from
linked libraries. Only do this for BSS symbols that have a size which avoids
__bss_start. Without this some libraries would be considered unneeded even
though they were providing a B symbol.
- Add in the symbols from crt1.o to cover a handful of common unresolved symbols.
- Consider (C) common data symbols as provided by libraries/crt1.
- Move libkey() function to more appropriate place.
Merge the following from ^/projects/release-arm64 to allow
building FreeBSD/arm64 VM images and memstick.img installation
medium:
r281786, r281788, r281792:
r281786:
Add support for building arm64/aarch64 virtual machine images.
r281788:
Copy amd64/make-memstick.sh to arm64/make-memstick.sh for
aarch64 memory stick images.
Although arm64 does not yet have USB support, the memstick
image should be bootable with certain virtualization tools,
such as qemu.
r281792:
Add a buildenv_setup() prototype, intended to be overridden as
needed.
For example, the arm64/aarch64 build needs devel/aarch64-binutils,
so buildenv_setup() in the release.conf for this architecture
handles the installation of the port before buildworld/buildkernel.
A couple of fields are still exposed via struct bpf_if_ext so that
bpf_peers_present() can be inlined into its callers. However, this change
eliminates some type duplication in the resulting CTF container, since
otherwise ctfmerge(1) propagates the duplication through all types that
contain a struct bpf_if.
- Speedup significantly by not using subshells for data already fetched.
Ran against /usr/local/sbin/pkg:
Before: 25.12 real 12.41 user 33.14 sys
After: 0.53 real 0.49 user 0.13 sys
- Exit with 1 if any missing or unresolved symbol is detected.
- Add option '-U' to skip looking up unresolved symbols.
- Don't consider provided weak objects as unresolved (nm V).
phabricator related changes:
- don't lint either contrib or crypto: these are both externally written
directories
- add additional linters for spelling (check common typos like teh ->
the)
- chmod linter checks for executible bit on bad files
- merge-conflict checks for merge conflict tokens then may have been
resolved incorrectly
- filename checks for back characters in filenames
- json for json syntax correctness
- remove history.immutable: it is meaningless on subversion, and causes
workflow problems when trying to use git. It it set to 'true' by
default with hg
Always send log(9) messages to the message buffer.
It is truer to the semantics of logging for messages to *always*
go to the message buffer, where they can eventually be collected
and, in fact, be put into a log file.
This restores the behavior prior to r70239, which seems to have
changed it inadvertently.
Submitted by: Eric Badger <eric@badgerio.us>
Reviewed by: jhb
Approved by: kib (mentor)
Obtained from: Dell Inc.
MFC after: 1 week
When building VM disk images, vm_copy_base() uses tar(1) to
copy the userland from one md(4)-mounted filesystem to a clean
filesystem to prevent remnants of files that were added and
removed from resulting in an unclean filesystem. When newfs(8)
creates the first filesystem with journaled soft-updates enabled,
the /.sujournal file in the new filesystem cannot be overwritten
by the /.sujournal in the original filesystem.
To avoid this particular error case, do not enable journaled
soft-updates when creating the md(4)-backed filesystems, and
instead use tunefs(8) to enable journaled soft-updates after
the new filesystem is populated in vm_copy_base().
While here, fix a long standing bug where the build environment
/boot files were used by mkimg(1) when creating the VM disk
images by using the files in .OBJDIR.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
vidcontrol: skip invalid video modes returned by vt(4)
vt(4) has a stub CONS_MODEINFO ioctl that does not provide any data
but returns success. This needs to be fixed in the kernel, but address
it in vidcontrol(1) as well in case it's run on an older kernel.
Reviewed by: bde
Sponsored by: The FreeBSD Foundation
Activate write-only optimization if bpf device opened with O_WRONLY.
dhclient opens bpf as write-only to send packets. It never reads received
packets from that descriptor, but processing them in kernel takes time.
Especially much time takes packet timestamping on systems with expensive
timecounter, such as bhyve guest, where network speed dropped in half.
Remove code to support the top of the stack layout for FreeBSD 1.x/2.x
kernel, but keep explanation of the old ps_strings structure to make
it clear what sanity check tries to accomplish.
Noted by: Oliver Pinter <oliver.pinter@hardenedbsd.org>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
ed(1): Fix [-Werror=logical-not-parentheses]
/usr/src/bin/ed/glbl.c:64:36: error: logical not is only applied to
theleft hand side of comparison [-Werror=logical-not-parentheses]
marius [Sun, 19 Apr 2015 20:15:57 +0000 (20:15 +0000)]
Refine the workaround for Intel HSD131 [1] added in r269052:
- Use the full mask described by the erratum as with a sufficiently high
number of these false-positives, the overflow bit (bit 62) additionally
gets set [7].
- HSD131 has been brought into several other Haswell-derived CPUs including
to the next generation, i. e. Intel Broadwell. Thus, also skip reporting of
these benign errors by default on CPU models affected by HSM142, HSW131 and
BDM48 [2 - 5], describing the HSD131 silicon bug for additional models.
Also, Celeron 2955U with a CPU ID of 0x45 have been reported to be covered
by this fault [6], with the specification update concerned with HSM142 [2]
only referring to 0x3c and 0x46.
Submitted by: David Froehlich [7]
MFC after: 3 days
adrian [Sun, 19 Apr 2015 17:15:55 +0000 (17:15 +0000)]
Refactor out the _PXM -> VM domain lookup done in ACPI, in preparation for
its use in upcoming code.
This is inspired by something in jhb's NUMA IRQ allocation patchset.
However, the tricky bit here is that the PXM lookup for a node may
fail, requiring a lookup on the parent node. So if it doesn't
exist, don't fail - just go up to the parent. Only error out of the
lookup is the ACPI lookup returns an error.
adrian [Sun, 19 Apr 2015 17:07:51 +0000 (17:07 +0000)]
Update pkt-gen to optionally use randomised source/destination
IPv4 addresses/ports.
When doing traffic testing of actual code that /does/ things to the
packet (rather than say, 'bridge.c'), it's typically a good idea to
use a variety of cache-busting and flow-tracking-busting packet
spreads. The pkt-gen method of testing an IP range was to walk
it linearly - which is fine, but not useful enough.
This can be used to completely randomize the source/destination
addresses (eg to test out flow-tracking-busting) and to keep the
destination fixed whilst randomising the source (eg to test out
what a DDoS may look like.)
identd: remove redundant zeroing
se_rpc_lowvers was set to 0 twice, so remove one of them
I can not find any other variable which they may have been a typo of.
README: changes and fixups
Two orthogonal goals:
- try to make README look a little nicer on phabricator by using
Remarkup syntax for commands (using `` instead of using a closing ')
- try to make README look a little nicer on github.
- Don't encourage `make world` when the handbook specifies otherwise
- Change language around documentation to be a bit clearer
sh: Fix the trap builtin to be POSIX-compliant for 'trap exit SIG' and 'trap n n...'.
The parser considered 'trap exit INT' to reset the default for both EXIT and
INT. This beahvior is not POSIX compliant. This was avoided if a value was
specified for 'exit', but then disallows exiting with the signal received. A
possible workaround is using ' exit'.
However POSIX does allow this type of behavior if the parameters are all
integers. Fix the handling for this and clarify its support in the manpage
since it is specifically allowed by POSIX.
The lseek(2), mmap(2), truncate(2), ftruncate(2), pread(2), and
pwrite(2) syscalls are wrapped to provide compatibility with pre-7.x
kernels which required padding before the off_t parameter. The
fcntl(2) contains compatibility code to handle kernels before the
struct flock was changed during the 8.x CURRENT development. The
shims were reasonable to allow easier revert to the older kernel at
that time.
Now, two or three major releases later, shims do not serve any
purpose. Such old kernels cannot handle current libc, so revert the
compatibility code.
Make padded syscalls support conditional under the COMPAT6 config
option. For COMPAT32, the syscalls were under COMPAT6 already.
Remove WITHOUT_SYSCALL_COMPAT build option, which only purpose was to
(partially) disable the removed shims.
Reviewed by: jhb, imp (previous versions)
Discussed with: peter
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This supports e500v1, e500v2, and e500mc. Tested only on e500v2, but the
performance counters are identical across all, with e500mc having some
additional events.
Make wait6(2), waitid(3) and ppoll(2) cancellation points. The
waitid() function is required to be cancellable by the standard. The
wait6() and ppoll() follow the other syscalls in their groups.
Reviewed by: jhb, jilles (previous versions)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Add manual pages for the io, ip, proc, sched, tcp and udp DTrace providers.
The format of these pages is somewhat experimental, so they may be subject
to further tweaking.
They were added for compatibility with the sched provider in Solaris and
illumos, but our sched provider is already incompatible since it uses native
types, so there isn't much point in keeping them around.
SDT(9): add a section on SDT providers, mentioning the "sdt" provider.
Add examples demonstrating how one can list available providers and the
DTrace probes provided by a provider.