Limit the number of cylinder groups that will be searched when
trying to build a cluster. The limit is tunable using the sysctl
vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups
per block allocation. It was previously limited to the number of
cylinder groups in the filesystem per block allocation. When there
were no clusters of the needed size left, it repeatedly searched
the whole filesystem for a non-existent cluster on every block
allocation. The result was very slow filesystem allocation with
100% CPU utilization. The old behavior can be had by setting
vfs.ffs.maxclustersearch to a huge number (1,000,000).
This change affects only the layout policy routines so is not able
to interfere with the integrity of the filesystem.
Use correct length for sparse uiomove(). It must be the clipped to
the page size, len is the total transfer length, which may be larger
than zero_region.
Reported and tested by: clusteradm (gjb)
Sponsored by: The FreeBSD Foundation
X-MFC-With: r281442
gcc 4.9 added support for new alignment attribute alloc_align:
The alloc_align attribute is used to tell the compiler that the function
return value points to memory, where the returned pointer minimum
alignment is given by one of the functions parameters. GCC uses this
information to improve pointer alignment analysis.
vt_is_cursor_in_area needs to return true if any part of the mouse
cursor is visible in the rectangle area. Replace the existing test with
a simpler version of a test for overlapping rectangles.
Differential Revision: https://reviews.freebsd.org/D2356
Reviewed by: ray
Sponsored by: The FreeBSD Foundation
Revert r281451. It causes a panic/hang early in boot for a number of
users, myself included. The original code is likely papering over a
larger bug that needs to be explored, but for now get things back to
a working state.
Obtained from: Netflix, Inc.
MFC after: immediately
Watchdog drivers need to support rearming the watchdog in contexts which
are not permitted to sleep. Only use the IPMI watchdog with backends
which poll driver-initiated requests to meet this requirement.
In practice this means that watchdogs will no longer be used on systems
that use the SSIF backend.
Rename the kld for oce(4) to if_oce.ko. ifconfig(8) has special knowledge
about kld filenames for network drivers that requires them to follow the
pattern of if_<foo>. This also fixes the existing documentation in the
manpage which says to use if_oce_load=YES in loader.conf.
Update this driver to not save copies of registers that are no longer used
after r281874. While here, also update it to always write the parent's
PCI bus number to the primary bus register.
Small changes to locale-related man pages.
Fix a missing .h and change the recommended include for the POSIX2008 functions from xlocale.h to locale.h. Including xlocale.h is for legacy / Darwin compatibility so should not be encouraged.
It would previously call into some unfinished Solaris compatibility code and
return without actually calling panic(9). The compatibility code is
unneeded, however, so just remove it and have dtrace_panic() call vpanic(9)
directly.
Don't propagate SIOCSIFCAPS from a vlan(4) to its parent. This leads to
quite unexpected result of toggling capabilities on the neighbour vlan(4)
interfaces.
Create the arm64/aarch64 VM disk image as MBR instead of
GPT scheme. UEFI needs to know the unique partition GUID
with GPT, which changes each time. Specifically, the QEMU
EFI BIOS file has this hard-coded.[1]
Since the GPT labels are now unavailable, unconditionally
label the root filesystem as 'rootfs' with newfs(8), since
it does not hurt anything anywhere else. For the arm64 case,
'/' is mounted from /dev/ufs/rootfs; for all other VM images,
'/' is mounted from /dev/gpt/rootfs.
Unfortunately, since the /dev/gpt/swapfs label is also lost,
set NOSWAP=1 for the arm64/aarch64 images. This is temporary,
until I figure out a scalable solution to this. But, a certain
piece of softare was written "very fast", and ended up living
for 15 years. We can deal with this for a week or so.
Information from: andrew, emaste [1]
Sponsored by: The FreeBSD Foundation
Update the pci_cfg_save/restore routines to operate on bridge devices
(type 1 and type 2) as well as leaf devices (type 0). In particular,
this allows the existing PCI bus logic to save and restore capability
registers such as MSI and PCI-express work for bridge devices rather than
requiring that code to be duplicated in bridge drivers. It also means
that bridge drivers no longer need to save and restore basic registers
such as the PCI command register or BARs nor manage powerstates for the
bridge device.
While here, pci_setup_secbus() has been changed to initialize the 'sec'
and 'sub' fields in the 'secbus' structure instead of requiring the pcib
and pccbb drivers to do this in the NEW_PCIB + PCI_RES_BUS case.
Don't explicitly manage power states for PCI-PCI bridge devices in the
driver's suspend and resume routines. These have been redundant no-ops
since r214065 changed the PCI bus driver to manage power states for
all devices (including type 1/2 bridge devices) during suspend and resume.
The minimim grant and maximum latency PCI config registers are only valid
for type 0 devices, not type 1 or 2 bridges. Don't read them for bridge
devices during bus scans and return an error when attempting to read them
as ivars for bridge devices.
Add definition for the argument_with_type_tag attribute.
This attribute originates in clang and brings support for checking types
of variadic functions' arguments for functions like fcntl() and ioctl().
Unfortunately lint(1) will complain about them: in particular as one of
the parameters is the function being tagged. For now define this attribute
in the lint-sensitive section.
Make AIO to not allocate pbufs for unmapped I/O like r281825.
While there, make few more performance optimizations.
On 40-core system doing many 512-byte AIO reads from array of raw SSDs
this change removes lock congestions inside pbuf allocator and devfs,
and bottleneck on single AIO completion taskqueue thread. It improves
peak AIO performance from ~600K to ~1.3M IOPS.
It is not network-specific code and would
be better as part of libkern instead.
Move zlib.h and zutil.h from net/ to sys/
Update includes to use sys/zlib.h and sys/zutil.h instead of net/
Submitted by: Steve Kiernan stevek@juniper.net
Obtained from: Juniper Networks, Inc.
GitHub Pull Request: https://github.com/freebsd/freebsd/pull/28
Relnotes: yes
Move some common code from sys/amd64/amd64/machdep.c and
sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c. Most
of the code is related to the idle handling.
Discussed with: pluknet
Sponsored by: The FreeBSD Foundation
* Add VCREAT flag to indicate when a new file is being created
* Add VVERIFY to indicate verification is required
* Both VCREAT and VVERIFY are only passed on the MAC method vnode_check_open
and are removed from the accmode after
* Add O_VERIFY flag to rtld open of objects
* Add 'v' flag to __sflags to set O_VERIFY flag.
Submitted by: Steve Kiernan <stevek@juniper.net>
Obtained from: Juniper Networks, Inc.
GitHub Pull Request: https://github.com/freebsd/freebsd/pull/27
Relnotes: yes
Improve carp(4) locking:
- Use the carp_sx to serialize not only CARP ioctls, but also carp_attach()
and carp_detach().
- Use cif_mtx to lock only access to those the linked list.
- These locking changes allow us to do some memory allocations with M_WAITOK
and also properly call callout_drain() in carp_destroy().
- In carp_attach() assert that ifaddr isn't attached. We always come here
with a pristine address from in[6]_control().
hiren [Tue, 21 Apr 2015 20:24:15 +0000 (20:24 +0000)]
For igb(4), when we are doing multiqueue, we are all setup to have full 32bit
RSS hash from the card. We do not need to hide that under "ifdef RSS" and should
expose that by default so others like lagg(4) can use that and avoid hashing the
traffic by themselves.
While here, improve comments and get rid of hidden/unimplemented RSS support
code for UDP.
Modify kern___getcwd() to take max pathlen limit as an additional
argument. This will be used for the Linux emulation layer - for Linux,
PATH_MAX is 4096 and not 1024.
Differential Revision: https://reviews.freebsd.org/D2335
Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Fix numerous issues in iic(4) and iicbus(4):
--Allow multiple open iic fds by storing addressing state in cdevpriv
--Fix, as much as possible, the baked-in race conditions in the iic
ioctl interface by requesting bus ownership on I2CSTART, releasing it on
I2CSTOP/I2CRSTCARD, and requiring bus ownership by the current cdevpriv
to use the I/O ioctls
--Reduce internal iic buffer size and remove 1K read/write limit by
iteratively calling iicbus_read/iicbus_write
--Eliminate dynamic allocation in I2CWRITE/I2CREAD
--Move handling of I2CRDWR to separate function and improve error handling
--Add new I2CSADDR ioctl to store address in current cdevpriv so that
I2CSTART is not needed for read(2)/write(2) to work
--Redesign iicbus_request_bus() and iicbus_release_bus():
--iicbus_request_bus() no longer falls through if the bus is already
owned by the requesting device. Multiple threads on the same device may
want exclusive access. Also, iicbus_release_bus() was never
device-recursive anyway.
--Previously, if IICBUS_CALLBACK failed in iicbus_release_bus(), but
the following iicbus_poll() call succeeded, IICBUS_CALLBACK would not be
issued again
--Do not hold iicbus mtx during IICBUS_CALLBACK call. There are
several drivers that may sleep in IICBUS_CALLBACK, if IIC_WAIT is passed.
--Do not loop in iicbus_request_bus if IICBUS_CALLBACK returns
EWOULDBLOCK; instead pass that to the caller so that it can retry if so
desired.
Rewrite physio() to not allocate pbufs for unmapped I/O.
pbufs is a limited resource, and their allocator is not SMP-scalable.
So instead of always allocating pbuf to immediately convert it to bio,
allocate bio just here. If buffer needs kernel mapping, then pbuf is
still allocated, but used only as a source of KVA and storage for a list
of held pages.
On 40-core system doing many 512-byte reads from user level to array of
raw SSDs this change removes huge lock congestion inside pbuf allocator.
It improves peak performance from ~300K to ~1.2M IOPS. On my previous
24-core system this problem also existed, but was less serious.
The comment on BMCR data in if_media entry is wrong. The ifm_data stores
the index array, not a value for BMCR register. In case of IFM_10_T there
could be either MII_MEDIA_10_T or MII_MEDIA_10_T_FDX, which are 1 and 2,
accordingly. Neither matches a valid BMCR value. My guessing is that this
write is harmless, since later mii_phy_setmedia() would write a proper
value there.
The code is here since the initial checkin. Note that case IFM_100_TX has
the same comment, but a proper value of BMCR_ISO is written. So, collapse
two cases into one, always writing there BMCR_ISO.
Since brgphy doesn't call mii_phy_setmedia(), there is no reason to
set any value to ifm_data. If brgphy ever to call mii_phy_setmedia(),
then the value of BRGPHY_S1000 | BRGPHY_BMCR_FDX will trigger KASSERT.
While here, remove the obfuscating macro and wrap long lines.
- For executables search for matching (B) global uninitialized BSS symbols from
linked libraries. Only do this for BSS symbols that have a size which avoids
__bss_start. Without this some libraries would be considered unneeded even
though they were providing a B symbol.
- Add in the symbols from crt1.o to cover a handful of common unresolved symbols.
- Consider (C) common data symbols as provided by libraries/crt1.
- Move libkey() function to more appropriate place.
Merge the following from ^/projects/release-arm64 to allow
building FreeBSD/arm64 VM images and memstick.img installation
medium:
r281786, r281788, r281792:
r281786:
Add support for building arm64/aarch64 virtual machine images.
r281788:
Copy amd64/make-memstick.sh to arm64/make-memstick.sh for
aarch64 memory stick images.
Although arm64 does not yet have USB support, the memstick
image should be bootable with certain virtualization tools,
such as qemu.
r281792:
Add a buildenv_setup() prototype, intended to be overridden as
needed.
For example, the arm64/aarch64 build needs devel/aarch64-binutils,
so buildenv_setup() in the release.conf for this architecture
handles the installation of the port before buildworld/buildkernel.
A couple of fields are still exposed via struct bpf_if_ext so that
bpf_peers_present() can be inlined into its callers. However, this change
eliminates some type duplication in the resulting CTF container, since
otherwise ctfmerge(1) propagates the duplication through all types that
contain a struct bpf_if.