jhb [Fri, 14 Oct 2016 21:51:50 +0000 (21:51 +0000)]
Reprogram I/O APIC interrupt pins when registering an I/O APIC.
All I/O APIC pins are masked when an I/O APIC is first probed. The
APIC enumerator (MP Table or MADT) then parses its associated tables to
configure individual pins to set custom delivery modes or alternate
routing (e.g. routing IRQ 0 to intpin 2). Pins for regular interrupt
pins are left masked until the first interrupt is assigned. However,
pins with unusual settings (e.g. NMI or SMI) are never assigned an
interrupt and thus never re-programmed. The I/O APIC code used to
reprogram all interrupt pins during registration but this was lost in
r151979.
In theory, this is mostly a no-op as the ACPI APIC table does not
include a way to enumerate NMI or SMI pins for the I/O APIC, so only
systems using an MP Table would be affected.
jhb [Fri, 14 Oct 2016 20:01:07 +0000 (20:01 +0000)]
Drop support for using mmap() with /dev/kmem.
Using the device pager with /dev/kmem is not stable since KVA mappings
are transient, but the device pager caches the PA associated with a
given offset forever. Interestingly, mips' implementation of
memmap() already refused requests for /dev/kmem.
Note that kvm_read/kvm_write do not use mmap, but use read and write on
/dev/kmem, so this should not affect libkvm users.
ambrisko [Fri, 14 Oct 2016 17:10:53 +0000 (17:10 +0000)]
In UEFI mode expose the SMBIOS anchor base address via kenv so the kernel
etc. can find out where the SMBIOS entry point is located. In pure
UEFI mode the BIOS is not mapped into the standard address space so the
SMBIOS table might not appear between 0xf0000 and 0xfffff. The
UEFI environment can report this the location of the anchor. If it is
reported then expose it as hint.smbios.0.mem. This can then be used
by other tools. However, we should make smbios(4) useful and have it
take this value and provide accesor function so ipmi(4) etc. don't
have to parse and figure things about the SMBIOS table. I have some
simple patches to smbios(4) to expose this address as sysctl and
for ipmi(4) to get the base address. However, the real fix is to
have ipmi(4) ask smbios(4) for what it wants and have smbios(4)
parse it out and return it. This would make smbios(4) useful and reduce
duplicated code. If this address doesn't point to the anchor then
finding SMBIOS info. will fail as if this didn't exist. So there should
be no harm.
With this change and the following hack, dmidecode works on a bunch of
UEFI machines that I tested:
if kenv hint.smbios.0.mem > /dev/null
then
mkdir -p /sys/firmware/efi
mount -t tmpfs -o size=8k tmpfs /sys/firmware/efi
echo "SMBIOS=`kenv hint.smbios.0.mem`" > /sys/firmware/efi/systab
fi
Linux exposes this information via the /sys/firmware/efi/systab file which
dmidecode looks at. We should update dmidecode to do this the FreeBSD
way when we determine what that is!
imp [Fri, 14 Oct 2016 16:23:12 +0000 (16:23 +0000)]
Create a new linker set, Xficl_compile_set which contains a list of
functions to call at the appropriate time to register new forth
words. In the past we've done this with ifdef soup, but now if the
file is included in the build, we'll get the new forth words.
Use this new functionality to move the pci bios stuff out of loader.c
by moving it to biospci.c.
Move the pnp functionality to common/pnp.c.
Move the inb/outb forth words to the i386 sysdep.c file where their
implementation is defined.
Adjust the efi linker scripts and build machinery to cope.
his should be an invisible change to forth scripts and user
experience.
imp [Fri, 14 Oct 2016 16:23:05 +0000 (16:23 +0000)]
Remove fetching of pInterp. Currently, there's no actual effect other
than to store the location of a forth word that is subsequently never
used. It was last used before the 2.03 ficl upgrade in r51786. It was
only used from r43614 (so Feb-Sept 1999) on head and in the 3.x branch
(merged r43715 3.1 -> EOL). Remove it since nobody cared enough to
report the bug in the last 18 years rather than fix it. It's need
seems to have passed in the 2.03 ficl update.
imp [Fri, 14 Oct 2016 16:05:44 +0000 (16:05 +0000)]
The file /boot/boot.conf existed for the 3.0 release (r38764). It was
replaced by /boot/loader.rc for 3.1 (r42682). In May 2000, this was
documented as deprecated (r61942) (between FreeBSD 4.0 and
4.1). Remove it since it's not been the preferred method in 17 years
and has been deprecated for 16.
andrew [Fri, 14 Oct 2016 15:53:48 +0000 (15:53 +0000)]
Rework how we store the VFP registers in the pcb. This will be used when
creating a floating-point context within the kernel without having to move
the stored values in memory.
jtl [Fri, 14 Oct 2016 14:57:43 +0000 (14:57 +0000)]
The code currently resets the keepalive timer each time a packet is
received on a TCP session that has entered the ESTABLISHED state. This
results in a lot of calls to reset the keepalive timer.
This patch changes the behavior so we set the keepalive timer for the
keepalive idle time (TP_KEEPIDLE). When the keepalive timer fires, it will
first check to see if the session has been idle for TP_KEEPIDLE ticks. If
not, it will reschedule the keepalive timer for the time the session will
have been idle for TP_KEEPIDLE ticks.
For a session with regular communication, the keepalive timer should fire
approximately once every TP_KEEPIDLE ticks. For sessions with irregular
communication, the keepalive timer might fire more often. But, the
disruption from a periodic keepalive timer should be less than the regular
cost of resetting the keepalive timer on every packet.
(FWIW, this change saved approximately 1.73% of the busy CPU cycles on a
particular test system with a heavy TCP output load. Of course, the
actual impact is very specific to the particular hardware and workload.)
mav [Fri, 14 Oct 2016 12:03:04 +0000 (12:03 +0000)]
MFV r307314:
6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates
Using a benchmark which creates 2 million files in one TXG, I observe
that the thread running spa_sync() is on CPU almost the entire time we
are syncing, and therefore can be a performance bottleneck. About 50% of
the time in spa_sync() is in dmu_objset_do_userquota_updates().
The problem is that dmu_objset_do_userquota_updates() calls
zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was
modified (or created). In this benchmark, all the files are owned by the
same user/group, so all 2 million calls to zap_increment_int() are
modifying the same entry in the zap. The same issue exists for the
DMU_GROUPUSED_OBJECT.
We should keep an in-memory map from user to space delta while we are
syncing, and when we finish, iterate over the in-memory map and modify
the ZAP once per entry. This reduces the number of calls to
zap_increment_int() from "number of objects modified" to "number of
owners/groups of modified files".
This reduced the time spent in spa_sync() in the file create benchmark
by ~33%, from 11 seconds to 7 seconds.
Closes #107
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Jinshan Xiong <jinshan.xiong@intel.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Fri, 14 Oct 2016 12:01:33 +0000 (12:01 +0000)]
MFV r307313:
5120 zfs should allow large block/gzip/raidz boot pool (loader project)
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Toomas Soome <tsoome@me.com>
sephe [Fri, 14 Oct 2016 05:32:47 +0000 (05:32 +0000)]
hyperv/stor: Fix off-by-one bug; this brings back TRIM support.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reported by: Lili Deng <v-lide microsoft com>
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8238
gonzo [Fri, 14 Oct 2016 03:37:35 +0000 (03:37 +0000)]
Add initial Raspberry Pi 3 support
RPI3 kernel config builds kernel compatible with latest upstream device
tree and firmware: https://github.com/raspberrypi/firmware/tree/master/boot
As of today it's 597c662a613df1144a6bc43e5f4505d83bd748ca
Default console is PL01x, so pi3-disable-bt dt overlay should be configured
in config.txt and stock U-Boot should be patched to use proper serial port.
Yet unsupported: SMP, VCHIQ, RNG driver. RNG requires some work due to
upstream device tree incompatibility.
Multiple people contributed to this work over time: db@, loos@, manu@
gonzo [Thu, 13 Oct 2016 23:29:24 +0000 (23:29 +0000)]
Fix BCM283x(Raspberry Pi) SDHCI driver for ARM64 build
- Revert BUS_SPACE_PHYSADDR back to rman_get_start. BUS_SPACE_PHYSADDR was
introduced in 2013 as temporary wrapper until proper solution appears.
It's ARM only and since we need this file for ARM64 build and no proper
API has been introduced - just revert the change and make sure it's
going to appear when people grep for BUS_SPACE_PHYSADDR in sources.
bapt [Thu, 13 Oct 2016 22:43:49 +0000 (22:43 +0000)]
Stop closing the network device when netbooting for loaders using the common
dev_net.c code.
The NETIF_OPEN_CLOSE_ONCE flag was added in r201932 to prevent that behaviour
on some architectures (sparc64 and powerpc64) the default was left to always
open and close the device for each open and close of a file by the loader
because it was necessary for u-boot on arm.
Since it has been added, the flag was turned on for every arches including the
u-boot loader for arm.
This also fixes netbooting on RPi3 (tested by gonzo@)
For the loader.efi it greatly speeds up netbooting
emaste [Thu, 13 Oct 2016 19:18:00 +0000 (19:18 +0000)]
libgcc_s: add libm dependencies from div{d,s,x}c3
compiler-rt's complex division support routines contain calls to
compiler builtins such as `__builtin_scalbnl`. Unfortunately Clang
turns these back into a call to `scalbnl`.
For now link libm's C version of the required support routines.
Reviewed by: ed
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8190
ed [Thu, 13 Oct 2016 18:25:40 +0000 (18:25 +0000)]
Improve typing of POSIX search tree functions.
Back in 2015 when I reimplemented these functions to use an AVL tree, I
was annoyed by the weakness of the typing of these functions. Both tree
nodes and keys are represented by 'void *', meaning that things like the
documentation for these functions are an absolute train wreck.
To make things worse, users of these functions need to cast the return
value of tfind()/tsearch() from 'void *' to 'type_of_key **' in order to
access the key. Technically speaking such casts violate aliasing rules.
I've observed actual breakages as a result of this by enabling features
like LTO.
I've filed a bug report at the Austin Group. Looking at the way the bug
got resolved, they made a pretty good step in the right direction. A new
type 'posix_tnode' has been added to correspond to tree nodes. It is
still defined as 'void' for source-level compatibility, but in the very
far future it could be replaced by a proper structure type containing a
key pointer.
kib [Thu, 13 Oct 2016 14:41:05 +0000 (14:41 +0000)]
Fix a race in vm_page_busy_sleep(9).
Suppose that we have an exclusively busy page, and a thread which can
accept shared-busy page. In this case, typical code waiting for the
page xbusy state to pass is
again:
VM_OBJECT_WLOCK(object);
...
if (vm_page_xbusied(m)) {
vm_page_lock(m);
VM_OBJECT_WUNLOCK(object); <---1
vm_page_busy_sleep(p, "vmopax");
goto again;
}
Suppose that the xbusy state owner locked the object, unbusied the
page and unlocked the object after we are at the line [1], but before we
executed the load of the busy_lock word in vm_page_busy_sleep(). If it
happens that there is still no waiters recorded for the busy state,
the xbusy owner did not acquired the page lock, so it proceeded.
More, suppose that some other thread happen to share-busy the page
after xbusy state was relinquished but before the m->busy_lock is read
in vm_page_busy_sleep(). Again, that thread only needs vm_object lock
to proceed. Then, vm_page_busy_sleep() reads busy_lock value equal to
the VPB_SHARERS_WORD(1).
In this case, all tests in vm_page_busy_sleep(9) pass and we are going
to sleep, despite the page being share-busied.
Update check for m->busy_lock == VPB_UNBUSIED in vm_page_busy_sleep(9)
to also accept shared-busy state if we only wait for the xbusy state to
pass.
Merge sequential if()s with the same 'then' clause in
vm_page_busy_sleep().
Note that the current code does not share-busy pages from parallel
threads, the only way to have more that one sbusy owner is right now
is to recurse.
Reported and tested by: pho (previous version)
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D8196
ngie [Thu, 13 Oct 2016 07:32:25 +0000 (07:32 +0000)]
Port contrib/netbsd-tests/fs/tmpfs/h_tools.c to FreeBSD
- Add inttypes.h #include for PRId64 macro
- Use FreeBSD's copy of getfh(2), which doesn't include a `fh_size` parameter.
Use sizeof(fhandle_t) instead as the size of fhp is always fixed as
fhandle_t, unlike NetBSD's copy of fhp, which is void*.
avg [Thu, 13 Oct 2016 07:25:18 +0000 (07:25 +0000)]
convert iicsmb to use iicbus_transfer for all operations
Previously the driver used more low level operations like iicbus_start
and iicbus_write. The problem is that those operations are not
implemented by iicbus(4) and the calls were effectively routed to
a driver to which the bus is attached.
But not all of the controllers implement such low level operations
while all of the drivers are expected to have iicbus_transfer.
While there fix incorrect implementation of iicsmb_bwrite and iicsmb_bread.
The former should send a byte count before the actual bytes, while the
latter should first receive the byte count and then receive the bytes.
I have tested only these commands:
- quick (r/w)
- send byte
- receive byte
- read byte
- write byte
ngie [Thu, 13 Oct 2016 07:02:54 +0000 (07:02 +0000)]
Skip :uchg on FreeBSD
Unfortunately removing files with uchg set always succeeds with root on
FreeBSD. Unfortunately running the test as an unprivileged user isn't doable
because mounting tmpfs requires root
imp [Thu, 13 Oct 2016 06:56:23 +0000 (06:56 +0000)]
Fix building on i386 and arm. But 'public domain' headers on the files
with no creative content. Include "lost" changes from git:
o Use /dev/efi instead of /dev/efidev
o Remove redundant NULL checks.
avg [Thu, 13 Oct 2016 06:19:54 +0000 (06:19 +0000)]
rc.d/zfsbe: a new script designed for boot environment support
Currently zfsbe ensures that subordinate filesystems are mounted at the
right mount points.
The script assumes that the subordinate filesystems of a boot environment
have their canmount property set to noauto, so that they are not
automatically mounted on boot. Whereas the root filesystem is mounted
by the kernel, there was nothing to mount its subordinates.
rc.d/zfsbe fills that gap.
cem [Thu, 13 Oct 2016 02:06:23 +0000 (02:06 +0000)]
kern_linker: Handle module-loading failures in preloaded .ko files
The runtime kernel loader, linker_load_file, unloads kernel files that
failed to load all of their modules. For consistency, treat preloaded
(loader.conf loaded) kernel files in the same way.
dteske [Wed, 12 Oct 2016 20:50:17 +0000 (20:50 +0000)]
Many shops still prefer rc.conf(5) based jail configuration(s). In-part
because they can use sysrc in conjunction with ssh and xargs to perform
en-masse changes in a large distribution with lots of jails spread over
many hosts on a LAN/WAN.
Provide a mechanism for disabling the warning eschewed by /etc/rc.d/jail
in said situation. If jail_confwarn="NO" is in rc.conf(5) (default "YES")
skip the warning that per-jail configurations are obsolete and that the
user should migrate to jail.conf(5).
Reviewed by: jelischer
MFC after: 3 days
Sponsored by: FIS Global, Inc.
Differential Revision: https://reviews.freebsd.org/D7465
gonzo [Wed, 12 Oct 2016 19:53:10 +0000 (19:53 +0000)]
[fdt] Add one more heuristic to determine MAC address of the SMSC device
- If check for net,ethernet/usb,device compatible node fails, try to find
.../usb/hub/ethernet, where ... is bus path that can depend on actual HW.
net,ethernet/usb,device compatibity strings are FreeBSD custom invention
that is used only in RPi DTBs and since there is no other way to tie USB
device to FDT node we just do our best effort here to work with upstream
device tree
- Use -1 value to indicate invalid phandle_t, 0 is valid phandle value and
shouldn't be used as error signal
jtl [Wed, 12 Oct 2016 19:06:50 +0000 (19:06 +0000)]
The TFO server-side code contains some changes that are not conditioned on
the TCP_RFC7413 kernel option. This change removes those few instructions
from the packet processing path.
While not strictly necessary, for the sake of consistency, I applied the
new IS_FASTOPEN macro to all places in the packet processing path that
used the (t_flags & TF_FASTOPEN) check.
Reviewed by: hiren
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D8219
emaste [Wed, 12 Oct 2016 18:49:30 +0000 (18:49 +0000)]
Add COMPAT_FREEBSD10 to the MIPS ERL kernel config
As of r302092, pipe is a wrapper around pipe2 and the pipe syscall is no
longer used. It is included only with the COMPAT_FREEBSD10 kernel option.
Add the compat option to support upgrades from systems with an earlier
userland.
emaste [Wed, 12 Oct 2016 15:49:20 +0000 (15:49 +0000)]
Avoid using 'head' in generating groff doc date
It may not be available in certain cross build cases.
Note that this is a slight change in functionality, in that now only the
first line of the source ChangeLog file is processed. This is acceptable
as groff will be retired and we won't encounter a possibly-different
ChangeLog format.
avg [Wed, 12 Oct 2016 07:08:32 +0000 (07:08 +0000)]
install header files required development with libzfs_core
libzfs_core provides a rather limited but committed (stable) interface
for working with ZFS. We install libzfs_core shared library but we do
not install header files required for developing programs that use
the library. This change is to install the required header files
libzfs_core.h, libnvpair.h and sys/nvpair.h.
The headers are installed into the same locations as on illumos.
avg [Wed, 12 Oct 2016 06:58:01 +0000 (06:58 +0000)]
smbus: allow child devices to be added via hints
This will allow to add slave drivers in the same fashion as for iicbus.
Also, allow other code to add a child device and set its 'addr' ivar.
The ivar can only be set if it's unset, it can not be changed.
That could be used, for example, by a platform driver that has
a precise description of the hardware and, thus, knows what drivers
can handle what slaves.
The slave auto-probing code is unsafe and broken because it uses
7-bit slave addresses. It's going to be removed.
Note: internally the driver uses address of zero as an unset address
while smbus_get_addr() returns it as -1 for compatibility reasons.
The address is expected to be unset only for children that do not
work with slaves like, for example, smb(4).
gonzo [Wed, 12 Oct 2016 03:36:46 +0000 (03:36 +0000)]
Make BCM28x USB driver compatible with upstream device tree
This should have been committed in r307093: resource allocation depends
on source of the device tree. upstream dts has extra interrupt that we can
ignore
gonzo [Wed, 12 Oct 2016 02:58:27 +0000 (02:58 +0000)]
Make sure intc is attached before interrupt consumers
If pass order is not specified devices are attached in the order they are
defined in dts. Some interrupt consumers may be defined before intc. Also
make sure intc interrupt-parent local_intc is attached before intc itself.
jtl [Wed, 12 Oct 2016 02:30:33 +0000 (02:30 +0000)]
Currently, when tcp_input() receives a packet on a session that matches a
TCPCB, it checks (so->so_options & SO_ACCEPTCONN) to determine whether or
not the socket is a listening socket. However, this causes the code to
access a different cacheline. If we first check if the socket is in the
LISTEN state, we can avoid accessing so->so_options when processing packets
received for ESTABLISHED sessions.
If INVARIANTS is defined, the code still needs to access both variables to
check that so->so_options is consistent with the state.
jtl [Wed, 12 Oct 2016 02:16:42 +0000 (02:16 +0000)]
In the TCP stack, the hhook(9) framework provides hooks for kernel modules
to add actions that run when a TCP frame is sent or received on a TCP
session in the ESTABLISHED state. In the base tree, this functionality is
only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd,
and cc_vegas congestion control modules.
Presently, we incur overhead to check for hooks each time a TCP frame is
sent or received on an ESTABLISHED TCP session.
This change adds a new compile-time option (TCP_HHOOK) to determine whether
to include the hhook(9) framework for TCP. To retain backwards
compatibility, I added the TCP_HHOOK option to every configuration file that
already defined "options INET". (Therefore, this patch introduces no
functional change. In order to see a functional difference, you need to
compile a custom kernel without the TCP_HHOOK option.) This change will
allow users to easily exclude this functionality from their kernel, should
they wish to do so.
Note that any users who use a custom kernel configuration and use one of the
congestion control modules listed above will need to add the TCP_HHOOK
option to their kernel configuration.
jonathan [Wed, 12 Oct 2016 00:42:46 +0000 (00:42 +0000)]
Extract suffix rules into bsd.suffixes[-posix].mk.
Refactor make suffix rules into separate files (one for POSIX and one not),
and rationalise the rules so that bsd.lib.mk can contain only those rules
that are library-specific (.c.po and .c.pico).
This can be accomplished by adding ${STATIC_CFLAGS} to the .c.o rule
unconditionally. STATIC_CFLAGS are only defined for use by sys.mk rules in
lib/libpam/Makefile.inc (see r227797), so it should be safe to include
them unconditionally in sys.mk's .c.o rule (tested by make universe and a
ports exp-run).
imp [Tue, 11 Oct 2016 22:32:12 +0000 (22:32 +0000)]
Properly include the 802.11n PHY support files when the BWM_GPL_PHY
option is included. Remove the comment suggesting that people
uncomment things because it is OBE.
imp [Tue, 11 Oct 2016 22:31:45 +0000 (22:31 +0000)]
Add efivar(1) to manipulate EFI variables. It uses a similar command
line interface to the Linux program, as well as adding a number of
useful features to make using it in shell scripts easier (since we
don't have a filesystem to fall back on interacting with).
imp [Tue, 11 Oct 2016 22:30:41 +0000 (22:30 +0000)]
Create libefivar library. This library aims to provide
the same API as the GPL'd version of this library. It implements the common
Linux API for programatically manipulating UEFI environment varibales using
the UEFI Runtime Services the kernel provides. It replaces the old efi
library since it is programmed to a different interface, but retails the
CHAR16 to UTF-8 and vice versa conversion routines. The new name is to match
Linux program's expectations.
imp [Tue, 11 Oct 2016 22:24:30 +0000 (22:24 +0000)]
Create /dev/efidev to provide an ioctl interface to
userland. It supports userland interfaces to UEFI Runtime Services. This is
indended to the the MI portion of EFI RuntimeServices support.
gonzo [Tue, 11 Oct 2016 21:37:34 +0000 (21:37 +0000)]
Make intc driver compatible with upstream DTS
- Fix compatibility strings
- Properly decode upstream's two-cell interrupt specs. Our home-made dts
does not have two-cell interrupts so no need to preserve backward
compatibility
kib [Tue, 11 Oct 2016 18:09:37 +0000 (18:09 +0000)]
When downgrading exclusively busied page to shared-busy state, wakeup
waiters. Otherwise, owners of the shared-busy state are left blocked
and might get into a deadlock.
Note that the vm_page_busy_downgrade() function is not used in the
tree right now.
Reported and tested by: pho (previous version)
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D8195