oshogbo [Sat, 23 Dec 2017 18:07:43 +0000 (18:07 +0000)]
Introduce the daemonfd function.
The daemonfd function is equivalent to the daemon(3) function expect that
arguments are descriptors. For example dhclient(8) which is sandboxed is
unable to open /dev/null to close stdio instead it's allows to fail
daemon(3) function to close the descriptors and then do it explicit in code.
Instead of such hacks we can use now daemonfd.
This API can be also helpful to migrate system to platforms like CheriBSD.
kan [Sat, 23 Dec 2017 16:45:26 +0000 (16:45 +0000)]
Silence clang analyzer false positive.
clang does not know that two lookup calls will return the same
pointer, so it assumes correctly that using the old pointer
after dropping the reference to it is a bit risky.
kan [Sat, 23 Dec 2017 16:45:24 +0000 (16:45 +0000)]
Do not pass NULL pointer to copyout in if_clone_list.
Sometimes caller is only interested in how many clones
are there and NULL pointer is passed for the destination
buffer. Do not pass it to copyout then.
kevans [Sat, 23 Dec 2017 14:30:44 +0000 (14:30 +0000)]
Move syscon into extres framework
This should help reduce confusion between syscon/syscons a little bit.
syscon is a resource generally modeled by FDT platforms, and not to be
confused with syscons.
kevans [Sat, 23 Dec 2017 14:27:42 +0000 (14:27 +0000)]
syscon: Introduce kobj and split out fdt bits
Allow more flexibility by kobj'ifying syscon and splitting out fdt specific
bits in preparation of a move to the extres framework.
The generic fdt driver has been moved to syscon_generic.c and the fdt
requirement has been removed from the syscon interface, as is common to the
extres framework.
imp [Sat, 23 Dec 2017 06:49:27 +0000 (06:49 +0000)]
Create a new ISA_PNP_INFO macro. Use this macro every where we have
ISA PNP card support (replace by hand version in if_ed). Move module
declarations to the end of some files. Fix PCCARD_PNP_INFO to use
nitems(). Remove some stale comments about pc98, turns out the comment
was simply wrong.
imp [Sat, 23 Dec 2017 06:11:19 +0000 (06:11 +0000)]
Expand cryptic comment with inforation I've learned in the mean time
about CIS3/CIS4, including studies I've done on my large collection of
PC Cards bought off e-bay over the years since the original entry as
well as conversations I've had at conferences.
kib [Fri, 22 Dec 2017 23:27:03 +0000 (23:27 +0000)]
Remove mips MD atomic_load_64 and atomic_store_64.
The only users of the functions were db_read_bytes() and
db_write_bytes() ddb(4) interfaces. Replace the calls with direct
reads and writes, which are automatically atomic on 64bits and n32.
Note that removed assembler implementation for mips32 is not atomic
anyway.
Reviewed by: jhb
Discussed with: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13586
np [Fri, 22 Dec 2017 19:10:19 +0000 (19:10 +0000)]
cxgbe(4): Do not forward interrupts to queues with freelists. This
leaves the firmware event queue (fwq) as the only queue that can take
interrupts for others.
This simplifies cfg_itype_and_nqueues and queue allocation in the driver
at the cost of a little (never?) used configuration. It also allows
service_iq to be split into two specialized variants in the future.
kib [Thu, 21 Dec 2017 23:39:00 +0000 (23:39 +0000)]
Fix mips build after introduction of MD definitions of atomic_load_64
and atomic_store_64.
The MD definitions are provided for LP64 only, while mips also uses
them for 32bit and n32. Only define mips variants for 32bit and n32
and change the syntax to match common definitions.
Note that this commit does not fix 32bit asm implementation to follow
new KBI, this will be fixed later. The functions are only used for 8
byte ddb accesses so the known bug does not prevent normal kernel
operations.
kib [Thu, 21 Dec 2017 23:08:10 +0000 (23:08 +0000)]
Fix build for LP64 arches with gcc.
gcc complaints that the comparision is always false due to the value
range, and the cast does not prevent the analysis. Split the LP64
vs. ILP32 clamping as a workaround.
imp [Thu, 21 Dec 2017 19:19:43 +0000 (19:19 +0000)]
When -v is specified with -p dev, print the same verbose output as
when listing the whole tree. The list, however, is from the requested
device to the root (so it backwards from the normal tree).
imp [Thu, 21 Dec 2017 18:51:47 +0000 (18:51 +0000)]
Implement "-p dev" to print the path to the given device back to the
nexus. With redirection, could also be used to test if the device
exists in the device tree.
mizhka [Thu, 21 Dec 2017 12:21:35 +0000 (12:21 +0000)]
[boot/efi] scan all display modes rather than sequential try-fail way
This patch allows to scan all display modes in boot1 as loader does.
Before system tried to select optimal display mode by sequential scan of
modes and if error then stop scanning. This way is not good, because
if mode N is not present, mode N+1 may exist.
In loader we use conout->Mode->MaxMode to identify maximum number of modes.
This commit is to use same way in boot1 as in loader.
ed [Thu, 21 Dec 2017 09:21:40 +0000 (09:21 +0000)]
Make truss work for CloudABI executables on i386.
The system call convention is different from i386 binaries running on
FreeBSD/amd64, but this is not noticeable by executables. On
FreeBSD/amd64, the vDSO already does padding of arguments and return
values to 64-bit values. On i386, it does not, meaning that system call
return values are simply stored in registers.
bde [Thu, 21 Dec 2017 09:17:48 +0000 (09:17 +0000)]
Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension.
restart_cpus() worked well enough by accident. Before this set of fixes,
resume_cpus() used the same cpuset (started_cpus, meaning CPUs directed to
restart) as restart_cpus(). resume_cpus() waited for the wrong cpuset
(stopped_cpus) to become empty, but since mixtures of stopped and suspended
CPUs are not close to working, stopped_cpus must be empty when resuming so
the wait is null -- restart_cpus just allows the other CPUs to restart and
returns without waiting.
Fix resume_cpus() to wait on a non-wrong cpuset for the ACPI case, and
add further kludges to try to keep it working for the XEN case. It
was only used for XEN. It waited on suspended_cpus. This works for
XEN. However, for ACPI, resuming is a 2-step process. ACPI has already
woken up the other CPUs and removed them from suspended_cpus. This
fix records the move by putting them in a new cpuset resuming_cpus.
Waiting on suspended_cpus would give the same null wait as waiting on
stopped_cpus. Wait on resuming_cpus instead.
Add a cpuset toresume_cpus to map the CPUs being told to resume to keep
this separate from the cpuset started_cpus for mapping the CPUs being told
to restart. Mixtures of stopped and suspended/resuming CPUs are still far
from working. Describe new and some old cpusets in comments.
Add further kludges to cpususpend_handler() to try to avoid breaking it
for XEN. XEN doesn't use resumectx(), so it doesn't use the second
return path for savectx(), and it goes from the suspended state directly
to the restarted state, while ACPI resume goes through the resuming state.
Enter the resuming state early for all cases so that resume_cpus can test
for being in this state and not have to worry about the intermediate
!suspended state for ACPI only.
imp [Thu, 21 Dec 2017 04:21:59 +0000 (04:21 +0000)]
Bump number that's an insane number of devices from 1,000 to 10,000. I
have access to machines that are pushing 400 devices. When 1,000 was
selected, it was rare to get even 40 or 50 devices. Bump the limit by
10x to keep up with the times.
It seems that tcp_lro_rx() doesn't verify TCP checksums, so
if there are bad checksums in the packets caused by invalid data, the
invalid data will pass through without errors.
This was noticed with the igb driver and a specific internet host:
fetch http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz -o test.bin && sha256 test.bin
Would result in a different value sometimes.
This ends up making LRO require RXCSUM to be enabled, and RXCSUM to
support TCP and UDP checksums.
ian [Wed, 20 Dec 2017 22:19:11 +0000 (22:19 +0000)]
If a temporary mapping is made to support EARLY_PRINTF, undo that mapping
after cninit() runs, otherwise we leave a bogus device-memory mapping in
userspace VA in the kernel pmap forever.
ian [Wed, 20 Dec 2017 22:17:27 +0000 (22:17 +0000)]
Allow pmap_kremove() to remove 1MB section mappings as well as 4K pages.
This will allow it to undo temporary device mappings such as those made
with pmap_preboot_map_attr().
ian [Wed, 20 Dec 2017 20:46:12 +0000 (20:46 +0000)]
Restore the ability to use EARLY_PRINTF support during most of initarm().
The real kernel page tables are set up much earlier in initarm() now than
they were when early printf support was first added, and they end up undoing
the mapping made in locore.S for early printf support. This re-adds the
mapping after switching to the new/real kernel page tables, making early
printf work again right after switching to them.
pfg [Wed, 20 Dec 2017 20:25:28 +0000 (20:25 +0000)]
Revert r327005 - SPDX tags for license similar to BSD-2-Clause.
After consultation with SPDX experts and their matching guidelines[1],
the licensing doesn't exactly match the BSD-2-Clause. It yet remains to be
determined if they are equivalent or if there is a recognized license that
matches but it is safer to just revert the tags.
Let this also be a reminder that on FreeBSD, SPDX tags are only advisory
and have no legal value (but IANAL).
Pointyhat to: pfg
Thanks to: Rodney Grimes, Gary O'Neall
imp [Wed, 20 Dec 2017 19:14:05 +0000 (19:14 +0000)]
Add device location wiring to the pci bus.
This allows one to specify, for example, that if there's an igb card
in bus 12, slot 0, function 0, it should be assigned igb5. If there
isn't, or there's one in a different slot, normal numbering rules
apply (hinted units are skipped). Adding 'hint.igb.5.at="pci12:0:0"'
or 'hint.igb.5.at="pci0:12:0:0"' to /boot/device.hints will accomplish
this. The double quotes are important.
The kernel only accepts the strings (in shell notation):
pci$d:$b:$s:$f
and pci$b:$s:$f
where $d is the pci domain, $b is the pci bus number, $s is the slot
number and $f is the function number. A string compare is done with
the current device to avoid another string parser in the kernel. All
numbers are unsigned decimal without leading zeros.
ian [Wed, 20 Dec 2017 18:23:22 +0000 (18:23 +0000)]
Add a new kernel config option, MD_ROOT_READONLY, which forces on the
MD_READONLY flag for the md device automatically instantiated during
kernel init for an mdroot filesystem.
Note that there is specifically and by design no tunable or sysctl
control over this feature. Without this option, you already have control
over whether the mdroot fs is writeable using vfs.root.mountfrom.options
from loader(8), the root_rw_mount rcvar, and by using "mount -u[rw] /"
or equivelent on the fly. This option is being added to provide a way
to make the mdroot fs truly immutable before userland code begins running.
shurd [Wed, 20 Dec 2017 01:03:34 +0000 (01:03 +0000)]
Support attaching tx queues to cpus
This will attempt to use a different thread/core on the same L2
cache when possible, or use the same cpu as the rx thread when not.
If SMP isn't enabled, don't go looking for cores to use. This is mostly
useful when using shared TX/RX queues.
jhb [Tue, 19 Dec 2017 23:54:44 +0000 (23:54 +0000)]
Don't return early for non-failure for one of the EMLINK checks.
r326987 enabled two #if 0'd-out EMLINK checks in zfs_link_create() for
link overflow. However, one of the checks (when the vnode adding a link
is a directory such as for mkdir) always returned even if the link did not
overflow. Change this to only return early if it needs to report an
EMLINK error.
jhb [Tue, 19 Dec 2017 22:39:05 +0000 (22:39 +0000)]
Rework pathconf handling for FIFOs.
On the one hand, FIFOs should respect other variables not supported by
the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.).
These values are fs-specific and must come from a fs-specific method.
On the other hand, filesystems that support FIFOs are required to
support _PC_PIPE_BUF on directory vnodes that can contain FIFOs.
Given this latter requirement, once the fs-specific VOP_PATHCONF
method supports _PC_PIPE_BUF for directories, it is also suitable for
FIFOs permitting a single VOP_PATHCONF method to be used for both
FIFOs and non-FIFOs.
To that end, retire all of the FIFO-specific pathconf methods from
filesystems and change FIFO-specific vnode operation switches to use
the existing fs-specific VOP_PATHCONF method. For fifofs, set it's
VOP_PATHCONF to VOP_PANIC since it should no longer be used.
While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that
only filesystems supporting FIFOs will report a value. In addition,
only report a valid _PC_PIPE_BUF for directories and FIFOs.
shurd [Tue, 19 Dec 2017 21:07:30 +0000 (21:07 +0000)]
Support short HWRM commands
New Stratus bnxt devices require support for short HWRM commands for VFs
to function. Enable their use, but only use them if it's both supported
and required... prefer the long HWRM commands when possible.
jhb [Tue, 19 Dec 2017 20:19:07 +0000 (20:19 +0000)]
Update tmpfs link count handling for ino64.
Add a new TMPFS_LINK_MAX to use in place of LINK_MAX for link overflow
checks and pathconf() reporting. Rather than storing a full 64-bit
link count, just use a plain int and use INT_MAX as TMPFS_LINK_MAX.
Discussed with: bde
Reviewed by: kib (part of a larger patch)
Sponsored by: Chelsio Communications
jhb [Tue, 19 Dec 2017 19:51:36 +0000 (19:51 +0000)]
Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().
Having all filesystems fall through to default values isn't always correct
and these values can vary for different filesystem implementations. Most
of these changes just use the existing default values with a few exceptions:
- Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact
permissions check this claims for chown().
- Use NANDFS_NAME_LEN for NAME_MAX for nandfs.
- Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to
indicate hard links aren't supported.
emaste [Tue, 19 Dec 2017 19:44:06 +0000 (19:44 +0000)]
embed_mfs: support embedding mfs into loader
The script originally supported embedding an mfs into ELF files or any
other type of file, because it searched for magic strings to mark the
beginning and end of the embeddable section. It was later modified to
read the section offset and length via readelf, which made it work for
ELF only. Restore the ability to update arbitrary file types by using
the readelf technique for ELF, and the magic string technique for all
others (including PE/COFF files like loader.efi).
Submitted by: Zakary Nafziger <worldofzak@gmail.com>
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12746
jhb [Tue, 19 Dec 2017 19:18:48 +0000 (19:18 +0000)]
Update NFS to handle larger link counts post ino64.
- Define a NFS_LINK_MAX as UINT32_MAX to match the wire protocol.
- Use NFS_LINK_MAX instead of LINK_MAX as the fallback value reported
for a PATHCONF RPC by the NFS server.
- Use NFS_LINK_MAX instead of LINK_MAX as the default value reported
by the NFS client pathconf() if not overridden by the NFS server.
- When reading the link count out of an RPC reply, read the full 32
bits instead of the lower 16 bits.
jhb [Tue, 19 Dec 2017 19:07:24 +0000 (19:07 +0000)]
Adjust ZFS' link count handling for ino64.
- Define a ZFS_LINK_MAX as the ZFS version of LINK_MAX which is set to
UINT64_MAX to match the on-disk format.
- Enable the currently #if 0'd code to check for link overflows and
return EMLINK.
- Don't clamp the link count reported in stat() to LINK_MAX as that is
still the 16-bit limit, but report the full link counts. Also,
avoid possibly overflowing the reported link count to 0 when adjusting
the link count to account for ".snapshot".
- Update the LINK_MAX reported by pathconf() to report ZFS_LINK_MAX
rather than LINK_MAX (but clamped to LONG_MAX for 32-bit systems).
jhb [Tue, 19 Dec 2017 18:20:38 +0000 (18:20 +0000)]
Add a custom VOP_PATHCONF method for fdescfs.
The method handles NAME_MAX and LINK_MAX explicitly. For all other
pathconf variables, the method passes the request down to the underlying
file descriptor. This requires splitting a kern_fpathconf() syscallsubr
routine out of sys_fpathconf(). Also, to avoid lock order reversals with
vnode locks, the fdescfs vnode is unlocked around the call to
kern_fpathconf(), but with the usecount of the vnode bumped.
markj [Tue, 19 Dec 2017 17:13:04 +0000 (17:13 +0000)]
Avoid using bioq_* in gmirror.
gmirror does not perform any sorting of I/O requests, so the bioq API
doesn't provide any advantages over plain TAILQs. The API also does not
provide operations needed by an upcoming change.
No functional change intended. The diff shrinks the geom_mirror.ko
text and the gmirror softc slightly.
Tested by: pho (part of a larger patch)
MFC after: 1 week
Sponsored by: Dell EMC Isilon
nwhitehorn [Tue, 19 Dec 2017 16:45:40 +0000 (16:45 +0000)]
The highest-order bit of the bootloader cookie is 1, with the result that
the 32-bit cookie can be sign-extended on its way out of the loader and
through Open Firmware. If sign-extended, the in-kernel check of its value
would fail on 64-bit systems, resulting in a mountroot prompt. Solve this
by telling the kernel to ignore the high-order bits.
nwhitehorn [Tue, 19 Dec 2017 15:50:46 +0000 (15:50 +0000)]
Make __startkernel line up with KERNBASE, so that the math to compute the
applied relocation offset in link_elf.c works as intended. We may want to
revisit how that works in future, for example by having elf_reloc_self()
actually store the numbers it is using rather than computing them later,
but this fixes symbol lookup after r326203.
kib [Tue, 19 Dec 2017 14:11:41 +0000 (14:11 +0000)]
mlx5en: Avoid SFENCe on x86
The IA32 memory model guarantees that all writes are seen in the program
order. Also, any access to the uncacheable memory flushes the store
buffers. As the consequence, SFENCE instruction is (almost) never needed,
in particular, it is not needed to ensure the correct order of updates as
seen by a PCIe device.
Use atomic_thread_fence_rel() instead of wb() to only emit compiler barriers
on x86 there. Other architectures get the right barrier instruction as
well.
kib [Tue, 19 Dec 2017 09:59:20 +0000 (09:59 +0000)]
Add atomic_load(9) and atomic_store(9) operations.
They provide relaxed-ordered atomic access semantic. Due to the
FreeBSD memory model, the operations are syntaxical wrappers around
the volatile accesses. The volatile qualifier is used to ensure that
the access not optimized out and in turn depends on the volatile
semantic as implemented by supported compilers.
The motivation for adding the operation is to help people coming from
other systems or knowing the C11/C++ standards where atomics have
special type and require use of the special access operations. It is
still the case that FreeBSD requires plain load and stores of aligned
integer types to be atomic.
Suggested by: jhb
Reviewed by: alc, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13534
imp [Tue, 19 Dec 2017 04:13:22 +0000 (04:13 +0000)]
When doing a dump, the scheduler is normally not running, so this
changed worked to capture dumps for me. However, the test for
SCHEDULER_STOPPED() isn't right. We can also call the dump routine
from ddb, in which case the scheduler is still running. This leads to
an assertion panic that we're sleeping when we shouldn't. Instead, use
the proper test for dumping or not. This brings us in line with other
places that do special things while we're doing polled I/O like this.
imp [Tue, 19 Dec 2017 04:05:55 +0000 (04:05 +0000)]
Interact is always called with NULL. Simplify code a little by
removing this argument, and expanding when rc is NULL. This
effectively completes the back out of custom scripts for tftp booted
loaders from r269153 that was started in r292344 with the new path
tricks that obsoleted it.