Move setting of media parameters inside open routines.
This is preparation for possibility to open/close media several times
per LUN life cycle. While there, rename variables to reduce confusion.
As additional bonus this allows to open read-only media, such as ZFS
snapshots.
Track changes to kern.maxvnodes and appropriately increase or decrease
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.
Auto-detect the UGA frame buffer and stride on a MacBook. We're
striking a delicate balance between exhaustive searching and
banking on assumptions. The environment variables can be used
as a fall-back anyway. With this change, all known and tested
Macs with only UGA should have a working console out of the
box... for now...
Eliminate pointless requeueing of pages from terminated objects. These
pages will have left the inactive queue before the page daemon performs
its next scan. Also, ignore references to pages from terminated objects.
This allows the clean pages to be freed a little sooner.
Move some comments to their proper place, i.e., next to the code that
they describe, and update other nearby comments.
andrew [Sat, 5 Sep 2015 17:29:07 +0000 (17:29 +0000)]
Add ddb show commands to print the special registers and to ask the
hardware to perform address translation for us. These are useful to help
track down what caused us to enter the debugger.
Do not pass lle to nd6_ns_output(). Use newly-added
nd6_llinfo_get_holdsrc() to extract desired IPv6 source
from holdchain and pass it to the nd6_ns_output().
o Unlike xor, in Jenkins hash every bit of input affects virtually
every bit of output, thus salting the hash actually works. With
xor salting only provides a false sense of security, since if
hash(x) collides with hash(y), then of course, hash(x) ^ salt
would also collide with hash(y) ^ salt. [1]
o Jenkins provides much better distribution than xor, very close to
ideal.
TCP connection setup/teardown benchmark has shown a 10% increase
with default hash size, and with bigger hashes that still provide
possibility for collisions. With enormous hash size, when dataset is
by an order of magnitude smaller than hash size, the benchmark has
shown 4% decrease in performance decrease, which is expected and
acceptable.
Noticed by: Jeffrey Knockel <jeffk cs.unm.edu> [1]
Benchmarks by: jch
Reviewed by: jch, pkelsey, delphij
Security: strengthens protection against hash collision DoS
Sponsored by: Nginx, Inc.
Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).
Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.
My MacBook has UGA only, but we fail to detect any changes
in the frame buffer when we flip pixels. Allow the detection
to be bypassed by setting the uga_framebuffer and uga_stride
variables. The kernel console works fine even when we can't
detect pixel changes in the frame buffer, which indicates
that the problem could be with reading from the frame buffer
and not writing to it.
nlge(4) is supposed to deprecate rge(4) for Broadcom XLR when it was
introduced 5 years ago.
rge doesn't build on -CURRENT due to MII changes. All the XLR kernel confs
use nlge. Let's get rid of the old driver for FreeBSD 11. We can use
10-STABLE or SVN to go back and look at the old driver if needed.
Gleaned from a public header file. 5402 and 5404 look like they may be
used on embedded devices. 5478 and 5488 are switch PHYs. 5754 change is just
to note a product alias.
Add more mmap tests related to character devices.
- Add cdev-related tests for bad args.
- Add two simple tests cases for mapping /dev/zero that test for
MAP_ANON-like behavior.
cem [Thu, 3 Sep 2015 20:32:10 +0000 (20:32 +0000)]
Detect badly behaved coredump note helpers
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.
When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.
NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath. As vnodes may move around during dump, this is racy.
So:
- Detect badly behaved notes in putnote() and pad underfilled notes.
- Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
exercise the NT_PROCSTAT_FILES corruption. It simply picks random
lengths to expand or truncate paths to in fo_fill_kinfo_vnode().
- Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
disable kinfo packing for PROCSTAT_FILES notes. This should avoid
both FILES note corruption and truncation, even if filenames change,
at the cost of about 1 kiB in padding bloat per open fd. Document
the new sysctl in core.5.
- Fix note_procstat_files to self-limit in the 2nd pass. Since
sometimes this will result in a short write, pad up to our advertised
size. This addresses note corruption, at the risk of sometimes
truncating the last several fd info entries.
- Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
zero padding.
Currently the Linux character device mmap handling only supports mmap
operations that map a single page that has an associated vm_page_t.
This does not permit mapping larger regions (such as a PCI memory
BAR) and it does not permit mapping addresses beyond the top of RAM
(such as a 64-bit BAR located above the top of RAM).
Instead of using a single OBJT_DEVICE object and passing the physaddr via
the offset as a hack, create a new sglist and OBJT_SG object for each
mmap request. The requested memory attribute is applied to the object
thus affecting all pages mapped by the request.
r249170 was just plain wrong. The effect of the change is to always
delete a logic volume on status change which is NOT what we want here.
The original code is correct in that when the volume changes status
the driver will only delete the volume if the status is one of the
fatal errors. A drive failure in a mirrored volume is NOT a situtation
where the volume should dissapear.
Reported on freebsd-scsi@:
https://lists.freebsd.org/pipermail/freebsd-scsi/2015-September/006800.html
Simplify the introductory example in ctl.conf(5) down to absolute
basics. The more complicated cases - like how to use physical
ports - are explained later, in the "EXAMPLES" section.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
For UGA, the frame buffer address obtained by scanning the
PCI BARs does not necessarily correspond to the upper-left
most pixel. Scan the frame buffer for which byte changed
when changing the pixel at (0,0).
Use the same technique to determine the stride. Except for
changing the pixel at (0,0), we change the pixel at (0,1).
Enable both i2c1 and i2c2. These devices are disabled in TI's DTS
so they were disabled during DTS transition. Though there are
no standard devices/drivers on them people might use iic(4) userland
interface to access these buses.
Simplify kvm symbol resolution and error handling. The symbol table
nl_symbols will eventually be organized into several modules depending
on MK_* variables.
After the introduction of direct dispatch, the pacing code in g_down()
broke in two ways. One, the pacing variable was accessed in multiple
threads in an unsafe way. Two, since large numbers of I/O could come
down from the buf layer at one time, large numbers of allocation
failures could happen all at once, resulting in a huge pace value that
would limit I/Os to 10 IOPS for minutes (or even hours) at a
time. While a real solution to these problems requires substantial
work (to go to a no-allocation after the first model, or to have some
way to wait for more memory with some kind of reserve for pager and
swapper requests), it is relatively easy to make this simplistic
pacing less pathological.
Move to using a volatile variable with loads and stores. While this is
a little racy, losing the race is safe: either you get memory and
proceed, or you don't and queue. Second, sleep for 1ms (or one tick, whichever
is larger) instead of 100ms. This removes the artificial 10 IOPS limit
while still easing up on new I/Os during memory shortages. Remove
tying the amount of time we do this to the number of failed requests
and do it only as long as we keep failing requests.
Finally, to avoid needless recursion when memory is tight (start ->
g_io_deliver() -> g_io_request() -> start -> ... until we use 1/2 the
stack), don't do direct dispatch while pacing. This should be a rare
event (not steady state) so the performance hit here is worth the
extra safety of not starving g_down() with directly dispatched I/O.
cem [Wed, 2 Sep 2015 16:48:03 +0000 (16:48 +0000)]
ioat: re-initialize interrupts after resetting hw on BDXDE
Resetting some generations of the I/OAT hardware (just BDXDE for now)
resets the corresponding MSI-X registers. So, teardown and
re-initialize interrupts after resetting the hardware.
The ${BUILDKERNELS:[2..-1]} appears to produce a non zero result for
a one word variable, which is quite unexpected from documentation.
So, to avoid double installation of a single kernel, protect the extra
kernels loop with ${BUILDKERNELS:[#]} > 1 conditional.
It's 2015, and some people are still trying to use fdisk and then
go asking what debug flags to set for GEOM to make it work. Advice
them to use gpart(8) instead.
Something similar should probably done with disklabel,
but I need to rewrite the disklabel examples first.
Reviewed by: wblock@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3315
Fix dynamic attach/detach of 802.11 devices after r287197:
o In pccard_ether add code to start children of a 802.11
device, that are configured in rc.conf.
o In devd.conf provide a regex matching all 802.11 devices,
and on match run pccard_ether to spawn children.
PR: 202784
Submitted by: <vidwer gmail.com>
In collaboration with: "Oleg V. Nauman" <oleg opentransfer.com>
Export current system call code and argument count for system call entry
and exit events. procfs stop events for system call tracing report these
values (argument count for system call entry and code for system call exit),
but ptrace() does not provide this information. (Note that while the system
call code can be determined in an ABI-specific manner during system call
entry, it is not generally available during system call exit.)
The values are exported via new fields at the end of struct ptrace_lwpinfo
available via PT_LWPINFO.