Add regression tests scripts for multi-IP FIBs exercising the send,
receive and forward path tagging packets with both the ifconfig fib
option or using ipfw, running ICMP6, TCP/v6 and UDP/v6 tests and
testing both setfib(2) as well as the SO_SETFIB socket option.
At 16 FIBs a total of over 64k return codes/replies/stati are checked,
sometimes multiple times (in different ways, e.g. the reflected request
as well as ipfw counter values).
The scripts need two or three machines to run and are thus not added
to the tools/regression framework but only to tools/test.
MFC r232114:
Update scripts to work around two sh(1) bugs found in stable/8:
1) _x=$((_x + 1)) does not work while x=$((x + 1)) does.
2) Parameter Expansion, esp. "${x%%bar}" does not work if quoted.
Correct typos and improve some details forwarding.sh already
had in initiator, esp. related to ipfw accepting if the default
is deny.
Add an extra stat call to the "delay" function in addition to the
touch which together is still a lot faster than sleep 1 but seems
to help a lot more to mitigate the unrelated kernel race seen.
Add regression tests for the setsockopt(2) SO_SETFIB socket option.
Check that the expected domain(9) families all handle the socket option
correctly and do proper bounds checks. This would catch bugs as fixed
in (r230938,)r230981.
ken [Mon, 5 Mar 2012 18:54:28 +0000 (18:54 +0000)]
MFC 232411:
Fix a problem that was causing the mpt(4) driver to attach to MegaRAID
cards that should be handled by the mfi(4) driver.
The root of the problem is that the mpt(4) driver was masking off the
bottom bit of the PCI device ID when deciding which cards to attach to.
It appears that a number of the mpt(4) Fibre Channel cards had a LAN
variant whose PCI device ID was just one bit off from the FC card's device
ID. The FC cards were even and the LAN cards were odd.
The problem was that this pattern wasn't carried over on the SAS and
parallel SCSI mpt(4) cards. Luckily the SAS and parallel SCSI PCI device
IDs were either even numbers, or they would get masked to a supported
adjacent PCI device ID, and everything worked well.
Now LSI is using some of the odd-numbered PCI device IDs between the 3Gb
SAS device IDs for their new MegaRAID cards. This is causing the mpt(4)
driver to attach to the RAID cards instead of the mfi(4) driver.
The solution is to stop masking off the bottom bit of the device ID, and
explicitly list the PCI device IDs of all supported cards.
This change should be a no-op for mpt(4) hardware. The only intended
functional change is that for the 929X, the is_fc variable gets set. It
wasn't being set previously, but needs to be because the 929X is a Fibre
Channel card.
kib [Mon, 5 Mar 2012 11:45:19 +0000 (11:45 +0000)]
MFC r232239:
Fix a race in top non-interactive mode. Use plain sleep(3) call instead
of arming timer and then pausing. If SIGALRM is delivered before pause(3)
is entered, top hangs.
delphij [Mon, 5 Mar 2012 05:18:58 +0000 (05:18 +0000)]
MFC r232202:
Drop setuid status while doing file operations to prevent potential
information leak. This changeset is intended to be a minimal one
to make backports easier.
MFC r231754:
Add additional check to EBR probe and create methods:
don't try probe and create EBR scheme when parent partition type
is not "ebr". This fixes error messages about corrupted EBR for
some partitions where is actually another partition scheme.
NOTE: if you have EBR on the partition with different than "ebr"
(0x05) type, then you will lost access to partitions until it will be
changed.
MFC r231928:
Add alias for the partition type 0x0f. Now "ebr" name is used for both
types 0x05 and 0x0f, but 0x05 is preferred and used when partition is
created with "gpart add -t ebr ...".
This should keep EBR partitions accessible after r231754 for those,
who have EBR on the partition with type 0x0f.
raj [Sun, 4 Mar 2012 17:53:40 +0000 (17:53 +0000)]
MFC r230865:
Adjust mvs(4) to handle interrupt cause reg depending on the actual number of
channels available
- current code treats bits 4:7 in 'SATAHC interrupt mask' and 'SATAHC
interrupt cause' as flags for SATA channels 2 and 3
- for embedded SATA controllers (SoC) these bits have been marked as reserved
in datasheets so far, but for some new and upcoming chips they are used for
purposes other than SATA
raj [Sun, 4 Mar 2012 17:00:46 +0000 (17:00 +0000)]
MFC r228504, r228530.
r228504:
Make *intr{cnt,names} on ARM reside in data section, similar to other arches.
sintrnames and sintrcnt are initialized with non-zero values, which were
discarded by the .bss directive, so consumers like "vmstat -i" were not
getting correct data.
remko [Sun, 4 Mar 2012 10:37:26 +0000 (10:37 +0000)]
Add an ifconfig carp option that enables users to set
the state of the carp cluster.
This is a direct commit to stable/9 because -HEAD's
code is very different. I discussed this with Gleb
and the reason for this is that since we do not
touch the kernel itself and are not adding very
weird or confusing things, we can commit this to the
stable branch directly.
The options 'master' and 'backup' are now available,
which enables the administrator to force a node into
the backup or master state on the cluster. Ofcourse
preempt has to be disabled otherwise the master node
will become master again.
One can do that with:
sysctl net.inet.carp.preempt=0
After that one can schedule maintenance on the node
normally running as the master and such.
delphij [Sat, 3 Mar 2012 02:35:45 +0000 (02:35 +0000)]
MFC r231888:
Put the signal trap output to standard error instead of standard output.
Without this change, pressing ^T could result in rc.d script putting
junk strings like:
Script <filename> running
in configuration files when redirecting standard output to these files.
nwhitehorn [Sat, 3 Mar 2012 02:19:33 +0000 (02:19 +0000)]
MFC r230123,230139:
Rework SLB trap handling so that double-faults into an SLB trap handler are
possible, and double faults within an SLB trap handler are not. The result
is that it possible to take an SLB fault at any time, on any address, for
any reason, at any point in the kernel.
This lets us do two important things. First, it removes the (soft) 16 GB RAM
ceiling on PPC64 as well as any architectural limitations on KVA space.
Second, it lets the kernel tolerate poorly designed hypervisors that
have a tendency to fail to restore the SLB properly after a hypervisor
context switch.
Now that we can tolerate LPAR context switches on the PS3 hypervisor, going
to hypervisor-idle on both threads will not hang the kernel.
Set read buffer size to multiple of sizeof(struct futx).
If the utmpx database gets updated while an application is reading it,
there is a chance the reading application processes partially
overwritten entries. To solve this, make sure we always read a multiple
of sizeof(struct futx) at a time.
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.
kib [Fri, 2 Mar 2012 11:47:34 +0000 (11:47 +0000)]
MFC r231868:
Fetch the aux vector for the static libc, and use the entries to
initialize the cache of the system information as it was done for the
dynamic libc. This removes several sysctls from the static binary
startup.
Use the aux vector to fill the single struct dl_phdr_info describing
the static binary itself, to implement dl_iterate_phdr(3) for the
static binaries.
kib [Fri, 2 Mar 2012 11:32:47 +0000 (11:32 +0000)]
MFC r231885:
Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.
emaste [Fri, 2 Mar 2012 00:21:07 +0000 (00:21 +0000)]
MFC r232267:
Workaround for PCIe 4GB boundary issue
Enforce a boundary of no more than 4GB - transfers crossing a 4GB
boundary can lead to data corruption due to PCIe limitations. This
change is a less-intrusive workaround that can be quickly merged back
to older branches; a cleaner implementation will arrive in HEAD later
but may require KPI changes.
dim [Thu, 1 Mar 2012 17:51:15 +0000 (17:51 +0000)]
MFC r231982:
When building with clang, disable -Wformat-security for
sys/dev/hpt27xx/osm_bsd.c, since it gets the following warnings:
sys/dev/hpt27xx/osm_bsd.c:1180:25: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
S_IRUSR | S_IWUSR, driver_name);
^~~~~~~~~~~
@/dev/hpt27xx/hpt27xx_config.h:46:21: note: expanded from:
#define driver_name hpt27xx_driver_name
^~~~~~~~~~~~~~~~~~~
Since 'hpt27xx_driver_name' is a constant string symbol (coming from the
proprietary hpt27xx_lib.o file), there is no security problem.
Because this driver is provided by the vendor, and applying changes
requires re-certification and other bureaucratic exercises, just disable
the warning for now.
gibbs [Wed, 29 Feb 2012 18:41:59 +0000 (18:41 +0000)]
MFC r231883
===========
Fix regression in the handling of blkback close events for
devices that are unplugged via QEMU.
sys/dev/xen/blkback/blkback.c:
Toolstack initiated closures change the frontend's state
to Closing. The backend must change to Closing as well,
even if we can't actually close yet, in order for the
frontend to notice and start the closing process.
davidxu [Wed, 29 Feb 2012 06:19:00 +0000 (06:19 +0000)]
MFC 230857:
If multiple threads call kevent() to get AIO events on same kqueue fd,
it is possible that a single AIO event will be reported to multiple
threads, it is not threading friendly, and the existing API can not
control this behavior.
Allocate a kevent flags field sigev_notify_kevent_flags for AIO event
notification in sigevent, and allow user to pass EV_CLEAR, EV_DISPATCH
or EV_ONESHOT to AIO kernel code, user can control whether the event
should be cleared once it is retrieved by a thread. This change should
be comptaible with existing application, because the field should have
already been zero-filled, and no additional action will be taken by
kernel.
PR: kern/156567
MFC 231006:
Add 32-bit compat code for AIO kevent flags introduced in revision 230857.
MFC 231724:
Add notes about sigev_notify_kevent_flags introduced in revision 230857
which enables thread-friendly polling on same fd for AIO events.
thompsa [Wed, 29 Feb 2012 00:52:56 +0000 (00:52 +0000)]
MFC r232008,232010,232080,232089
Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.
delphij [Tue, 28 Feb 2012 23:30:19 +0000 (23:30 +0000)]
MFC r228924:
In POSIX.1-2008:
P_tmpdir [OB XSI] Default directory prefix for tempnam().
This macro is used in a lot of places in legacy applications,
and is why we see a lot of programs written for e.g. Linux
store volatile temporary files in /var/tmp and not /tmp.
rmacklem [Tue, 28 Feb 2012 15:52:01 +0000 (15:52 +0000)]
MFC: r232050
hrs@ reported a panic to freebsd-stable@ under the subject line
"panic in 8.3-PRERELEASE" on Feb. 22, 2012. This panic was caused
by use of a mix of tsleep() and msleep() calls on the same event
in the new NFS server DRC code. It did "mtx_unlock(); tsleep();"
in two places, which kib@ noted introduced a slight risk that the
wakeup() would occur before the tsleep(), resulting in a 10sec
delay before waking up. This patch fixes the problem by replacing
"mtx_unlock(); tsleep();" with mtx_sleep(..PDROP..). It also
changes a nfsmsleep() call to mtx_sleep() so that the code uses
mtx_sleep() consistently within the file.
jkim [Mon, 27 Feb 2012 18:28:18 +0000 (18:28 +0000)]
MFC: r231843, r232061, r232063, r232065, r232069
- Set the initial mode for the adapter after executing VESA BIOS POST.
- Probe supported states for save/restore function.
- Defer to VGA methods if no state is supported.
sbruno [Mon, 27 Feb 2012 17:29:42 +0000 (17:29 +0000)]
MFC r231860
During work to port isci(4) to stable/7 I noted that the maxio portion of
struct ccb_pathinq from sys/cam/cam_ccb.h wasn't added to stable/7 at all
and didn't appear in stable/8 until svn R195534. Since __FreeBSD_version
did not get bumped until svn R195634, assume that maxio is valid at 800102
or higher.
brueffer [Sat, 25 Feb 2012 10:10:43 +0000 (10:10 +0000)]
MFC: r231871
Switch the license boilerplates to our standard one.
Advantages:
- Reduces the number of different license versions in the tree
- Eliminates a typo
- Removes some incorrect author attributions due to c/p
- Removes c/p error potential for future pmc manpages
marius [Sat, 25 Feb 2012 00:35:19 +0000 (00:35 +0000)]
MFC: r231913
- Probe BCM57780.
- In case the parent is bge(4), don't set the Jumbo frame settings unless
the MAC actually is Jumbo capable as otherwise the PHY might not have the
corresponding registers implemented. This is also in line with what the
Linux tg3 driver does.
PR: 165032
Submitted by: Alexander Milanov
Obtained from: OpenBSD
marius [Fri, 24 Feb 2012 00:47:14 +0000 (00:47 +0000)]
MFC: r231621
- As it turns out, MSI-X is broken for at least LSI SAS1068E when passed
through by VMware so blacklist their PCI-PCI bridge for MSI/MSI-X here.
Note that besides currently there not being a quirk type that disables
MSI-X only and there's no evidence that MSI doesn't work with the VMware
pass-through, it's really questionable whether MSI generally works in
that setup as VMware only mention three know working devices [1, p. 4].
Also not that this quirk entry currently doesn't affect the devices
emulated by VMware in any way as these don't claim support MSI/MSI-X to
begin with. [2]
While at it, make the PCI quirk table const and static.
- Remove some duplicated empty lines.
- Use DEVMETHOD_END.
jkim [Thu, 23 Feb 2012 22:34:44 +0000 (22:34 +0000)]
MFC: r231781
Some BIOSes are known for corrupting low 64KB between suspend and resume.
Mask off the first 16 pages unless we appear to be running in a VM. This
address may be overridden by 'hw.physmem.start' tunable from loader.
jkim [Thu, 23 Feb 2012 22:26:14 +0000 (22:26 +0000)]
MFC: r231161
- Give all clocks and timers on acpi0 the equal probing order.
- Increase probing order for ECDT table to match HID-based probing.
- Decrease probing order for HPET table to match HID-based probing.
- Decrease probing order for CPUs and system resources.
- Fix ACPI_DEV_BASE_ORDER to reflect the reality.
jkim [Thu, 23 Feb 2012 22:20:52 +0000 (22:20 +0000)]
MFC: r231797
Clean up RFLAG and CR3 register handling and nearby comments. For BSP, use
spinlock_enter()/spinlock_exit() to save/restore RFLAGS. We know interrupt
is disabled when returning from S3. For AP, we do not have to save/restore
it because IRET will do it for us any way. Do not save CR3 locally because
savectx() does it and BSP does not have to switch to kernel map for amd64.
Change contigmalloc(9) flag while I am in the neighborhood.
jkim [Thu, 23 Feb 2012 22:03:20 +0000 (22:03 +0000)]
MFC: r231791, r231840
Set up an event handler to turn off speaker if user requested it. Speaker
will stop beeping after all device drivers are resumed. Use proper API to
"acquire" and "release" PIC timer2 for consistency and correctness.
kmacy [Thu, 23 Feb 2012 19:20:36 +0000 (19:20 +0000)]
MFC r230623
exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create
kmacy [Thu, 23 Feb 2012 18:50:19 +0000 (18:50 +0000)]
MFC r230598
A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly
referenced within its timeout window. This change clears the LLE_VALID flag when an llentry
is removed from an interface's hash table and adds an extra check to the flowtable code
for the LLE_VALID flag in llentry to avoid retaining and using a stale reference.
kib [Tue, 21 Feb 2012 21:21:31 +0000 (21:21 +0000)]
MFC r230430:
Use getcontextx(3) internal API instead of getcontext(2) to provide
the signal handlers with the context information in the deferrred
case.
kib [Tue, 21 Feb 2012 21:18:59 +0000 (21:18 +0000)]
MFC r230429:
Add API for obtaining extended machine context states that cannot be
fit into existing mcontext_t.
On i386 and amd64 do return the extended FPU states using
getcontextx(3). For other architectures, getcontextx(3) returns the
same information as getcontext(2).
MFC r230864:
Make the sys/ucontext.h self-contained by changing the return type
of __getcontextx_size(3) from size_t to int.
kib [Tue, 21 Feb 2012 01:43:31 +0000 (01:43 +0000)]
MFC r231313 (by mckusick):
First attempt the uiomove() to the newly allocated (and dirty) buffer
and only zeros it if the uiomove() fails. The effect is to eliminate the
gratuitous zeroing of the buffer in the usual case where the uiomove()
successfully fills it.