uqs [Tue, 8 May 2012 08:19:07 +0000 (08:19 +0000)]
Sync traceroute(8) with head.
Merges r215937,216184 and r211062,215880,220968:
- Remove unused traceroute(8) contrib code from head
- make WARNS=3 clean
- fix an operator precedence bug for TCP tracerouting
- Remove unneeded struct timezone passed to gettimeofday().
- Remove clause 3 and 4 from TNF licenses.
- Check return code of setuid() in traceroute.
marius [Mon, 7 May 2012 07:04:41 +0000 (07:04 +0000)]
MFC: r225203 (partial)
Attempt to make break-to-debugger and alternative break-to-debugger more
accessible:
(1) Always compile in support for breaking into the debugger if options
KDB is present in the kernel.
(2) Disable both by default, but allow them to be enabled via tunables
and sysctls debug.kdb.break_to_debugger and
debug.kdb.alt_break_to_debugger.
(3) options BREAK_TO_DEBUGGER and options ALT_BREAK_TO_DEBUGGER continue
to behave as before -- only now instead of compiling in
break-to-debugger support, they change the default values of the
above sysctls to enable those features by default. Current kernel
configurations should, therefore, continue to behave as expected.
(4) Migrate alternative break-to-debugger state machine logic out of
individual device drivers into centralised KDB code. This has a
number of upsides, but also one downside: it's now tricky to release
sio spin locks when entering the debugger, so we don't. However,
similar logic does not exist in other device drivers, including uart.
(5) dcons requires some special handling; unlike other console types, it
allows overriding KDB's own debugger selection, so we need a new
interface to KDB to allow that to work.
GENERIC kernels will now support break-to-debugger as long as appropriate
boot/run-time options are set, which should improve the debuggability of
kernels significantly.
MFC: r225214 (partial)
Follow up to r225203 refining break-to-debugger run-time configuration
improvements:
(1) Implement new model in previously missed at91 UART driver
(2) Move BREAK_TO_DEBUGGER and ALT_BREAK_TO_DEBUGGER from opt_comconsole.h
to opt_kdb.h (spotted by np)
(3) Garbage collect now-unused opt_comconsole.h
Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file. With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.
This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore. This means that the multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.
melifaro [Sat, 5 May 2012 11:34:27 +0000 (11:34 +0000)]
MFC r234572
Do not require radix write lock to be held while dumping route table
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.
glebius [Sat, 5 May 2012 10:05:13 +0000 (10:05 +0000)]
Merge 234342 from head:
When we receive an ICMP unreach need fragmentation datagram, we take
proposed MTU value from it and update the TCP host cache. Then
tcp_mss_update() is called on the corresponding tcpcb. It finds the
just allocated entry in the TCP host cache and updates MSS on the
tcpcb. And then we do a fast retransmit of what we have in the tcp
send buffer.
This sequence gets broken if the TCP host cache is exausted. In this
case allocation fails, and later called tcp_mss_update() finds nothing
in cache. The fast retransmit is done with not reduced MSS and is
immidiately replied by remote host with new ICMP datagrams and the
cycle repeats. This ping-pong can go up to wirespeed.
To fix this:
- tcp_mss_update() gets new parameter - mtuoffer, that is like
offer, but needs to have min_protoh subtracted.
- tcp_mtudisc() as notification method renamed to tcp_mtudisc_notify().
- tcp_mtudisc() now accepts not a useless error argument, but proposed
MTU value, that is passed to tcp_mss_update() as mtuoffer.
Reported by: az
Reported by: Andrey Zonov <andrey zonov.org>
Reviewed by: andre (previous version of patch)
hselasky [Fri, 4 May 2012 15:55:31 +0000 (15:55 +0000)]
MFC r234803 and r234961:
Add support for Multi-TT mode of modern USB HUBs.
This will give you more bandwidth for isochronous
FULL speed applications connected through a
High Speed HUB.
davidxu [Thu, 3 May 2012 03:05:18 +0000 (03:05 +0000)]
Merge 233103, 233912 from head:
233103:
Some software think a mutex can be destroyed after it owned it, for
example, it uses a serialization point like following:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
pthread_mutex_destroy(&muetx);
They think a previous lock holder should have already left the mutex and
is no longer referencing it, so they destroy it. To be maximum compatible
with such code, we use IA64 version to unlock the mutex in kernel, remove
the two steps unlocking code.
233912:
umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily. On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c
Special code for stable/8:
And add code to detect if the UMTX_OP_MUTEX_WAKE2 is available.
mav [Wed, 2 May 2012 07:22:58 +0000 (07:22 +0000)]
MFC r234415:
Some improvements to GEOM MULTIPATH:
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
- Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
- Hide duplicate messages about device status change.
- Remove periodic thread wake up with 10Hz rate.
mav [Wed, 2 May 2012 07:05:20 +0000 (07:05 +0000)]
MFC r222643:
When possible, join ranges of subsequest BIO_DELETE requests to handle more
(up to 2048 instead of 256 or even 64) of them with single TRIM request.
OCZ Vertex2/Vertex3 SSDs can handle no more then 64 ranges per TRIM request.
Due to lack of BIO_DELETE clustering now, it means that we could delete no
more then 2MB per request (on FS with 32K block) with limited request rate.
This change increases delete rate on Vertex2 from 250MB/s to 950MB/s.
MFC r222643:
Increase maximum supported number of ranges per TRIM command from 256 to 512
to use full potential of Intel X25-M SSDs. On synthetic test with 32K ranges
it gives about 20% speedup, which probably costs more then 2K of RAM.
mav [Wed, 2 May 2012 06:52:00 +0000 (06:52 +0000)]
Merge ATA_CAM compatibility shims. While 8-STABLE doesn't have ATA_CAM
enabled by default, this should make migration easier for users enabling
it manually.
r221071:
Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
r221384:
Do not report legacy unit numbers (do not create legacy aliases) for disks
on port multiplier ports above first two. They don't fit into ATA_STATIC_ID
scheme and so can't be mapped properly. No need to pollute dev.
delphij [Wed, 2 May 2012 00:31:09 +0000 (00:31 +0000)]
MFC r233770:
Eliminate two cases of unwanted strncpy(). The name is not required
by the current code, and the results would get overwritten anyway
by subsequent memset().
kib [Tue, 1 May 2012 11:45:16 +0000 (11:45 +0000)]
MFC r234657:
Take the spinlock around clearing of the fp->_flags in fclose(3), which
indicates the avaliability of FILE, to prevent possible reordering of
the writes as seen by other CPUs.
MFC r233507:
Use program exit status as pam_exec return code (optional)
pam_exec(8) now accepts a new option "return_prog_exit_status". When
set, the program exit status is used as the pam_exec return code. It
allows the program to tell why the step failed (eg. user unknown).
However, if it exits with a code not allowed by the calling PAM service
module function (see $PAM_SM_FUNC below), a warning is logged and
PAM_SERVICE_ERR is returned.
The following changes are related to this new feature but they apply no
matter if the "return_prog_exit_status" option is set or not.
The environment passed to the program is extended:
o $PAM_SM_FUNC contains the name of the PAM service module function
(eg. pam_sm_authenticate).
o All valid PAM return codes' numerical values are available
through variables named after the return code name. For instance,
$PAM_SUCCESS, $PAM_USER_UNKNOWN or $PAM_PERM_DENIED.
pam_exec return code better reflects what went on:
o If the program exits with !0, the return code is now
PAM_PERM_DENIED, not PAM_SYSTEM_ERR.
o If the program fails because of a signal (WIFSIGNALED) or doesn't
terminate normally (!WIFEXITED), the return code is now
PAM_SERVICE_ERR, not PAM_SYSTEM_ERR.
o If a syscall in pam_exec fails, the return code remains
PAM_SYSTEM_ERR.
waitpid(2) is called in a loop. If it returns because of EINTR, do it
again. Before, it would return PAM_SYSTEM_ERR without waiting for the
child to exit.
Several log messages now include the PAM service module function name.
MFC r234459:
Fix a bug where we copy out more data from a mbuf chain that are
ctually in it. This happens when SCTP receives an unknown chunk, which
requires the sending of an ERROR chunk, and there is no final padding but
the chunk is not 4-byte aligned.
Reported by yueting via rwatson@
MFC r234556:
When MAP_STACK mapping is created, the map entry is created only to
cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent
call to vm_map_wire() to wire the whole stack region fails due to
VM_MAP_WIRE_NOHOLES flag.
Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack.
Export the udp_cksum sysctl for upcoming SCTP work. Rather than always,
SCTP will only do IPv4 UDP checksum calculation as defined by the host
policy. When tunneling SCTP always calculates the inner checksum already
so not doing the outer UDP can save cycles.
MFC r234038
If a page belonging a reservation is cached, then mark the reservation so
that it will be freed to the cache pool rather than the default pool.
Otherwise, the cached pages within the reservation may be recycled sooner
than necessary.
MFC r234692:
Read backup GPT header from the last LBA only when primary GPT header and
table aren't valid. If they are ok, use hdr_lba_alt value to read backup
header. This will make gptboot happy when GPT used atop of some GEOM
provider, e.g. GEOM_MIRROR.
Free mbuf in case when protocol in unknown in ng_ipfw_rcvdata().
This change fixes (theoretically) possible mbuf leak introduced in
r225586. Reorder code a bit and change return codes to be more specific
- Add ipfw eXtended tables permitting radix to be used for any kind of keys.
- Add support for IPv6 and interface extended tables
- Make number of tables to be changed in runtime in range 0..65534.
- Use IP_FW3 opcode for all new extended table cmds
No ABI changes are introduced. Old userland will see valid tables for
IPv4 tables and no entries otherwise. Flush works for any table.
IP_FW3 socket option is used to encapsulate all new opcodes:
/* IP_FW3 header/opcodes */
typedef struct _ip_fw3_opheader {
uint16_t opcode; /* Operation opcode */
uint16_t reserved[3]; /* Align to 64-bit boundary */
} ip_fw3_opheader;
New opcodes added:
IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST
ipfw(8) table argument parsing behavior is changed:
'ipfw table 999 add some-unqualified-host' now assumes
'some-unqualified-host' to be interface name instead of hostname.
New tunable:
net.inet.ip.fw.tables_max controls number of table supported by ipfw in given
VNET instance. 128 is still the default value.
Sysctl change:
net.inet.ip.fw.tables_max is now read-write.
New syntax:
ipfw add skipto tablearg ip from any to any via table(42) in
ipfw add skipto tablearg ip from any to any via table(4242) out
This is a bit hackish, special interface name '\1' is used to signal interface
table number is passed in p.glob field.
MFC r234121:
Back out r228476.
r228476 fixed superfluous link UP/DOWN messages but broke IPMI
access during boot. It's not clear why r228476 breaks IPMI and
should be revisited.
Reported by: Paul Guyot <paulguyot <> ieee dot org >
MFC r233387:
Use suspend/resume methods provided by net80211. This ensures that the
appropriate state handling takes place, not doing so results in the
device doing nothing until manual intervention.
Improve the way to calculate available pages in tmpfs:
- Don't deduct wired pages from total usable counts because it does not
make any sense. To make things worse, on systems where swap size is
smaller than physical memory and use a lot of wired pages (e.g. ZFS),
tmpfs can suddenly have free space of 0 because of this;
- Count cached pages as available; [1]
- Don't count inactive pages as available, technically we could but that
might be too aggressive; [1]
Don astbestos garment and remove the warning about TMPFS being experimental
-- highly experimental even. So far the closest to a bug in TMPFS that people
have gotten to relates to how ZFS can take away from the memory that TMPFS
needs. One can argue that such is not a bug in TMPFS. Irrespective, even if
there is a bug here and there in TMPFS, it's not in our own advantage to
scare people away from using TMPFS. I for one have been using it, even with
ZFS, very successfully.
Change SIGUSR1 to SIGTHR to properly wake up a process that is being
traced. The use of SIGUSR1 caused traced processes (those attached to
with dtrace -p) to exit when dtrace exited.
marius [Fri, 20 Apr 2012 14:45:57 +0000 (14:45 +0000)]
MFC: r222475
Fix read_ivar implementation for MMC and SD.
1. Both mmc_read_ivar() and sdhci_read_ivar() use the expression
'*(int *)result = val' to assign to result which is uintptr_t *.
This does not work on big-endian 64 bit systems.
2. The media_size ivar is declared as 'off_t' which does not fit
into uintptr_t in 32bit systems, change this to long.
Submitted by: kanthms at netlogicmicro com (initial version)
Export vinactive() from kern/vfs_subr.c (e.g., make it no longer
static and declare its prototype in sys/vnode.h) so that it can be
called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c)
instead of the body of vinactive() being cut and pasted into
process_deferred_inactive().
MFC r234172:
Add thread-private flag to indicate that error value is already placed
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.
Use the flag to fix sigsuspend(2) error return ktrace records.
MFC r233176:
Add new GEOM_PART_LDM module that implements the Logical Disk Manager
scheme. The LDM is a logical volume manager for MS Windows NT and it
is also known as dynamic volumes. It supports about 2000 partitions
and also provides the capability for software RAID implementations.
This version implements only partitioning scheme capability and based
on the linux-ntfs project documentation and several publications across
the Web. NOTE: JBOD, RAID0 and RAID5 volumes aren't supported.
An access to the LDM metadata is read-only. When LDM is on the disk
partitioned with MBR we can also destroy metadata. For the GPT
partitioned disks destroy action is not supported.
MFC r233177:
Connect geom_part_ldm module to the build.
MFC r233178:
Connect geom_part_ldm to the kernel build.
MFC r233181:
Add CTLFLAG_TUN to sysctls.
MFC r233651:
Do proper cleanup for the GPT case when an error occurs.
MFC r233652:
VMDB offset should be greater than logical volume size only for MBR.
Some software think a mutex can be destroyed after it owned it, for
example, it uses a serialization point like following:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
pthread_mutex_destroy(&muetx);
They think a previous lock holder should have already left the mutex and
is no longer referencing it, so they destroy it. To be maximum compatible
with such code, we use IA64 version to unlock the mutex in kernel, remove
the two steps unlocking code.