glebius [Sat, 5 May 2012 07:55:50 +0000 (07:55 +0000)]
Merge 234342 from head:
When we receive an ICMP unreach need fragmentation datagram, we take
proposed MTU value from it and update the TCP host cache. Then
tcp_mss_update() is called on the corresponding tcpcb. It finds the
just allocated entry in the TCP host cache and updates MSS on the
tcpcb. And then we do a fast retransmit of what we have in the tcp
send buffer.
This sequence gets broken if the TCP host cache is exausted. In this
case allocation fails, and later called tcp_mss_update() finds nothing
in cache. The fast retransmit is done with not reduced MSS and is
immidiately replied by remote host with new ICMP datagrams and the
cycle repeats. This ping-pong can go up to wirespeed.
To fix this:
- tcp_mss_update() gets new parameter - mtuoffer, that is like
offer, but needs to have min_protoh subtracted.
- tcp_mtudisc() as notification method renamed to tcp_mtudisc_notify().
- tcp_mtudisc() now accepts not a useless error argument, but proposed
MTU value, that is passed to tcp_mss_update() as mtuoffer.
Reported by: az
Reported by: Andrey Zonov <andrey zonov.org>
Reviewed by: andre (previous version of patch)
jilles [Fri, 4 May 2012 20:45:53 +0000 (20:45 +0000)]
MFC r234057: sem_open: Make sure to fail an O_CREAT|O_EXCL open, even if
that semaphore is already open in this process.
If the named semaphore is already open, sem_open() only increments a
reference count and did not take the flags into account (which otherwise
happens by passing them to open()). Add an extra check for O_CREAT|O_EXCL.
hselasky [Fri, 4 May 2012 15:38:47 +0000 (15:38 +0000)]
MFC r233662, r233677 and r233678:
Writing zero to BAR actually does not disable it and
it is even harmful as hselasky found out. Historically,
this code was originated from (OLDCARD) CardBus driver and later leaked into
PCI driver when CardBus was newbus'ified and refactored with PCI driver.
However, it is not really necessary even for CardBus.
hselasky [Fri, 4 May 2012 15:10:49 +0000 (15:10 +0000)]
MFC r234803 and r234961:
Add support for Multi-TT mode of modern USB HUBs.
This will give you more bandwidth for isochronous
FULL speed applications connected through a
High Speed HUB.
- Use more natural ip->i_flags instead of vap->va_flags in the final
flags check.
- Add a comment for the immutable/append check done after handling of
the flags.
- Style improvements.
No functional change intended.
MFC r234421:
The part about exec atime no longer applies in the comment.
mav [Wed, 2 May 2012 07:17:53 +0000 (07:17 +0000)]
MFC r234415:
Some improvements to GEOM MULTIPATH:
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
- Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
- Hide duplicate messages about device status change.
- Remove periodic thread wake up with 10Hz rate.
delphij [Wed, 2 May 2012 00:30:30 +0000 (00:30 +0000)]
MFC r233770:
Eliminate two cases of unwanted strncpy(). The name is not required
by the current code, and the results would get overwritten anyway
by subsequent memset().
kib [Tue, 1 May 2012 10:49:20 +0000 (10:49 +0000)]
MFC r234657:
Take the spinlock around clearing of the fp->_flags in fclose(3), which
indicates the avaliability of FILE, to prevent possible reordering of
the writes as seen by other CPUs.
r233998:
Add reserved memory limit sysctl to tmpfs. Cleanup availble and used
memory functions. Check if free pages available before allocating new
node.
r233999 (partial):
Add vfs_getopt_size. Support human readable file system options in tmpfs.
Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.
NOTE: To preserve KBI add tmpfs_getopt_size function instead of global
vfs_getopt_size.
r234000:
tmpfs supports only INT_MAX nodes due to limitations of unit number
allocator. Replace UINT32_MAX checks with INT_MAX. Keeping more than 2^31
nodes in memory is not likely to become possible in foreseeable feature
and would require new unit number allocator.
r234325:
Provide better description for vfs.tmpfs.memory_reserved sysctl.
MFC r233507:
Use program exit status as pam_exec return code (optional)
pam_exec(8) now accepts a new option "return_prog_exit_status". When
set, the program exit status is used as the pam_exec return code. It
allows the program to tell why the step failed (eg. user unknown).
However, if it exits with a code not allowed by the calling PAM service
module function (see $PAM_SM_FUNC below), a warning is logged and
PAM_SERVICE_ERR is returned.
The following changes are related to this new feature but they apply no
matter if the "return_prog_exit_status" option is set or not.
The environment passed to the program is extended:
o $PAM_SM_FUNC contains the name of the PAM service module function
(eg. pam_sm_authenticate).
o All valid PAM return codes' numerical values are available
through variables named after the return code name. For instance,
$PAM_SUCCESS, $PAM_USER_UNKNOWN or $PAM_PERM_DENIED.
pam_exec return code better reflects what went on:
o If the program exits with !0, the return code is now
PAM_PERM_DENIED, not PAM_SYSTEM_ERR.
o If the program fails because of a signal (WIFSIGNALED) or doesn't
terminate normally (!WIFEXITED), the return code is now
PAM_SERVICE_ERR, not PAM_SYSTEM_ERR.
o If a syscall in pam_exec fails, the return code remains
PAM_SYSTEM_ERR.
waitpid(2) is called in a loop. If it returns because of EINTR, do it
again. Before, it would return PAM_SYSTEM_ERR without waiting for the
child to exit.
Several log messages now include the PAM service module function name.
Fix a bug where we copy out more data from a mbuf chain that are
actually in it. This happens when SCTP receives an unknown chunk, which
requires the sending of an ERROR chunk, and there is no final padding but
the chunk is not 4-byte aligned.
Reported by yueting via rwatson@
Export the udp_cksum sysctl for upcoming SCTP work. Rather than always,
SCTP will only do IPv4 UDP checksum calculation as defined by the host
policy. When tunneling SCTP always calculates the inner checksum already
so not doing the outer UDP can save cycles.
MFC r234038
If a page belonging a reservation is cached, then mark the reservation so
that it will be freed to the cache pool rather than the default pool.
Otherwise, the cached pages within the reservation may be recycled sooner
than necessary.
MFC r234556:
When MAP_STACK mapping is created, the map entry is created only to
cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent
call to vm_map_wire() to wire the whole stack region fails due to
VM_MAP_WIRE_NOHOLES flag.
Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack.
MFC r233097
With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists.
dim [Sat, 28 Apr 2012 09:21:43 +0000 (09:21 +0000)]
MFC r234540:
Fix the following clang warning in dpt(4):
sys/dev/dpt/dpt_scsi.c:612:18: error: implicit truncation from 'int' to bitfield changes value from -2 to 2 [-Werror,-Wconstant-conversion]
dpt->cache_type = DPT_CACHE_WRITEBACK;
^ ~~~~~~~~~~~~~~~~~~~
by defining DPT_CACHE_WRITEBACK as 2, since dpt_softc::cache_type is an
unsigned bitfield. No binary change.
dim [Sat, 28 Apr 2012 09:18:20 +0000 (09:18 +0000)]
MFC r228572:
Fix format string Z --> z, since the former is a deprecated and (in FreeBSD)
unsupported form of the latter. This change has been reviewed and accepted
in the -hackers list.
Submitted by: Alexander Best
Reviewed by: David Schulz
dim [Fri, 27 Apr 2012 18:21:45 +0000 (18:21 +0000)]
MFC r234507:
Fix the following compilation warnings in sys/contrib/rdma/rdma_cma.c:
sys/contrib/rdma/rdma_cma.c:1259:8: error: case value not in enumerated type 'enum iw_cm_event_status' [-Werror,-Wswitch]
case ECONNRESET:
^
@/sys/errno.h:118:20: note: expanded from macro 'ECONNRESET'
#define ECONNRESET 54 /* Connection reset by peer */
^
sys/contrib/rdma/rdma_cma.c:1263:8: error: case value not in enumerated type 'enum iw_cm_event_status' [-Werror,-Wswitch]
case ETIMEDOUT:
^
@/sys/errno.h:124:19: note: expanded from macro 'ETIMEDOUT'
#define ETIMEDOUT 60 /* Operation timed out */
^
sys/contrib/rdma/rdma_cma.c:1260:8: error: case value not in enumerated type 'enum iw_cm_event_status' [-Werror,-Wswitch]
case ECONNREFUSED:
^
@/sys/errno.h:125:22: note: expanded from macro 'ECONNREFUSED'
#define ECONNREFUSED 61 /* Connection refused */
^
This is because the switch uses iw_cm_event::status, which is an enum
iw_cm_event_status, while ECONNRESET, ETIMEDOUT and ECONNREFUSED are
just plain defines from errno.h.
It looks like there is only one use of any of the enumeration values of
iw_cm_event_status, in:
sys/contrib/rdma/rdma_iwcm.c: if (iw_event->status == IW_CM_EVENT_STATUS_ACCEPTED) {
So messing around with the enum definitions to fix the warning seems too
disruptive; the simplest fix is to cast the argument of the switch to
int.
dim [Fri, 27 Apr 2012 18:08:15 +0000 (18:08 +0000)]
MFC r234506:
Fix the following compilation warnings in nxge(4):
sys/dev/nxge/if_nxge.c:1276:11: error: case value not in enumerated type 'xge_hal_event_e' (aka 'enum xge_hal_event_e') [-Werror,-Wswitch]
case XGE_LL_EVENT_TRY_XMIT_AGAIN:
^
sys/dev/nxge/if_nxge.c:1289:11: error: case value not in enumerated type 'xge_hal_event_e' (aka 'enum xge_hal_event_e') [-Werror,-Wswitch]
case XGE_LL_EVENT_DEVICE_RESETTING:
^
This is because the switch uses xge_queue_item_t::event_type, which is
an enum xge_hal_event_e, while the XGE_LL_EVENT_xx values are of the
enum xge_event_e.
Since messing around with the enum definitions is too disruptive, the
simplest fix is to cast the argument of the switch to int.
dim [Fri, 27 Apr 2012 18:05:24 +0000 (18:05 +0000)]
MFC r234503:
Replace homegrown list implementation in sys/dev/asr/asr.c with
STAILQ(). While here, fix another clang warning about a switch which
tests an enum type for a regular integer value.
dim [Fri, 27 Apr 2012 06:49:35 +0000 (06:49 +0000)]
MFC r234502:
After r217375, some startup objects under lib/csu are built in a special
way: first they are compiled to assembly, then some sed'ing is done on
the assembly, and lastly the assembly is compiled to an object file.
This last step is done using ${CC}, and not ${AS}, because when the
compiler is clang, it outputs directives that are too advanced for our
old gas. So we use clang's integrated assembler instead. (When the
compiler is gcc, it just calls gas, and nothing is different, except one
extra fork.)
However, in the .s to .o rules in lib/csu/$ARCH/Makefile, I still passed
CFLAGS to the compiler, instead of ACFLAGS, which are specifically for
compiling .s files.
In case you are using '-g' for debug info anywhere in your CFLAGS, it
causes the .s files to already contain debug information in the assembly
itself. In the next step, the .s files are also compiled using '-g',
and if the compiler is clang, it complains: "error: input can't have
.file dwarf directives when -g is used to generate dwarf debug info for
assembly code".
Fix this by using ${ACFLAGS} for compiling the .s files instead.
- Do not clobber softc when psm(4) is reintialized.
- Make INITAFTERSUSPEND flag independent of HOOKRESUME flag.
- Automatically set INITAFTERSUSPEND flag when ALPS GlidePoint is detected.
- Always probe Synaptics Touchpad. Allow MOUSE_SYN_GETHWINFO ioctl and
automatically set INITAFTERSUSPEND flag when a supported device is detected,
regardless of "hw.psm.synaptics_support" tunable setting.
- Update psm(4) to reflect the above changes.
- Remove long-time defunct SYNCHACK flag while I am in the neighborhood.
MFC r234615:
Fix copy-and-paste error in r230400 that would cause ppc64 executables
built with -fvisibility=hidden to fail to link with a message about
hidden symbol main being referenced from a DSO.
MFC r234579:
Avoid a lock order reversal in pmap_extract_and_hold() from relocking
the page. This PMAP requires an additional lock besides the PMAP lock
in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release.
MFC r234542:
Organize some members of ucontext_t in the same order they are in the
trap frame. These are usually not used, and so this changes very little.
MFC r234692:
Read backup GPT header from the last LBA only when primary GPT header and
table aren't valid. If they are ok, use hdr_lba_alt value to read backup
header. This will make gptboot happy when GPT used atop of some GEOM
provider, e.g. GEOM_MIRROR.
Merge r233167 from head:
Rotate auth.log and messages at the beginning of a year. Otherwise,
daily security checks 800.loginfail and 900.tcpwrap may produce
false positive alerts.
Merge r233257, r233258 from head:
Don't run through time checks when entry is definitely oversized. This
leads to newsyslog rotating on (size OR time) if both are specified.
Fix a sentence in a paragraph that describes time and interval based
trimming. This sentence vaguely can be interpreted as if it was speaking
about time and size interaction, while it wasn't about it.
Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()
operations for setting and accessing vnode's v_socket field.
The operations are necessary to implement proper unix socket handling
on layered file systems like nullfs(5).
This change fixes the long standing issue with nullfs(5) being in that
unix sockets did not work between lower and upper layers: if we bound
to a socket on the lower layer we could connect only to the lower
path; if we bound to the upper layer we could connect only to the
upper path. The new behavior is one can connect to both the lower and
the upper paths regardless what layer path one binds to.
- Add ipfw eXtended tables permitting radix to be used for any kind of keys.
- Add support for IPv6 and interface extended tables
- Make number of tables to be changed in runtime in range 0..65534.
- Use IP_FW3 opcode for all new extended table cmds
No ABI changes are introduced. Old userland will see valid tables for
IPv4 tables and no entries otherwise. Flush works for any table.
IP_FW3 socket option is used to encapsulate all new opcodes:
/* IP_FW3 header/opcodes */
typedef struct _ip_fw3_opheader {
uint16_t opcode; /* Operation opcode */
uint16_t reserved[3]; /* Align to 64-bit boundary */
} ip_fw3_opheader;
New opcodes added:
IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST
ipfw(8) table argument parsing behavior is changed:
'ipfw table 999 add host' now assumes 'host' to be interface name instead of
hostname.
New tunable:
net.inet.ip.fw.tables_max controls number of table supported by ipfw in given
VNET instance. 128 is still the default value.
Sysctl change:
net.inet.ip.fw.tables_max is now read-write.
New syntax:
ipfw add skipto tablearg ip from any to any via table(42) in
ipfw add skipto tablearg ip from any to any via table(4242) out
This is a bit hackish, special interface name '\1' is used to signal interface
table number is passed in p.glob field.
MFC r234121:
Back out r228476.
r228476 fixed superfluous link UP/DOWN messages but broke IPMI
access during boot. It's not clear why r228476 breaks IPMI and
should be revisited.
Reported by: Paul Guyot <paulguyot <> ieee dot org >
MFC r233387:
Use suspend/resume methods provided by net80211. This ensures that the
appropriate state handling takes place, not doing so results in the
device doing nothing until manual intervention.
Improve device tree blob (DTB) handling in loader(8).
Enable using the statically embedded blob from the kernel, if present. The
KLD loaded DTB takes precedence, but they are both recognized and handled in
the same way.
Improve FDT handling in loader(8) and make it more robust.
o Fix buffer overflows when using a long property body in node paths.
o Fix loop end condition when iterating through the symbol table.
o Better error handling during node modification, better problem reporting.
o Eliminate build time warnings.
MFC r234115:
Do not restore the register holding the TLS pointer when doing various
usermode context switches (long jumps and ucontext operations). If these
are used across threads, multiple threads can end up with the same TLS base.
Madness will then result.
This makes behavior on PPC match that on x86 systems and on Linux.
Don astbestos garment and remove the warning about TMPFS being experimental
-- highly experimental even. So far the closest to a bug in TMPFS that people
have gotten to relates to how ZFS can take away from the memory that TMPFS
needs. One can argue that such is not a bug in TMPFS. Irrespective, even if
there is a bug here and there in TMPFS, it's not in our own advantage to
scare people away from using TMPFS. I for one have been using it, even with
ZFS, very successfully.