tuexen [Fri, 10 Feb 2012 20:27:58 +0000 (20:27 +0000)]
MFC r221249:
Improve compilation of SCTP code without INET support.
Some bugs where fixed while doing this:
* ASCONF-ACK messages might use wrong port number when using
IPv6.
* Checking for additional addresses takes the correct address
into account and also does not do more comparisons than
necessary.
This patch is based on one received from bz@ who was
sponsored by The FreeBSD Foundation and iXsystems.
tuexen [Fri, 10 Feb 2012 20:22:45 +0000 (20:22 +0000)]
MFC r219397:
Tunes and fixes the new DC-CC to seem to hit the
right mix. Still may need some tweaks but it
appears to almost not give away too much to an
RFC2581 flow, but can really minimize the amount of
buffers used in the net.
From rrs@.
tuexen [Fri, 10 Feb 2012 20:19:38 +0000 (20:19 +0000)]
MFC r219120:
Adds a new Congestion Control that helps reduce
the RTT that a flow will build up in buffers in
transit. It is a slight modification to RFC2581
but is more friendly i.e. less aggressive.
From rrs@.
tuexen [Fri, 10 Feb 2012 20:17:28 +0000 (20:17 +0000)]
MFC r219057:
Improvements to CC modules:
1) Add four new points that allow you to get more information
to cc algo's
2) Fix the case where user changes module on a existing TCB, in
such a case, the initialization module needs to be called on all nets.
3) Move htcp_cc structure to a union that other modules can use.
4) Add 5th point for get/set socket options for cc_module specific options
From rrs@.
tuexen [Fri, 10 Feb 2012 20:09:23 +0000 (20:09 +0000)]
MFC r219014:
* Fix several bugs where the scaled versions of srtt and rttvar
where used incorrectly.
* Use appropriate variable names for RTO instead of RTT.
tuexen [Fri, 10 Feb 2012 19:57:58 +0000 (19:57 +0000)]
MFC r218641:
Fix a bug reported by Jonathan Leighton in his web-sctp testing
at the Univ-of-Del. Basically when a 1-to-1 socket did a
socket/bind/send(data)/close. If the timing was right
we would dereference a socket that is NULL.
From rrs@.
tuexen [Fri, 10 Feb 2012 19:52:18 +0000 (19:52 +0000)]
MFC r218400:
Fix bugs related to M_FLOWID:
* Store the flowid when receiving an SCTP/IPv6 packet.
* Store the flowid when receiving an SCTP packet with wrong CRC.
* Initilize flowid correctly.
* Put test code under INVARIANTS.
tuexen [Fri, 10 Feb 2012 19:40:19 +0000 (19:40 +0000)]
MFC r218371:
1) Use same scheme Michael and I discussed for a selected for a flowid
2) If flowid is not set, arrange so it is stored.
3) If flowid is set by lower layer, use it.
From rrs@.
tuexen [Fri, 10 Feb 2012 19:15:05 +0000 (19:15 +0000)]
MFC r218232:
1) Move per John Baldwin to mp_maxid
2) Some signed/unsigned errors found by Mac OS compiler (from Michael)
3) a couple of copyright updates on the effected files.
From rrs@.
tuexen [Fri, 10 Feb 2012 19:12:23 +0000 (19:12 +0000)]
MFC r218219:
Fix the per CPU stats so that:
1) They don't use the giant "MAX_CPU" define and instead
are allocated dynamically based on mp_ncpus
2) Will zero with the netstat -z -s -p sctp
3) Will be properly handled by both the sctp_init and finish
(the multi-net stuff was incorrectly bzero'ing in sctp_init
the wrong size.. the bzero is now moved to the right places).
And of course the free is put in at the very end.
From rrs@.
tuexen [Fri, 10 Feb 2012 19:10:09 +0000 (19:10 +0000)]
MFC r218211:
Adds an experimental option to create a pool of
threads. These serve as input threads and are queued
packets based on the V-tag number. This is similar to
what a modern card can do with queue's for TCP... but
alas modern cards know nothing about SCTP.
From rrs@.
tuexen [Fri, 10 Feb 2012 19:07:32 +0000 (19:07 +0000)]
MFC r218186:
1) Allow a chunk to track the cwnd it was at when sent.
2) Add separate max-bursts for retransmit and hb. These
are set to sysctlable values but not settable via the
socket api. This makes sure we don't blast out HB's or
fast-retransmits.
3) Determine on the first data transmission on a net if
its local-lan (by being under or over a RTT). This
can later be used to think about different algorithms
based on locallan vs big-i (experimental)
4) The cwnd should NOT be allowed to grow when an ECNEcho
is seen (TCP has this same bug). We fix this in SCTP
so an ECNe being seen prevents an advance of cwnd.
5) CWR's should not be sent multiple times to the
same network, instead just updating the TSN being
transmitted if needed.
From rrs@.
tuexen [Fri, 10 Feb 2012 19:04:26 +0000 (19:04 +0000)]
MFC r218129:
More ECN fixes:
1) We now remove ECN-Nonce since it will no longer continue as a I-D
2) Eliminate last_tsn_echo, this tied us to an assoc not the net
and thus we were not doing m-homing on the ECN-Echo senders side right.
3) Increment the count going out even if the TSN in lower in the pending
ECN-Echo, this way the receiver knows exactly how many packets were
marked even with network re-ordering
4) Fix so we DO NOT stop doing delayed sack if a ECN Echo is in queue
tuexen [Fri, 10 Feb 2012 19:02:07 +0000 (19:02 +0000)]
MFC r218072:
Fixes to ECN in SCTP.
1) ECN was on an association basis, this is incorrect and
will not work with CMT or for that matter if the user
is sending to multiple addresses. This commit makes
ECN on a per path basis.
2) Adopt the new format for the ECN internet draft. This also
maintains compatability with old format chunks as well.
3) Keep track of the real time of a RTT down to micro seconds.
For some future conditional features (for like a data center
this is good information to have).
From rrs@.
tuexen [Fri, 10 Feb 2012 18:58:36 +0000 (18:58 +0000)]
MFC r218039:
Keep track of the real last RTT on each net.
This will be used for Data Center congestion
control, we won't want to engage it in the
ECN code unless we KNOW that the RTT is less
than 500us.
From rrs@
tuexen [Fri, 10 Feb 2012 18:55:50 +0000 (18:55 +0000)]
MFC r218037:
Fix a bug in the way ECN-Echo chunk
sends were being accounted for. The
counting was such that we counted only
when we queued a chunk, not when we sent it.
Now keep an additional counter for queuing and
one for sending.
tuexen [Fri, 10 Feb 2012 18:49:28 +0000 (18:49 +0000)]
MFC r217894:
Change infrastructure for SCTP_MAX_BURST to allow compliance
with the latest socket API ID. Especially it can be disabled.
Full compliance needs changing the structure used in the
socket option. Since this breaks the API, it will be a
seperate commit which will not be MFCed to stable/8.
tuexen [Fri, 10 Feb 2012 17:49:14 +0000 (17:49 +0000)]
MFC r217554:
Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string. For SYSCTL_PROC instances that I
noticed a discrepancy between the CTLTYPE and the format specifier,
fix the CTLTYPE.
The commit is from mdf@ and the MFC has been discussed with him.
brooks [Fri, 10 Feb 2012 15:54:39 +0000 (15:54 +0000)]
MFC r230403.
When creating the jails /dev/log symlink, do it by full path to avoid
creating stray "log" symlinks if the mount fails. That apparently
happens in some ezjail configs.
PR: conf/143084
Submitted by: Dirk Engling <erdgeist at erdgeist.org>
bz [Fri, 10 Feb 2012 13:15:11 +0000 (13:15 +0000)]
MFC r223741:
Tag mbufs of all incoming frames or packets with the interface's FIB
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by: julian
dougb [Fri, 10 Feb 2012 10:18:30 +0000 (10:18 +0000)]
MFC r208307:
This change does the following for the scripts that run up through
FILESYSTEMS (the default early_late_divider):
1. Move sysctl to run first
2. Move as many BEFOREs to REQUIREs as possible.
3. Minor effect, move hostid_save from right before mdconfig to right
after.
A lot of the early scripts make use of sysctl one way or another so
running this first makes a lot of sense given that system-critical
values are often placed in sysctl.conf. (More details in original log.)
In addition to the changes from this revision, tweak a few other rcorder
elements in order to minimize the differences between the order before
and after this change, mostly related to scripts still in this branch
that are no longer in HEAD.
tuexen [Fri, 10 Feb 2012 07:28:37 +0000 (07:28 +0000)]
MFC 216825:
Define and use SCTP_SSN_GE, SCTP_SSN_GT, SCTP_TSN_GE, SCTP_TSN_GT macros
and use them instead of the generic compare_with_wrap.
Retire compare_with_wrap.
ae [Fri, 10 Feb 2012 06:35:14 +0000 (06:35 +0000)]
MFC r228061:
The size of APM could be bigger than number of already allocated entries.
And the first usable sector should not start from the inside of APM area.
MFC r228076:
Add an ability to increase number of allocated APM entries when we
have reserved free space in the APM area.
Also instead of one write request per each APM entry, use MAXPHYS
sized writes when we are updating APM.
bz [Fri, 10 Feb 2012 06:12:48 +0000 (06:12 +0000)]
MFC r223334:
Leave an extra comment about flowtable and IPv6 support rectifying a
previous comment (r206024 in stable/8) to not forget things here in the
future as well and merge the changes without conflicts.
rmacklem [Fri, 10 Feb 2012 04:01:17 +0000 (04:01 +0000)]
MFC: r230605
A problem with respect to data read through the buffer cache for both
NFS clients was reported to freebsd-fs@ under the subject "NFS
corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when
a TCP mounted root fs was changed to using UDP. I believe that this
problem was caused by the change in mnt_stat.f_iosize that occurred
because rsize was decreased to the maximum supported by UDP. This
patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize,
since the latter is set to f_iosize when the vnode is allocated, but
does not change for a given vnode when f_iosize changes.
In case ntp cannot resolve a hostname on startup it will queue the entry
for resolving by a child process that, upon success, will add the entry
to the config of the running running parent process.
Unfortunately there are a couple of bugs with this, fixed in various
later versions of upstream in potentially different ways due to other
code changes:
1) Upon server [-46] <FQDN> the [-46] are used as FQDN for later resolving
which does not work. Make sure we always pass the name (or IP there).
2) The intermediate file to carry the information to the child process
does not know about -4/-6 restrictions, so that a dual-stacked host
could resolve to an IPv6 address but that might be unreachable (see
r223626) leading to no working synchronization ignoring a IPv4 record.
Thus alter the intermediate format to also pass the address family
(AF_UNSPEC (default), AF_INET or AF_INET6) to the child process
depending on -4 or -6.
3) Make the child process to parse the new intermediate file format and
save the address family for getaddrinfo() hints flags.
4) Change child to always reload resolv.conf calling res_init() before
trying to resolve names. This will pick up resolv.conf changes or
new resolv.confs should they have not existed or been empty or
unusable on ntp startup. This fix is more conditional in upstream
versions but given FreeBSD has res_init there is no need for the
configure logic as well.
mav [Thu, 9 Feb 2012 07:45:02 +0000 (07:45 +0000)]
MFC r230921:
Insert ordered command every 1/4 of the current command timeout, not 1/4
of the default one.
Without this change setting kern.cam.ada.default_timeout to 1 instead of 30
allowed me to trigger several false positive command timeouts under heavy
ZFS load on a SiI3132 siis(4) controller with 5 HDDs on a port multiplier.
Compare port numbers correctly. They are stored by SRCPORT()
in host byte order, so we need to compare them as such.
Properly compare IPv6 addresses as well.
This allows the, by default, 8 badaddrs slots per address
family to work correctly and only print sendto() errors once.
The change is no longer applicable to any latest upstream versions.
marius [Wed, 8 Feb 2012 23:57:03 +0000 (23:57 +0000)]
Given that sun4u uses sparc64/sparc64/ofw_machdep.c, use the sparc64
<machine/ofw_machdep.h> here also. This isn't exactly a clean approach
though as ofw_machdep.h also includes some prototypes for functions
that aren't actually shared between both architectures but just happen
to have the same interface. Given that the sun4v support actually is
dead and this is just to allow things to continue to compile it should
be okay though.
Hide IPv6 next header parsing warnings under the verbose sysctl
so people can possibly disable it when their consoles are flooded,
or enabled it for debugging.
If we detect an IPv6 fragment header and it is not the first fragment,
then terminate the loop as we will not find any further headers and
for short fragments this could otherwise lead to a pullup error
discarding the fragment.
Submitted by: Matthew Luckie (mjl luckie.org.nz)
PR: kern/145733
ipfw internally checks for offset == 0 to determine whether the
packet is a/the first fragment or not. For IPv6 we have added the
"more fragments" flag as well to be able to determine on whether
there will be more as we do not have the fragment header avaialble
for logging, while for IPv4 this information can be derived directly
from the IPv4 header. This allowed fragmented packets to bypass
normal rules as proper masking was not done when checking offset.
Split variables to not need masking for IPv6 to avoid further errors.
After r225032 fix logging in a similar way masking the the IPv6
more fragments flag off so that offset == 0 checks work properly.
Submitted by: Matthew Luckie (mjl luckie.org.nz)
PR: kern/145733
While not explicitly allowed by RFC 2460, in case there is no
translation technology involved (and that section is suggested to
be removed by Errata 2843), single packet fragments do not harm.
There is another errata and further drafts under discussion to clarify
on these kinds of packets.
Meanwhile add a sysctl to allow disabling this behaviour again.
We will treat single packet fragment (a fragment header added
when not needed) as if there was no fragment header.
Submitted by: Matthew Luckie (mjl luckie.org.nz) (original version)
PR: kern/145733