1 .\" Copyright (c) 1983, 1991, 1993
2 .\" The Regents of the University of California.
3 .\" Copyright (c) 2010-2011 The FreeBSD Foundation
4 .\" All rights reserved.
6 .\" Portions of this documentation were written at the Centre for Advanced
7 .\" Internet Architectures, Swinburne University of Technology, Melbourne,
8 .\" Australia by David Hayes under sponsorship from the FreeBSD Foundation.
10 .\" Redistribution and use in source and binary forms, with or without
11 .\" modification, are permitted provided that the following conditions
13 .\" 1. Redistributions of source code must retain the above copyright
14 .\" notice, this list of conditions and the following disclaimer.
15 .\" 2. Redistributions in binary form must reproduce the above copyright
16 .\" notice, this list of conditions and the following disclaimer in the
17 .\" documentation and/or other materials provided with the distribution.
18 .\" 3. All advertising materials mentioning features or use of this software
19 .\" must display the following acknowledgement:
20 .\" This product includes software developed by the University of
21 .\" California, Berkeley and its contributors.
22 .\" 4. Neither the name of the University nor the names of its contributors
23 .\" may be used to endorse or promote products derived from this software
24 .\" without specific prior written permission.
26 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
27 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
28 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
29 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
30 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
31 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
32 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
33 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
34 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
35 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
38 .\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
46 .Nd Internet Transmission Control Protocol
53 .Fn socket AF_INET SOCK_STREAM 0
57 protocol provides reliable, flow-controlled, two-way
59 It is a byte-stream protocol used to
65 Internet address format and, in addition, provides a per-host
67 .Dq "port addresses" .
68 Thus, each address is composed
69 of an Internet address specifying the host and network,
72 port on the host identifying the peer entity.
80 Active sockets initiate connections to passive
84 sockets are created active; to create a
87 system call must be used
88 after binding the socket with the
91 Only passive sockets may use the
93 call to accept incoming connections.
94 Only active sockets may use the
96 call to initiate connections.
100 their location to match
101 incoming connection requests from multiple networks.
102 This technique, termed
103 .Dq "wildcard addressing" ,
105 server to provide service to clients on multiple networks.
106 To create a socket which listens on all networks, the Internet
112 port may still be specified
113 at this time; if the port is not specified, the system will assign one.
114 Once a connection has been established, the socket's address is
115 fixed by the peer entity's location.
116 The address assigned to the
117 socket is the address associated with the network interface
118 through which packets are being transmitted and received.
119 Normally, this address corresponds to the peer entity's network.
122 supports a number of socket options which can be set with
126 .Bl -tag -width ".Dv TCP_CONGESTION"
128 Information about a socket's underlying TCP session may be retrieved
129 by passing the read-only option
133 It accepts a single argument: a pointer to an instance of
134 .Vt "struct tcp_info" .
136 This API is subject to change; consult the source to determine
137 which fields are currently filled out by this option.
139 specific additions include
143 bandwidth-controlled window space.
144 .It Dv TCP_CONGESTION
145 Select or query the congestion control algorithm that TCP will use for the
153 option accepts a per-socket timeout argument of
155 in seconds, for new, non-established
158 For the global default in milliseconds see
162 section further down.
166 option accepts an argument of
168 for the amount of time, in seconds, that the connection must be idle
169 before keepalive probes (if enabled) are sent for the connection of this
171 If set on a listening socket, the value is inherited by the newly created
174 For the global default in milliseconds see
178 section further down.
182 option accepts an argument of
184 to set the per-socket interval, in seconds, between keepalive probes sent
186 If set on a listening socket, the value is inherited by the newly created
189 For the global default in milliseconds see
193 section further down.
197 option accepts an argument of
199 and allows a per-socket tuning of the number of probes sent, with no response,
200 before the connection will be dropped.
201 If set on a listening socket, the value is inherited by the newly created
204 For the global default see the
208 section further down.
210 Under most circumstances,
212 sends data when it is presented;
213 when outstanding data has not yet been acknowledged, it gathers
214 small amounts of output to be sent in a single packet once
215 an acknowledgement is received.
216 For a small number of clients, such as window systems
217 that send a stream of mouse events which receive no replies,
218 this packetization may cause significant delays.
221 defeats this algorithm.
223 By default, a sender- and
224 .No receiver- Ns Tn TCP
225 will negotiate among themselves to determine the maximum segment size
226 to be used for each connection.
229 option allows the user to determine the result of this negotiation,
230 and to reduce it if desired.
233 usually sends a number of options in each packet, corresponding to
236 extensions which are provided in this implementation.
239 is provided to disable
241 option use on a per-connection basis.
244 .No sender- Ns Tn TCP
247 bit, and begin transmission immediately (if permitted) at the end of
252 When this option is set to a non-zero value,
254 will delay sending any data at all until either the socket is closed,
255 or the internal send buffer is filled.
257 This option enables the use of MD5 digests (also known as TCP-MD5)
258 on writes to the specified socket.
259 Outgoing traffic is digested;
260 digests on incoming traffic are verified if the
261 .Va net.inet.tcp.signature_verify_input
263 The current default behavior for the system is to respond to a system
264 advertising this option with TCP-MD5; this may change.
266 One common use for this in a
268 router deployment is to enable
269 based routers to interwork with Cisco equipment at peering points.
270 Support for this feature conforms to RFC 2385.
273 sessions are supported.
275 In order for this option to function correctly, it is necessary for the
276 administrator to add a tcp-md5 key entry to the system's security
277 associations database (SADB) using the
280 This entry must have an SPI of 0x1000 and can therefore only be specified
281 on a per-host basis at this time.
283 If an SADB entry cannot be found for the destination, the outgoing traffic
284 will have an invalid digest option prepended, and the following error message
285 will be visible on the system console:
286 .Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" .
289 The option level for the
291 call is the protocol number for
294 .Xr getprotobyname 3 ,
297 All options are declared in
302 transport level may be used with
306 Incoming connection requests that are source-routed are noted,
307 and the reverse source route is used in responding.
309 The default congestion control algorithm for
313 Other congestion control algorithms can be made available using the
319 protocol implements a number of variables in the
324 .Bl -tag -width ".Va TCPCTL_DO_RFC1323"
325 .It Dv TCPCTL_DO_RFC1323
327 Implement the window scaling and timestamp options of RFC 1323
329 .It Dv TCPCTL_MSSDFLT
331 The default value used for the maximum segment size
333 when no advice to the contrary is received from MSS negotiation.
334 .It Dv TCPCTL_SENDSPACE
339 .It Dv TCPCTL_RECVSPACE
345 Log any connection attempts to ports where there is not a socket
346 accepting connections.
347 The value of 1 limits the logging to
349 (connection establishment) packets only.
350 That of 2 results in any
352 packets to closed ports being logged.
353 Any value unlisted above disables the logging
354 (default is 0, i.e., the logging is disabled).
356 The Maximum Segment Lifetime, in milliseconds, for a packet.
358 Timeout, in milliseconds, for new, non-established
361 The default is 75000 msec.
363 Amount of time, in milliseconds, that the connection must be idle
364 before keepalive probes (if enabled) are sent.
365 The default is 7200000 msec (2 hours).
367 The interval, in milliseconds, between keepalive probes sent to remote
368 machines, when no response is received on a
371 The default is 75000 msec.
373 Number of probes sent, with no response, before a connection
375 The default is 8 packets.
376 .It Va always_keepalive
381 connections, the kernel will
382 periodically send a packet to the remote host to verify the connection
387 unreachable messages may abort connections in
393 reassembly queue if the system is low on mbufs.
395 If enabled, disable sending of RST when a connection is attempted
396 to a port where there is not a socket accepting connections.
400 Delay ACK to try and piggyback it onto a data packet.
402 Maximum amount of time, in milliseconds, before a delayed ACK is sent.
403 .It Va path_mtu_discovery
404 Enable Path MTU Discovery.
408 control-block hash table
410 This may be tuned using the kernel option
413 .Va net.inet.tcp.tcbhashsize
417 Number of active process control blocks
420 Determines whether or not
422 cookies should be generated for outbound
426 cookies are a great help during
428 flood attacks, and are enabled by default.
431 .It Va isn_reseed_interval
432 The interval (in seconds) specifying how often the secret data used in
433 RFC 1948 initial sequence number calculations should be reseeded.
434 By default, this variable is set to zero, indicating that
435 no reseeding will occur.
436 Reseeding should not be necessary, and will break
438 recycling for a few minutes.
439 .It Va rexmit_min , rexmit_slop
440 Adjust the retransmit timer calculation for
443 typically added to the raw calculation to take into account
444 occasional variances that the
446 (smoothed round-trip time)
447 is unable to accommodate, while the minimum specifies an
452 second minimum, these RFCs tend to focus on streaming behavior,
453 and fail to deal with the fact that a 1 second minimum has severe
454 detrimental effects over lossy interactive connections, such
455 as a 802.11b wireless link, and over very fast but lossy
456 connections for those cases not covered by the fast retransmit
458 For this reason, we use 200ms of slop and a near-0
459 minimum, which gives us an effective minimum of 200ms (similar to
462 Enable the Limited Transmit algorithm as described in RFC 3042.
463 It helps avoid timeouts on lossy links and also when the congestion window
464 is small, as happens on short transfers.
466 Enable support for RFC 3390, which allows for a variable-sized
467 starting congestion window on new connections, depending on the
468 maximum segment size.
469 This helps throughput in general, but
470 particularly affects short transfers and high-bandwidth large
471 propagation-delay connections.
473 Enable support for RFC 2018, TCP Selective Acknowledgment option,
474 which allows the receiver to inform the sender about all successfully
475 arrived segments, allowing the sender to retransmit the missing segments
478 Maximum number of SACK holes per connection.
480 .It Va sack.globalmaxholes
481 Maximum number of SACK holes per system, across all connections.
484 When a TCP connection enters the
486 state, its associated socket structure is freed, since it is of
487 negligible size and use, and a new structure is allocated to contain a
488 minimal amount of information necessary for sustaining a connection in
489 this state, called the compressed TCP TIME_WAIT state.
490 Since this structure is smaller than a socket structure, it can save
491 a significant amount of system memory.
493 .Va net.inet.tcp.maxtcptw
494 MIB variable controls the maximum number of these structures allocated.
495 By default, it is initialized to
496 .Va kern.ipc.maxsockets
498 .It Va nolocaltimewait
499 Suppress creating of compressed TCP TIME_WAIT states for connections in
500 which both endpoints are local.
501 .It Va fast_finwait2_recycle
505 connections faster when the socket is marked as
507 (no user process has the socket open, data received on
508 the socket cannot be read).
509 The timeout used here is
510 .Va finwait2_timeout .
511 .It Va finwait2_timeout
512 Timeout to use for fast recycling of
516 Defaults to 60 seconds.
518 Enable support for TCP Explicit Congestion Notification (ECN).
519 ECN allows a TCP sender to reduce the transmission rate in order to
521 .It Va ecn.maxretries
522 Number of retries (SYN or SYN/ACK retransmits) before disabling ECN on a
524 This is needed to help with connection establishment
525 when a broken firewall is in the network path.
526 .It Va pmtud_blackhole_detection
527 Turn on automatic path MTU blackhole detection.
528 In case of retransmits OS will
529 lower the MSS to check if it's MTU problem.
530 If current MSS is greater than
531 configured value to try, it will be set to configured value, otherwise,
532 MSS will be set to default values
533 .Po Va net.inet.tcp.mssdflt
535 .Va net.inet.tcp.v6mssdflt
537 .It Va pmtud_blackhole_mss
538 MSS to try for IPv4 if PMTU blackhole detection is turned on.
539 .It Va v6pmtud_blackhole_mss
540 MSS to try for IPv6 if PMTU blackhole detection is turned on.
541 .It Va pmtud_blackhole_activated
542 Number of times configured values were used in an attempt to downshift.
543 .It Va pmtud_blackhole_activated_min_mss
544 Number of times default MSS was used in an attempt to downshift.
545 .It Va pmtud_blackhole_failed
546 Number of connections for which retransmits continued even after MSS
550 A socket operation may fail with one of the following errors returned:
553 when trying to establish a connection on a socket which
556 when the system runs out of memory for
557 an internal data structure;
559 when a connection was dropped
560 due to excessive retransmissions;
563 forces the connection to be closed;
564 .It Bq Er ECONNREFUSED
566 peer actively refuses connection establishment (usually because
567 no process is listening to the port);
570 is made to create a socket with a port which has already been
572 .It Bq Er EADDRNOTAVAIL
573 when an attempt is made to create a
574 socket with a network address for which no network interface
576 .It Bq Er EAFNOSUPPORT
577 when an attempt is made to bind or connect a socket to a multicast
596 .%T "TCP Extensions for High Performance"
601 .%T "Protection of BGP Sessions via the TCP MD5 Signature Option"
605 .%A "K. Ramakrishnan"
608 .%T "The Addition of Explicit Congestion Notification (ECN) to IP"
616 The RFC 1323 extensions for window scaling and timestamps were added
621 option was introduced in
624 .Em subject to change .