1 .\" Copyright (c) 1983, 1991, 1993
2 .\" The Regents of the University of California.
3 .\" Copyright (c) 2010-2011 The FreeBSD Foundation
4 .\" All rights reserved.
6 .\" Portions of this documentation were written at the Centre for Advanced
7 .\" Internet Architectures, Swinburne University of Technology, Melbourne,
8 .\" Australia by David Hayes under sponsorship from the FreeBSD Foundation.
10 .\" Redistribution and use in source and binary forms, with or without
11 .\" modification, are permitted provided that the following conditions
13 .\" 1. Redistributions of source code must retain the above copyright
14 .\" notice, this list of conditions and the following disclaimer.
15 .\" 2. Redistributions in binary form must reproduce the above copyright
16 .\" notice, this list of conditions and the following disclaimer in the
17 .\" documentation and/or other materials provided with the distribution.
18 .\" 3. Neither the name of the University nor the names of its contributors
19 .\" may be used to endorse or promote products derived from this software
20 .\" without specific prior written permission.
22 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34 .\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
42 .Nd Internet Transmission Control Protocol
49 .Fn socket AF_INET SOCK_STREAM 0
53 protocol provides reliable, flow-controlled, two-way
55 It is a byte-stream protocol used to
61 Internet address format and, in addition, provides a per-host
63 .Dq "port addresses" .
64 Thus, each address is composed
65 of an Internet address specifying the host and network,
68 port on the host identifying the peer entity.
76 Active sockets initiate connections to passive
80 sockets are created active; to create a
83 system call must be used
84 after binding the socket with the
87 Only passive sockets may use the
89 call to accept incoming connections.
90 Only active sockets may use the
92 call to initiate connections.
96 their location to match
97 incoming connection requests from multiple networks.
98 This technique, termed
99 .Dq "wildcard addressing" ,
101 server to provide service to clients on multiple networks.
102 To create a socket which listens on all networks, the Internet
108 port may still be specified
109 at this time; if the port is not specified, the system will assign one.
110 Once a connection has been established, the socket's address is
111 fixed by the peer entity's location.
112 The address assigned to the
113 socket is the address associated with the network interface
114 through which packets are being transmitted and received.
115 Normally, this address corresponds to the peer entity's network.
118 supports a number of socket options which can be set with
122 .Bl -tag -width ".Dv TCP_CONGESTION"
124 Information about a socket's underlying TCP session may be retrieved
125 by passing the read-only option
129 It accepts a single argument: a pointer to an instance of
130 .Vt "struct tcp_info" .
132 This API is subject to change; consult the source to determine
133 which fields are currently filled out by this option.
135 specific additions include
139 bandwidth-controlled window space.
141 Set or query congestion control algorithm specific parameters.
145 .It Dv TCP_CONGESTION
146 Select or query the congestion control algorithm that TCP will use for the
154 option accepts a per-socket timeout argument of
156 in seconds, for new, non-established
159 For the global default in milliseconds see
163 section further down.
167 option accepts an argument of
169 for the amount of time, in seconds, that the connection must be idle
170 before keepalive probes (if enabled) are sent for the connection of this
172 If set on a listening socket, the value is inherited by the newly created
175 For the global default in milliseconds see
179 section further down.
183 option accepts an argument of
185 to set the per-socket interval, in seconds, between keepalive probes sent
187 If set on a listening socket, the value is inherited by the newly created
190 For the global default in milliseconds see
194 section further down.
198 option accepts an argument of
200 and allows a per-socket tuning of the number of probes sent, with no response,
201 before the connection will be dropped.
202 If set on a listening socket, the value is inherited by the newly created
205 For the global default see the
209 section further down.
211 Under most circumstances,
213 sends data when it is presented;
214 when outstanding data has not yet been acknowledged, it gathers
215 small amounts of output to be sent in a single packet once
216 an acknowledgement is received.
217 For a small number of clients, such as window systems
218 that send a stream of mouse events which receive no replies,
219 this packetization may cause significant delays.
222 defeats this algorithm.
224 By default, a sender- and
225 .No receiver- Ns Tn TCP
226 will negotiate among themselves to determine the maximum segment size
227 to be used for each connection.
230 option allows the user to determine the result of this negotiation,
231 and to reduce it if desired.
234 usually sends a number of options in each packet, corresponding to
237 extensions which are provided in this implementation.
240 is provided to disable
242 option use on a per-connection basis.
245 .No sender- Ns Tn TCP
248 bit, and begin transmission immediately (if permitted) at the end of
253 When this option is set to a non-zero value,
255 will delay sending any data at all until either the socket is closed,
256 or the internal send buffer is filled.
258 This option enables the use of MD5 digests (also known as TCP-MD5)
259 on writes to the specified socket.
260 Outgoing traffic is digested;
261 digests on incoming traffic are verified if the
262 .Va net.inet.tcp.signature_verify_input
264 The current default behavior for the system is to respond to a system
265 advertising this option with TCP-MD5; this may change.
267 One common use for this in a
269 router deployment is to enable
270 based routers to interwork with Cisco equipment at peering points.
271 Support for this feature conforms to RFC 2385.
274 sessions are supported.
276 In order for this option to function correctly, it is necessary for the
277 administrator to add a tcp-md5 key entry to the system's security
278 associations database (SADB) using the
281 This entry must have an SPI of 0x1000 and can therefore only be specified
282 on a per-host basis at this time.
284 If an SADB entry cannot be found for the destination, the outgoing traffic
285 will have an invalid digest option prepended, and the following error message
286 will be visible on the system console:
287 .Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" .
290 The option level for the
292 call is the protocol number for
295 .Xr getprotobyname 3 ,
298 All options are declared in
303 transport level may be used with
307 Incoming connection requests that are source-routed are noted,
308 and the reverse source route is used in responding.
310 The default congestion control algorithm for
314 Other congestion control algorithms can be made available using the
320 protocol implements a number of variables in the
325 .Bl -tag -width ".Va TCPCTL_DO_RFC1323"
326 .It Dv TCPCTL_DO_RFC1323
328 Implement the window scaling and timestamp options of RFC 1323
330 .It Dv TCPCTL_MSSDFLT
332 The default value used for the maximum segment size
334 when no advice to the contrary is received from MSS negotiation.
335 .It Dv TCPCTL_SENDSPACE
340 .It Dv TCPCTL_RECVSPACE
346 Log any connection attempts to ports where there is not a socket
347 accepting connections.
348 The value of 1 limits the logging to
350 (connection establishment) packets only.
351 That of 2 results in any
353 packets to closed ports being logged.
354 Any value unlisted above disables the logging
355 (default is 0, i.e., the logging is disabled).
357 The Maximum Segment Lifetime, in milliseconds, for a packet.
359 Timeout, in milliseconds, for new, non-established
362 The default is 75000 msec.
364 Amount of time, in milliseconds, that the connection must be idle
365 before keepalive probes (if enabled) are sent.
366 The default is 7200000 msec (2 hours).
368 The interval, in milliseconds, between keepalive probes sent to remote
369 machines, when no response is received on a
372 The default is 75000 msec.
374 Number of probes sent, with no response, before a connection
376 The default is 8 packets.
377 .It Va always_keepalive
382 connections, the kernel will
383 periodically send a packet to the remote host to verify the connection
388 unreachable messages may abort connections in
394 reassembly queue if the system is low on mbufs.
396 If enabled, disable sending of RST when a connection is attempted
397 to a port where there is not a socket accepting connections.
401 Delay ACK to try and piggyback it onto a data packet.
403 Maximum amount of time, in milliseconds, before a delayed ACK is sent.
404 .It Va path_mtu_discovery
405 Enable Path MTU Discovery.
409 control-block hash table
411 This may be tuned using the kernel option
414 .Va net.inet.tcp.tcbhashsize
418 Number of active process control blocks
421 Determines whether or not
423 cookies should be generated for outbound
427 cookies are a great help during
429 flood attacks, and are enabled by default.
432 .It Va isn_reseed_interval
433 The interval (in seconds) specifying how often the secret data used in
434 RFC 1948 initial sequence number calculations should be reseeded.
435 By default, this variable is set to zero, indicating that
436 no reseeding will occur.
437 Reseeding should not be necessary, and will break
439 recycling for a few minutes.
440 .It Va rexmit_min , rexmit_slop
441 Adjust the retransmit timer calculation for
444 typically added to the raw calculation to take into account
445 occasional variances that the
447 (smoothed round-trip time)
448 is unable to accommodate, while the minimum specifies an
453 second minimum, these RFCs tend to focus on streaming behavior,
454 and fail to deal with the fact that a 1 second minimum has severe
455 detrimental effects over lossy interactive connections, such
456 as a 802.11b wireless link, and over very fast but lossy
457 connections for those cases not covered by the fast retransmit
459 For this reason, we use 200ms of slop and a near-0
460 minimum, which gives us an effective minimum of 200ms (similar to
462 .It Va initcwnd_segments
463 Enable the ability to specify initial congestion window in number of segments.
464 The default value is 10 as suggested by RFC 6928.
465 Changing the value on fly would not affect connections using congestion window
468 This regulates the burst of packets allowed to be sent in the first RTT.
469 The value should be relative to the link capacity.
470 Start with small values for lower-capacity links.
471 Large bursts can cause buffer overruns and packet drops if routers have small
472 buffers or the link is experiencing congestion.
474 Enable the Limited Transmit algorithm as described in RFC 3042.
475 It helps avoid timeouts on lossy links and also when the congestion window
476 is small, as happens on short transfers.
478 Enable support for RFC 3390, which allows for a variable-sized
479 starting congestion window on new connections, depending on the
480 maximum segment size.
481 This helps throughput in general, but
482 particularly affects short transfers and high-bandwidth large
483 propagation-delay connections.
485 Enable support for RFC 2018, TCP Selective Acknowledgment option,
486 which allows the receiver to inform the sender about all successfully
487 arrived segments, allowing the sender to retransmit the missing segments
490 Maximum number of SACK holes per connection.
492 .It Va sack.globalmaxholes
493 Maximum number of SACK holes per system, across all connections.
496 When a TCP connection enters the
498 state, its associated socket structure is freed, since it is of
499 negligible size and use, and a new structure is allocated to contain a
500 minimal amount of information necessary for sustaining a connection in
501 this state, called the compressed TCP TIME_WAIT state.
502 Since this structure is smaller than a socket structure, it can save
503 a significant amount of system memory.
505 .Va net.inet.tcp.maxtcptw
506 MIB variable controls the maximum number of these structures allocated.
507 By default, it is initialized to
508 .Va kern.ipc.maxsockets
510 .It Va nolocaltimewait
511 Suppress creating of compressed TCP TIME_WAIT states for connections in
512 which both endpoints are local.
513 .It Va fast_finwait2_recycle
517 connections faster when the socket is marked as
519 (no user process has the socket open, data received on
520 the socket cannot be read).
521 The timeout used here is
522 .Va finwait2_timeout .
523 .It Va finwait2_timeout
524 Timeout to use for fast recycling of
528 Defaults to 60 seconds.
530 Enable support for TCP Explicit Congestion Notification (ECN).
531 ECN allows a TCP sender to reduce the transmission rate in order to
533 .It Va ecn.maxretries
534 Number of retries (SYN or SYN/ACK retransmits) before disabling ECN on a
536 This is needed to help with connection establishment
537 when a broken firewall is in the network path.
538 .It Va pmtud_blackhole_detection
539 Turn on automatic path MTU blackhole detection.
540 In case of retransmits OS will
541 lower the MSS to check if it's MTU problem.
542 If current MSS is greater than
543 configured value to try, it will be set to configured value, otherwise,
544 MSS will be set to default values
545 .Po Va net.inet.tcp.mssdflt
547 .Va net.inet.tcp.v6mssdflt
549 .It Va pmtud_blackhole_mss
550 MSS to try for IPv4 if PMTU blackhole detection is turned on.
551 .It Va v6pmtud_blackhole_mss
552 MSS to try for IPv6 if PMTU blackhole detection is turned on.
553 .It Va pmtud_blackhole_activated
554 Number of times configured values were used in an attempt to downshift.
555 .It Va pmtud_blackhole_activated_min_mss
556 Number of times default MSS was used in an attempt to downshift.
557 .It Va pmtud_blackhole_failed
558 Number of connections for which retransmits continued even after MSS
562 A socket operation may fail with one of the following errors returned:
565 when trying to establish a connection on a socket which
568 when the system runs out of memory for
569 an internal data structure;
571 when a connection was dropped
572 due to excessive retransmissions;
575 forces the connection to be closed;
576 .It Bq Er ECONNREFUSED
578 peer actively refuses connection establishment (usually because
579 no process is listening to the port);
582 is made to create a socket with a port which has already been
584 .It Bq Er EADDRNOTAVAIL
585 when an attempt is made to create a
586 socket with a network address for which no network interface
588 .It Bq Er EAFNOSUPPORT
589 when an attempt is made to bind or connect a socket to a multicast
608 .%T "TCP Extensions for High Performance"
613 .%T "Protection of BGP Sessions via the TCP MD5 Signature Option"
617 .%A "K. Ramakrishnan"
620 .%T "The Addition of Explicit Congestion Notification (ECN) to IP"
628 The RFC 1323 extensions for window scaling and timestamps were added
633 option was introduced in
636 .Em subject to change .