1 .\" Copyright (c) 2014 Adrian Chadd
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
12 .\" 3. The name of the author may not be used to endorse or promote products
13 .\" derived from this software without specific prior written permission.
15 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
16 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
19 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
21 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
22 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
23 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34 .Nd Distributed Protocol Control Block Groups
36 .Cd "options PCBGROUP"
43 .Fa "struct inpcbinfo *pcbinfo" "u_int hashfields" "int hash_nelements"
46 .Fn in_pcbgroup_destroy "struct inpcbinfo *pcbinfo"
47 .Ft struct inpcbgroup *
48 .Fo in_pcbgroup_byhash
49 .Fa "struct inpcbinfo *pcbinfo" "u_int hashtype" "uint32_t hash"
51 .Ft struct inpcbgroup *
52 .Fn in_pcbgroup_byinpcb "struct inpcb *inp"
54 .Fn in_pcbgroup_update "struct inpcb *inp"
56 .Fn in_pcbgroup_update_mbuf "struct inpcb *inp" "struct mbuf *m"
58 .Fn in_pcbgroup_remove "struct inpcb *inp"
60 .Fn in_pcbgroup_enabled "struct inpcbinfo *pcbinfo"
61 .In netinet6/in6_pcb.h
62 .Ft struct inpcbgroup *
63 .Fo in6_pcbgroup_byhash
64 .Fa "struct inpcbinfo *pcbinfo" "u_int hashtype" "uint32_t hash"
67 This implementation introduces notions of affinity
68 for connections and distribute work so as to reduce lock contention,
69 with hardware work distribution strategies
71 In this construction, connection groups supplement, rather than replace,
72 existing reservation tables for protocol 4-tuples, offering CPU-affine
73 lookup tables with minimal cache line migration and lock contention
74 during steady state operation.
76 Internet protocols like UDP and TCP register to use connection groups
77 by providing an ipi_hashfields value other than IPI_HASHFIELDS_NONE.
78 This indicates to the connection group code whether a 2-tuple or
79 4-tuple is used as an argument to hashes that assign a connection to
81 This must be aligned with any hardware-offloaded distribution model,
82 such as RSS or similar approaches taken in embedded network boards.
83 Wildcard sockets require special handling, as in Willmann 2006, and
84 are shared between connection groups while being protected by
86 Connection establishment and teardown can be signficantly more
87 expensive than without connection groups, but that steady-state
88 processing can be significantly faster.
90 Enabling PCBGROUP in the kernel only provides the infrastructure
91 required to create and manage multiple PCB groups.
92 An implementation needs to fill in a few functions to provide PCB
93 group hash information in order for PCBs to be placed in a PCB group.
95 By default, each PCB info block (struct pcbinfo) has a single hash for
96 all PCB entries for the given protocol with a single lock protecting it.
97 This can be a significant source of lock contention on SMP hardware.
98 When a PCBGROUP is created, an array of separate hash tables are
99 created, each with its own lock.
100 A separate table for wildcard PCBs is provided.
101 By default, a PCBGROUP table is created for each available CPU.
102 The PCBGROUP code attempts to calculate a hash value from the given
103 PCB or mbuf when looking up a PCBGROUP.
104 While processing a received frame,
105 .Fn in_pcbgroup_byhash
106 can be used in conjunction with either a hardware-provided hash
111 calculated hash value provided by some NICs
113 or a software-provided hash value in order to choose a PCBGROUP
115 A single table lock is held while performing a wildcard match.
116 However, all of the table locks are acquired before modifying the
118 The PCBGROUP tables operate in conjunction with the normal single PCB list
120 Thus, inserting and removing a PCB will still incur the same costs
122 A protocol which uses PCBGROUP should fall back to the normal PCB list
123 lookup if a call to the PCBGROUP layer does not yield a lookup hit.
125 Initialize a PCBGROUP in a PCB info block
126 .Pq Vt "struct pcbinfo"
128 .Fn in_pcbgroup_init .
130 Add a connection to a PCBGROUP with
131 .Fn in_pcbgroup_update .
132 Connections are removed by with
133 .Fn in_pcbgroup_remove .
134 These in turn will determine which PCBGROUP bucket the given PCB
135 is placed into and calculate the hash value appropriately.
137 Wildcard PCBs are hashed differently and placed in a single wildcard
141 is enabled and in use, RSS-aware wildcard PCBs are placed in a single
142 PCBGROUP based on RSS information.
143 Protocols may look up the PCB entry in a PCBGROUP by using the lookup
145 .Fn in_pcbgroup_byhash
147 .Fn in_pcbgroup_byinpcb .
148 .Sh IMPLEMENTATION NOTES
153 is aware of PCBGROUP and will call into the PCBGROUP code to do
154 PCBGROUP assignment and lookup, preferring a PCBGROUP lookup to the
155 default global PCB info table.
157 An implementor wishing to experiment or modify the PCBGROUP assignment
158 should modify this set of functions:
159 .Bl -tag -width "12345678" -offset indent
160 .It Fn in_pcbgroup_getbucket No and Fn in6_pcbgroup_getbucket
161 Map a given 32 bit hash value to a PCBGROUP.
162 By default this is hash % number_of_pcbgroups.
163 However, this distribution may not align with NIC receive queues or
167 .It Fn in_pcbgroup_byhash No and Fn in6_pcbgroup_byhash
168 Map a 32 bit hash value and a hash type identifier to a PCBGROUP.
169 By default, this simply returns NULL.
170 This function is used by the
173 .Pa sys/netinet/in_pcb.c
174 to map an mbuf to a PCBGROUP.
175 .It Fn in_pcbgroup_bytuple No and Fn in6_pcbgroup_bytuple
176 Map the source and destination address and port details to a PCBGROUP.
177 By default, this does a very simple XOR hash.
178 This function is used by both the PCB lookup code and as a fallback in
182 .Pa sys/netinet/in_pcb.c .
192 .%T "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems"
193 .%J "2006 USENIX Annual Technical Conference"
195 .%U http://www.ece.rice.edu/~willmann/pubs/paranet_usenix.pdf
198 PCBGROUP first appeared in
202 The PCBGROUP implementation was written by
203 .An Robert N. M. Watson Aq Mt rwatson@FreeBSD.org
204 under contract to Juniper Networks, Inc.
206 This manual page written by
207 .An Adrian Chadd Aq Mt adrian@FreeBSD.org .
211 implementation currently uses
213 blocks to tie into PCBGROUP.
214 This is a sign that a more abstract programming API is needed.
216 There is currently no support for re-balancing the PCBGROUP assignment,
217 nor is there any support for overriding which PCBGROUP a socket/PCB
220 No statistics are kept to indicate how often PCBGROUP lookups