sys/modules/netgraph/netgraph/netgraph.4

   1 .\" Copyright (c) 1996-1999 Whistle Communications, Inc.
   2 .\" All rights reserved.
   3 .\"
   4 .\" Subject to the following obligations and disclaimer of warranty, use and
   5 .\" redistribution of this software, in source or object code forms, with or
   6 .\" without modifications are expressly permitted by Whistle Communications;
   7 .\" provided, however, that:
   8 .\" 1. Any and all reproductions of the source or object code must include the
   9 .\"    copyright notice above and the following disclaimer of warranties; and
  10 .\" 2. No rights are granted, in any manner or form, to use Whistle
  11 .\"    Communications, Inc. trademarks, including the mark "WHISTLE
  12 .\"    COMMUNICATIONS" on advertising, endorsements, or otherwise except as
  13 .\"    such appears in the above copyright notice or in the software.
  14 .\"
  15 .\" THIS SOFTWARE IS BEING PROVIDED BY WHISTLE COMMUNICATIONS "AS IS", AND
  16 .\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, WHISTLE COMMUNICATIONS MAKES NO
  17 .\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING THIS SOFTWARE,
  18 .\" INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED WARRANTIES OF
  19 .\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
  20 .\" WHISTLE COMMUNICATIONS DOES NOT WARRANT, GUARANTEE, OR MAKE ANY
  21 .\" REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS OF THE USE OF THIS
  22 .\" SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY OR OTHERWISE.
  23 .\" IN NO EVENT SHALL WHISTLE COMMUNICATIONS BE LIABLE FOR ANY DAMAGES
  24 .\" RESULTING FROM OR ARISING OUT OF ANY USE OF THIS SOFTWARE, INCLUDING
  25 .\" WITHOUT LIMITATION, ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
  26 .\" PUNITIVE, OR CONSEQUENTIAL DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR
  27 .\" SERVICES, LOSS OF USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY
  28 .\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  29 .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  30 .\" THIS SOFTWARE, EVEN IF WHISTLE COMMUNICATIONS IS ADVISED OF THE POSSIBILITY
  31 .\" OF SUCH DAMAGE.
  32 .\"
  33 .\" Authors: Julian Elischer <julian@whistle.com>
  34 .\"          Archie Cobbs <archie@whistle.com>
  35 .\"
  36 .\" $FreeBSD$
  37 .\" $Whistle: netgraph.4,v 1.7 1999/01/28 23:54:52 julian Exp $
  38 .\"
  39 .Dd January 19, 1999
  40 .Dt NETGRAPH 4
  41 .Os FreeBSD
  42 .Sh NAME
  43 .Nm netgraph
  44 .Nd graph based kernel networking subsystem
  45 .Sh DESCRIPTION
  46 The
  47 .Nm
  48 system provides a uniform and modular system for the implementation
  49 of kernel objects which perform various networking functions. The objects,
  50 known as
  51 .Em nodes ,
  52 can be arranged into arbitrarily complicated graphs. Nodes have
  53 .Em hooks
  54 which are used to connect two nodes together, forming the edges in the graph.
  55 Nodes communicate along the edges to process data, implement protocols, etc.
  56 .Pp
  57 The aim of
  58 .Nm
  59 is to supplement rather than replace the existing kernel networking
  60 infrastructure.  It provides:
  61 .Pp
  62 .Bl -bullet -compact -offset 2n
  63 .It
  64 A flexible way of combining protocol and link level drivers
  65 .It
  66 A modular way to implement new protocols
  67 .It
  68 A common framework for kernel entities to inter-communicate
  69 .It
  70 A reasonably fast, kernel-based implementation
  71 .El
  72 .Sh Nodes and Types
  73 The most fundamental concept in
  74 .Nm
  75 is that of a
  76 .Em node .
  77 All nodes implement a number of predefined methods which allow them
  78 to interact with other nodes in a well defined manner.
  79 .Pp
  80 Each node has a
  81 .Em type ,
  82 which is a static property of the node determined at node creation time.
  83 A node's type is described by a unique
  84 .Tn ASCII
  85 type name.
  86 The type implies what the node does and how it may be connected
  87 to other nodes.
  88 .Pp
  89 In object-oriented language, types are classes and nodes are instances
  90 of their respective class. All node types are subclasses of the generic node
  91 type, and hence inherit certain common functionality and capabilities
  92 (e.g., the ability to have an
  93 .Tn ASCII
  94 name).
  95 .Pp
  96 Nodes may be assigned a globally unique
  97 .Tn ASCII
  98 name which can be
  99 used to refer to the node.
 100 The name must not contain the characters
 101 .Dq \&.
 102 or
 103 .Dq \&:
 104 and is limited to
 105 .Dv "NG_NODELEN + 1"
 106 characters (including NUL byte).
 107 .Pp
 108 Each node instance has a unique
 109 .Em ID number
 110 which is expressed as a 32-bit hex value. This value may be used to
 111 refer to a node when there is no
 112 .Tn ASCII
 113 name assigned to it.
 114 .Sh Hooks
 115 Nodes are connected to other nodes by connecting a pair of
 116 .Em hooks ,
 117 one from each node. Data flows bidirectionally between nodes along
 118 connected pairs of hooks.  A node may have as many hooks as it
 119 needs, and may assign whatever meaning it wants to a hook.
 120 .Pp
 121 Hooks have these properties:
 122 .Pp
 123 .Bl -bullet -compact -offset 2n
 124 .It
 125 A hook has an
 126 .Tn ASCII
 127 name which is unique among all hooks
 128 on that node (other hooks on other nodes may have the same name).
 129 The name must not contain a
 130 .Dq \&.
 131 or a
 132 .Dq \&:
 133 and is
 134 limited to
 135 .Dv "NG_HOOKLEN + 1"
 136 characters (including NUL byte).
 137 .It
 138 A hook is always connected to another hook. That is, hooks are
 139 created at the time they are connected, and breaking an edge by
 140 removing either hook destroys both hooks.
 141 .El
 142 .Pp
 143 A node may decide to assign special meaning to some hooks.
 144 For example, connecting to the hook named
 145 .Dq debug
 146 might trigger
 147 the node to start sending debugging information to that hook.
 148 .Sh Data Flow
 149 Two types of information flow between nodes: data messages and
 150 control messages. Data messages are passed in mbuf chains along the edges
 151 in the graph, one edge at a time. The first mbuf in a chain must have the
 152 .Dv M_PKTHDR
 153 flag set. Each node decides how to handle data coming in on its hooks.
 154 .Pp
 155 Control messages are type-specific C structures sent from one node
 156 directly to some arbitrary other node.  Control messages have a common
 157 header format, followed by type-specific data, and are binary structures
 158 for efficiency.  However, node types also may support conversion of the
 159 type specific data between binary and
 160 .Tn ASCII
 161 for debugging and human interface purposes (see the
 162 .Dv NGM_ASCII2BINARY
 163 and
 164 .Dv NGM_BINARY2ASCII
 165 generic control messages below).  Nodes are not required to support
 166 these conversions.
 167 .Pp
 168 There are two ways to address a control message. If
 169 there is a sequence of edges connecting the two nodes, the message
 170 may be
 171 .Dq source routed
 172 by specifying the corresponding sequence
 173 of hooks as the destination address for the message (relative
 174 addressing).  Otherwise, the recipient node global
 175 .Tn ASCII
 176 name
 177 (or equivalent ID based name) is used as the destination address
 178 for the message (absolute addressing).  The two types of addressing
 179 may be combined, by specifying an absolute start node and a sequence
 180 of hooks.
 181 .Pp
 182 Messages often represent commands that are followed by a reply message
 183 in the reverse direction. To facilitate this, the recipient of a
 184 control message is supplied with a
 185 .Dq return address
 186 that is suitable
 187 for addressing a reply.
 188 .Pp
 189 Each control message contains a 32 bit value called a
 190 .Em typecookie
 191 indicating the type of the message, i.e., how to interpret it.
 192 Typically each type defines a unique typecookie for the messages
 193 that it understands.  However, a node may choose to recognize and
 194 implement more than one type of message.
 195 .Pp
 196 If message is delivered to an address that implies that it arrived
 197 at that node through a particular hook, that hook is identified to the
 198 receiving node. This allows a message to be rerouted or passed on, should
 199 a node decide that this is required.
 200 .Sh Netgraph is Functional
 201 In order to minimize latency, most
 202 .Nm
 203 operations are functional.
 204 That is, data and control messages are delivered by making function
 205 calls rather than by using queues and mailboxes.  For example, if node
 206 A wishes to send a data mbuf to neighboring node B, it calls the
 207 generic
 208 .Nm
 209 data delivery function. This function in turn locates
 210 node B and calls B's
 211 .Dq receive data
 212 method.
 213 .Pp
 214 It is allowable for nodes to reject a data packet, or to pass it back to the
 215 caller in a modified or completely replaced form. The caller can notify the
 216 node being called that it does not wish to receive any such packets
 217 by using the
 218 .Fn NG_SEND_DATA
 219 macro, in which case, the second node should just discard rejected packets.
 220 If the sender knows how to handle returned packets, it must use the
 221 .Fn NG_SEND_DATA_RET
 222 macro, which will adjust the parameters to point to the returned data
 223 or NULL if no data was returned to the caller. No packet return is possible
 224 across a queuing link (though an explicitly sent return is of course possible,
 225 it doesn't mean quite the same thing).
 226 .Pp
 227 While this mode of operation
 228 results in good performance, it has a few implications for node
 229 developers:
 230 .Pp
 231 .Bl -bullet -compact -offset 2n
 232 .It
 233 Whenever a node delivers a data or control message, the node
 234 may need to allow for the possibility of receiving a returning
 235 message before the original delivery function call returns.
 236 .It
 237 Netgraph nodes and support routines generally run at
 238 .Fn splnet .
 239 However, some nodes may want to send data and control messages
 240 from a different priority level. Netgraph supplies queueing routines which
 241 utilize the NETISR system to move message delivery to
 242 .Fn splnet .
 243 Nodes that run at other priorities (e.g. interfaces) can be directly
 244 linked to other nodes so that the combination runs at the other priority,
 245 however any interaction with nodes running at splnet MUST be achievd via the
 246 queueing functions, (which use the
 247 .Fn netisr
 248 feature of the kernel).
 249 Note that messages are always received at
 250 .Fn splnet .
 251 .It
 252 It's possible for an infinite loop to occur if the graph contains cycles.
 253 .El
 254 .Pp
 255 So far, these issues have not proven problematical in practice.
 256 .Sh Interaction With Other Parts of the Kernel
 257 A node may have a hidden interaction with other components of the
 258 kernel outside of the
 259 .Nm
 260 subsystem, such as device hardware,
 261 kernel protocol stacks, etc.  In fact, one of the benefits of
 262 .Nm
 263 is the ability to join disparate kernel networking entities together in a
 264 consistent communication framework.
 265 .Pp
 266 An example is the node type
 267 .Em socket
 268 which is both a netgraph node and a
 269 .Xr socket 2
 270 BSD socket in the protocol family
 271 .Dv PF_NETGRAPH .
 272 Socket nodes allow user processes to participate in
 273 .Nm Ns .
 274 Other nodes communicate with socket nodes using the usual methods, and the
 275 node hides the fact that it is also passing information to and from a
 276 cooperating user process.
 277 .Pp
 278 Another example is a device driver that presents
 279 a node interface to the hardware.
 280 .Sh Node Methods
 281 Nodes are notified of the following actions via function calls
 282 to the following node methods (all at
 283 .Fn splnet )
 284 and may accept or reject that action (by returning the appropriate
 285 error code):
 286 .Bl -tag -width xxx
 287 .It Creation of a new node
 288 The constructor for the type is called. If creation of a new node is
 289 allowed, the constructor must call the generic node creation
 290 function (in object-oriented terms, the superclass constructor)
 291 and then allocate any special resources it needs. For nodes that
 292 correspond to hardware, this is typically done during the device
 293 attach routine. Often a global
 294 .Tn ASCII
 295 name corresponding to the
 296 device name is assigned here as well.
 297 .It Creation of a new hook
 298 The hook is created and tentatively
 299 linked to the node, and the node is told about the name that will be
 300 used to describe this hook. The node sets up any special data structures
 301 it needs, or may reject the connection, based on the name of the hook.
 302 .It Successful connection of two hooks
 303 After both ends have accepted their
 304 hooks, and the links have been made, the nodes get a chance to
 305 find out who their peer is across the link and can then decide to reject
 306 the connection. Tear-down is automatic.
 307 .It Destruction of a hook
 308 The node is notified of a broken connection. The node may consider some hooks
 309 to be critical to operation and others to be expendable: the disconnection
 310 of one hook may be an acceptable event while for another it
 311 may effect a total shutdown for the node.
 312 .It Shutdown of a node
 313 This method allows a node to clean up
 314 and to ensure that any actions that need to be performed
 315 at this time are taken. The method must call the generic (i.e., superclass)
 316 node destructor to get rid of the generic components of the node.
 317 Some nodes (usually associated with a piece of hardware) may be
 318 .Em persistent
 319 in that a shutdown breaks all edges and resets the node,
 320 but doesn't remove it, in which case the generic destructor is not called.
 321 .El
 322 .Sh Sending and Receiving Data
 323 Three other methods are also supported by all nodes:
 324 .Bl -tag -width xxx
 325 .It Receive data message
 326 An mbuf chain is passed to the node.
 327 The node is notified on which hook the data arrived,
 328 and can use this information in its processing decision.
 329 The receiving node must always
 330 .Fn m_freem
 331 the mbuf chain on completion or error, pass it back (reject it), or pass
 332 it on to another node
 333 (or kernel module) which will then be responsible for freeing it.
 334 If a node passes a packet back to the caller, it does not have to be the
 335 same mbuf, in which case the original must be freed. Passing a packet
 336 back allows a module to modify the original data (e.g. encrypt it),
 337 or in some other way filter it (e.g. packet filtering).
 338 .Pp
 339 In addition to the mbuf chain itself there is also a pointer to a
 340 structure describing meta-data about the message
 341 (e.g. priority information). This pointer may be
 342 .Dv NULL
 343 if there is no additional information. The format for this information is
 344 described in
 345 .Pa netgraph.h .
 346 The memory for meta-data must allocated via
 347 .Fn malloc
 348 with type
 349 .Dv M_NETGRAPH .
 350 As with the data itself, it is the receiver's responsibility to
 351 .Fn free
 352 the meta-data. If the mbuf chain is freed the meta-data must
 353 be freed at the same time. If the meta-data is freed but the
 354 real data on is passed on, then a
 355 .Dv NULL
 356 pointer must be substituted.
 357 Meta-data may be passed back in the same way that mbuf data may be passed back.
 358 As with mbuf data, the rejected or returned meta-data pointer may point to
 359 the same or different meta-data as that passed in,
 360 and if it is different, the original must be freed.
 361 .Pp
 362 The receiving node may decide to defer the data by queueing it in the
 363 .Nm
 364 NETISR system (see below).
 365 .Pp
 366 The structure and use of meta-data is still experimental, but is presently used in
 367 frame-relay to indicate that management packets should be queued for transmission
 368 at a higher priority than data packets. This is required for
 369 conformance with Frame Relay standards.
 370 .Pp
 371 .It Receive queued data message
 372 Usually this will be the same function as
 373 .Em Receive data message.
 374 This is the entry point called when a data message is being handed to
 375 the node after having been queued in the NETISR system.
 376 This allows a node to decide in the
 377 .Em Receive data message
 378 method that a message should be deferred and queued,
 379 and be sure that when it is processed from the queue,
 380 it will not be queued again.
 381 .It Receive control message
 382 This method is called when a control message is addressed to the node.
 383 A return address is always supplied, giving the address of the node
 384 that originated the message so a reply message can be sent anytime later.
 385 .Pp
 386 It is possible for a synchronous reply to be made, and in fact this
 387 is more common in practice.
 388 This is done by setting a pointer (supplied as an extra function parameter)
 389 to point to the reply.
 390 Then when the control message delivery function returns,
 391 the caller can check if this pointer has been made non-NULL,
 392 and if so then it points to the reply message allocated via
 393 .Fn malloc
 394 and containing the synchronous response. In both directions,
 395 (request and response) it is up to the
 396 receiver of that message to
 397 .Fn free
 398 the control message buffer. All control messages and replies are
 399 allocated with
 400 .Fn malloc
 401 type
 402 .Dv M_NETGRAPH .
 403 .Pp
 404 If the message was delivered via a specific hook, that hook will
 405 also be made known, which allows the use of such things as flow-control
 406 messages, and status change messages, where the node may want to forward
 407 the message out another hook to that on which it arrived.
 408 .El
 409 .Pp
 410 Much use has been made of reference counts, so that nodes being
 411 free'd of all references are automatically freed, and this behaviour
 412 has been tested and debugged to present a consistent and trustworthy
 413 framework for the
 414 .Dq type module
 415 writer to use.
 416 .Sh Addressing
 417 The
 418 .Nm
 419 framework provides an unambiguous and simple to use method of specifically
 420 addressing any single node in the graph. The naming of a node is
 421 independent of its type, in that another node, or external component
 422 need not know anything about the node's type in order to address it so as
 423 to send it a generic message type. Node and hook names should be
 424 chosen so as to make addresses meaningful.
 425 .Pp
 426 Addresses are either absolute or relative. An absolute address begins
 427 with a node name, (or ID), followed by a colon, followed by a sequence of hook
 428 names separated by periods. This addresses the node reached by starting
 429 at the named node and following the specified sequence of hooks.
 430 A relative address includes only the sequence of hook names, implicitly
 431 starting hook traversal at the local node.
 432 .Pp
 433 There are a couple of special possibilities for the node name.
 434 The name
 435 .Dq \&.
 436 (referred to as
 437 .Dq \&.: )
 438 always refers to the local node.
 439 Also, nodes that have no global name may be addressed by their ID numbers,
 440 by enclosing the hex representation of the ID number within square brackets.
 441 Here are some examples of valid netgraph addresses:
 442 .Bd -literal -offset 4n -compact
 443
 444   .:
 445   foo:
 446   .:hook1
 447   foo:hook1.hook2
 448   [f057cd80]:hook1
 449 .Ed
 450 .Pp
 451 Consider the following set of nodes might be created for a site with
 452 a single physical frame relay line having two active logical DLCI channels,
 453 with RFC-1490 frames on DLCI 16 and PPP frames over DLCI 20:
 454 .Pp
 455 .Bd -literal
 456 [type SYNC ]                  [type FRAME]                 [type RFC1490]
 457 [ "Frame1" ](uplink)<-->(data)[<un-named>](dlci16)<-->(mux)[<un-named>  ]
 458 [    A     ]                  [    B     ](dlci20)<---+    [     C      ]
 459                                                       |
 460                                                       |      [ type PPP ]
 461                                                       +>(mux)[<un-named>]
 462                                                              [    D     ]
 463 .Ed
 464 .Pp
 465 One could always send a control message to node C from anywhere
 466 by using the name
 467 .Em "Frame1:uplink.dlci16" .
 468 In this case, node C would also be notified that the message
 469 reached it via its hook
 470 .Dq mux .
 471 Similarly,
 472 .Em "Frame1:uplink.dlci20"
 473 could reliably be used to reach node D, and node A could refer
 474 to node B as
 475 .Em ".:uplink" ,
 476 or simply
 477 .Em "uplink" .
 478 Conversely, B can refer to A as
 479 .Em "data" .
 480 The address
 481 .Em "mux.data"
 482 could be used by both nodes C and D to address a message to node A.
 483 .Pp
 484 Note that this is only for
 485 .Em control messages .
 486 In each of these cases, where a relative addressing mode is
 487 used, the recipient is notified of the hook on which the
 488 message arrived, as well as
 489 the originating node.
 490 This allows the option of hop-by-hop distibution of messages and
 491 state information.
 492 Data messages are
 493 .Em only
 494 routed one hop at a time, by specifying the departing
 495 hook, with each node making
 496 the next routing decision. So when B receives a frame on hook
 497 .Dq data
 498 it decodes the frame relay header to determine the DLCI,
 499 and then forwards the unwrapped frame to either C or D.
 500 .Pp
 501 A similar graph might be used to represent multi-link PPP running
 502 over an ISDN line:
 503 .Pp
 504 .Bd -literal
 505 [ type BRI ](B1)<--->(link1)[ type MPP  ]
 506 [  "ISDN1" ](B2)<--->(link2)[ (no name) ]
 507 [          ](D) <-+
 508                   |
 509  +----------------+
 510  |
 511  +->(switch)[ type Q.921 ](term1)<---->(datalink)[ type Q.931 ]
 512             [ (no name)  ]                       [ (no name)  ]
 513 .Ed
 514 .Sh Netgraph Structures
 515 Interesting members of the node and hook structures are shown below:
 516 .Bd -literal
 517 struct  ng_node {
 518   char    *name;                /* Optional globally unique name */
 519   void    *private;             /* Node implementation private info */
 520   struct  ng_type *type;        /* The type of this node */
 521   int     refs;                 /* Number of references to this struct */
 522   int     numhooks;             /* Number of connected hooks */
 523   hook_p  hooks;                /* Linked list of (connected) hooks */
 524 };
 525 typedef struct ng_node *node_p;
 526
 527 struct  ng_hook {
 528   char           *name;         /* This node's name for this hook */
 529   void           *private;      /* Node implementation private info */
 530   int            refs;          /* Number of references to this struct */
 531   struct ng_node *node;         /* The node this hook is attached to */
 532   struct ng_hook *peer;         /* The other hook in this connected pair */
 533   struct ng_hook *next;         /* Next in list of hooks for this node */
 534 };
 535 typedef struct ng_hook *hook_p;
 536 .Ed
 537 .Pp
 538 The maintenance of the name pointers, reference counts, and linked list
 539 of hooks for each node is handled automatically by the
 540 .Nm
 541 subsystem.
 542 Typically a node's private info contains a back-pointer to the node or hook
 543 structure, which counts as a new reference that must be registered by
 544 incrementing
 545 .Dv "node->refs" .
 546 .Pp
 547 From a hook you can obtain the corresponding node, and from
 548 a node the list of all active hooks.
 549 .Pp
 550 Node types are described by these structures:
 551 .Bd -literal
 552 /** How to convert a control message from binary <-> ASCII */
 553 struct ng_cmdlist {
 554   u_int32_t                  cookie;     /* typecookie */
 555   int                        cmd;        /* command number */
 556   const char                 *name;      /* command name */
 557   const struct ng_parse_type *mesgType;  /* args if !NGF_RESP */
 558   const struct ng_parse_type *respType;  /* args if NGF_RESP */
 559 };
 560
 561 struct ng_type {
 562   u_int32_t version;                    /* Must equal NG_VERSION */
 563   const  char *name;                    /* Unique type name */
 564
 565   /* Module event handler */
 566   modeventhand_t  mod_event;            /* Handle load/unload (optional) */
 567
 568   /* Constructor */
 569   int    (*constructor)(node_p *node);  /* Create a new node */
 570
 571   /** Methods using the node **/
 572   int    (*rcvmsg)(node_p node,         /* Receive control message */
 573             struct ng_mesg *msg,                /* The message */
 574             const char *retaddr,                /* Return address */
 575             struct ng_mesg **resp               /* Synchronous response */
 576             hook_p lasthook);                   /* last hook traversed */
 577   int    (*shutdown)(node_p node);      /* Shutdown this node */
 578   int    (*newhook)(node_p node,        /* create a new hook */
 579             hook_p hook,                        /* Pre-allocated struct */
 580             const char *name);                  /* Name for new hook */
 581
 582   /** Methods using the hook **/
 583   int    (*connect)(hook_p hook);       /* Confirm new hook attachment */
 584   int    (*rcvdata)(hook_p hook,        /* Receive data on a hook */
 585             struct mbuf *m,                     /* The data in an mbuf */
 586             meta_p meta,                        /* Meta-data, if any */
 587             struct mbuf  **ret_m,               /* return data here */
 588             meta_p *ret_meta);                  /* return Meta-data here */
 589   int    (*disconnect)(hook_p hook);    /* Notify disconnection of hook */
 590
 591   /** How to convert control messages binary <-> ASCII */
 592   const struct ng_cmdlist *cmdlist;     /* Optional; may be NULL */
 593 };
 594 .Ed
 595 .Pp
 596 Control messages have the following structure:
 597 .Bd -literal
 598 #define NG_CMDSTRLEN    15      /* Max command string (16 with null) */
 599
 600 struct ng_mesg {
 601   struct ng_msghdr {
 602     u_char      version;        /* Must equal NG_VERSION */
 603     u_char      spare;          /* Pad to 2 bytes */
 604     u_short     arglen;         /* Length of cmd/resp data */
 605     u_long      flags;          /* Message status flags */
 606     u_long      token;          /* Reply should have the same token */
 607     u_long      typecookie;     /* Node type understanding this message */
 608     u_long      cmd;            /* Command identifier */
 609     u_char      cmdstr[NG_CMDSTRLEN+1]; /* Cmd string (for debug) */
 610   } header;
 611   char  data[0];                /* Start of cmd/resp data */
 612 };
 613
 614 #define NG_VERSION      1               /* Netgraph version */
 615 #define NGF_ORIG        0x0000          /* Command */
 616 #define NGF_RESP        0x0001          /* Response */
 617 .Ed
 618 .Pp
 619 Control messages have the fixed header shown above, followed by a
 620 variable length data section which depends on the type cookie
 621 and the command. Each field is explained below:
 622 .Bl -tag -width xxx
 623 .It Dv version
 624 Indicates the version of netgraph itself. The current version is
 625 .Dv NG_VERSION .
 626 .It Dv arglen
 627 This is the length of any extra arguments, which begin at
 628 .Dv data .
 629 .It Dv flags
 630 Indicates whether this is a command or a response control message.
 631 .It Dv token
 632 The
 633 .Dv token
 634 is a means by which a sender can match a reply message to the
 635 corresponding command message; the reply always has the same token.
 636 .Pp
 637 .It Dv typecookie
 638 The corresponding node type's unique 32-bit value.
 639 If a node doesn't recognize the type cookie it must reject the message
 640 by returning
 641 .Er EINVAL .
 642 .Pp
 643 Each type should have an include file that defines the commands,
 644 argument format, and cookie for its own messages.
 645 The typecookie
 646 insures that the same header file was included by both sender and
 647 receiver; when an incompatible change in the header file is made,
 648 the typecookie
 649 .Em must
 650 be changed.
 651 The de facto method for generating unique type cookies is to take the
 652 seconds from the epoch at the time the header file is written
 653 (i.e., the output of
 654 .Dv "date -u +'%s'" ) .
 655 .Pp
 656 There is a predefined typecookie
 657 .Dv NGM_GENERIC_COOKIE
 658 for the
 659 .Dq generic
 660 node type, and
 661 a corresponding set of generic messages which all nodes understand.
 662 The handling of these messages is automatic.
 663 .It Dv command
 664 The identifier for the message command. This is type specific,
 665 and is defined in the same header file as the typecookie.
 666 .It Dv cmdstr
 667 Room for a short human readable version of
 668 .Dq command
 669 (for debugging purposes only).
 670 .El
 671 .Pp
 672 Some modules may choose to implement messages from more than one
 673 of the header files and thus recognize more than one type cookie.
 674 .Sh Control Message ASCII Form
 675 Control messages are in binary format for efficiency.  However, for
 676 debugging and human interface purposes, and if the node type supports
 677 it, control messages may be converted to and from an equivalent
 678 .Tn ASCII
 679 form.  The
 680 .Tn ASCII
 681 form is similar to the binary form, with two exceptions:
 682 .Pp
 683 .Bl -tag -compact -width xxx
 684 .It o
 685 The
 686 .Dv cmdstr
 687 header field must contain the
 688 .Tn ASCII
 689 name of the command, corresponding to the
 690 .Dv cmd
 691 header field.
 692 .It o
 693 The
 694 .Dv args
 695 field contains a NUL-terminated
 696 .Tn ASCII
 697 string version of the message arguments.
 698 .El
 699 .Pp
 700 In general, the arguments field of a control messgage can be any
 701 arbitrary C data type.  Netgraph includes parsing routines to support
 702 some pre-defined datatypes in
 703 .Tn ASCII
 704 with this simple syntax:
 705 .Pp
 706 .Bl -tag -compact -width xxx
 707 .It o
 708 Integer types are represented by base 8, 10, or 16 numbers.
 709 .It o
 710 Strings are enclosed in double quotes and respect the normal
 711 C language backslash escapes.
 712 .It o
 713 IP addresses have the obvious form.
 714 .It o
 715 Arrays are enclosed in square brackets, with the elements listed
 716 consecutively starting at index zero.  An element may have an optional
 717 index and equals sign preceeding it.  Whenever an element
 718 does not have an explicit index, the index is implicitly the previous
 719 element's index plus one.
 720 .It o
 721 Structures are enclosed in curly braces, and each field is specified
 722 in the form
 723 .Dq fieldname=value .
 724 .It o
 725 Any array element or structure field whose value is equal to its
 726 .Dq default value
 727 may be omitted. For integer types, the default value
 728 is usually zero; for string types, the empty string.
 729 .It o
 730 Array elements and structure fields may be specified in any order.
 731 .El
 732 .Pp
 733 Each node type may define its own arbitrary types by providing
 734 the necessary routines to parse and unparse.
 735 .Tn ASCII
 736 forms defined
 737 for a specific node type are documented in the documentation for
 738 that node type.
 739 .Sh Generic Control Messages
 740 There are a number of standard predefined messages that will work
 741 for any node, as they are supported directly by the framework itself.
 742 These are defined in
 743 .Pa ng_message.h
 744 along with the basic layout of messages and other similar information.
 745 .Bl -tag -width xxx
 746 .It Dv NGM_CONNECT
 747 Connect to another node, using the supplied hook names on either end.
 748 .It Dv NGM_MKPEER
 749 Construct a node of the given type and then connect to it using the
 750 supplied hook names.
 751 .It Dv NGM_SHUTDOWN
 752 The target node should disconnect from all its neighbours and shut down.
 753 Persistent nodes such as those representing physical hardware
 754 might not disappear from the node namespace, but only reset themselves.
 755 The node must disconnect all of its hooks.
 756 This may result in neighbors shutting themselves down, and possibly a
 757 cascading shutdown of the entire connected graph.
 758 .It Dv NGM_NAME
 759 Assign a name to a node. Nodes can exist without having a name, and this
 760 is the default for nodes created using the
 761 .Dv NGM_MKPEER
 762 method. Such nodes can only be addressed relatively or by their ID number.
 763 .It Dv NGM_RMHOOK
 764 Ask the node to break a hook connection to one of its neighbours.
 765 Both nodes will have their
 766 .Dq disconnect
 767 method invoked.
 768 Either node may elect to totally shut down as a result.
 769 .It Dv NGM_NODEINFO
 770 Asks the target node to describe itself. The four returned fields
 771 are the node name (if named), the node type, the node ID and the
 772 number of hooks attached. The ID is an internal number unique to that node.
 773 .It Dv NGM_LISTHOOKS
 774 This returns the information given by
 775 .Dv NGM_NODEINFO ,
 776 but in addition
 777 includes an array of fields describing each link, and the description for
 778 the node at the far end of that link.
 779 .It Dv NGM_LISTNAMES
 780 This returns an array of node descriptions (as for
 781 .Dv NGM_NODEINFO ")"
 782 where each entry of the array describes a named node.
 783 All named nodes will be described.
 784 .It Dv NGM_LISTNODES
 785 This is the same as
 786 .Dv NGM_LISTNAMES
 787 except that all nodes are listed regardless of whether they have a name or not.
 788 .It Dv NGM_LISTTYPES
 789 This returns a list of all currently installed netgraph types.
 790 .It Dv NGM_TEXT_STATUS
 791 The node may return a text formatted status message.
 792 The status information is determined entirely by the node type.
 793 It is the only "generic" message
 794 that requires any support within the node itself and as such the node may
 795 elect to not support this message. The text response must be less than
 796 .Dv NG_TEXTRESPONSE
 797 bytes in length (presently 1024). This can be used to return general
 798 status information in human readable form.
 799 .It Dv NGM_BINARY2ASCII
 800 This message converts a binary control message to its
 801 .Tn ASCII
 802 form.
 803 The entire control message to be converted is contained within the
 804 arguments field of the
 805 .Dv Dv NGM_BINARY2ASCII
 806 message itself.  If successful, the reply will contain the same control
 807 message in
 808 .Tn ASCII
 809 form.
 810 A node will typically only know how to translate messages that it
 811 itself understands, so the target node of the
 812 .Dv NGM_BINARY2ASCII
 813 is often the same node that would actually receive that message.
 814 .It Dv NGM_ASCII2BINARY
 815 The opposite of
 816 .Dv NGM_BINARY2ASCII .
 817 The entire control message to be converted, in
 818 .Tn ASCII
 819 form, is contained
 820 in the arguments section of the
 821 .Dv NGM_ASCII2BINARY
 822 and need only have the
 823 .Dv flags ,
 824 .Dv cmdstr ,
 825 and
 826 .Dv arglen
 827 header fields filled in, plus the NUL-terminated string version of
 828 the arguments in the arguments field.  If successful, the reply
 829 contains the binary version of the control message.
 830 .El
 831 .Sh Metadata
 832 Data moving through the
 833 .Nm
 834 system can be accompanied by meta-data that describes some
 835 aspect of that data. The form of the meta-data is a fixed header,
 836 which contains enough information for most uses, and can optionally
 837 be supplemented by trailing
 838 .Em option
 839 structures, which contain a
 840 .Em cookie
 841 (see the section on control messages), an identifier, a length and optional
 842 data. If a node does not recognize the cookie associated with an option,
 843 it should ignore that option.
 844 .Pp
 845 Meta data might include such things as priority, discard eligibility,
 846 or special processing requirements. It might also mark a packet for
 847 debug status, etc. The use of meta-data is still experimental.
 848 .Sh INITIALIZATION
 849 The base
 850 .Nm
 851 code may either be statically compiled
 852 into the kernel or else loaded dynamically as a KLD via
 853 .Xr kldload 8 .
 854 In the former case, include
 855 .Bd -literal -offset 4n -compact
 856
 857    options NETGRAPH
 858
 859 .Ed
 860 in your kernel configuration file. You may also include selected
 861 node types in the kernel compilation, for example:
 862 .Bd -literal -offset 4n -compact
 863
 864    options NETGRAPH
 865    options NETGRAPH_SOCKET
 866    options NETGRAPH_ECHO
 867
 868 .Ed
 869 .Pp
 870 Once the
 871 .Nm
 872 subsystem is loaded, individual node types may be loaded at any time
 873 as KLD modules via
 874 .Xr kldload 8 .
 875 Moreover,
 876 .Nm
 877 knows how to automatically do this; when a request to create a new
 878 node of unknown type
 879 .Em type
 880 is made,
 881 .Nm
 882 will attempt to load the KLD module
 883 .Pa ng_type.ko .
 884 .Pp
 885 Types can also be installed at boot time, as certain device drivers
 886 may want to export each instance of the device as a netgraph node.
 887 .Pp
 888 In general, new types can be installed at any time from within the
 889 kernel by calling
 890 .Fn ng_newtype ,
 891 supplying a pointer to the type's
 892 .Dv struct ng_type
 893 structure.
 894 .Pp
 895 The
 896 .Fn NETGRAPH_INIT
 897 macro automates this process by using a linker set.
 898 .Sh EXISTING NODE TYPES
 899 Several node types currently exist. Each is fully documented
 900 in its own man page:
 901 .Bl -tag -width xxx
 902 .It SOCKET
 903 The socket type implements two new sockets in the new protocol domain
 904 .Dv PF_NETGRAPH .
 905 The new sockets protocols are
 906 .Dv NG_DATA
 907 and
 908 .Dv NG_CONTROL ,
 909 both of type
 910 .Dv SOCK_DGRAM .
 911 Typically one of each is associated with a socket node.
 912 When both sockets have closed, the node will shut down. The
 913 .Dv NG_DATA
 914 socket is used for sending and receiving data, while the
 915 .Dv NG_CONTROL
 916 socket is used for sending and receiving control messages.
 917 Data and control messages are passed using the
 918 .Xr sendto 2
 919 and
 920 .Xr recvfrom 2
 921 calls, using a
 922 .Dv struct sockaddr_ng
 923 socket address.
 924 .Pp
 925 .It HOLE
 926 Responds only to generic messages and is a
 927 .Dq black hole
 928 for data, Useful for testing. Always accepts new hooks.
 929 .Pp
 930 .It ECHO
 931 Responds only to generic messages and always echoes data back through the
 932 hook from which it arrived. Returns any non generic messages as their
 933 own response. Useful for testing.  Always accepts new hooks.
 934 .Pp
 935 .It TEE
 936 This node is useful for
 937 .Dq snooping .
 938 It has 4 hooks:
 939 .Dv left ,
 940 .Dv right ,
 941 .Dv left2right ,
 942 and
 943 .Dv right2left .
 944 Data entering from the right is passed to the left and duplicated on
 945 .Dv right2left,
 946 and data entering from the left is passed to the right and
 947 duplicated on
 948 .Dv left2right .
 949 Data entering from
 950 .Dv left2right
 951 is sent to the right and data from
 952 .Dv right2left
 953 to left.
 954 .Pp
 955 .It RFC1490 MUX
 956 Encapsulates/de-encapsulates frames encoded according to RFC 1490.
 957 Has a hook for the encapsulated packets
 958 .Pq Dq downstream
 959 and one hook
 960 for each protocol (i.e., IP, PPP, etc.).
 961 .Pp
 962 .It FRAME RELAY MUX
 963 Encapsulates/de-encapsulates Frame Relay frames.
 964 Has a hook for the encapsulated packets
 965 .Pq Dq downstream
 966 and one hook
 967 for each DLCI.
 968 .Pp
 969 .It FRAME RELAY LMI
 970 Automatically handles frame relay
 971 .Dq LMI
 972 (link management interface) operations and packets.
 973 Automatically probes and detects which of several LMI standards
 974 is in use at the exchange.
 975 .Pp
 976 .It TTY
 977 This node is also a line discipline. It simply converts between mbuf
 978 frames and sequential serial data, allowing a tty to appear as a netgraph
 979 node. It has a programmable
 980 .Dq hotkey
 981 character.
 982 .Pp
 983 .It ASYNC
 984 This node encapsulates and de-encapsulates asynchronous frames
 985 according to RFC 1662. This is used in conjunction with the TTY node
 986 type for supporting PPP links over asynchronous serial lines.
 987 .Pp
 988 .It INTERFACE
 989 This node is also a system networking interface. It has hooks representing
 990 each protocol family (IP, AppleTalk, IPX, etc.) and appears in the output of
 991 .Xr ifconfig 8 .
 992 The interfaces are named
 993 .Em ng0 ,
 994 .Em ng1 ,
 995 etc.
 996 .El
 997 .Sh NOTES
 998 Whether a named node exists can be checked by trying to send a control message
 999 to it (e.g.,
1000 .Dv NGM_NODEINFO
1001 ).
1002 If it does not exist,
1003 .Er ENOENT
1004 will be returned.
1005 .Pp
1006 All data messages are mbuf chains with the M_PKTHDR flag set.
1007 .Pp
1008 Nodes are responsible for freeing what they allocate.
1009 There are three exceptions:
1010 .Bl -tag -width xxxx
1011 .It 1
1012 Mbufs sent across a data link are never to be freed by the sender,
1013 unless it is returned from the recipient.
1014 .It 2
1015 Any meta-data information traveling with the data has the same restriction.
1016 It might be freed by any node the data passes through, and a
1017 .Dv NULL
1018 passed onwards, but the caller will never free it.
1019 Two macros
1020 .Fn NG_FREE_META "meta"
1021 and
1022 .Fn NG_FREE_DATA "m" "meta"
1023 should be used if possible to free data and meta data (see
1024 .Pa netgraph.h ) .
1025 .It 3
1026 Messages sent using
1027 .Fn ng_send_message
1028 are freed by the recipient. As in the case above, the addresses
1029 associated with the message are freed by whatever allocated them so the
1030 recipient should copy them if it wants to keep that information.
1031 .El
1032 .Sh FILES
1033 .Bl -tag -width xxxxx -compact
1034 .It Pa /sys/netgraph/netgraph.h
1035 Definitions for use solely within the kernel by
1036 .Nm
1037 nodes.
1038 .It Pa /sys/netgraph/ng_message.h
1039 Definitions needed by any file that needs to deal with
1040 .Nm
1041 messages.
1042 .It Pa /sys/netgraph/ng_socket.h
1043 Definitions needed to use
1044 .Nm
1045 socket type nodes.
1046 .It Pa /sys/netgraph/ng_{type}.h
1047 Definitions needed to use
1048 .Nm
1049 {type}
1050 nodes, including the type cookie definition.
1051 .It Pa /modules/netgraph.ko
1052 Netgraph subsystem loadable KLD module.
1053 .It Pa /modules/ng_{type}.ko
1054 Loadable KLD module for node type {type}.
1055 .El
1056 .Sh USER MODE SUPPORT
1057 There is a library for supporting user-mode programs that wish
1058 to interact with the netgraph system. See
1059 .Xr netgraph 3
1060 for details.
1061 .Pp
1062 Two user-mode support programs,
1063 .Xr ngctl 8
1064 and
1065 .Xr nghook 8 ,
1066 are available to assist manual configuration and debugging.
1067 .Pp
1068 There are a few useful techniques for debugging new node types.
1069 First, implementing new node types in user-mode first
1070 makes debugging easier.
1071 The
1072 .Em tee
1073 node type is also useful for debugging, especially in conjunction with
1074 .Xr ngctl 8
1075 and
1076 .Xr nghook 8 .
1077 .Sh SEE ALSO
1078 .Xr socket 2 ,
1079 .Xr netgraph 3 ,
1080 .Xr ng_async 4 ,
1081 .Xr ng_bpf 4 ,
1082 .Xr ng_cisco 4 ,
1083 .Xr ng_ether 4 ,
1084 .Xr ng_echo 4 ,
1085 .Xr ng_frame_relay 4 ,
1086 .Xr ng_hole 4 ,
1087 .Xr ng_iface 4 ,
1088 .Xr ng_ksocket 4 ,
1089 .Xr ng_lmi 4 ,
1090 .Xr ng_mppc 4 ,
1091 .Xr ng_ppp 4 ,
1092 .Xr ng_pppoe 4 ,
1093 .Xr ng_rfc1490 4 ,
1094 .Xr ng_socket 4 ,
1095 .Xr ng_tee 4 ,
1096 .Xr ng_tty 4 ,
1097 .Xr ng_UI 4 ,
1098 .Xr ng_vjc 4 ,
1099 .Xr ng_{type} 4 ,
1100 .Xr ngctl 8 ,
1101 .Xr nghook 8
1102 .Sh HISTORY
1103 The
1104 .Nm
1105 system was designed and first implemented at Whistle Communications, Inc.
1106 in a version of
1107 .Fx 2.2
1108 customized for the Whistle InterJet.
1109 It first made its debut in the main tree in
1110 .Fx 3.4 .
1111 .Sh AUTHORS
1112 .An Julian Elischer Aq julian@whistle.com ,
1113 with contributions by
1114 .An Archie Cobbs Aq archie@whistle.com .