1 .\" Copyright (c) 1990 The Regents of the University of California.
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that: (1) source code distributions
6 .\" retain the above copyright notice and this paragraph in its entirety, (2)
7 .\" distributions including binary code include the above copyright notice and
8 .\" this paragraph in its entirety in the documentation or other materials
9 .\" provided with the distribution, and (3) all advertising materials mentioning
10 .\" features or use of this software display the following acknowledgement:
11 .\" ``This product includes software developed by the University of California,
12 .\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
13 .\" the University nor the names of its contributors may be used to endorse
14 .\" or promote products derived from this software without specific prior
15 .\" written permission.
16 .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
17 .\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
18 .\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
20 .\" This document is derived in part from the enet man page (enet.4)
21 .\" distributed with 4.3BSD Unix.
30 .Nd Berkeley Packet Filter
34 The Berkeley Packet Filter
35 provides a raw interface to data link layers in a protocol
37 All packets on the network, even those destined for other hosts,
38 are accessible through this mechanism.
40 The packet filter appears as a character special device,
44 After opening the device, the file descriptor must be bound to a
45 specific network interface with the
48 A given interface can be shared by multiple listeners, and the filter
49 underlying each descriptor will see an identical packet stream.
51 A separate device file is required for each minor device.
52 If a file is in use, the open will fail and
57 Associated with each open instance of a
59 file is a user-settable packet filter.
60 Whenever a packet is received by an interface,
61 all file descriptors listening on that interface apply their filter.
62 Each descriptor that accepts the packet receives its own copy.
64 Reads from these files return the next group of packets
65 that have matched the filter.
66 To improve performance, the buffer passed to read must be
67 the same size as the buffers used internally by
69 This size is returned by the
71 ioctl (see below), and
74 Note that an individual packet larger than this size is necessarily
77 The packet filter will support any link level protocol that has fixed length
79 Currently, only Ethernet,
83 drivers have been modified to interact with
86 Since packet data is in network byte order, applications should use the
88 macros to extract multi-byte values.
90 A packet can be sent out on the network by writing to a
93 The writes are unbuffered, meaning only one packet can be processed per write.
94 Currently, only writes to Ethernets and
100 command codes below are defined in
105 #include <sys/types.h>
106 #include <sys/time.h>
107 #include <sys/ioctl.h>
124 the following commands may be applied to any open
127 The (third) argument to
129 should be a pointer to the type indicated.
130 .Bl -tag -width BIOCGRTIMEOUT
133 Returns the required buffer length for reads on
138 Sets the buffer length for reads on
141 The buffer must be set before the file is attached to an interface
144 If the requested buffer size cannot be accommodated, the closest
145 allowable size will be set and returned in the argument.
146 A read call will result in
148 if it is passed a buffer that is not this size.
151 Returns the type of the data link layer underlying the attached interface.
153 is returned if no interface has been specified.
154 The device types, prefixed with
159 Forces the interface into promiscuous mode.
160 All packets, not just those destined for the local host, are processed.
161 Since more than one file can be listening on a given interface,
162 a listener that opened its interface non-promiscuously may receive
163 packets promiscuously.
164 This problem can be remedied with an appropriate filter.
166 Flushes the buffer of incoming packets,
167 and resets the statistics that are returned by BIOCGSTATS.
169 .Pq Li "struct ifreq"
170 Returns the name of the hardware interface that the file is listening on.
171 The name is returned in the ifr_name field of
175 All other fields are undefined.
177 .Pq Li "struct ifreq"
178 Sets the hardware interface associate with the file.
180 command must be performed before any packets can be read.
181 The device is indicated by name using the
186 Additionally, performs the actions of
190 .Pq Li "struct timeval"
191 Set or get the read timeout parameter.
193 specifies the length of time to wait before timing
194 out on a read request.
195 This parameter is initialized to zero by
197 indicating no timeout.
199 .Pq Li "struct bpf_stat"
200 Returns the following structure of packet statistics:
203 u_int bs_recv; /* number of packets received */
204 u_int bs_drop; /* number of packets dropped */
209 .Bl -hang -offset indent
211 the number of packets received by the descriptor since opened or reset
212 (including any buffered since the last read call);
215 the number of packets which were accepted by the filter but dropped by the
216 kernel because of buffer overflows
217 (i.e., the application's reads are not keeping up with the packet traffic).
223 based on the truth value of the argument.
224 When immediate mode is enabled, reads return immediately upon packet
226 Otherwise, a read will block until either the kernel buffer
227 becomes full or a timeout occurs.
228 This is useful for programs like
230 which must respond to messages in real time.
231 The default for a new file is off.
234 .Pq Li "struct bpf_program"
235 Sets the read filter program used by the kernel to discard uninteresting
237 An array of instructions and its length is passed in using
238 the following structure:
242 struct bpf_insn *bf_insns;
246 The filter program is pointed to by the
248 field while its length in units of
249 .Sq Li struct bpf_insn
255 for an explanation of the filter language.
256 The only difference between
262 performs the actions of
268 .Pq Li "struct bpf_program"
269 Sets the write filter program used by the kernel to control what type of
270 packets can be written to the interface.
278 .Pq Li "struct bpf_version"
279 Returns the major and minor version numbers of the filter language currently
280 recognized by the kernel.
281 Before installing a filter, applications must check
282 that the current version is compatible with the running kernel.
283 Version numbers are compatible if the major numbers match and the application minor
284 is less than or equal to the kernel minor.
285 The kernel version number is returned in the following structure:
293 The current version numbers are given by
294 .Dv BPF_MAJOR_VERSION
296 .Dv BPF_MINOR_VERSION
299 An incompatible filter
300 may result in undefined behavior (most likely, an error returned by
302 or haphazard packet matching).
306 Set or get the status of the
309 Set to zero if the link level source address should be filled in automatically
310 by the interface output routine.
311 Set to one if the link level source
312 address will be written, as provided, to the wire.
313 This flag is initialized to zero by default.
317 These commands are obsolete but left for compatibility.
323 Set or get the flag determining whether locally generated packets on the
324 interface should be returned by BPF.
325 Set to zero to see only incoming packets on the interface.
326 Set to one to see packets originating locally and remotely on the interface.
327 This flag is initialized to one by default.
328 .It Dv BIOCSDIRECTION
329 .It Dv BIOCGDIRECTION
331 Set or get the setting determining whether incoming, outgoing, or all packets
332 on the interface should be returned by BPF.
335 to see only incoming packets on the interface.
338 to see packets originating locally and remotely on the interface.
341 to see only outgoing packets on the interface.
342 This setting is initialized to
347 Set packet feedback mode.
348 This allows injected packets to be fed back as input to the interface when
349 output via the interface is successful.
352 direction is set, injected outgoing packet is not returned by BPF to avoid
353 duplication. This flag is initialized to zero by default.
355 Set the locked flag on the
358 This prevents the execution of
359 ioctl commands which could change the underlying operating parameters of
363 The following structure is prepended to each packet returned by
367 struct timeval bh_tstamp; /* time stamp */
368 u_long bh_caplen; /* length of captured portion */
369 u_long bh_datalen; /* original length of packet */
370 u_short bh_hdrlen; /* length of bpf header (this struct
371 plus alignment padding */
375 The fields, whose values are stored in host order, and are:
377 .Bl -tag -compact -width bh_datalen
379 The time at which the packet was processed by the packet filter.
381 The length of the captured portion of the packet.
382 This is the minimum of
383 the truncation amount specified by the filter and the length of the packet.
385 The length of the packet off the wire.
386 This value is independent of the truncation amount specified by the filter.
390 header, which may not be equal to
391 .\" XXX - not really a function call
392 .Fn sizeof "struct bpf_hdr" .
397 field exists to account for
398 padding between the header and the link level protocol.
399 The purpose here is to guarantee proper alignment of the packet
400 data structures, which is required on alignment sensitive
401 architectures and improves performance on many other architectures.
402 The packet filter insures that the
404 and the network layer
405 header will be word aligned.
407 must be taken when accessing the link layer protocol fields on alignment
409 (This is not a problem on an Ethernet, since
410 the type field is a short falling on an even offset,
411 and the addresses are probably accessed in a bytewise fashion).
413 Additionally, individual packets are padded so that each starts
415 This requires that an application
416 has some knowledge of how to get from packet to packet.
423 It rounds up its argument to the nearest word aligned value (where a word is
429 points to the start of a packet, this expression
430 will advance it to the next packet:
431 .Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
433 For the alignment mechanisms to work properly, the
436 must itself be word aligned.
440 will always return an aligned buffer.
442 A filter program is an array of instructions, with all branches forwardly
443 directed, terminated by a
446 Each instruction performs some action on the pseudo-machine state,
447 which consists of an accumulator, index register, scratch memory store,
448 and implicit program counter.
450 The following structure defines the instruction format:
462 field is used in different ways by different instructions,
467 fields are used as offsets
468 by the branch instructions.
469 The opcodes are encoded in a semi-hierarchical fashion.
470 There are eight classes of instructions:
480 Various other mode and
481 operator bits are or'd into the class to give the actual instructions.
482 The classes and modes are defined in
485 Below are the semantics for each defined
488 We use the convention that A is the accumulator, X is the index register,
489 P[] packet data, and M[] scratch memory store.
490 P[i:n] gives the data at byte offset
493 interpreted as a word (n=4),
494 unsigned halfword (n=2), or unsigned byte (n=1).
495 M[i] gives the i'th word in the scratch memory store, which is only
496 addressed in word units.
497 The memory store is indexed from 0 to
504 are the corresponding fields in the
505 instruction definition.
507 refers to the length of the packet.
509 .Bl -tag -width BPF_STXx
511 These instructions copy a value into the accumulator.
512 The type of the source operand is specified by an
514 and can be a constant
516 packet data at a fixed offset
518 packet data at a variable offset
522 or a word in the scratch memory store
528 the data size must be specified as a word
534 The semantics of all the recognized
539 BPF_LD+BPF_W+BPF_ABS A <- P[k:4]
540 BPF_LD+BPF_H+BPF_ABS A <- P[k:2]
541 BPF_LD+BPF_B+BPF_ABS A <- P[k:1]
542 BPF_LD+BPF_W+BPF_IND A <- P[X+k:4]
543 BPF_LD+BPF_H+BPF_IND A <- P[X+k:2]
544 BPF_LD+BPF_B+BPF_IND A <- P[X+k:1]
545 BPF_LD+BPF_W+BPF_LEN A <- len
546 BPF_LD+BPF_IMM A <- k
547 BPF_LD+BPF_MEM A <- M[k]
550 These instructions load a value into the index register.
552 the addressing modes are more restrictive than those of the accumulator loads,
555 a hack for efficiently loading the IP header length.
558 BPF_LDX+BPF_W+BPF_IMM X <- k
559 BPF_LDX+BPF_W+BPF_MEM X <- M[k]
560 BPF_LDX+BPF_W+BPF_LEN X <- len
561 BPF_LDX+BPF_B+BPF_MSH X <- 4*(P[k:1]&0xf)
564 This instruction stores the accumulator into the scratch memory.
565 We do not need an addressing mode since there is only one possibility
572 This instruction stores the index register in the scratch memory store.
578 The alu instructions perform operations between the accumulator and
579 index register or constant, and store the result back in the accumulator.
580 For binary operations, a source mode is required
586 BPF_ALU+BPF_ADD+BPF_K A <- A + k
587 BPF_ALU+BPF_SUB+BPF_K A <- A - k
588 BPF_ALU+BPF_MUL+BPF_K A <- A * k
589 BPF_ALU+BPF_DIV+BPF_K A <- A / k
590 BPF_ALU+BPF_AND+BPF_K A <- A & k
591 BPF_ALU+BPF_OR+BPF_K A <- A | k
592 BPF_ALU+BPF_LSH+BPF_K A <- A << k
593 BPF_ALU+BPF_RSH+BPF_K A <- A >> k
594 BPF_ALU+BPF_ADD+BPF_X A <- A + X
595 BPF_ALU+BPF_SUB+BPF_X A <- A - X
596 BPF_ALU+BPF_MUL+BPF_X A <- A * X
597 BPF_ALU+BPF_DIV+BPF_X A <- A / X
598 BPF_ALU+BPF_AND+BPF_X A <- A & X
599 BPF_ALU+BPF_OR+BPF_X A <- A | X
600 BPF_ALU+BPF_LSH+BPF_X A <- A << X
601 BPF_ALU+BPF_RSH+BPF_X A <- A >> X
602 BPF_ALU+BPF_NEG A <- -A
605 The jump instructions alter flow of control.
607 compare the accumulator against a constant
609 or the index register
611 If the result is true (or non-zero),
612 the true branch is taken, otherwise the false branch is taken.
613 Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
614 However, the jump always
616 opcode uses the 32 bit
618 field as the offset, allowing arbitrarily distant destinations.
619 All conditionals use unsigned comparison conventions.
622 BPF_JMP+BPF_JA pc += k
623 BPF_JMP+BPF_JGT+BPF_K pc += (A > k) ? jt : jf
624 BPF_JMP+BPF_JGE+BPF_K pc += (A >= k) ? jt : jf
625 BPF_JMP+BPF_JEQ+BPF_K pc += (A == k) ? jt : jf
626 BPF_JMP+BPF_JSET+BPF_K pc += (A & k) ? jt : jf
627 BPF_JMP+BPF_JGT+BPF_X pc += (A > X) ? jt : jf
628 BPF_JMP+BPF_JGE+BPF_X pc += (A >= X) ? jt : jf
629 BPF_JMP+BPF_JEQ+BPF_X pc += (A == X) ? jt : jf
630 BPF_JMP+BPF_JSET+BPF_X pc += (A & X) ? jt : jf
633 The return instructions terminate the filter program and specify the amount
634 of packet to accept (i.e., they return the truncation amount).
635 A return value of zero indicates that the packet should be ignored.
636 The return value is either a constant
642 BPF_RET+BPF_A accept A bytes
643 BPF_RET+BPF_K accept k bytes
646 The miscellaneous category was created for anything that does not
647 fit into the above classes, and for any new instructions that might need to
649 Currently, these are the register transfer instructions
650 that copy the index register to the accumulator or vice versa.
653 BPF_MISC+BPF_TAX X <- A
654 BPF_MISC+BPF_TXA A <- X
660 interface provides the following macros to facilitate
662 .Fn BPF_STMT opcode operand
664 .Fn BPF_JUMP opcode operand true_offset false_offset .
666 .Bl -tag -compact -width /dev/bpfXXX
667 .It Pa /dev/bpf Ns Sy n
668 the packet filter device
671 The following filter is taken from the Reverse ARP Daemon.
672 It accepts only Reverse ARP requests.
674 struct bpf_insn insns[] = {
675 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
676 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
677 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
678 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
679 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
680 sizeof(struct ether_header)),
681 BPF_STMT(BPF_RET+BPF_K, 0),
685 This filter accepts only IP packets between host 128.3.112.15 and
688 struct bpf_insn insns[] = {
689 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
690 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
691 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
692 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
693 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
694 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
695 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
696 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
697 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
698 BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
699 BPF_STMT(BPF_RET+BPF_K, 0),
703 Finally, this filter returns only TCP finger packets.
704 We must parse the IP header to reach the TCP header.
708 checks that the IP fragment offset is 0 so we are sure
709 that we have a TCP header.
711 struct bpf_insn insns[] = {
712 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
713 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
714 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
715 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
716 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
717 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
718 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
719 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
720 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
721 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
722 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
723 BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
724 BPF_STMT(BPF_RET+BPF_K, 0),
736 .%T "An efficient, extensible, and portable network monitor"
739 The Enet packet filter was created in 1980 by Mike Accetta and
740 Rick Rashid at Carnegie-Mellon University.
742 Stanford, ported the code to
744 and continued its development from
746 Since then, it has evolved into the Ultrix Packet Filter at
758 of Lawrence Berkeley Laboratory, implemented BPF in
760 Much of the design is due to
763 The read buffer must be of a fixed size (returned by the
767 A file that does not request promiscuous mode may receive promiscuously
768 received packets as a side effect of another file requesting this
769 mode on the same hardware interface.
770 This could be fixed in the kernel with additional processing overhead.
771 However, we favor the model where
772 all files must assume that the interface is promiscuous, and if
773 so desired, must utilize a filter to reject foreign packets.
775 Data link protocols with variable length headers are not currently supported.
782 settings have been observed to work incorrectly on some interface
783 types, including those with hardware loopback rather than software loopback,
784 and point-to-point interfaces.
785 They appear to function correctly on a
786 broad range of Ethernet-style interfaces.