1 .\" Copyright (c) 2011-2014 Matteo Landi, Luigi Rizzo, Universita` di Pisa
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
13 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25 .\" This document is derived in part from the enet man page (enet.4)
26 .\" distributed with 4.3BSD Unix.
35 .Nd a framework for fast packet I/O
38 .Nd a fast VirtuAl Local Ethernet using the netmap API
41 .Nd a shared memory packet transport channel
46 is a framework for extremely fast and efficient packet I/O
47 for both userspace and kernel clients.
48 It runs on FreeBSD and Linux,
51 a very fast and modular in-kernel software switch/dataplane,
54 a shared memory packet transport channel.
55 All these are accessed interchangeably with the same API.
60 are at least one order of magnitude faster than
61 standard OS mechanisms
62 (sockets, bpf, tun/tap interfaces, native switches, pipes),
63 reaching 14.88 million packets per second (Mpps)
64 with much less than one core on a 10 Gbit NIC,
65 about 20 Mpps per core for VALE ports,
66 and over 100 Mpps for netmap pipes.
68 Userspace clients can dynamically switch NICs into
70 mode and send and receive raw packets through
71 memory mapped buffers.
74 switch instances and ports, and
76 can be created dynamically,
77 providing high speed packet I/O between processes,
78 virtual machines, NICs and the host stack.
81 suports both non-blocking I/O through
83 synchronization and blocking I/O through a file descriptor
84 and standard OS mechanisms such as
92 are implemented by a single kernel module, which also emulates the
94 API over standard drivers for devices without native
99 requires explicit support in device drivers.
101 In the rest of this (long) manual page we document
102 various aspects of the
106 architecture, features and usage.
110 supports raw packet I/O through a
112 which can be connected to a physical interface
118 Ports use preallocated circular queues of buffers
120 residing in an mmapped region.
121 There is one ring for each transmit/receive queue of a
123 An additional ring pair connects to the host stack.
125 After binding a file descriptor to a port, a
127 client can send or receive packets in batches through
128 the rings, and possibly implement zero-copy forwarding
131 All NICs operating in
133 mode use the same memory region,
134 accessible to all processes who own
136 file descriptors bound to NICs.
142 by default use separate memory regions,
143 but can be independently configured to share memory.
145 .Sh ENTERING AND EXITING NETMAP MODE
146 The following section describes the system calls to create
154 Simpler, higher level functions are described in section
157 Ports and rings are created and controlled through a file descriptor,
158 created by opening a special device
159 .Dl fd = open("/dev/netmap");
160 and then bound to a specific port with an
161 .Dl ioctl(fd, NIOCREGIF, (struct nmreq *)arg);
164 has multiple modes of operation controlled by the
168 specifies the port name, as follows:
170 .It Dv OS network interface name (e.g. 'em0', 'eth1', ... )
171 the data path of the NIC is disconnected from the host stack,
172 and the file descriptor is bound to the NIC (one or all queues),
173 or to the host stack;
174 .It Dv valeXXX:YYY (arbitrary XXX and YYY)
175 the file descriptor is bound to port YYY of a VALE switch called XXX,
176 both dynamically created if necessary.
177 The string cannot exceed IFNAMSIZ characters, and YYY cannot
178 be the name of any existing OS network interface.
183 indicates the size of the shared memory region,
184 and the number, size and location of all the
186 data structures, which can be accessed by mmapping the memory
187 .Dl char *mem = mmap(0, arg.nr_memsize, fd);
189 Non blocking I/O is done with special
194 on the file descriptor permit blocking I/O.
204 mode, the OS will still believe the interface is up and running.
205 OS-generated packets for that NIC end up into a
207 ring, and another ring is used to send packets into the OS network stack.
210 on the file descriptor removes the binding,
211 and returns the NIC to normal mode (reconnecting the data path
212 to the host stack), or destroys the virtual port.
214 The data structures in the mmapped memory region are detailed in
215 .Xr sys/net/netmap.h ,
216 which is the ultimate reference for the
218 API. The main structures and fields are indicated below:
220 .It Dv struct netmap_if (one per interface)
224 const uint32_t ni_flags; /* properties */
226 const uint32_t ni_tx_rings; /* NIC tx rings */
227 const uint32_t ni_rx_rings; /* NIC rx rings */
228 uint32_t ni_bufs_head; /* head of extra bufs list */
233 Indicates the number of available rings
234 .Pa ( struct netmap_rings )
235 and their position in the mmapped region.
236 The number of tx and rx rings
237 .Pa ( ni_tx_rings , ni_rx_rings )
238 normally depends on the hardware.
239 NICs also have an extra tx/rx ring pair connected to the host stack.
241 can also request additional unbound buffers in the same memory space,
242 to be used as temporary storage for packets.
244 contains the index of the first of these free rings,
245 which are connected in a list (the first uint32_t of each
246 buffer being the index of the next buffer in the list).
247 A 0 indicates the end of the list.
249 .It Dv struct netmap_ring (one per ring)
253 const uint32_t num_slots; /* slots in each ring */
254 const uint32_t nr_buf_size; /* size of each buffer */
256 uint32_t head; /* (u) first buf owned by user */
257 uint32_t cur; /* (u) wakeup position */
258 const uint32_t tail; /* (k) first buf owned by kernel */
261 struct timeval ts; /* (k) time of last rxsync() */
263 struct netmap_slot slot[0]; /* array of slots */
267 Implements transmit and receive rings, with read/write
268 pointers, metadata and and an array of
270 describing the buffers.
272 .It Dv struct netmap_slot (one per buffer)
275 uint32_t buf_idx; /* buffer index */
276 uint16_t len; /* packet length */
277 uint16_t flags; /* buf changed, etc. */
278 uint64_t ptr; /* address for indirect buffers */
282 Describes a packet buffer, which normally is identified by
283 an index and resides in the mmapped region.
284 .It Dv packet buffers
285 Fixed size (normally 2 KB) packet buffers allocated by the kernel.
290 in the mmapped region is indicated by the
292 field in the structure returned by
294 From there, all other objects are reachable through
295 relative references (offsets or indexes).
296 Macros and functions in <net/netmap_user.h>
297 help converting them into actual pointers:
299 .Dl struct netmap_if *nifp = NETMAP_IF(mem, arg.nr_offset);
300 .Dl struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index);
301 .Dl struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index);
303 .Dl char *buf = NETMAP_BUF(ring, buffer_index);
304 .Sh RINGS, BUFFERS AND DATA I/O
306 are circular queues of packets with three indexes/pointers
307 .Va ( head , cur , tail ) ;
308 one slot is always kept empty.
311 should not be assumed to be a power of two.
313 (NOTE: older versions of netmap used head/count format to indicate
314 the content of a ring).
317 is the first slot available to userspace;
321 select/poll will unblock when
327 is the first slot reserved to the kernel.
329 Slot indexes MUST only move forward;
330 for convenience, the function
331 .Dl nm_ring_next(ring, index)
332 returns the next index modulo the ring size.
337 are only modified by the user program;
339 is only modified by the kernel.
340 The kernel only reads/writes the
341 .Vt struct netmap_ring
343 during the execution of a netmap-related system call.
344 The only exception are slots (and buffers) in the range
345 .Va tail\ . . . head-1 ,
346 that are explicitly assigned to the kernel.
349 On transmit rings, after a
351 system call, slots in the range
352 .Va head\ . . . tail-1
353 are available for transmission.
354 User code should fill the slots sequentially
359 past slots ready to transmit.
361 may be moved further ahead if the user code needs
362 more slots before further transmissions (see
363 .Sx SCATTER GATHER I/O ) .
365 At the next NIOCTXSYNC/select()/poll(),
368 are pushed to the port, and
370 may advance if further slots have become available.
371 Below is an example of the evolution of a TX ring:
374 after the syscall, slots between cur and tail are (a)vailable
378 TX [.....aaaaaaaaaaa.............]
380 user creates new packets to (T)ransmit
384 TX [.....TTTTTaaaaaa.............]
386 NIOCTXSYNC/poll()/select() sends packets and reports new slots
390 TX [..........aaaaaaaaaaa........]
392 select() and poll() wlll block if there is no space in the ring, i.e.
393 .Dl ring->cur == ring->tail
394 and return when new slots have become available.
396 High speed applications may want to amortize the cost of system calls
397 by preparing as many packets as possible before issuing them.
399 A transmit ring with pending transmissions has
400 .Dl ring->head != ring->tail + 1 (modulo the ring size).
402 .Va int nm_tx_pending(ring)
403 implements this test.
406 On receive rings, after a
408 system call, the slots in the range
409 .Va head\& . . . tail-1
410 contain received packets.
411 User code should process them and advance
415 past slots it wants to return to the kernel.
417 may be moved further ahead if the user code wants to
418 wait for more packets
419 without returning all the previous slots to the kernel.
421 At the next NIOCRXSYNC/select()/poll(),
424 are returned to the kernel for further receives, and
426 may advance to report new incoming packets.
428 Below is an example of the evolution of an RX ring:
430 after the syscall, there are some (h)eld and some (R)eceived slots
434 RX [..hhhhhhRRRRRRRR..........]
436 user advances head and cur, releasing some slots and holding others
440 RX [..*****hhhRRRRRR...........]
442 NICRXSYNC/poll()/select() recovers slots and reports new packets
446 RX [.......hhhRRRRRRRRRRRR....]
449 .Sh SLOTS AND PACKET BUFFERS
450 Normally, packets should be stored in the netmap-allocated buffers
451 assigned to slots when ports are bound to a file descriptor.
452 One packet is fully contained in a single buffer.
454 The following flags affect slot and buffer processing:
457 it MUST be used when the buf_idx in the slot is changed.
458 This can be used to implement
459 zero-copy forwarding, see
460 .Sx ZERO-COPY FORWARDING .
463 reports when this buffer has been transmitted.
466 notifies transmit completions in batches, hence signals
467 can be delayed indefinitely. This flag helps detecting
468 when packets have been send and a file descriptor can be closed.
470 When a ring is in 'transparent' mode (see
471 .Sx TRANSPARENT MODE ) ,
472 packets marked with this flags are forwarded to the other endpoint
473 at the next system call, thus restoring (in a selective way)
474 the connection between a NIC and the host stack.
476 tells the forwarding code that the SRC MAC address for this
477 packet must not be used in the learning bridge code.
479 indicates that the packet's payload is in a user-supplied buffer,
480 whose user virtual address is in the 'ptr' field of the slot.
481 The size can reach 65535 bytes.
483 This is only supported on the transmit ring of
485 ports, and it helps reducing data copies in the interconnection
488 indicates that the packet continues with subsequent buffers;
489 the last buffer in a packet must have the flag clear.
491 .Sh SCATTER GATHER I/O
492 Packets can span multiple slots if the
494 flag is set in all but the last slot.
495 The maximum length of a chain is 64 buffers.
496 This is normally used with
498 ports when connecting virtual machines, as they generate large
499 TSO segments that are not split unless they reach a physical device.
501 NOTE: The length field always refers to the individual
502 fragment; there is no place with the total length of a packet.
504 On receive rings the macro
506 indicates the remaining number of slots for this packet,
507 including the current one.
508 Slots with a value greater than 1 also have NS_MOREFRAG set.
511 uses two ioctls (NIOCTXSYNC, NIOCRXSYNC)
512 for non-blocking I/O. They take no argument.
513 Two more ioctls (NIOCGINFO, NIOCREGIF) are used
514 to query and configure ports, with the following argument:
517 char nr_name[IFNAMSIZ]; /* (i) port name */
518 uint32_t nr_version; /* (i) API version */
519 uint32_t nr_offset; /* (o) nifp offset in mmap region */
520 uint32_t nr_memsize; /* (o) size of the mmap region */
521 uint32_t nr_tx_slots; /* (i/o) slots in tx rings */
522 uint32_t nr_rx_slots; /* (i/o) slots in rx rings */
523 uint16_t nr_tx_rings; /* (i/o) number of tx rings */
524 uint16_t nr_rx_rings; /* (i/o) number of tx rings */
525 uint16_t nr_ringid; /* (i/o) ring(s) we care about */
526 uint16_t nr_cmd; /* (i) special command */
527 uint16_t nr_arg1; /* (i/o) extra arguments */
528 uint16_t nr_arg2; /* (i/o) extra arguments */
529 uint32_t nr_arg3; /* (i/o) extra arguments */
530 uint32_t nr_flags /* (i/o) open mode */
535 A file descriptor obtained through
537 also supports the ioctl supported by network devices, see
542 returns EINVAL if the named port does not support netmap.
543 Otherwise, it returns 0 and (advisory) information
545 Note that all the information below can change before the
546 interface is actually put in netmap mode.
550 indicates the size of the
552 memory region. NICs in
554 mode all share the same memory region,
557 ports have independent regions for each port.
558 .It Pa nr_tx_slots , nr_rx_slots
559 indicate the size of transmit and receive rings.
560 .It Pa nr_tx_rings , nr_rx_rings
561 indicate the number of transmit
563 Both ring number and sizes may be configured at runtime
564 using interface-specific functions (e.g.
569 binds the port named in
571 to the file descriptor. For a physical device this also switches it into
574 it from the host stack.
575 Multiple file descriptors can be bound to the same port,
576 with proper synchronization left to the user.
578 .Dv NIOCREGIF can also bind a file descriptor to one endpoint of a
580 consisting of two netmap ports with a crossover connection.
581 A netmap pipe share the same memory space of the parent port,
582 and is meant to enable configuration where a master process acts
583 as a dispatcher towards slave processes.
585 To enable this function, the
587 field of the structure can be used as a hint to the kernel to
588 indicate how many pipes we expect to use, and reserve extra space
589 in the memory region.
591 On return, it gives the same info as NIOCGINFO,
596 indicating the identity of the rings controlled through the file
601 selects which rings are controlled through this file descriptor.
604 are indicated below, together with the naming schemes
605 that application libraries (such as the
607 indicated below) can use to indicate the specific set of rings.
608 In the example below, "netmap:foo" is any valid netmap port name.
610 .Bl -tag -width XXXXX
611 .It NR_REG_ALL_NIC "netmap:foo"
612 (default) all hardware ring pairs
613 .It NR_REG_SW_NIC "netmap:foo^"
614 the ``host rings'', connecting to the host stack.
615 .It NR_RING_NIC_SW "netmap:foo+
616 all hardware rings and the host rings
617 .It NR_REG_ONE_NIC "netmap:foo-i"
618 only the i-th hardware ring pair, where the number is in
620 .It NR_REG_PIPE_MASTER "netmap:foo{i"
621 the master side of the netmap pipe whose identifier (i) is in
623 .It NR_REG_PIPE_SLAVE "netmap:foo}i"
624 the slave side of the netmap pipe whose identifier (i) is in
627 The identifier of a pipe must be thought as part of the pipe name,
628 and does not need to be sequential. On return the pipe
629 will only have a single ring pair with index 0,
630 irrespective of the value of i.
637 call pushes out any pending packets on the transmit ring, even if
638 no write events are specified.
639 The feature can be disabled by or-ing
640 .Va NETMAP_NO_TX_SYNC
641 to the value written to
643 When this feature is used,
644 packets are transmitted only on
645 .Va ioctl(NIOCTXSYNC)
646 or select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring.
648 When registering a virtual interface that is dynamically created to a
650 switch, we can specify the desired number of rings (1 by default,
651 and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields.
653 tells the hardware of new packets to transmit, and updates the
654 number of slots available for transmission.
656 tells the hardware of consumed packets, and asks for newly available
659 .Sh SELECT, POLL, EPOLL, KQUEUE.
665 file descriptor process rings as indicated in
669 respectively when write (POLLOUT) and read (POLLIN) events are requested.
670 Both block if no slots are available in the ring
671 .Va ( ring->cur == ring->tail ) .
672 Depending on the platform,
678 Packets in transmit rings are normally pushed out
679 (and buffers reclaimed) even without
680 requesting write events. Passing the NETMAP_NO_TX_SYNC flag to
682 disables this feature.
683 By default, receive rings are processed only if read
684 events are requested. Passing the NETMAP_DO_RX_SYNC flag to
685 .Em NIOCREGIF updates receive rings even without read events.
686 Note that on epoll and kqueue, NETMAP_NO_TX_SYNC and NETMAP_DO_RX_SYNC
687 only have an effect when some event is posted for the file descriptor.
691 API is supposed to be used directly, both because of its simplicity and
692 for efficient integration with applications.
695 .Va <net/netmap_user.h>
696 header provides a few macros and functions to ease creating
697 a file descriptor and doing I/O with a
699 port. These are loosely modeled after the
701 API, to ease porting of libpcap-based applications to
703 To use these extra functions, programs should
704 .Dl #define NETMAP_WITH_LIBS
706 .Dl #include <net/netmap_user.h>
708 The following functions are available:
709 .Bl -tag -width XXXXX
710 .It Va struct nm_desc * nm_open(const char *ifname, const struct nmreq *req, uint64_t flags, const struct nm_desc *arg)
713 binds a file descriptor to a port.
716 is a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a
720 provides the initial values for the argument to the NIOCREGIF ioctl.
721 The nm_flags and nm_ringid values are overwritten by parsing
722 ifname and flags, and other fields can be overridden through
723 the other two arguments.
725 points to a struct nm_desc containing arguments (e.g. from a previously
726 open file descriptor) that should override the defaults.
727 The fields are used as described below
729 can be set to a combination of the following flags:
730 .Va NETMAP_NO_TX_POLL ,
731 .Va NETMAP_DO_RX_POLL
732 (copied into nr_ringid);
733 .Va NM_OPEN_NO_MMAP (if arg points to the same memory region,
734 avoids the mmap and uses the values from it);
735 .Va NM_OPEN_IFNAME (ignores ifname and uses the values in arg);
738 .Va NM_OPEN_ARG3 (uses the fields from arg);
739 .Va NM_OPEN_RING_CFG (uses the ring number and sizes from arg).
741 .It Va int nm_close(struct nm_desc *d)
742 closes the file descriptor, unmaps memory, frees resources.
743 .It Va int nm_inject(struct nm_desc *d, const void *buf, size_t size)
744 similar to pcap_inject(), pushes a packet to a ring, returns the size
745 of the packet is successful, or 0 on error;
746 .It Va int nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg)
747 similar to pcap_dispatch(), applies a callback to incoming packets
748 .It Va u_char * nm_nextpkt(struct nm_desc *d, struct nm_pkthdr *hdr)
749 similar to pcap_next(), fetches the next packet
752 .Sh SUPPORTED DEVICES
754 natively supports the following devices:
772 NICs without native support can still be used in
774 mode through emulation. Performance is inferior to native netmap
775 mode but still significantly higher than sockets, and approaching
776 that of in-kernel solutions such as Linux's
779 Emulation is also available for devices with native netmap support,
780 which can be used for testing or performance comparison.
782 .Va dev.netmap.admode
783 globally controls how netmap mode is implemented.
784 .Sh SYSCTL VARIABLES AND MODULE PARAMETERS
785 Some aspect of the operation of
787 are controlled through sysctl variables on FreeBSD
789 and module parameters on Linux
790 .Em ( /sys/module/netmap_lin/parameters/* ) :
792 .Bl -tag -width indent
793 .It Va dev.netmap.admode: 0
794 Controls the use of native or emulated adapter mode.
795 0 uses the best available option, 1 forces native and
796 fails if not available, 2 forces emulated hence never fails.
797 .It Va dev.netmap.generic_ringsize: 1024
798 Ring size used for emulated netmap mode
799 .It Va dev.netmap.generic_mit: 100000
800 Controls interrupt moderation for emulated mode
801 .It Va dev.netmap.mmap_unreg: 0
802 .It Va dev.netmap.fwd: 0
803 Forces NS_FORWARD mode
804 .It Va dev.netmap.flags: 0
805 .It Va dev.netmap.txsync_retry: 2
806 .It Va dev.netmap.no_pendintr: 1
807 Forces recovery of transmit buffers on system calls
808 .It Va dev.netmap.mitigate: 1
809 Propagates interrupt mitigation to user processes
810 .It Va dev.netmap.no_timestamp: 0
811 Disables the update of the timestamp in the netmap ring
812 .It Va dev.netmap.verbose: 0
813 Verbose kernel messages
814 .It Va dev.netmap.buf_num: 163840
815 .It Va dev.netmap.buf_size: 2048
816 .It Va dev.netmap.ring_num: 200
817 .It Va dev.netmap.ring_size: 36864
818 .It Va dev.netmap.if_num: 100
819 .It Va dev.netmap.if_size: 1024
820 Sizes and number of objects (netmap_if, netmap_ring, buffers)
821 for the global memory region. The only parameter worth modifying is
822 .Va dev.netmap.buf_num
823 as it impacts the total amount of memory used by netmap.
824 .It Va dev.netmap.buf_curr_num: 0
825 .It Va dev.netmap.buf_curr_size: 0
826 .It Va dev.netmap.ring_curr_num: 0
827 .It Va dev.netmap.ring_curr_size: 0
828 .It Va dev.netmap.if_curr_num: 0
829 .It Va dev.netmap.if_curr_size: 0
830 Actual values in use.
831 .It Va dev.netmap.bridge_batch: 1024
832 Batch size used when moving packets across a
834 switch. Values above 64 generally guarantee good
845 to wake up processes when significant events occur, and
849 is used to configure ports and
852 Applications may need to create threads and bind them to
853 specific cores to improve performance, using standard
857 .Xr pthread_setaffinity_np 3
860 No matter how fast the CPU and OS are,
861 achieving line rate on 10G and faster interfaces
862 requires hardware with sufficient performance.
863 Several NICs are unable to sustain line rate with
864 small packet sizes. Insufficient PCIe or memory bandwidth
865 can also cause reduced performance.
867 Another frequent reason for low performance is the use
868 of flow control on the link: a slow receiver can limit
870 Be sure to disable flow control when running high
873 .Ss SPECIAL NIC FEATURES
875 is orthogonal to some NIC features such as
876 multiqueue, schedulers, packet filters.
878 Multiple transmit and receive rings are supported natively
879 and can be configured with ordinary OS tools,
883 device-specific sysctl variables.
884 The same goes for Receive Packet Steering (RPS)
885 and filtering of incoming traffic.
890 .Em checksum offloading , TCP segmentation offloading ,
891 .Em encryption , VLAN encapsulation/decapsulation ,
893 When using netmap to exchange packets with the host stack,
894 make sure to disable these features.
898 comes with a few programs that can be used for testing or
905 .Va tools/tools/netmap/
906 directory in FreeBSD distributions.
909 is a general purpose traffic source/sink.
912 .Dl pkt-gen -i ix0 -f tx -l 60
913 can generate an infinite stream of minimum size packets, and
914 .Dl pkt-gen -i ix0 -f rx
916 Both print traffic statistics, to help monitor
917 how the system performs.
920 has many options can be uses to set packet sizes, addresses,
921 rates, and use multiple send/receive threads and cores.
924 is another test program which interconnects two
926 ports. It can be used for transparent forwarding between
928 .Dl bridge -i ix0 -i ix1
929 or even connect the NIC to the host stack using netmap
930 .Dl bridge -i ix0 -i ix0
931 .Ss USING THE NATIVE API
932 The following code implements a traffic generator
934 .Bd -literal -compact
935 #include <net/netmap_user.h>
939 struct netmap_if *nifp;
940 struct netmap_ring *ring;
944 fd = open("/dev/netmap", O_RDWR);
945 bzero(&nmr, sizeof(nmr));
946 strcpy(nmr.nr_name, "ix0");
947 nmr.nm_version = NETMAP_API;
948 ioctl(fd, NIOCREGIF, &nmr);
949 p = mmap(0, nmr.nr_memsize, fd);
950 nifp = NETMAP_IF(p, nmr.nr_offset);
951 ring = NETMAP_TXRING(nifp, 0);
953 fds.events = POLLOUT;
956 while (!nm_ring_empty(ring)) {
958 buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
959 ... prepare packet in buf ...
960 ring->slot[i].len = ... packet length ...
961 ring->head = ring->cur = nm_ring_next(ring, i);
967 A simple receiver can be implemented using the helper functions
968 .Bd -literal -compact
969 #define NETMAP_WITH_LIBS
970 #include <net/netmap_user.h>
979 d = nm_open("netmap:ix0", NULL, 0, 0);
980 fds.fd = NETMAP_FD(d);
984 while ( (buf = nm_nextpkt(d, &h)) )
985 consume_pkt(buf, h->len);
990 .Ss ZERO-COPY FORWARDING
991 Since physical interfaces share the same memory region,
992 it is possible to do packet forwarding between ports
993 swapping buffers. The buffer from the transmit ring is used
994 to replenish the receive ring:
995 .Bd -literal -compact
997 struct netmap_slot *src, *dst;
999 src = &src_ring->slot[rxr->cur];
1000 dst = &dst_ring->slot[txr->cur];
1002 dst->buf_idx = src->buf_idx;
1003 dst->len = src->len;
1004 dst->flags = NS_BUF_CHANGED;
1006 src->flags = NS_BUF_CHANGED;
1007 rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur);
1008 txr->head = txr->cur = nm_ring_next(txr, txr->cur);
1011 .Ss ACCESSING THE HOST STACK
1012 The host stack is for all practical purposes just a regular ring pair,
1013 which you can access with the netmap API (e.g. with
1014 .Dl nm_open("netmap:eth0^", ... ) ;
1015 All packets that the host would send to an interface in
1017 mode end up into the RX ring, whereas all packets queued to the
1018 TX ring are send up to the host stack.
1020 A simple way to test the performance of a
1022 switch is to attach a sender and a receiver to it,
1023 e.g. running the following in two different terminals:
1024 .Dl pkt-gen -i vale1:a -f rx # receiver
1025 .Dl pkt-gen -i vale1:b -f tx # sender
1026 The same example can be used to test netmap pipes, by simply
1027 changing port names, e.g.
1028 .Dl pkt-gen -i vale:x{3 -f rx # receiver on the master side
1029 .Dl pkt-gen -i vale:x}3 -f tx # sender on the slave side
1031 The following command attaches an interface and the host stack
1033 .Dl vale-ctl -h vale2:em0
1036 clients attached to the same switch can now communicate
1037 with the network card or the host.
1041 http://info.iet.unipi.it/~luigi/netmap/
1043 Luigi Rizzo, Revisiting network I/O APIs: the netmap framework,
1044 Communications of the ACM, 55 (3), pp.45-51, March 2012
1046 Luigi Rizzo, netmap: a novel framework for fast packet I/O,
1047 Usenix ATC'12, June 2012, Boston
1049 Luigi Rizzo, Giuseppe Lettieri,
1050 VALE, a switched ethernet for virtual machines,
1051 ACM CoNEXT'12, December 2012, Nice
1053 Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
1054 Speeding up packet I/O in virtual machines,
1055 ACM/IEEE ANCS'13, October 2013, San Jose
1060 framework has been originally designed and implemented at the
1061 Universita` di Pisa in 2011 by
1063 and further extended with help from
1065 .An Gaetano Catalli ,
1066 .An Giuseppe Lettieri ,
1067 .An Vincenzo Maffione .
1072 have been funded by the European Commission within FP7 Projects
1073 CHANGE (257422) and OPENLAB (287581).