1 .\" Copyright (c) 2011-2014 Matteo Landi, Luigi Rizzo, Universita` di Pisa
2 .\" All rights reserved.
4 .\" Redistribution and use in source and binary forms, with or without
5 .\" modification, are permitted provided that the following conditions
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
9 .\" 2. Redistributions in binary form must reproduce the above copyright
10 .\" notice, this list of conditions and the following disclaimer in the
11 .\" documentation and/or other materials provided with the distribution.
13 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25 .\" This document is derived in part from the enet man page (enet.4)
26 .\" distributed with 4.3BSD Unix.
35 .Nd a framework for fast packet I/O
38 .Nd a fast VirtuAl Local Ethernet using the netmap API
41 .Nd a shared memory packet transport channel
46 is a framework for extremely fast and efficient packet I/O
47 for both userspace and kernel clients.
48 It runs on FreeBSD and Linux,
51 a very fast and modular in-kernel software switch/dataplane,
54 a shared memory packet transport channel.
55 All these are accessed interchangeably with the same API.
60 are at least one order of magnitude faster than
61 standard OS mechanisms
62 (sockets, bpf, tun/tap interfaces, native switches, pipes),
63 reaching 14.88 million packets per second (Mpps)
64 with much less than one core on a 10 Gbit NIC,
65 about 20 Mpps per core for VALE ports,
66 and over 100 Mpps for netmap pipes.
68 Userspace clients can dynamically switch NICs into
70 mode and send and receive raw packets through
71 memory mapped buffers.
74 switch instances and ports, and
76 can be created dynamically,
77 providing high speed packet I/O between processes,
78 virtual machines, NICs and the host stack.
81 suports both non-blocking I/O through
83 synchronization and blocking I/O through a file descriptor
84 and standard OS mechanisms such as
92 are implemented by a single kernel module, which also emulates the
94 API over standard drivers for devices without native
99 requires explicit support in device drivers.
101 In the rest of this (long) manual page we document
102 various aspects of the
106 architecture, features and usage.
110 supports raw packet I/O through a
112 which can be connected to a physical interface
118 Ports use preallocated circular queues of buffers
120 residing in an mmapped region.
121 There is one ring for each transmit/receive queue of a
123 An additional ring pair connects to the host stack.
125 After binding a file descriptor to a port, a
127 client can send or receive packets in batches through
128 the rings, and possibly implement zero-copy forwarding
131 All NICs operating in
133 mode use the same memory region,
134 accessible to all processes who own
136 file descriptors bound to NICs.
142 by default use separate memory regions,
143 but can be independently configured to share memory.
145 .Sh ENTERING AND EXITING NETMAP MODE
146 The following section describes the system calls to create
154 Simpler, higher level functions are described in section
157 Ports and rings are created and controlled through a file descriptor,
158 created by opening a special device
159 .Dl fd = open("/dev/netmap");
160 and then bound to a specific port with an
161 .Dl ioctl(fd, NIOCREGIF, (struct nmreq *)arg);
164 has multiple modes of operation controlled by the
168 specifies the port name, as follows:
170 .It Dv OS network interface name (e.g. 'em0', 'eth1', ... )
171 the data path of the NIC is disconnected from the host stack,
172 and the file descriptor is bound to the NIC (one or all queues),
173 or to the host stack;
174 .It Dv valeXXX:YYY (arbitrary XXX and YYY)
175 the file descriptor is bound to port YYY of a VALE switch called XXX,
176 both dynamically created if necessary.
177 The string cannot exceed IFNAMSIZ characters, and YYY cannot
178 be the name of any existing OS network interface.
183 indicates the size of the shared memory region,
184 and the number, size and location of all the
186 data structures, which can be accessed by mmapping the memory
187 .Dl char *mem = mmap(0, arg.nr_memsize, fd);
189 Non blocking I/O is done with special
194 on the file descriptor permit blocking I/O.
204 mode, the OS will still believe the interface is up and running.
205 OS-generated packets for that NIC end up into a
207 ring, and another ring is used to send packets into the OS network stack.
210 on the file descriptor removes the binding,
211 and returns the NIC to normal mode (reconnecting the data path
212 to the host stack), or destroys the virtual port.
215 The data structures in the mmapped memory region are detailed in
216 .Xr sys/net/netmap.h ,
217 which is the ultimate reference for the
219 API. The main structures and fields are indicated below:
221 .It Dv struct netmap_if (one per interface)
225 const uint32_t ni_flags; /* properties */
227 const uint32_t ni_tx_rings; /* NIC tx rings */
228 const uint32_t ni_rx_rings; /* NIC rx rings */
229 uint32_t ni_bufs_head; /* head of extra bufs list */
234 Indicates the number of available rings
235 .Pa ( struct netmap_rings )
236 and their position in the mmapped region.
237 The number of tx and rx rings
238 .Pa ( ni_tx_rings , ni_rx_rings )
239 normally depends on the hardware.
240 NICs also have an extra tx/rx ring pair connected to the host stack.
242 can also request additional unbound buffers in the same memory space,
243 to be used as temporary storage for packets.
245 contains the index of the first of these free rings,
246 which are connected in a list (the first uint32_t of each
247 buffer being the index of the next buffer in the list).
248 A 0 indicates the end of the list.
250 .It Dv struct netmap_ring (one per ring)
254 const uint32_t num_slots; /* slots in each ring */
255 const uint32_t nr_buf_size; /* size of each buffer */
257 uint32_t head; /* (u) first buf owned by user */
258 uint32_t cur; /* (u) wakeup position */
259 const uint32_t tail; /* (k) first buf owned by kernel */
262 struct timeval ts; /* (k) time of last rxsync() */
264 struct netmap_slot slot[0]; /* array of slots */
268 Implements transmit and receive rings, with read/write
269 pointers, metadata and and an array of
271 describing the buffers.
273 .It Dv struct netmap_slot (one per buffer)
276 uint32_t buf_idx; /* buffer index */
277 uint16_t len; /* packet length */
278 uint16_t flags; /* buf changed, etc. */
279 uint64_t ptr; /* address for indirect buffers */
283 Describes a packet buffer, which normally is identified by
284 an index and resides in the mmapped region.
285 .It Dv packet buffers
286 Fixed size (normally 2 KB) packet buffers allocated by the kernel.
291 in the mmapped region is indicated by the
293 field in the structure returned by
295 From there, all other objects are reachable through
296 relative references (offsets or indexes).
297 Macros and functions in <net/netmap_user.h>
298 help converting them into actual pointers:
300 .Dl struct netmap_if *nifp = NETMAP_IF(mem, arg.nr_offset);
301 .Dl struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index);
302 .Dl struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index);
304 .Dl char *buf = NETMAP_BUF(ring, buffer_index);
305 .Sh RINGS, BUFFERS AND DATA I/O
307 are circular queues of packets with three indexes/pointers
308 .Va ( head , cur , tail ) ;
309 one slot is always kept empty.
312 should not be assumed to be a power of two.
314 (NOTE: older versions of netmap used head/count format to indicate
315 the content of a ring).
318 is the first slot available to userspace;
322 select/poll will unblock when
328 is the first slot reserved to the kernel.
330 Slot indexes MUST only move forward;
331 for convenience, the function
332 .Dl nm_ring_next(ring, index)
333 returns the next index modulo the ring size.
338 are only modified by the user program;
340 is only modified by the kernel.
341 The kernel only reads/writes the
342 .Vt struct netmap_ring
344 during the execution of a netmap-related system call.
345 The only exception are slots (and buffers) in the range
346 .Va tail\ . . . head-1 ,
347 that are explicitly assigned to the kernel.
350 On transmit rings, after a
352 system call, slots in the range
353 .Va head\ . . . tail-1
354 are available for transmission.
355 User code should fill the slots sequentially
360 past slots ready to transmit.
362 may be moved further ahead if the user code needs
363 more slots before further transmissions (see
364 .Sx SCATTER GATHER I/O ) .
366 At the next NIOCTXSYNC/select()/poll(),
369 are pushed to the port, and
371 may advance if further slots have become available.
372 Below is an example of the evolution of a TX ring:
375 after the syscall, slots between cur and tail are (a)vailable
379 TX [.....aaaaaaaaaaa.............]
381 user creates new packets to (T)ransmit
385 TX [.....TTTTTaaaaaa.............]
387 NIOCTXSYNC/poll()/select() sends packets and reports new slots
391 TX [..........aaaaaaaaaaa........]
394 select() and poll() wlll block if there is no space in the ring, i.e.
395 .Dl ring->cur == ring->tail
396 and return when new slots have become available.
398 High speed applications may want to amortize the cost of system calls
399 by preparing as many packets as possible before issuing them.
401 A transmit ring with pending transmissions has
402 .Dl ring->head != ring->tail + 1 (modulo the ring size).
404 .Va int nm_tx_pending(ring)
405 implements this test.
408 On receive rings, after a
410 system call, the slots in the range
411 .Va head\& . . . tail-1
412 contain received packets.
413 User code should process them and advance
417 past slots it wants to return to the kernel.
419 may be moved further ahead if the user code wants to
420 wait for more packets
421 without returning all the previous slots to the kernel.
423 At the next NIOCRXSYNC/select()/poll(),
426 are returned to the kernel for further receives, and
428 may advance to report new incoming packets.
430 Below is an example of the evolution of an RX ring:
432 after the syscall, there are some (h)eld and some (R)eceived slots
436 RX [..hhhhhhRRRRRRRR..........]
438 user advances head and cur, releasing some slots and holding others
442 RX [..*****hhhRRRRRR...........]
444 NICRXSYNC/poll()/select() recovers slots and reports new packets
448 RX [.......hhhRRRRRRRRRRRR....]
451 .Sh SLOTS AND PACKET BUFFERS
452 Normally, packets should be stored in the netmap-allocated buffers
453 assigned to slots when ports are bound to a file descriptor.
454 One packet is fully contained in a single buffer.
456 The following flags affect slot and buffer processing:
459 it MUST be used when the buf_idx in the slot is changed.
460 This can be used to implement
461 zero-copy forwarding, see
462 .Sx ZERO-COPY FORWARDING .
465 reports when this buffer has been transmitted.
468 notifies transmit completions in batches, hence signals
469 can be delayed indefinitely. This flag helps detecting
470 when packets have been send and a file descriptor can be closed.
472 When a ring is in 'transparent' mode (see
473 .Sx TRANSPARENT MODE ) ,
474 packets marked with this flags are forwarded to the other endpoint
475 at the next system call, thus restoring (in a selective way)
476 the connection between a NIC and the host stack.
478 tells the forwarding code that the SRC MAC address for this
479 packet must not be used in the learning bridge code.
481 indicates that the packet's payload is in a user-supplied buffer,
482 whose user virtual address is in the 'ptr' field of the slot.
483 The size can reach 65535 bytes.
485 This is only supported on the transmit ring of
487 ports, and it helps reducing data copies in the interconnection
490 indicates that the packet continues with subsequent buffers;
491 the last buffer in a packet must have the flag clear.
493 .Sh SCATTER GATHER I/O
494 Packets can span multiple slots if the
496 flag is set in all but the last slot.
497 The maximum length of a chain is 64 buffers.
498 This is normally used with
500 ports when connecting virtual machines, as they generate large
501 TSO segments that are not split unless they reach a physical device.
503 NOTE: The length field always refers to the individual
504 fragment; there is no place with the total length of a packet.
506 On receive rings the macro
508 indicates the remaining number of slots for this packet,
509 including the current one.
510 Slots with a value greater than 1 also have NS_MOREFRAG set.
513 uses two ioctls (NIOCTXSYNC, NIOCRXSYNC)
514 for non-blocking I/O. They take no argument.
515 Two more ioctls (NIOCGINFO, NIOCREGIF) are used
516 to query and configure ports, with the following argument:
519 char nr_name[IFNAMSIZ]; /* (i) port name */
520 uint32_t nr_version; /* (i) API version */
521 uint32_t nr_offset; /* (o) nifp offset in mmap region */
522 uint32_t nr_memsize; /* (o) size of the mmap region */
523 uint32_t nr_tx_slots; /* (i/o) slots in tx rings */
524 uint32_t nr_rx_slots; /* (i/o) slots in rx rings */
525 uint16_t nr_tx_rings; /* (i/o) number of tx rings */
526 uint16_t nr_rx_rings; /* (i/o) number of tx rings */
527 uint16_t nr_ringid; /* (i/o) ring(s) we care about */
528 uint16_t nr_cmd; /* (i) special command */
529 uint16_t nr_arg1; /* (i/o) extra arguments */
530 uint16_t nr_arg2; /* (i/o) extra arguments */
531 uint32_t nr_arg3; /* (i/o) extra arguments */
532 uint32_t nr_flags /* (i/o) open mode */
537 A file descriptor obtained through
539 also supports the ioctl supported by network devices, see
544 returns EINVAL if the named port does not support netmap.
545 Otherwise, it returns 0 and (advisory) information
547 Note that all the information below can change before the
548 interface is actually put in netmap mode.
552 indicates the size of the
554 memory region. NICs in
556 mode all share the same memory region,
559 ports have independent regions for each port.
560 .It Pa nr_tx_slots , nr_rx_slots
561 indicate the size of transmit and receive rings.
562 .It Pa nr_tx_rings , nr_rx_rings
563 indicate the number of transmit
565 Both ring number and sizes may be configured at runtime
566 using interface-specific functions (e.g.
571 binds the port named in
573 to the file descriptor. For a physical device this also switches it into
576 it from the host stack.
577 Multiple file descriptors can be bound to the same port,
578 with proper synchronization left to the user.
580 .Dv NIOCREGIF can also bind a file descriptor to one endpoint of a
582 consisting of two netmap ports with a crossover connection.
583 A netmap pipe share the same memory space of the parent port,
584 and is meant to enable configuration where a master process acts
585 as a dispatcher towards slave processes.
587 To enable this function, the
589 field of the structure can be used as a hint to the kernel to
590 indicate how many pipes we expect to use, and reserve extra space
591 in the memory region.
593 On return, it gives the same info as NIOCGINFO,
598 indicating the identity of the rings controlled through the file
603 selects which rings are controlled through this file descriptor.
606 are indicated below, together with the naming schemes
607 that application libraries (such as the
609 indicated below) can use to indicate the specific set of rings.
610 In the example below, "netmap:foo" is any valid netmap port name.
612 .Bl -tag -width XXXXX
613 .It NR_REG_ALL_NIC "netmap:foo"
614 (default) all hardware ring pairs
615 .It NR_REG_SW_NIC "netmap:foo^"
616 the ``host rings'', connecting to the host stack.
617 .It NR_RING_NIC_SW "netmap:foo+
618 all hardware rings and the host rings
619 .It NR_REG_ONE_NIC "netmap:foo-i"
620 only the i-th hardware ring pair, where the number is in
622 .It NR_REG_PIPE_MASTER "netmap:foo{i"
623 the master side of the netmap pipe whose identifier (i) is in
625 .It NR_REG_PIPE_SLAVE "netmap:foo}i"
626 the slave side of the netmap pipe whose identifier (i) is in
629 The identifier of a pipe must be thought as part of the pipe name,
630 and does not need to be sequential. On return the pipe
631 will only have a single ring pair with index 0,
632 irrespective of the value of i.
639 call pushes out any pending packets on the transmit ring, even if
640 no write events are specified.
641 The feature can be disabled by or-ing
642 .Va NETMAP_NO_TX_SYNC
643 to the value written to
645 When this feature is used,
646 packets are transmitted only on
647 .Va ioctl(NIOCTXSYNC)
648 or select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring.
650 When registering a virtual interface that is dynamically created to a
652 switch, we can specify the desired number of rings (1 by default,
653 and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields.
655 tells the hardware of new packets to transmit, and updates the
656 number of slots available for transmission.
658 tells the hardware of consumed packets, and asks for newly available
661 .Sh SELECT, POLL, EPOLL, KQUEUE.
667 file descriptor process rings as indicated in
671 respectively when write (POLLOUT) and read (POLLIN) events are requested.
672 Both block if no slots are available in the ring
673 .Va ( ring->cur == ring->tail ) .
674 Depending on the platform,
680 Packets in transmit rings are normally pushed out
681 (and buffers reclaimed) even without
682 requesting write events. Passing the NETMAP_NO_TX_SYNC flag to
684 disables this feature.
685 By default, receive rings are processed only if read
686 events are requested. Passing the NETMAP_DO_RX_SYNC flag to
687 .Em NIOCREGIF updates receive rings even without read events.
688 Note that on epoll and kqueue, NETMAP_NO_TX_SYNC and NETMAP_DO_RX_SYNC
689 only have an effect when some event is posted for the file descriptor.
693 API is supposed to be used directly, both because of its simplicity and
694 for efficient integration with applications.
697 .Va <net/netmap_user.h>
698 header provides a few macros and functions to ease creating
699 a file descriptor and doing I/O with a
701 port. These are loosely modeled after the
703 API, to ease porting of libpcap-based applications to
705 To use these extra functions, programs should
706 .Dl #define NETMAP_WITH_LIBS
708 .Dl #include <net/netmap_user.h>
710 The following functions are available:
711 .Bl -tag -width XXXXX
712 .It Va struct nm_desc * nm_open(const char *ifname, const struct nmreq *req, uint64_t flags, const struct nm_desc *arg)
715 binds a file descriptor to a port.
718 is a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a
722 provides the initial values for the argument to the NIOCREGIF ioctl.
723 The nm_flags and nm_ringid values are overwritten by parsing
724 ifname and flags, and other fields can be overridden through
725 the other two arguments.
727 points to a struct nm_desc containing arguments (e.g. from a previously
728 open file descriptor) that should override the defaults.
729 The fields are used as described below
731 can be set to a combination of the following flags:
732 .Va NETMAP_NO_TX_POLL ,
733 .Va NETMAP_DO_RX_POLL
734 (copied into nr_ringid);
735 .Va NM_OPEN_NO_MMAP (if arg points to the same memory region,
736 avoids the mmap and uses the values from it);
737 .Va NM_OPEN_IFNAME (ignores ifname and uses the values in arg);
740 .Va NM_OPEN_ARG3 (uses the fields from arg);
741 .Va NM_OPEN_RING_CFG (uses the ring number and sizes from arg).
743 .It Va int nm_close(struct nm_desc *d)
744 closes the file descriptor, unmaps memory, frees resources.
745 .It Va int nm_inject(struct nm_desc *d, const void *buf, size_t size)
746 similar to pcap_inject(), pushes a packet to a ring, returns the size
747 of the packet is successful, or 0 on error;
748 .It Va int nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg)
749 similar to pcap_dispatch(), applies a callback to incoming packets
750 .It Va u_char * nm_nextpkt(struct nm_desc *d, struct nm_pkthdr *hdr)
751 similar to pcap_next(), fetches the next packet
754 .Sh SUPPORTED DEVICES
756 natively supports the following devices:
774 NICs without native support can still be used in
776 mode through emulation. Performance is inferior to native netmap
777 mode but still significantly higher than sockets, and approaching
778 that of in-kernel solutions such as Linux's
781 Emulation is also available for devices with native netmap support,
782 which can be used for testing or performance comparison.
784 .Va dev.netmap.admode
785 globally controls how netmap mode is implemented.
786 .Sh SYSCTL VARIABLES AND MODULE PARAMETERS
787 Some aspect of the operation of
789 are controlled through sysctl variables on FreeBSD
791 and module parameters on Linux
792 .Em ( /sys/module/netmap_lin/parameters/* ) :
794 .Bl -tag -width indent
795 .It Va dev.netmap.admode: 0
796 Controls the use of native or emulated adapter mode.
797 0 uses the best available option, 1 forces native and
798 fails if not available, 2 forces emulated hence never fails.
799 .It Va dev.netmap.generic_ringsize: 1024
800 Ring size used for emulated netmap mode
801 .It Va dev.netmap.generic_mit: 100000
802 Controls interrupt moderation for emulated mode
803 .It Va dev.netmap.mmap_unreg: 0
804 .It Va dev.netmap.fwd: 0
805 Forces NS_FORWARD mode
806 .It Va dev.netmap.flags: 0
807 .It Va dev.netmap.txsync_retry: 2
808 .It Va dev.netmap.no_pendintr: 1
809 Forces recovery of transmit buffers on system calls
810 .It Va dev.netmap.mitigate: 1
811 Propagates interrupt mitigation to user processes
812 .It Va dev.netmap.no_timestamp: 0
813 Disables the update of the timestamp in the netmap ring
814 .It Va dev.netmap.verbose: 0
815 Verbose kernel messages
816 .It Va dev.netmap.buf_num: 163840
817 .It Va dev.netmap.buf_size: 2048
818 .It Va dev.netmap.ring_num: 200
819 .It Va dev.netmap.ring_size: 36864
820 .It Va dev.netmap.if_num: 100
821 .It Va dev.netmap.if_size: 1024
822 Sizes and number of objects (netmap_if, netmap_ring, buffers)
823 for the global memory region. The only parameter worth modifying is
824 .Va dev.netmap.buf_num
825 as it impacts the total amount of memory used by netmap.
826 .It Va dev.netmap.buf_curr_num: 0
827 .It Va dev.netmap.buf_curr_size: 0
828 .It Va dev.netmap.ring_curr_num: 0
829 .It Va dev.netmap.ring_curr_size: 0
830 .It Va dev.netmap.if_curr_num: 0
831 .It Va dev.netmap.if_curr_size: 0
832 Actual values in use.
833 .It Va dev.netmap.bridge_batch: 1024
834 Batch size used when moving packets across a
836 switch. Values above 64 generally guarantee good
847 to wake up processes when significant events occur, and
851 is used to configure ports and
854 Applications may need to create threads and bind them to
855 specific cores to improve performance, using standard
859 .Xr pthread_setaffinity_np 3
862 No matter how fast the CPU and OS are,
863 achieving line rate on 10G and faster interfaces
864 requires hardware with sufficient performance.
865 Several NICs are unable to sustain line rate with
866 small packet sizes. Insufficient PCIe or memory bandwidth
867 can also cause reduced performance.
869 Another frequent reason for low performance is the use
870 of flow control on the link: a slow receiver can limit
872 Be sure to disable flow control when running high
875 .Ss SPECIAL NIC FEATURES
877 is orthogonal to some NIC features such as
878 multiqueue, schedulers, packet filters.
880 Multiple transmit and receive rings are supported natively
881 and can be configured with ordinary OS tools,
885 device-specific sysctl variables.
886 The same goes for Receive Packet Steering (RPS)
887 and filtering of incoming traffic.
892 .Em checksum offloading , TCP segmentation offloading ,
893 .Em encryption , VLAN encapsulation/decapsulation ,
895 When using netmap to exchange packets with the host stack,
896 make sure to disable these features.
900 comes with a few programs that can be used for testing or
907 .Va tools/tools/netmap/
908 directory in FreeBSD distributions.
911 is a general purpose traffic source/sink.
914 .Dl pkt-gen -i ix0 -f tx -l 60
915 can generate an infinite stream of minimum size packets, and
916 .Dl pkt-gen -i ix0 -f rx
918 Both print traffic statistics, to help monitor
919 how the system performs.
922 has many options can be uses to set packet sizes, addresses,
923 rates, and use multiple send/receive threads and cores.
926 is another test program which interconnects two
928 ports. It can be used for transparent forwarding between
930 .Dl bridge -i ix0 -i ix1
931 or even connect the NIC to the host stack using netmap
932 .Dl bridge -i ix0 -i ix0
933 .Ss USING THE NATIVE API
934 The following code implements a traffic generator
936 .Bd -literal -compact
937 #include <net/netmap_user.h>
941 struct netmap_if *nifp;
942 struct netmap_ring *ring;
946 fd = open("/dev/netmap", O_RDWR);
947 bzero(&nmr, sizeof(nmr));
948 strcpy(nmr.nr_name, "ix0");
949 nmr.nm_version = NETMAP_API;
950 ioctl(fd, NIOCREGIF, &nmr);
951 p = mmap(0, nmr.nr_memsize, fd);
952 nifp = NETMAP_IF(p, nmr.nr_offset);
953 ring = NETMAP_TXRING(nifp, 0);
955 fds.events = POLLOUT;
958 while (!nm_ring_empty(ring)) {
960 buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
961 ... prepare packet in buf ...
962 ring->slot[i].len = ... packet length ...
963 ring->head = ring->cur = nm_ring_next(ring, i);
969 A simple receiver can be implemented using the helper functions
970 .Bd -literal -compact
971 #define NETMAP_WITH_LIBS
972 #include <net/netmap_user.h>
981 d = nm_open("netmap:ix0", NULL, 0, 0);
982 fds.fd = NETMAP_FD(d);
986 while ( (buf = nm_nextpkt(d, &h)) )
987 consume_pkt(buf, h->len);
992 .Ss ZERO-COPY FORWARDING
993 Since physical interfaces share the same memory region,
994 it is possible to do packet forwarding between ports
995 swapping buffers. The buffer from the transmit ring is used
996 to replenish the receive ring:
997 .Bd -literal -compact
999 struct netmap_slot *src, *dst;
1001 src = &src_ring->slot[rxr->cur];
1002 dst = &dst_ring->slot[txr->cur];
1004 dst->buf_idx = src->buf_idx;
1005 dst->len = src->len;
1006 dst->flags = NS_BUF_CHANGED;
1008 src->flags = NS_BUF_CHANGED;
1009 rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur);
1010 txr->head = txr->cur = nm_ring_next(txr, txr->cur);
1013 .Ss ACCESSING THE HOST STACK
1014 The host stack is for all practical purposes just a regular ring pair,
1015 which you can access with the netmap API (e.g. with
1016 .Dl nm_open("netmap:eth0^", ... ) ;
1017 All packets that the host would send to an interface in
1019 mode end up into the RX ring, whereas all packets queued to the
1020 TX ring are send up to the host stack.
1022 A simple way to test the performance of a
1024 switch is to attach a sender and a receiver to it,
1025 e.g. running the following in two different terminals:
1026 .Dl pkt-gen -i vale1:a -f rx # receiver
1027 .Dl pkt-gen -i vale1:b -f tx # sender
1028 The same example can be used to test netmap pipes, by simply
1029 changing port names, e.g.
1030 .Dl pkt-gen -i vale:x{3 -f rx # receiver on the master side
1031 .Dl pkt-gen -i vale:x}3 -f tx # sender on the slave side
1033 The following command attaches an interface and the host stack
1035 .Dl vale-ctl -h vale2:em0
1038 clients attached to the same switch can now communicate
1039 with the network card or the host.
1043 http://info.iet.unipi.it/~luigi/netmap/
1045 Luigi Rizzo, Revisiting network I/O APIs: the netmap framework,
1046 Communications of the ACM, 55 (3), pp.45-51, March 2012
1048 Luigi Rizzo, netmap: a novel framework for fast packet I/O,
1049 Usenix ATC'12, June 2012, Boston
1051 Luigi Rizzo, Giuseppe Lettieri,
1052 VALE, a switched ethernet for virtual machines,
1053 ACM CoNEXT'12, December 2012, Nice
1055 Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,
1056 Speeding up packet I/O in virtual machines,
1057 ACM/IEEE ANCS'13, October 2013, San Jose
1062 framework has been originally designed and implemented at the
1063 Universita` di Pisa in 2011 by
1065 and further extended with help from
1067 .An Gaetano Catalli ,
1068 .An Giuseppe Lettieri ,
1069 .An Vincenzo Maffione .
1074 have been funded by the European Commission within FP7 Projects
1075 CHANGE (257422) and OPENLAB (287581).