2 * Copyright (C) 2011-2014 Matteo Landi
3 * Copyright (C) 2011-2016 Luigi Rizzo
4 * Copyright (C) 2011-2016 Giuseppe Lettieri
5 * Copyright (C) 2011-2016 Vincenzo Maffione
8 * Redistribution and use in source and binary forms, with or without
9 * modification, are permitted provided that the following conditions
11 * 1. Redistributions of source code must retain the above copyright
12 * notice, this list of conditions and the following disclaimer.
13 * 2. Redistributions in binary form must reproduce the above copyright
14 * notice, this list of conditions and the following disclaimer in the
15 * documentation and/or other materials provided with the distribution.
17 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
18 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20 * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
21 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34 * This module supports memory mapped access to network devices,
37 * The module uses a large, memory pool allocated by the kernel
38 * and accessible as mmapped memory by multiple userspace threads/processes.
39 * The memory pool contains packet buffers and "netmap rings",
40 * i.e. user-accessible copies of the interface's queues.
42 * Access to the network card works like this:
43 * 1. a process/thread issues one or more open() on /dev/netmap, to create
44 * select()able file descriptor on which events are reported.
45 * 2. on each descriptor, the process issues an ioctl() to identify
46 * the interface that should report events to the file descriptor.
47 * 3. on each descriptor, the process issues an mmap() request to
48 * map the shared memory region within the process' address space.
49 * The list of interesting queues is indicated by a location in
50 * the shared memory region.
51 * 4. using the functions in the netmap(4) userspace API, a process
52 * can look up the occupation state of a queue, access memory buffers,
53 * and retrieve received packets or enqueue packets to transmit.
54 * 5. using some ioctl()s the process can synchronize the userspace view
55 * of the queue with the actual status in the kernel. This includes both
56 * receiving the notification of new packets, and transmitting new
57 * packets on the output interface.
58 * 6. select() or poll() can be used to wait for events on individual
59 * transmit or receive queues (or all queues for a given interface).
62 SYNCHRONIZATION (USER)
64 The netmap rings and data structures may be shared among multiple
65 user threads or even independent processes.
66 Any synchronization among those threads/processes is delegated
67 to the threads themselves. Only one thread at a time can be in
68 a system call on the same netmap ring. The OS does not enforce
69 this and only guarantees against system crashes in case of
74 Within the kernel, access to the netmap rings is protected as follows:
76 - a spinlock on each ring, to handle producer/consumer races on
77 RX rings attached to the host stack (against multiple host
78 threads writing from the host stack to the same ring),
79 and on 'destination' rings attached to a VALE switch
80 (i.e. RX rings in VALE ports, and TX rings in NIC/host ports)
81 protecting multiple active senders for the same destination)
83 - an atomic variable to guarantee that there is at most one
84 instance of *_*xsync() on the ring at any time.
85 For rings connected to user file
86 descriptors, an atomic_test_and_set() protects this, and the
87 lock on the ring is not actually used.
88 For NIC RX rings connected to a VALE switch, an atomic_test_and_set()
89 is also used to prevent multiple executions (the driver might indeed
90 already guarantee this).
91 For NIC TX rings connected to a VALE switch, the lock arbitrates
92 access to the queue (both when allocating buffers and when pushing
95 - *xsync() should be protected against initializations of the card.
96 On FreeBSD most devices have the reset routine protected by
97 a RING lock (ixgbe, igb, em) or core lock (re). lem is missing
98 the RING protection on rx_reset(), this should be added.
100 On linux there is an external lock on the tx path, which probably
101 also arbitrates access to the reset routine. XXX to be revised
103 - a per-interface core_lock protecting access from the host stack
104 while interfaces may be detached from netmap mode.
105 XXX there should be no need for this lock if we detach the interfaces
106 only while they are down.
111 NMG_LOCK() serializes all modifications to switches and ports.
112 A switch cannot be deleted until all ports are gone.
114 For each switch, an SX lock (RWlock on linux) protects
115 deletion of ports. When configuring or deleting a new port, the
116 lock is acquired in exclusive mode (after holding NMG_LOCK).
117 When forwarding, the lock is acquired in shared mode (without NMG_LOCK).
118 The lock is held throughout the entire forwarding cycle,
119 during which the thread may incur in a page fault.
120 Hence it is important that sleepable shared locks are used.
122 On the rx ring, the per-port lock is grabbed initially to reserve
123 a number of slot in the ring, then the lock is released,
124 packets are copied from source to destination, and then
125 the lock is acquired again and the receive ring is updated.
126 (A similar thing is done on the tx ring for NIC and host stack
127 ports attached to the switch)
132 /* --- internals ----
134 * Roadmap to the code that implements the above.
136 * > 1. a process/thread issues one or more open() on /dev/netmap, to create
137 * > select()able file descriptor on which events are reported.
139 * Internally, we allocate a netmap_priv_d structure, that will be
140 * initialized on ioctl(NIOCREGIF). There is one netmap_priv_d
141 * structure for each open().
144 * FreeBSD: see netmap_open() (netmap_freebsd.c)
145 * linux: see linux_netmap_open() (netmap_linux.c)
147 * > 2. on each descriptor, the process issues an ioctl() to identify
148 * > the interface that should report events to the file descriptor.
150 * Implemented by netmap_ioctl(), NIOCREGIF case, with nmr->nr_cmd==0.
151 * Most important things happen in netmap_get_na() and
152 * netmap_do_regif(), called from there. Additional details can be
153 * found in the comments above those functions.
155 * In all cases, this action creates/takes-a-reference-to a
156 * netmap_*_adapter describing the port, and allocates a netmap_if
157 * and all necessary netmap rings, filling them with netmap buffers.
159 * In this phase, the sync callbacks for each ring are set (these are used
160 * in steps 5 and 6 below). The callbacks depend on the type of adapter.
161 * The adapter creation/initialization code puts them in the
162 * netmap_adapter (fields na->nm_txsync and na->nm_rxsync). Then, they
163 * are copied from there to the netmap_kring's during netmap_do_regif(), by
164 * the nm_krings_create() callback. All the nm_krings_create callbacks
165 * actually call netmap_krings_create() to perform this and the other
166 * common stuff. netmap_krings_create() also takes care of the host rings,
167 * if needed, by setting their sync callbacks appropriately.
169 * Additional actions depend on the kind of netmap_adapter that has been
172 * - netmap_hw_adapter: [netmap.c]
173 * This is a system netdev/ifp with native netmap support.
174 * The ifp is detached from the host stack by redirecting:
175 * - transmissions (from the network stack) to netmap_transmit()
176 * - receive notifications to the nm_notify() callback for
177 * this adapter. The callback is normally netmap_notify(), unless
178 * the ifp is attached to a bridge using bwrap, in which case it
179 * is netmap_bwrap_intr_notify().
181 * - netmap_generic_adapter: [netmap_generic.c]
182 * A system netdev/ifp without native netmap support.
184 * (the decision about native/non native support is taken in
185 * netmap_get_hw_na(), called by netmap_get_na())
187 * - netmap_vp_adapter [netmap_vale.c]
188 * Returned by netmap_get_bdg_na().
189 * This is a persistent or ephemeral VALE port. Ephemeral ports
190 * are created on the fly if they don't already exist, and are
191 * always attached to a bridge.
192 * Persistent VALE ports must must be created separately, and i
193 * then attached like normal NICs. The NIOCREGIF we are examining
194 * will find them only if they had previosly been created and
195 * attached (see VALE_CTL below).
197 * - netmap_pipe_adapter [netmap_pipe.c]
198 * Returned by netmap_get_pipe_na().
199 * Both pipe ends are created, if they didn't already exist.
201 * - netmap_monitor_adapter [netmap_monitor.c]
202 * Returned by netmap_get_monitor_na().
203 * If successful, the nm_sync callbacks of the monitored adapter
204 * will be intercepted by the returned monitor.
206 * - netmap_bwrap_adapter [netmap_vale.c]
207 * Cannot be obtained in this way, see VALE_CTL below
211 * linux: we first go through linux_netmap_ioctl() to
212 * adapt the FreeBSD interface to the linux one.
215 * > 3. on each descriptor, the process issues an mmap() request to
216 * > map the shared memory region within the process' address space.
217 * > The list of interesting queues is indicated by a location in
218 * > the shared memory region.
221 * FreeBSD: netmap_mmap_single (netmap_freebsd.c).
222 * linux: linux_netmap_mmap (netmap_linux.c).
224 * > 4. using the functions in the netmap(4) userspace API, a process
225 * > can look up the occupation state of a queue, access memory buffers,
226 * > and retrieve received packets or enqueue packets to transmit.
228 * these actions do not involve the kernel.
230 * > 5. using some ioctl()s the process can synchronize the userspace view
231 * > of the queue with the actual status in the kernel. This includes both
232 * > receiving the notification of new packets, and transmitting new
233 * > packets on the output interface.
235 * These are implemented in netmap_ioctl(), NIOCTXSYNC and NIOCRXSYNC
236 * cases. They invoke the nm_sync callbacks on the netmap_kring
237 * structures, as initialized in step 2 and maybe later modified
238 * by a monitor. Monitors, however, will always call the original
239 * callback before doing anything else.
242 * > 6. select() or poll() can be used to wait for events on individual
243 * > transmit or receive queues (or all queues for a given interface).
245 * Implemented in netmap_poll(). This will call the same nm_sync()
246 * callbacks as in step 5 above.
249 * linux: we first go through linux_netmap_poll() to adapt
250 * the FreeBSD interface to the linux one.
253 * ---- VALE_CTL -----
255 * VALE switches are controlled by issuing a NIOCREGIF with a non-null
256 * nr_cmd in the nmreq structure. These subcommands are handled by
257 * netmap_bdg_ctl() in netmap_vale.c. Persistent VALE ports are created
258 * and destroyed by issuing the NETMAP_BDG_NEWIF and NETMAP_BDG_DELIF
259 * subcommands, respectively.
261 * Any network interface known to the system (including a persistent VALE
262 * port) can be attached to a VALE switch by issuing the
263 * NETMAP_BDG_ATTACH subcommand. After the attachment, persistent VALE ports
264 * look exactly like ephemeral VALE ports (as created in step 2 above). The
265 * attachment of other interfaces, instead, requires the creation of a
266 * netmap_bwrap_adapter. Moreover, the attached interface must be put in
267 * netmap mode. This may require the creation of a netmap_generic_adapter if
268 * we have no native support for the interface, or if generic adapters have
269 * been forced by sysctl.
271 * Both persistent VALE ports and bwraps are handled by netmap_get_bdg_na(),
272 * called by nm_bdg_ctl_attach(), and discriminated by the nm_bdg_attach()
273 * callback. In the case of the bwrap, the callback creates the
274 * netmap_bwrap_adapter. The initialization of the bwrap is then
275 * completed by calling netmap_do_regif() on it, in the nm_bdg_ctl()
276 * callback (netmap_bwrap_bdg_ctl in netmap_vale.c).
277 * A generic adapter for the wrapped ifp will be created if needed, when
278 * netmap_get_bdg_na() calls netmap_get_hw_na().
281 * ---- DATAPATHS -----
283 * -= SYSTEM DEVICE WITH NATIVE SUPPORT =-
285 * na == NA(ifp) == netmap_hw_adapter created in DEVICE_netmap_attach()
287 * - tx from netmap userspace:
289 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context
290 * kring->nm_sync() == DEVICE_netmap_txsync()
291 * 2) device interrupt handler
292 * na->nm_notify() == netmap_notify()
293 * - rx from netmap userspace:
295 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context
296 * kring->nm_sync() == DEVICE_netmap_rxsync()
297 * 2) device interrupt handler
298 * na->nm_notify() == netmap_notify()
299 * - rx from host stack
303 * na->nm_notify == netmap_notify()
304 * 2) ioctl(NIOCRXSYNC)/netmap_poll() in process context
305 * kring->nm_sync() == netmap_rxsync_from_host
306 * netmap_rxsync_from_host(na, NULL, NULL)
308 * ioctl(NIOCTXSYNC)/netmap_poll() in process context
309 * kring->nm_sync() == netmap_txsync_to_host
310 * netmap_txsync_to_host(na)
312 * FreeBSD: na->if_input() == ether_input()
313 * linux: netif_rx() with NM_MAGIC_PRIORITY_RX
316 * -= SYSTEM DEVICE WITH GENERIC SUPPORT =-
318 * na == NA(ifp) == generic_netmap_adapter created in generic_netmap_attach()
320 * - tx from netmap userspace:
322 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context
323 * kring->nm_sync() == generic_netmap_txsync()
324 * nm_os_generic_xmit_frame()
325 * linux: dev_queue_xmit() with NM_MAGIC_PRIORITY_TX
326 * ifp->ndo_start_xmit == generic_ndo_start_xmit()
327 * gna->save_start_xmit == orig. dev. start_xmit
328 * FreeBSD: na->if_transmit() == orig. dev if_transmit
329 * 2) generic_mbuf_destructor()
330 * na->nm_notify() == netmap_notify()
331 * - rx from netmap userspace:
332 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context
333 * kring->nm_sync() == generic_netmap_rxsync()
336 * generic_rx_handler()
338 * na->nm_notify() == netmap_notify()
339 * - rx from host stack
340 * FreeBSD: same as native
341 * Linux: same as native except:
343 * dev_queue_xmit() without NM_MAGIC_PRIORITY_TX
344 * ifp->ndo_start_xmit == generic_ndo_start_xmit()
346 * na->nm_notify() == netmap_notify()
347 * - tx to host stack (same as native):
355 * ioctl(NIOCTXSYNC)/netmap_poll() in process context
356 * kring->nm_sync() == netmap_vp_txsync()
358 * - system device with native support:
361 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring)
362 * kring->nm_sync() == DEVICE_netmap_rxsync()
364 * kring->nm_sync() == DEVICE_netmap_rxsync()
367 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring)
368 * kring->nm_sync() == netmap_rxsync_from_host()
371 * - system device with generic support:
372 * from device driver:
373 * generic_rx_handler()
374 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring)
375 * kring->nm_sync() == generic_netmap_rxsync()
377 * kring->nm_sync() == generic_netmap_rxsync()
380 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring)
381 * kring->nm_sync() == netmap_rxsync_from_host()
384 * (all cases) --> nm_bdg_flush()
385 * dest_na->nm_notify() == (see below)
391 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context
392 * kring->nm_sync() == netmap_vp_rxsync()
393 * 2) from nm_bdg_flush()
394 * na->nm_notify() == netmap_notify()
396 * - system device with native support:
398 * na->nm_notify() == netmap_bwrap_notify()
400 * kring->nm_sync() == DEVICE_netmap_txsync()
404 * kring->nm_sync() == netmap_txsync_to_host
405 * netmap_vp_rxsync_locked()
407 * - system device with generic adapter:
409 * na->nm_notify() == netmap_bwrap_notify()
411 * kring->nm_sync() == generic_netmap_txsync()
415 * kring->nm_sync() == netmap_txsync_to_host
421 * OS-specific code that is used only within this file.
422 * Other OS-specific code that must be accessed by drivers
423 * is present in netmap_kern.h
426 #if defined(__FreeBSD__)
427 #include <sys/cdefs.h> /* prerequisite */
428 #include <sys/types.h>
429 #include <sys/errno.h>
430 #include <sys/param.h> /* defines used in kernel.h */
431 #include <sys/kernel.h> /* types used in module initialization */
432 #include <sys/conf.h> /* cdevsw struct, UID, GID */
433 #include <sys/filio.h> /* FIONBIO */
434 #include <sys/sockio.h>
435 #include <sys/socketvar.h> /* struct socket */
436 #include <sys/malloc.h>
437 #include <sys/poll.h>
438 #include <sys/rwlock.h>
439 #include <sys/socket.h> /* sockaddrs */
440 #include <sys/selinfo.h>
441 #include <sys/sysctl.h>
442 #include <sys/jail.h>
443 #include <net/vnet.h>
445 #include <net/if_var.h>
446 #include <net/bpf.h> /* BIOCIMMEDIATE */
447 #include <machine/bus.h> /* bus_dmamap_* */
448 #include <sys/endian.h>
449 #include <sys/refcount.h>
454 #include "bsd_glue.h"
456 #elif defined(__APPLE__)
458 #warning OSX support is only partial
459 #include "osx_glue.h"
461 #elif defined (_WIN32)
463 #include "win_glue.h"
467 #error Unsupported platform
469 #endif /* unsupported */
474 #include <net/netmap.h>
475 #include <dev/netmap/netmap_kern.h>
476 #include <dev/netmap/netmap_mem2.h>
479 /* user-controlled variables */
482 static int netmap_no_timestamp; /* don't timestamp on rxsync */
483 int netmap_mitigate = 1;
484 int netmap_no_pendintr = 1;
485 int netmap_txsync_retry = 2;
486 int netmap_flags = 0; /* debug flags */
487 static int netmap_fwd = 0; /* force transparent forwarding */
490 * netmap_admode selects the netmap mode to use.
491 * Invalid values are reset to NETMAP_ADMODE_BEST
493 enum { NETMAP_ADMODE_BEST = 0, /* use native, fallback to generic */
494 NETMAP_ADMODE_NATIVE, /* either native or none */
495 NETMAP_ADMODE_GENERIC, /* force generic */
496 NETMAP_ADMODE_LAST };
497 static int netmap_admode = NETMAP_ADMODE_BEST;
499 /* netmap_generic_mit controls mitigation of RX notifications for
500 * the generic netmap adapter. The value is a time interval in
502 int netmap_generic_mit = 100*1000;
504 /* We use by default netmap-aware qdiscs with generic netmap adapters,
505 * even if there can be a little performance hit with hardware NICs.
506 * However, using the qdisc is the safer approach, for two reasons:
507 * 1) it prevents non-fifo qdiscs to break the TX notification
508 * scheme, which is based on mbuf destructors when txqdisc is
510 * 2) it makes it possible to transmit over software devices that
511 * change skb->dev, like bridge, veth, ...
513 * Anyway users looking for the best performance should
514 * use native adapters.
516 int netmap_generic_txqdisc = 1;
518 /* Default number of slots and queues for generic adapters. */
519 int netmap_generic_ringsize = 1024;
520 int netmap_generic_rings = 1;
522 /* Non-zero if ptnet devices are allowed to use virtio-net headers. */
523 int ptnet_vnet_hdr = 1;
525 /* 0 if ptnetmap should not use worker threads for TX processing */
526 int ptnetmap_tx_workers = 1;
529 * SYSCTL calls are grouped between SYSBEGIN and SYSEND to be emulated
530 * in some other operating systems
534 SYSCTL_DECL(_dev_netmap);
535 SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args");
536 SYSCTL_INT(_dev_netmap, OID_AUTO, verbose,
537 CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode");
538 SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp,
539 CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp");
540 SYSCTL_INT(_dev_netmap, OID_AUTO, mitigate, CTLFLAG_RW, &netmap_mitigate, 0, "");
541 SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr,
542 CTLFLAG_RW, &netmap_no_pendintr, 0, "Always look for new received packets.");
543 SYSCTL_INT(_dev_netmap, OID_AUTO, txsync_retry, CTLFLAG_RW,
544 &netmap_txsync_retry, 0 , "Number of txsync loops in bridge's flush.");
546 SYSCTL_INT(_dev_netmap, OID_AUTO, flags, CTLFLAG_RW, &netmap_flags, 0 , "");
547 SYSCTL_INT(_dev_netmap, OID_AUTO, fwd, CTLFLAG_RW, &netmap_fwd, 0 , "");
548 SYSCTL_INT(_dev_netmap, OID_AUTO, admode, CTLFLAG_RW, &netmap_admode, 0 , "");
549 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_mit, CTLFLAG_RW, &netmap_generic_mit, 0 , "");
550 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_ringsize, CTLFLAG_RW, &netmap_generic_ringsize, 0 , "");
551 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_rings, CTLFLAG_RW, &netmap_generic_rings, 0 , "");
552 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_txqdisc, CTLFLAG_RW, &netmap_generic_txqdisc, 0 , "");
553 SYSCTL_INT(_dev_netmap, OID_AUTO, ptnet_vnet_hdr, CTLFLAG_RW, &ptnet_vnet_hdr, 0 , "");
554 SYSCTL_INT(_dev_netmap, OID_AUTO, ptnetmap_tx_workers, CTLFLAG_RW, &ptnetmap_tx_workers, 0 , "");
558 NMG_LOCK_T netmap_global_lock;
561 * mark the ring as stopped, and run through the locks
562 * to make sure other users get to see it.
563 * stopped must be either NR_KR_STOPPED (for unbounded stop)
564 * of NR_KR_LOCKED (brief stop for mutual exclusion purposes)
567 netmap_disable_ring(struct netmap_kring *kr, int stopped)
569 nm_kr_stop(kr, stopped);
570 // XXX check if nm_kr_stop is sufficient
571 mtx_lock(&kr->q_lock);
572 mtx_unlock(&kr->q_lock);
576 /* stop or enable a single ring */
578 netmap_set_ring(struct netmap_adapter *na, u_int ring_id, enum txrx t, int stopped)
581 netmap_disable_ring(NMR(na, t) + ring_id, stopped);
583 NMR(na, t)[ring_id].nkr_stopped = 0;
587 /* stop or enable all the rings of na */
589 netmap_set_all_rings(struct netmap_adapter *na, int stopped)
594 if (!nm_netmap_on(na))
598 for (i = 0; i < netmap_real_rings(na, t); i++) {
599 netmap_set_ring(na, i, t, stopped);
605 * Convenience function used in drivers. Waits for current txsync()s/rxsync()s
606 * to finish and prevents any new one from starting. Call this before turning
607 * netmap mode off, or before removing the hardware rings (e.g., on module
611 netmap_disable_all_rings(struct ifnet *ifp)
613 if (NM_NA_VALID(ifp)) {
614 netmap_set_all_rings(NA(ifp), NM_KR_STOPPED);
619 * Convenience function used in drivers. Re-enables rxsync and txsync on the
620 * adapter's rings In linux drivers, this should be placed near each
624 netmap_enable_all_rings(struct ifnet *ifp)
626 if (NM_NA_VALID(ifp)) {
627 netmap_set_all_rings(NA(ifp), 0 /* enabled */);
632 netmap_make_zombie(struct ifnet *ifp)
634 if (NM_NA_VALID(ifp)) {
635 struct netmap_adapter *na = NA(ifp);
636 netmap_set_all_rings(na, NM_KR_LOCKED);
637 na->na_flags |= NAF_ZOMBIE;
638 netmap_set_all_rings(na, 0);
643 netmap_undo_zombie(struct ifnet *ifp)
645 if (NM_NA_VALID(ifp)) {
646 struct netmap_adapter *na = NA(ifp);
647 if (na->na_flags & NAF_ZOMBIE) {
648 netmap_set_all_rings(na, NM_KR_LOCKED);
649 na->na_flags &= ~NAF_ZOMBIE;
650 netmap_set_all_rings(na, 0);
656 * generic bound_checking function
659 nm_bound_var(u_int *v, u_int dflt, u_int lo, u_int hi, const char *msg)
662 const char *op = NULL;
671 } else if (oldv > hi) {
676 nm_prinf("%s %s to %d (was %d)\n", op, msg, *v, oldv);
682 * packet-dump function, user-supplied or static buffer.
683 * The destination buffer must be at least 30+4*len
686 nm_dump_buf(char *p, int len, int lim, char *dst)
688 static char _dst[8192];
690 static char hex[] ="0123456789abcdef";
691 char *o; /* output position */
693 #define P_HI(x) hex[((x) & 0xf0)>>4]
694 #define P_LO(x) hex[((x) & 0xf)]
695 #define P_C(x) ((x) >= 0x20 && (x) <= 0x7e ? (x) : '.')
698 if (lim <= 0 || lim > len)
701 sprintf(o, "buf 0x%p len %d lim %d\n", p, len, lim);
703 /* hexdump routine */
704 for (i = 0; i < lim; ) {
705 sprintf(o, "%5d: ", i);
709 for (j=0; j < 16 && i < lim; i++, j++) {
711 o[j*3+1] = P_LO(p[i]);
714 for (j=0; j < 16 && i < lim; i++, j++)
715 o[j + 48] = P_C(p[i]);
728 * Fetch configuration from the device, to cope with dynamic
729 * reconfigurations after loading the module.
731 /* call with NMG_LOCK held */
733 netmap_update_config(struct netmap_adapter *na)
735 u_int txr, txd, rxr, rxd;
737 txr = txd = rxr = rxd = 0;
738 if (na->nm_config == NULL ||
739 na->nm_config(na, &txr, &txd, &rxr, &rxd))
741 /* take whatever we had at init time */
742 txr = na->num_tx_rings;
743 txd = na->num_tx_desc;
744 rxr = na->num_rx_rings;
745 rxd = na->num_rx_desc;
748 if (na->num_tx_rings == txr && na->num_tx_desc == txd &&
749 na->num_rx_rings == rxr && na->num_rx_desc == rxd)
750 return 0; /* nothing changed */
751 if (netmap_verbose || na->active_fds > 0) {
752 D("stored config %s: txring %d x %d, rxring %d x %d",
754 na->num_tx_rings, na->num_tx_desc,
755 na->num_rx_rings, na->num_rx_desc);
756 D("new config %s: txring %d x %d, rxring %d x %d",
757 na->name, txr, txd, rxr, rxd);
759 if (na->active_fds == 0) {
760 D("configuration changed (but fine)");
761 na->num_tx_rings = txr;
762 na->num_tx_desc = txd;
763 na->num_rx_rings = rxr;
764 na->num_rx_desc = rxd;
767 D("configuration changed while active, this is bad...");
771 /* nm_sync callbacks for the host rings */
772 static int netmap_txsync_to_host(struct netmap_kring *kring, int flags);
773 static int netmap_rxsync_from_host(struct netmap_kring *kring, int flags);
775 /* create the krings array and initialize the fields common to all adapters.
776 * The array layout is this:
779 * na->tx_rings ----->| | \
780 * | | } na->num_tx_ring
784 * na->rx_rings ----> +----------+
786 * | | } na->num_rx_rings
791 * na->tailroom ----->| | \
792 * | | } tailroom bytes
796 * Note: for compatibility, host krings are created even when not needed.
797 * The tailroom space is currently used by vale ports for allocating leases.
799 /* call with NMG_LOCK held */
801 netmap_krings_create(struct netmap_adapter *na, u_int tailroom)
804 struct netmap_kring *kring;
808 if (na->tx_rings != NULL) {
809 D("warning: krings were already created");
813 /* account for the (possibly fake) host rings */
814 n[NR_TX] = na->num_tx_rings + 1;
815 n[NR_RX] = na->num_rx_rings + 1;
817 len = (n[NR_TX] + n[NR_RX]) * sizeof(struct netmap_kring) + tailroom;
819 na->tx_rings = nm_os_malloc((size_t)len);
820 if (na->tx_rings == NULL) {
821 D("Cannot allocate krings");
824 na->rx_rings = na->tx_rings + n[NR_TX];
827 * All fields in krings are 0 except the one initialized below.
828 * but better be explicit on important kring fields.
831 ndesc = nma_get_ndesc(na, t);
832 for (i = 0; i < n[t]; i++) {
833 kring = &NMR(na, t)[i];
834 bzero(kring, sizeof(*kring));
838 kring->nkr_num_slots = ndesc;
839 kring->nr_mode = NKR_NETMAP_OFF;
840 kring->nr_pending_mode = NKR_NETMAP_OFF;
841 if (i < nma_get_nrings(na, t)) {
842 kring->nm_sync = (t == NR_TX ? na->nm_txsync : na->nm_rxsync);
844 kring->nm_sync = (t == NR_TX ?
845 netmap_txsync_to_host:
846 netmap_rxsync_from_host);
848 kring->nm_notify = na->nm_notify;
849 kring->rhead = kring->rcur = kring->nr_hwcur = 0;
851 * IMPORTANT: Always keep one slot empty.
853 kring->rtail = kring->nr_hwtail = (t == NR_TX ? ndesc - 1 : 0);
854 snprintf(kring->name, sizeof(kring->name) - 1, "%s %s%d", na->name,
856 ND("ktx %s h %d c %d t %d",
857 kring->name, kring->rhead, kring->rcur, kring->rtail);
858 mtx_init(&kring->q_lock, (t == NR_TX ? "nm_txq_lock" : "nm_rxq_lock"), NULL, MTX_DEF);
859 nm_os_selinfo_init(&kring->si);
861 nm_os_selinfo_init(&na->si[t]);
864 na->tailroom = na->rx_rings + n[NR_RX];
870 /* undo the actions performed by netmap_krings_create */
871 /* call with NMG_LOCK held */
873 netmap_krings_delete(struct netmap_adapter *na)
875 struct netmap_kring *kring = na->tx_rings;
878 if (na->tx_rings == NULL) {
879 D("warning: krings were already deleted");
884 nm_os_selinfo_uninit(&na->si[t]);
886 /* we rely on the krings layout described above */
887 for ( ; kring != na->tailroom; kring++) {
888 mtx_destroy(&kring->q_lock);
889 nm_os_selinfo_uninit(&kring->si);
891 nm_os_free(na->tx_rings);
892 na->tx_rings = na->rx_rings = na->tailroom = NULL;
897 * Destructor for NIC ports. They also have an mbuf queue
898 * on the rings connected to the host so we need to purge
901 /* call with NMG_LOCK held */
903 netmap_hw_krings_delete(struct netmap_adapter *na)
905 struct mbq *q = &na->rx_rings[na->num_rx_rings].rx_queue;
907 ND("destroy sw mbq with len %d", mbq_len(q));
910 netmap_krings_delete(na);
916 * Undo everything that was done in netmap_do_regif(). In particular,
917 * call nm_register(ifp,0) to stop netmap mode on the interface and
918 * revert to normal operation.
920 /* call with NMG_LOCK held */
921 static void netmap_unset_ringid(struct netmap_priv_d *);
922 static void netmap_krings_put(struct netmap_priv_d *);
924 netmap_do_unregif(struct netmap_priv_d *priv)
926 struct netmap_adapter *na = priv->np_na;
930 /* unset nr_pending_mode and possibly release exclusive mode */
931 netmap_krings_put(priv);
934 /* XXX check whether we have to do something with monitor
935 * when rings change nr_mode. */
936 if (na->active_fds <= 0) {
937 /* walk through all the rings and tell any monitor
938 * that the port is going to exit netmap mode
940 netmap_monitor_stop(na);
944 if (na->active_fds <= 0 || nm_kring_pending(priv)) {
945 na->nm_register(na, 0);
948 /* delete rings and buffers that are no longer needed */
949 netmap_mem_rings_delete(na);
951 if (na->active_fds <= 0) { /* last instance */
953 * (TO CHECK) We enter here
954 * when the last reference to this file descriptor goes
955 * away. This means we cannot have any pending poll()
956 * or interrupt routine operating on the structure.
957 * XXX The file may be closed in a thread while
958 * another thread is using it.
959 * Linux keeps the file opened until the last reference
960 * by any outstanding ioctl/poll or mmap is gone.
961 * FreeBSD does not track mmap()s (but we do) and
962 * wakes up any sleeping poll(). Need to check what
963 * happens if the close() occurs while a concurrent
964 * syscall is running.
967 D("deleting last instance for %s", na->name);
969 if (nm_netmap_on(na)) {
970 D("BUG: netmap on while going to delete the krings");
973 na->nm_krings_delete(na);
976 /* possibily decrement counter of tx_si/rx_si users */
977 netmap_unset_ringid(priv);
978 /* delete the nifp */
979 netmap_mem_if_delete(na, priv->np_nifp);
980 /* drop the allocator */
981 netmap_mem_deref(na->nm_mem, na);
982 /* mark the priv as unregistered */
984 priv->np_nifp = NULL;
987 /* call with NMG_LOCK held */
989 nm_si_user(struct netmap_priv_d *priv, enum txrx t)
991 return (priv->np_na != NULL &&
992 (priv->np_qlast[t] - priv->np_qfirst[t] > 1));
995 struct netmap_priv_d*
996 netmap_priv_new(void)
998 struct netmap_priv_d *priv;
1000 priv = nm_os_malloc(sizeof(struct netmap_priv_d));
1009 * Destructor of the netmap_priv_d, called when the fd is closed
1010 * Action: undo all the things done by NIOCREGIF,
1011 * On FreeBSD we need to track whether there are active mmap()s,
1012 * and we use np_active_mmaps for that. On linux, the field is always 0.
1013 * Return: 1 if we can free priv, 0 otherwise.
1016 /* call with NMG_LOCK held */
1018 netmap_priv_delete(struct netmap_priv_d *priv)
1020 struct netmap_adapter *na = priv->np_na;
1022 /* number of active references to this fd */
1023 if (--priv->np_refs > 0) {
1028 netmap_do_unregif(priv);
1030 netmap_unget_na(na, priv->np_ifp);
1031 bzero(priv, sizeof(*priv)); /* for safety */
1036 /* call with NMG_LOCK *not* held */
1038 netmap_dtor(void *data)
1040 struct netmap_priv_d *priv = data;
1043 netmap_priv_delete(priv);
1049 * Handlers for synchronization of the rings from/to the host stack.
1050 * These are associated to a network interface and are just another
1051 * ring pair managed by userspace.
1053 * Netmap also supports transparent forwarding (NS_FORWARD and NR_FORWARD
1056 * - Before releasing buffers on hw RX rings, the application can mark
1057 * them with the NS_FORWARD flag. During the next RXSYNC or poll(), they
1058 * will be forwarded to the host stack, similarly to what happened if
1059 * the application moved them to the host TX ring.
1061 * - Before releasing buffers on the host RX ring, the application can
1062 * mark them with the NS_FORWARD flag. During the next RXSYNC or poll(),
1063 * they will be forwarded to the hw TX rings, saving the application
1064 * from doing the same task in user-space.
1066 * Transparent fowarding can be enabled per-ring, by setting the NR_FORWARD
1067 * flag, or globally with the netmap_fwd sysctl.
1069 * The transfer NIC --> host is relatively easy, just encapsulate
1070 * into mbufs and we are done. The host --> NIC side is slightly
1071 * harder because there might not be room in the tx ring so it
1072 * might take a while before releasing the buffer.
1077 * Pass a whole queue of mbufs to the host stack as coming from 'dst'
1078 * We do not need to lock because the queue is private.
1079 * After this call the queue is empty.
1082 netmap_send_up(struct ifnet *dst, struct mbq *q)
1085 struct mbuf *head = NULL, *prev = NULL;
1087 /* Send packets up, outside the lock; head/prev machinery
1088 * is only useful for Windows. */
1089 while ((m = mbq_dequeue(q)) != NULL) {
1090 if (netmap_verbose & NM_VERB_HOST)
1091 D("sending up pkt %p size %d", m, MBUF_LEN(m));
1092 prev = nm_os_send_up(dst, m, prev);
1097 nm_os_send_up(dst, NULL, head);
1103 * Scan the buffers from hwcur to ring->head, and put a copy of those
1104 * marked NS_FORWARD (or all of them if forced) into a queue of mbufs.
1105 * Drop remaining packets in the unlikely event
1106 * of an mbuf shortage.
1109 netmap_grab_packets(struct netmap_kring *kring, struct mbq *q, int force)
1111 u_int const lim = kring->nkr_num_slots - 1;
1112 u_int const head = kring->rhead;
1114 struct netmap_adapter *na = kring->na;
1116 for (n = kring->nr_hwcur; n != head; n = nm_next(n, lim)) {
1118 struct netmap_slot *slot = &kring->ring->slot[n];
1120 if ((slot->flags & NS_FORWARD) == 0 && !force)
1122 if (slot->len < 14 || slot->len > NETMAP_BUF_SIZE(na)) {
1123 RD(5, "bad pkt at %d len %d", n, slot->len);
1126 slot->flags &= ~NS_FORWARD; // XXX needed ?
1127 /* XXX TODO: adapt to the case of a multisegment packet */
1128 m = m_devget(NMB(na, slot), slot->len, 0, na->ifp, NULL);
1137 _nm_may_forward(struct netmap_kring *kring)
1139 return ((netmap_fwd || kring->ring->flags & NR_FORWARD) &&
1140 kring->na->na_flags & NAF_HOST_RINGS &&
1141 kring->tx == NR_RX);
1145 nm_may_forward_up(struct netmap_kring *kring)
1147 return _nm_may_forward(kring) &&
1148 kring->ring_id != kring->na->num_rx_rings;
1152 nm_may_forward_down(struct netmap_kring *kring, int sync_flags)
1154 return _nm_may_forward(kring) &&
1155 (sync_flags & NAF_CAN_FORWARD_DOWN) &&
1156 kring->ring_id == kring->na->num_rx_rings;
1160 * Send to the NIC rings packets marked NS_FORWARD between
1161 * kring->nr_hwcur and kring->rhead.
1162 * Called under kring->rx_queue.lock on the sw rx ring.
1164 * It can only be called if the user opened all the TX hw rings,
1165 * see NAF_CAN_FORWARD_DOWN flag.
1166 * We can touch the TX netmap rings (slots, head and cur) since
1167 * we are in poll/ioctl system call context, and the application
1168 * is not supposed to touch the ring (using a different thread)
1169 * during the execution of the system call.
1172 netmap_sw_to_nic(struct netmap_adapter *na)
1174 struct netmap_kring *kring = &na->rx_rings[na->num_rx_rings];
1175 struct netmap_slot *rxslot = kring->ring->slot;
1176 u_int i, rxcur = kring->nr_hwcur;
1177 u_int const head = kring->rhead;
1178 u_int const src_lim = kring->nkr_num_slots - 1;
1181 /* scan rings to find space, then fill as much as possible */
1182 for (i = 0; i < na->num_tx_rings; i++) {
1183 struct netmap_kring *kdst = &na->tx_rings[i];
1184 struct netmap_ring *rdst = kdst->ring;
1185 u_int const dst_lim = kdst->nkr_num_slots - 1;
1187 /* XXX do we trust ring or kring->rcur,rtail ? */
1188 for (; rxcur != head && !nm_ring_empty(rdst);
1189 rxcur = nm_next(rxcur, src_lim) ) {
1190 struct netmap_slot *src, *dst, tmp;
1191 u_int dst_head = rdst->head;
1193 src = &rxslot[rxcur];
1194 if ((src->flags & NS_FORWARD) == 0 && !netmap_fwd)
1199 dst = &rdst->slot[dst_head];
1203 src->buf_idx = dst->buf_idx;
1204 src->flags = NS_BUF_CHANGED;
1206 dst->buf_idx = tmp.buf_idx;
1208 dst->flags = NS_BUF_CHANGED;
1210 rdst->head = rdst->cur = nm_next(dst_head, dst_lim);
1212 /* if (sent) XXX txsync ? it would be just an optimization */
1219 * netmap_txsync_to_host() passes packets up. We are called from a
1220 * system call in user process context, and the only contention
1221 * can be among multiple user threads erroneously calling
1222 * this routine concurrently.
1225 netmap_txsync_to_host(struct netmap_kring *kring, int flags)
1227 struct netmap_adapter *na = kring->na;
1228 u_int const lim = kring->nkr_num_slots - 1;
1229 u_int const head = kring->rhead;
1232 /* Take packets from hwcur to head and pass them up.
1233 * Force hwcur = head since netmap_grab_packets() stops at head
1236 netmap_grab_packets(kring, &q, 1 /* force */);
1237 ND("have %d pkts in queue", mbq_len(&q));
1238 kring->nr_hwcur = head;
1239 kring->nr_hwtail = head + lim;
1240 if (kring->nr_hwtail > lim)
1241 kring->nr_hwtail -= lim + 1;
1243 netmap_send_up(na->ifp, &q);
1249 * rxsync backend for packets coming from the host stack.
1250 * They have been put in kring->rx_queue by netmap_transmit().
1251 * We protect access to the kring using kring->rx_queue.lock
1253 * also moves to the nic hw rings any packet the user has marked
1254 * for transparent-mode forwarding, then sets the NR_FORWARD
1255 * flag in the kring to let the caller push them out
1258 netmap_rxsync_from_host(struct netmap_kring *kring, int flags)
1260 struct netmap_adapter *na = kring->na;
1261 struct netmap_ring *ring = kring->ring;
1263 u_int const lim = kring->nkr_num_slots - 1;
1264 u_int const head = kring->rhead;
1266 struct mbq *q = &kring->rx_queue, fq;
1268 mbq_init(&fq); /* fq holds packets to be freed */
1272 /* First part: import newly received packets */
1274 if (n) { /* grab packets from the queue */
1278 nm_i = kring->nr_hwtail;
1279 stop_i = nm_prev(kring->nr_hwcur, lim);
1280 while ( nm_i != stop_i && (m = mbq_dequeue(q)) != NULL ) {
1281 int len = MBUF_LEN(m);
1282 struct netmap_slot *slot = &ring->slot[nm_i];
1284 m_copydata(m, 0, len, NMB(na, slot));
1285 ND("nm %d len %d", nm_i, len);
1287 D("%s", nm_dump_buf(NMB(na, slot),len, 128, NULL));
1290 slot->flags = kring->nkr_slot_flags;
1291 nm_i = nm_next(nm_i, lim);
1292 mbq_enqueue(&fq, m);
1294 kring->nr_hwtail = nm_i;
1298 * Second part: skip past packets that userspace has released.
1300 nm_i = kring->nr_hwcur;
1301 if (nm_i != head) { /* something was released */
1302 if (nm_may_forward_down(kring, flags)) {
1303 ret = netmap_sw_to_nic(na);
1305 kring->nr_kflags |= NR_FORWARD;
1309 kring->nr_hwcur = head;
1321 /* Get a netmap adapter for the port.
1323 * If it is possible to satisfy the request, return 0
1324 * with *na containing the netmap adapter found.
1325 * Otherwise return an error code, with *na containing NULL.
1327 * When the port is attached to a bridge, we always return
1329 * Otherwise, if the port is already bound to a file descriptor,
1330 * then we unconditionally return the existing adapter into *na.
1331 * In all the other cases, we return (into *na) either native,
1332 * generic or NULL, according to the following table:
1335 * active_fds dev.netmap.admode YES NO
1336 * -------------------------------------------------------
1337 * >0 * NA(ifp) NA(ifp)
1339 * 0 NETMAP_ADMODE_BEST NATIVE GENERIC
1340 * 0 NETMAP_ADMODE_NATIVE NATIVE NULL
1341 * 0 NETMAP_ADMODE_GENERIC GENERIC GENERIC
1344 static void netmap_hw_dtor(struct netmap_adapter *); /* needed by NM_IS_NATIVE() */
1346 netmap_get_hw_na(struct ifnet *ifp, struct netmap_mem_d *nmd, struct netmap_adapter **na)
1348 /* generic support */
1349 int i = netmap_admode; /* Take a snapshot. */
1350 struct netmap_adapter *prev_na;
1353 *na = NULL; /* default */
1355 /* reset in case of invalid value */
1356 if (i < NETMAP_ADMODE_BEST || i >= NETMAP_ADMODE_LAST)
1357 i = netmap_admode = NETMAP_ADMODE_BEST;
1359 if (NM_NA_VALID(ifp)) {
1361 /* If an adapter already exists, return it if
1362 * there are active file descriptors or if
1363 * netmap is not forced to use generic
1366 if (NETMAP_OWNED_BY_ANY(prev_na)
1367 || i != NETMAP_ADMODE_GENERIC
1368 || prev_na->na_flags & NAF_FORCE_NATIVE
1370 /* ugly, but we cannot allow an adapter switch
1371 * if some pipe is referring to this one
1373 || prev_na->na_next_pipe > 0
1381 /* If there isn't native support and netmap is not allowed
1382 * to use generic adapters, we cannot satisfy the request.
1384 if (!NM_IS_NATIVE(ifp) && i == NETMAP_ADMODE_NATIVE)
1387 /* Otherwise, create a generic adapter and return it,
1388 * saving the previously used netmap adapter, if any.
1390 * Note that here 'prev_na', if not NULL, MUST be a
1391 * native adapter, and CANNOT be a generic one. This is
1392 * true because generic adapters are created on demand, and
1393 * destroyed when not used anymore. Therefore, if the adapter
1394 * currently attached to an interface 'ifp' is generic, it
1396 * (NA(ifp)->active_fds > 0 || NETMAP_OWNED_BY_KERN(NA(ifp))).
1397 * Consequently, if NA(ifp) is generic, we will enter one of
1398 * the branches above. This ensures that we never override
1399 * a generic adapter with another generic adapter.
1401 error = generic_netmap_attach(ifp);
1408 if (nmd != NULL && !((*na)->na_flags & NAF_MEM_OWNER) &&
1409 (*na)->active_fds == 0 && ((*na)->nm_mem != nmd)) {
1410 netmap_mem_put((*na)->nm_mem);
1411 (*na)->nm_mem = netmap_mem_get(nmd);
1418 * MUST BE CALLED UNDER NMG_LOCK()
1420 * Get a refcounted reference to a netmap adapter attached
1421 * to the interface specified by nmr.
1422 * This is always called in the execution of an ioctl().
1424 * Return ENXIO if the interface specified by the request does
1425 * not exist, ENOTSUP if netmap is not supported by the interface,
1426 * EBUSY if the interface is already attached to a bridge,
1427 * EINVAL if parameters are invalid, ENOMEM if needed resources
1428 * could not be allocated.
1429 * If successful, hold a reference to the netmap adapter.
1431 * If the interface specified by nmr is a system one, also keep
1432 * a reference to it and return a valid *ifp.
1435 netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na,
1436 struct ifnet **ifp, struct netmap_mem_d *nmd, int create)
1439 struct netmap_adapter *ret = NULL;
1442 *na = NULL; /* default return value */
1447 /* if the request contain a memid, try to find the
1448 * corresponding memory region
1450 if (nmd == NULL && nmr->nr_arg2) {
1451 nmd = netmap_mem_find(nmr->nr_arg2);
1454 /* keep the rereference */
1458 /* We cascade through all possible types of netmap adapter.
1459 * All netmap_get_*_na() functions return an error and an na,
1460 * with the following combinations:
1463 * 0 NULL type doesn't match
1464 * !0 NULL type matches, but na creation/lookup failed
1465 * 0 !NULL type matches and na created/found
1466 * !0 !NULL impossible
1469 /* try to see if this is a ptnetmap port */
1470 error = netmap_get_pt_host_na(nmr, na, nmd, create);
1471 if (error || *na != NULL)
1474 /* try to see if this is a monitor port */
1475 error = netmap_get_monitor_na(nmr, na, nmd, create);
1476 if (error || *na != NULL)
1479 /* try to see if this is a pipe port */
1480 error = netmap_get_pipe_na(nmr, na, nmd, create);
1481 if (error || *na != NULL)
1484 /* try to see if this is a bridge port */
1485 error = netmap_get_bdg_na(nmr, na, nmd, create);
1489 if (*na != NULL) /* valid match in netmap_get_bdg_na() */
1493 * This must be a hardware na, lookup the name in the system.
1494 * Note that by hardware we actually mean "it shows up in ifconfig".
1495 * This may still be a tap, a veth/epair, or even a
1496 * persistent VALE port.
1498 *ifp = ifunit_ref(nmr->nr_name);
1504 error = netmap_get_hw_na(*ifp, nmd, &ret);
1509 netmap_adapter_get(ret);
1514 netmap_adapter_put(ret);
1521 netmap_mem_put(nmd);
1526 /* undo netmap_get_na() */
1528 netmap_unget_na(struct netmap_adapter *na, struct ifnet *ifp)
1533 netmap_adapter_put(na);
1537 #define NM_FAIL_ON(t) do { \
1538 if (unlikely(t)) { \
1539 RD(5, "%s: fail '" #t "' " \
1541 "rh %d rc %d rt %d " \
1544 head, cur, ring->tail, \
1545 kring->rhead, kring->rcur, kring->rtail, \
1546 kring->nr_hwcur, kring->nr_hwtail); \
1547 return kring->nkr_num_slots; \
1552 * validate parameters on entry for *_txsync()
1553 * Returns ring->cur if ok, or something >= kring->nkr_num_slots
1556 * rhead, rcur and rtail=hwtail are stored from previous round.
1557 * hwcur is the next packet to send to the ring.
1560 * hwcur <= *rhead <= head <= cur <= tail = *rtail <= hwtail
1562 * hwcur, rhead, rtail and hwtail are reliable
1565 nm_txsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring)
1567 u_int head = ring->head; /* read only once */
1568 u_int cur = ring->cur; /* read only once */
1569 u_int n = kring->nkr_num_slots;
1571 ND(5, "%s kcur %d ktail %d head %d cur %d tail %d",
1573 kring->nr_hwcur, kring->nr_hwtail,
1574 ring->head, ring->cur, ring->tail);
1575 #if 1 /* kernel sanity checks; but we can trust the kring. */
1576 NM_FAIL_ON(kring->nr_hwcur >= n || kring->rhead >= n ||
1577 kring->rtail >= n || kring->nr_hwtail >= n);
1578 #endif /* kernel sanity checks */
1580 * user sanity checks. We only use head,
1581 * A, B, ... are possible positions for head:
1583 * 0 A rhead B rtail C n-1
1584 * 0 D rtail E rhead F n-1
1586 * B, F, D are valid. A, C, E are wrong
1588 if (kring->rtail >= kring->rhead) {
1589 /* want rhead <= head <= rtail */
1590 NM_FAIL_ON(head < kring->rhead || head > kring->rtail);
1591 /* and also head <= cur <= rtail */
1592 NM_FAIL_ON(cur < head || cur > kring->rtail);
1593 } else { /* here rtail < rhead */
1594 /* we need head outside rtail .. rhead */
1595 NM_FAIL_ON(head > kring->rtail && head < kring->rhead);
1597 /* two cases now: head <= rtail or head >= rhead */
1598 if (head <= kring->rtail) {
1599 /* want head <= cur <= rtail */
1600 NM_FAIL_ON(cur < head || cur > kring->rtail);
1601 } else { /* head >= rhead */
1602 /* cur must be outside rtail..head */
1603 NM_FAIL_ON(cur > kring->rtail && cur < head);
1606 if (ring->tail != kring->rtail) {
1607 RD(5, "%s tail overwritten was %d need %d", kring->name,
1608 ring->tail, kring->rtail);
1609 ring->tail = kring->rtail;
1611 kring->rhead = head;
1618 * validate parameters on entry for *_rxsync()
1619 * Returns ring->head if ok, kring->nkr_num_slots on error.
1621 * For a valid configuration,
1622 * hwcur <= head <= cur <= tail <= hwtail
1624 * We only consider head and cur.
1625 * hwcur and hwtail are reliable.
1629 nm_rxsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring)
1631 uint32_t const n = kring->nkr_num_slots;
1634 ND(5,"%s kc %d kt %d h %d c %d t %d",
1636 kring->nr_hwcur, kring->nr_hwtail,
1637 ring->head, ring->cur, ring->tail);
1639 * Before storing the new values, we should check they do not
1640 * move backwards. However:
1641 * - head is not an issue because the previous value is hwcur;
1642 * - cur could in principle go back, however it does not matter
1643 * because we are processing a brand new rxsync()
1645 cur = kring->rcur = ring->cur; /* read only once */
1646 head = kring->rhead = ring->head; /* read only once */
1647 #if 1 /* kernel sanity checks */
1648 NM_FAIL_ON(kring->nr_hwcur >= n || kring->nr_hwtail >= n);
1649 #endif /* kernel sanity checks */
1650 /* user sanity checks */
1651 if (kring->nr_hwtail >= kring->nr_hwcur) {
1652 /* want hwcur <= rhead <= hwtail */
1653 NM_FAIL_ON(head < kring->nr_hwcur || head > kring->nr_hwtail);
1654 /* and also rhead <= rcur <= hwtail */
1655 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail);
1657 /* we need rhead outside hwtail..hwcur */
1658 NM_FAIL_ON(head < kring->nr_hwcur && head > kring->nr_hwtail);
1659 /* two cases now: head <= hwtail or head >= hwcur */
1660 if (head <= kring->nr_hwtail) {
1661 /* want head <= cur <= hwtail */
1662 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail);
1664 /* cur must be outside hwtail..head */
1665 NM_FAIL_ON(cur < head && cur > kring->nr_hwtail);
1668 if (ring->tail != kring->rtail) {
1669 RD(5, "%s tail overwritten was %d need %d",
1671 ring->tail, kring->rtail);
1672 ring->tail = kring->rtail;
1679 * Error routine called when txsync/rxsync detects an error.
1680 * Can't do much more than resetting head =cur = hwcur, tail = hwtail
1681 * Return 1 on reinit.
1683 * This routine is only called by the upper half of the kernel.
1684 * It only reads hwcur (which is changed only by the upper half, too)
1685 * and hwtail (which may be changed by the lower half, but only on
1686 * a tx ring and only to increase it, so any error will be recovered
1687 * on the next call). For the above, we don't strictly need to call
1691 netmap_ring_reinit(struct netmap_kring *kring)
1693 struct netmap_ring *ring = kring->ring;
1694 u_int i, lim = kring->nkr_num_slots - 1;
1697 // XXX KASSERT nm_kr_tryget
1698 RD(10, "called for %s", kring->name);
1699 // XXX probably wrong to trust userspace
1700 kring->rhead = ring->head;
1701 kring->rcur = ring->cur;
1702 kring->rtail = ring->tail;
1704 if (ring->cur > lim)
1706 if (ring->head > lim)
1708 if (ring->tail > lim)
1710 for (i = 0; i <= lim; i++) {
1711 u_int idx = ring->slot[i].buf_idx;
1712 u_int len = ring->slot[i].len;
1713 if (idx < 2 || idx >= kring->na->na_lut.objtotal) {
1714 RD(5, "bad index at slot %d idx %d len %d ", i, idx, len);
1715 ring->slot[i].buf_idx = 0;
1716 ring->slot[i].len = 0;
1717 } else if (len > NETMAP_BUF_SIZE(kring->na)) {
1718 ring->slot[i].len = 0;
1719 RD(5, "bad len at slot %d idx %d len %d", i, idx, len);
1723 RD(10, "total %d errors", errors);
1724 RD(10, "%s reinit, cur %d -> %d tail %d -> %d",
1726 ring->cur, kring->nr_hwcur,
1727 ring->tail, kring->nr_hwtail);
1728 ring->head = kring->rhead = kring->nr_hwcur;
1729 ring->cur = kring->rcur = kring->nr_hwcur;
1730 ring->tail = kring->rtail = kring->nr_hwtail;
1732 return (errors ? 1 : 0);
1735 /* interpret the ringid and flags fields of an nmreq, by translating them
1736 * into a pair of intervals of ring indices:
1738 * [priv->np_txqfirst, priv->np_txqlast) and
1739 * [priv->np_rxqfirst, priv->np_rxqlast)
1743 netmap_interp_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags)
1745 struct netmap_adapter *na = priv->np_na;
1746 u_int j, i = ringid & NETMAP_RING_MASK;
1747 u_int reg = flags & NR_REG_MASK;
1748 int excluded_direction[] = { NR_TX_RINGS_ONLY, NR_RX_RINGS_ONLY };
1751 if (reg == NR_REG_DEFAULT) {
1752 /* convert from old ringid to flags */
1753 if (ringid & NETMAP_SW_RING) {
1755 } else if (ringid & NETMAP_HW_RING) {
1756 reg = NR_REG_ONE_NIC;
1758 reg = NR_REG_ALL_NIC;
1760 D("deprecated API, old ringid 0x%x -> ringid %x reg %d", ringid, i, reg);
1763 if ((flags & NR_PTNETMAP_HOST) && ((reg != NR_REG_ALL_NIC &&
1764 reg != NR_REG_PIPE_MASTER && reg != NR_REG_PIPE_SLAVE) ||
1765 flags & (NR_RX_RINGS_ONLY|NR_TX_RINGS_ONLY))) {
1766 D("Error: only NR_REG_ALL_NIC supported with netmap passthrough");
1771 if (flags & excluded_direction[t]) {
1772 priv->np_qfirst[t] = priv->np_qlast[t] = 0;
1776 case NR_REG_ALL_NIC:
1777 case NR_REG_PIPE_MASTER:
1778 case NR_REG_PIPE_SLAVE:
1779 priv->np_qfirst[t] = 0;
1780 priv->np_qlast[t] = nma_get_nrings(na, t);
1781 ND("ALL/PIPE: %s %d %d", nm_txrx2str(t),
1782 priv->np_qfirst[t], priv->np_qlast[t]);
1786 if (!(na->na_flags & NAF_HOST_RINGS)) {
1787 D("host rings not supported");
1790 priv->np_qfirst[t] = (reg == NR_REG_SW ?
1791 nma_get_nrings(na, t) : 0);
1792 priv->np_qlast[t] = nma_get_nrings(na, t) + 1;
1793 ND("%s: %s %d %d", reg == NR_REG_SW ? "SW" : "NIC+SW",
1795 priv->np_qfirst[t], priv->np_qlast[t]);
1797 case NR_REG_ONE_NIC:
1798 if (i >= na->num_tx_rings && i >= na->num_rx_rings) {
1799 D("invalid ring id %d", i);
1802 /* if not enough rings, use the first one */
1804 if (j >= nma_get_nrings(na, t))
1806 priv->np_qfirst[t] = j;
1807 priv->np_qlast[t] = j + 1;
1808 ND("ONE_NIC: %s %d %d", nm_txrx2str(t),
1809 priv->np_qfirst[t], priv->np_qlast[t]);
1812 D("invalid regif type %d", reg);
1816 priv->np_flags = (flags & ~NR_REG_MASK) | reg;
1818 /* Allow transparent forwarding mode in the host --> nic
1819 * direction only if all the TX hw rings have been opened. */
1820 if (priv->np_qfirst[NR_TX] == 0 &&
1821 priv->np_qlast[NR_TX] >= na->num_tx_rings) {
1822 priv->np_sync_flags |= NAF_CAN_FORWARD_DOWN;
1825 if (netmap_verbose) {
1826 D("%s: tx [%d,%d) rx [%d,%d) id %d",
1828 priv->np_qfirst[NR_TX],
1829 priv->np_qlast[NR_TX],
1830 priv->np_qfirst[NR_RX],
1831 priv->np_qlast[NR_RX],
1839 * Set the ring ID. For devices with a single queue, a request
1840 * for all rings is the same as a single ring.
1843 netmap_set_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags)
1845 struct netmap_adapter *na = priv->np_na;
1849 error = netmap_interp_ringid(priv, ringid, flags);
1854 priv->np_txpoll = (ringid & NETMAP_NO_TX_POLL) ? 0 : 1;
1856 /* optimization: count the users registered for more than
1857 * one ring, which are the ones sleeping on the global queue.
1858 * The default netmap_notify() callback will then
1859 * avoid signaling the global queue if nobody is using it
1862 if (nm_si_user(priv, t))
1869 netmap_unset_ringid(struct netmap_priv_d *priv)
1871 struct netmap_adapter *na = priv->np_na;
1875 if (nm_si_user(priv, t))
1877 priv->np_qfirst[t] = priv->np_qlast[t] = 0;
1880 priv->np_txpoll = 0;
1884 /* Set the nr_pending_mode for the requested rings.
1885 * If requested, also try to get exclusive access to the rings, provided
1886 * the rings we want to bind are not exclusively owned by a previous bind.
1889 netmap_krings_get(struct netmap_priv_d *priv)
1891 struct netmap_adapter *na = priv->np_na;
1893 struct netmap_kring *kring;
1894 int excl = (priv->np_flags & NR_EXCLUSIVE);
1897 ND("%s: grabbing tx [%d, %d) rx [%d, %d)",
1899 priv->np_qfirst[NR_TX],
1900 priv->np_qlast[NR_TX],
1901 priv->np_qfirst[NR_RX],
1902 priv->np_qlast[NR_RX]);
1904 /* first round: check that all the requested rings
1905 * are neither alread exclusively owned, nor we
1906 * want exclusive ownership when they are already in use
1909 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
1910 kring = &NMR(na, t)[i];
1911 if ((kring->nr_kflags & NKR_EXCLUSIVE) ||
1912 (kring->users && excl))
1914 ND("ring %s busy", kring->name);
1920 /* second round: increment usage count (possibly marking them
1921 * as exclusive) and set the nr_pending_mode
1924 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
1925 kring = &NMR(na, t)[i];
1928 kring->nr_kflags |= NKR_EXCLUSIVE;
1929 kring->nr_pending_mode = NKR_NETMAP_ON;
1937 /* Undo netmap_krings_get(). This is done by clearing the exclusive mode
1938 * if was asked on regif, and unset the nr_pending_mode if we are the
1939 * last users of the involved rings. */
1941 netmap_krings_put(struct netmap_priv_d *priv)
1943 struct netmap_adapter *na = priv->np_na;
1945 struct netmap_kring *kring;
1946 int excl = (priv->np_flags & NR_EXCLUSIVE);
1949 ND("%s: releasing tx [%d, %d) rx [%d, %d)",
1951 priv->np_qfirst[NR_TX],
1952 priv->np_qlast[NR_TX],
1953 priv->np_qfirst[NR_RX],
1954 priv->np_qlast[MR_RX]);
1958 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
1959 kring = &NMR(na, t)[i];
1961 kring->nr_kflags &= ~NKR_EXCLUSIVE;
1963 if (kring->users == 0)
1964 kring->nr_pending_mode = NKR_NETMAP_OFF;
1970 * possibly move the interface to netmap-mode.
1971 * If success it returns a pointer to netmap_if, otherwise NULL.
1972 * This must be called with NMG_LOCK held.
1974 * The following na callbacks are called in the process:
1976 * na->nm_config() [by netmap_update_config]
1977 * (get current number and size of rings)
1979 * We have a generic one for linux (netmap_linux_config).
1980 * The bwrap has to override this, since it has to forward
1981 * the request to the wrapped adapter (netmap_bwrap_config).
1984 * na->nm_krings_create()
1985 * (create and init the krings array)
1987 * One of the following:
1989 * * netmap_hw_krings_create, (hw ports)
1990 * creates the standard layout for the krings
1991 * and adds the mbq (used for the host rings).
1993 * * netmap_vp_krings_create (VALE ports)
1994 * add leases and scratchpads
1996 * * netmap_pipe_krings_create (pipes)
1997 * create the krings and rings of both ends and
2000 * * netmap_monitor_krings_create (monitors)
2001 * avoid allocating the mbq
2003 * * netmap_bwrap_krings_create (bwraps)
2004 * create both the brap krings array,
2005 * the krings array of the wrapped adapter, and
2006 * (if needed) the fake array for the host adapter
2008 * na->nm_register(, 1)
2009 * (put the adapter in netmap mode)
2011 * This may be one of the following:
2013 * * netmap_hw_reg (hw ports)
2014 * checks that the ifp is still there, then calls
2015 * the hardware specific callback;
2017 * * netmap_vp_reg (VALE ports)
2018 * If the port is connected to a bridge,
2019 * set the NAF_NETMAP_ON flag under the
2020 * bridge write lock.
2022 * * netmap_pipe_reg (pipes)
2023 * inform the other pipe end that it is no
2024 * longer responsible for the lifetime of this
2027 * * netmap_monitor_reg (monitors)
2028 * intercept the sync callbacks of the monitored
2031 * * netmap_bwrap_reg (bwraps)
2032 * cross-link the bwrap and hwna rings,
2033 * forward the request to the hwna, override
2034 * the hwna notify callback (to get the frames
2035 * coming from outside go through the bridge).
2040 netmap_do_regif(struct netmap_priv_d *priv, struct netmap_adapter *na,
2041 uint16_t ringid, uint32_t flags)
2043 struct netmap_if *nifp = NULL;
2047 /* ring configuration may have changed, fetch from the card */
2048 netmap_update_config(na);
2049 priv->np_na = na; /* store the reference */
2050 error = netmap_set_ringid(priv, ringid, flags);
2053 error = netmap_mem_finalize(na->nm_mem, na);
2057 if (na->active_fds == 0) {
2059 * If this is the first registration of the adapter,
2060 * create the in-kernel view of the netmap rings,
2061 * the netmap krings.
2065 * Depending on the adapter, this may also create
2066 * the netmap rings themselves
2068 error = na->nm_krings_create(na);
2074 /* now the krings must exist and we can check whether some
2075 * previous bind has exclusive ownership on them, and set
2078 error = netmap_krings_get(priv);
2080 goto err_del_krings;
2082 /* create all needed missing netmap rings */
2083 error = netmap_mem_rings_create(na);
2087 /* in all cases, create a new netmap if */
2088 nifp = netmap_mem_if_new(na, priv);
2094 if (na->active_fds == 0) {
2095 /* cache the allocator info in the na */
2096 error = netmap_mem_get_lut(na->nm_mem, &na->na_lut);
2099 ND("lut %p bufs %u size %u", na->na_lut.lut, na->na_lut.objtotal,
2100 na->na_lut.objsize);
2103 if (nm_kring_pending(priv)) {
2104 /* Some kring is switching mode, tell the adapter to
2106 error = na->nm_register(na, 1);
2111 /* Commit the reference. */
2115 * advertise that the interface is ready by setting np_nifp.
2116 * The barrier is needed because readers (poll, *SYNC and mmap)
2117 * check for priv->np_nifp != NULL without locking
2119 mb(); /* make sure previous writes are visible to all CPUs */
2120 priv->np_nifp = nifp;
2125 if (na->active_fds == 0)
2126 memset(&na->na_lut, 0, sizeof(na->na_lut));
2128 netmap_mem_if_delete(na, nifp);
2130 netmap_krings_put(priv);
2132 netmap_mem_rings_delete(na);
2134 if (na->active_fds == 0)
2135 na->nm_krings_delete(na);
2137 netmap_mem_deref(na->nm_mem, na);
2145 * update kring and ring at the end of rxsync/txsync.
2148 nm_sync_finalize(struct netmap_kring *kring)
2151 * Update ring tail to what the kernel knows
2152 * After txsync: head/rhead/hwcur might be behind cur/rcur
2155 kring->ring->tail = kring->rtail = kring->nr_hwtail;
2157 ND(5, "%s now hwcur %d hwtail %d head %d cur %d tail %d",
2158 kring->name, kring->nr_hwcur, kring->nr_hwtail,
2159 kring->rhead, kring->rcur, kring->rtail);
2162 /* set ring timestamp */
2164 ring_timestamp_set(struct netmap_ring *ring)
2166 if (netmap_no_timestamp == 0 || ring->flags & NR_TIMESTAMP) {
2167 microtime(&ring->ts);
2173 * ioctl(2) support for the "netmap" device.
2175 * Following a list of accepted commands:
2177 * - SIOCGIFADDR just for convenience
2182 * Return 0 on success, errno otherwise.
2185 netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, caddr_t data, struct thread *td)
2187 struct mbq q; /* packets from RX hw queues to host stack */
2188 struct nmreq *nmr = (struct nmreq *) data;
2189 struct netmap_adapter *na = NULL;
2190 struct netmap_mem_d *nmd = NULL;
2191 struct ifnet *ifp = NULL;
2193 u_int i, qfirst, qlast;
2194 struct netmap_if *nifp;
2195 struct netmap_kring *krings;
2199 if (cmd == NIOCGINFO || cmd == NIOCREGIF) {
2201 nmr->nr_name[sizeof(nmr->nr_name) - 1] = '\0';
2202 if (nmr->nr_version != NETMAP_API) {
2203 D("API mismatch for %s got %d need %d",
2205 nmr->nr_version, NETMAP_API);
2206 nmr->nr_version = NETMAP_API;
2208 if (nmr->nr_version < NETMAP_MIN_API ||
2209 nmr->nr_version > NETMAP_MAX_API) {
2215 case NIOCGINFO: /* return capabilities etc */
2216 if (nmr->nr_cmd == NETMAP_BDG_LIST) {
2217 error = netmap_bdg_ctl(nmr, NULL);
2223 /* memsize is always valid */
2226 if (nmr->nr_name[0] != '\0') {
2228 /* get a refcount */
2229 error = netmap_get_na(nmr, &na, &ifp, NULL, 1 /* create */);
2235 nmd = na->nm_mem; /* get memory allocator */
2237 nmd = netmap_mem_find(nmr->nr_arg2 ? nmr->nr_arg2 : 1);
2244 error = netmap_mem_get_info(nmd, &nmr->nr_memsize, &memflags,
2248 if (na == NULL) /* only memory info */
2251 nmr->nr_rx_slots = nmr->nr_tx_slots = 0;
2252 netmap_update_config(na);
2253 nmr->nr_rx_rings = na->num_rx_rings;
2254 nmr->nr_tx_rings = na->num_tx_rings;
2255 nmr->nr_rx_slots = na->num_rx_desc;
2256 nmr->nr_tx_slots = na->num_tx_desc;
2258 netmap_unget_na(na, ifp);
2264 * If nmr->nr_cmd is not zero, this NIOCREGIF is not really
2265 * a regif operation, but a different one, specified by the
2266 * value of nmr->nr_cmd.
2269 if (i == NETMAP_BDG_ATTACH || i == NETMAP_BDG_DETACH
2270 || i == NETMAP_BDG_VNET_HDR
2271 || i == NETMAP_BDG_NEWIF
2272 || i == NETMAP_BDG_DELIF
2273 || i == NETMAP_BDG_POLLING_ON
2274 || i == NETMAP_BDG_POLLING_OFF) {
2275 /* possibly attach/detach NIC and VALE switch */
2276 error = netmap_bdg_ctl(nmr, NULL);
2278 } else if (i == NETMAP_PT_HOST_CREATE || i == NETMAP_PT_HOST_DELETE) {
2279 /* forward the command to the ptnetmap subsystem */
2280 error = ptnetmap_ctl(nmr, priv->np_na);
2282 } else if (i == NETMAP_VNET_HDR_GET) {
2283 /* get vnet-header length for this netmap port */
2287 error = netmap_get_na(nmr, &na, &ifp, NULL, 0);
2289 nmr->nr_arg1 = na->virt_hdr_len;
2291 netmap_unget_na(na, ifp);
2294 } else if (i == NETMAP_POOLS_INFO_GET) {
2295 /* get information from the memory allocator */
2297 if (priv->np_na && priv->np_na->nm_mem) {
2298 struct netmap_mem_d *nmd = priv->np_na->nm_mem;
2299 error = netmap_mem_pools_info_get(nmr, nmd);
2305 } else if (i != 0) {
2306 D("nr_cmd must be 0 not %d", i);
2311 /* protect access to priv from concurrent NIOCREGIF */
2317 if (priv->np_nifp != NULL) { /* thread already registered */
2323 /* find the allocator and get a reference */
2324 nmd = netmap_mem_find(nmr->nr_arg2);
2330 /* find the interface and a reference */
2331 error = netmap_get_na(nmr, &na, &ifp, nmd,
2332 1 /* create */); /* keep reference */
2335 if (NETMAP_OWNED_BY_KERN(na)) {
2340 if (na->virt_hdr_len && !(nmr->nr_flags & NR_ACCEPT_VNET_HDR)) {
2345 error = netmap_do_regif(priv, na, nmr->nr_ringid, nmr->nr_flags);
2346 if (error) { /* reg. failed, release priv and ref */
2349 nifp = priv->np_nifp;
2350 priv->np_td = td; // XXX kqueue, debugging only
2352 /* return the offset of the netmap_if object */
2353 nmr->nr_rx_rings = na->num_rx_rings;
2354 nmr->nr_tx_rings = na->num_tx_rings;
2355 nmr->nr_rx_slots = na->num_rx_desc;
2356 nmr->nr_tx_slots = na->num_tx_desc;
2357 error = netmap_mem_get_info(na->nm_mem, &nmr->nr_memsize, &memflags,
2360 netmap_do_unregif(priv);
2363 if (memflags & NETMAP_MEM_PRIVATE) {
2364 *(uint32_t *)(uintptr_t)&nifp->ni_flags |= NI_PRIV_MEM;
2367 priv->np_si[t] = nm_si_user(priv, t) ?
2368 &na->si[t] : &NMR(na, t)[priv->np_qfirst[t]].si;
2373 D("requested %d extra buffers", nmr->nr_arg3);
2374 nmr->nr_arg3 = netmap_extra_alloc(na,
2375 &nifp->ni_bufs_head, nmr->nr_arg3);
2377 D("got %d extra buffers", nmr->nr_arg3);
2379 nmr->nr_offset = netmap_mem_if_offset(na->nm_mem, nifp);
2381 /* store ifp reference so that priv destructor may release it */
2385 netmap_unget_na(na, ifp);
2387 /* release the reference from netmap_mem_find() or
2388 * netmap_mem_ext_create()
2391 netmap_mem_put(nmd);
2397 nifp = priv->np_nifp;
2403 mb(); /* make sure following reads are not from cache */
2405 na = priv->np_na; /* we have a reference */
2408 D("Internal error: nifp != NULL && na == NULL");
2414 t = (cmd == NIOCTXSYNC ? NR_TX : NR_RX);
2415 krings = NMR(na, t);
2416 qfirst = priv->np_qfirst[t];
2417 qlast = priv->np_qlast[t];
2418 sync_flags = priv->np_sync_flags;
2420 for (i = qfirst; i < qlast; i++) {
2421 struct netmap_kring *kring = krings + i;
2422 struct netmap_ring *ring = kring->ring;
2424 if (unlikely(nm_kr_tryget(kring, 1, &error))) {
2425 error = (error ? EIO : 0);
2429 if (cmd == NIOCTXSYNC) {
2430 if (netmap_verbose & NM_VERB_TXSYNC)
2431 D("pre txsync ring %d cur %d hwcur %d",
2434 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2435 netmap_ring_reinit(kring);
2436 } else if (kring->nm_sync(kring, sync_flags | NAF_FORCE_RECLAIM) == 0) {
2437 nm_sync_finalize(kring);
2439 if (netmap_verbose & NM_VERB_TXSYNC)
2440 D("post txsync ring %d cur %d hwcur %d",
2444 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2445 netmap_ring_reinit(kring);
2447 if (nm_may_forward_up(kring)) {
2448 /* transparent forwarding, see netmap_poll() */
2449 netmap_grab_packets(kring, &q, netmap_fwd);
2451 if (kring->nm_sync(kring, sync_flags | NAF_FORCE_READ) == 0) {
2452 nm_sync_finalize(kring);
2454 ring_timestamp_set(ring);
2460 netmap_send_up(na->ifp, &q);
2467 error = netmap_bdg_config(nmr);
2473 ND("FIONBIO/FIOASYNC are no-ops");
2480 D("ignore BIOCIMMEDIATE/BIOCSHDRCMPLT/BIOCSHDRCMPLT/BIOCSSEESENT");
2483 default: /* allow device-specific ioctls */
2485 struct ifnet *ifp = ifunit_ref(nmr->nr_name);
2491 bzero(&so, sizeof(so));
2492 so.so_vnet = ifp->if_vnet;
2493 // so->so_proto not null.
2494 error = ifioctl(&so, cmd, data, td);
2511 * select(2) and poll(2) handlers for the "netmap" device.
2513 * Can be called for one or more queues.
2514 * Return true the event mask corresponding to ready events.
2515 * If there are no ready events, do a selrecord on either individual
2516 * selinfo or on the global one.
2517 * Device-dependent parts (locking and sync of tx/rx rings)
2518 * are done through callbacks.
2520 * On linux, arguments are really pwait, the poll table, and 'td' is struct file *
2521 * The first one is remapped to pwait as selrecord() uses the name as an
2525 netmap_poll(struct netmap_priv_d *priv, int events, NM_SELRECORD_T *sr)
2527 struct netmap_adapter *na;
2528 struct netmap_kring *kring;
2529 struct netmap_ring *ring;
2530 u_int i, check_all_tx, check_all_rx, want[NR_TXRX], revents = 0;
2531 #define want_tx want[NR_TX]
2532 #define want_rx want[NR_RX]
2533 struct mbq q; /* packets from RX hw queues to host stack */
2537 * In order to avoid nested locks, we need to "double check"
2538 * txsync and rxsync if we decide to do a selrecord().
2539 * retry_tx (and retry_rx, later) prevent looping forever.
2541 int retry_tx = 1, retry_rx = 1;
2543 /* Transparent mode: send_down is 1 if we have found some
2544 * packets to forward (host RX ring --> NIC) during the rx
2545 * scan and we have not sent them down to the NIC yet.
2546 * Transparent mode requires to bind all rings to a single
2550 int sync_flags = priv->np_sync_flags;
2554 if (priv->np_nifp == NULL) {
2555 D("No if registered");
2558 mb(); /* make sure following reads are not from cache */
2562 if (!nm_netmap_on(na))
2565 if (netmap_verbose & 0x8000)
2566 D("device %s events 0x%x", na->name, events);
2567 want_tx = events & (POLLOUT | POLLWRNORM);
2568 want_rx = events & (POLLIN | POLLRDNORM);
2571 * check_all_{tx|rx} are set if the card has more than one queue AND
2572 * the file descriptor is bound to all of them. If so, we sleep on
2573 * the "global" selinfo, otherwise we sleep on individual selinfo
2574 * (FreeBSD only allows two selinfo's per file descriptor).
2575 * The interrupt routine in the driver wake one or the other
2576 * (or both) depending on which clients are active.
2578 * rxsync() is only called if we run out of buffers on a POLLIN.
2579 * txsync() is called if we run out of buffers on POLLOUT, or
2580 * there are pending packets to send. The latter can be disabled
2581 * passing NETMAP_NO_TX_POLL in the NIOCREG call.
2583 check_all_tx = nm_si_user(priv, NR_TX);
2584 check_all_rx = nm_si_user(priv, NR_RX);
2587 * We start with a lock free round which is cheap if we have
2588 * slots available. If this fails, then lock and call the sync
2591 #if 1 /* new code- call rx if any of the ring needs to release or read buffers */
2594 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) {
2595 kring = &NMR(na, t)[i];
2596 /* XXX compare ring->cur and kring->tail */
2597 if (!nm_ring_empty(kring->ring)) {
2599 want[t] = 0; /* also breaks the loop */
2604 want_rx = 0; /* look for a reason to run the handlers */
2606 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
2607 kring = &NMR(na, t)[i];
2608 if (kring->ring->cur == kring->ring->tail /* try fetch new buffers */
2609 || kring->rhead != kring->ring->head /* release buffers */) {
2614 revents |= events & (POLLIN | POLLRDNORM); /* we have data */
2616 #else /* old code */
2618 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) {
2619 kring = &NMR(na, t)[i];
2620 /* XXX compare ring->cur and kring->tail */
2621 if (!nm_ring_empty(kring->ring)) {
2623 want[t] = 0; /* also breaks the loop */
2627 #endif /* old code */
2630 * If we want to push packets out (priv->np_txpoll) or
2631 * want_tx is still set, we must issue txsync calls
2632 * (on all rings, to avoid that the tx rings stall).
2633 * XXX should also check cur != hwcur on the tx rings.
2634 * Fortunately, normal tx mode has np_txpoll set.
2636 if (priv->np_txpoll || want_tx) {
2638 * The first round checks if anyone is ready, if not
2639 * do a selrecord and another round to handle races.
2640 * want_tx goes to 0 if any space is found, and is
2641 * used to skip rings with no pending transmissions.
2644 for (i = priv->np_qfirst[NR_TX]; i < priv->np_qlast[NR_TX]; i++) {
2647 kring = &na->tx_rings[i];
2650 if (!send_down && !want_tx && ring->cur == kring->nr_hwcur)
2653 if (nm_kr_tryget(kring, 1, &revents))
2656 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2657 netmap_ring_reinit(kring);
2660 if (kring->nm_sync(kring, sync_flags))
2663 nm_sync_finalize(kring);
2667 * If we found new slots, notify potential
2668 * listeners on the same ring.
2669 * Since we just did a txsync, look at the copies
2670 * of cur,tail in the kring.
2672 found = kring->rcur != kring->rtail;
2674 if (found) { /* notify other listeners */
2677 kring->nm_notify(kring, 0);
2680 /* if there were any packet to forward we must have handled them by now */
2682 if (want_tx && retry_tx && sr) {
2683 nm_os_selrecord(sr, check_all_tx ?
2684 &na->si[NR_TX] : &na->tx_rings[priv->np_qfirst[NR_TX]].si);
2691 * If want_rx is still set scan receive rings.
2692 * Do it on all rings because otherwise we starve.
2695 /* two rounds here for race avoidance */
2697 for (i = priv->np_qfirst[NR_RX]; i < priv->np_qlast[NR_RX]; i++) {
2700 kring = &na->rx_rings[i];
2703 if (unlikely(nm_kr_tryget(kring, 1, &revents)))
2706 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2707 netmap_ring_reinit(kring);
2710 /* now we can use kring->rcur, rtail */
2713 * transparent mode support: collect packets from
2714 * hw rxring(s) that have been released by the user
2716 if (nm_may_forward_up(kring)) {
2717 netmap_grab_packets(kring, &q, netmap_fwd);
2720 /* Clear the NR_FORWARD flag anyway, it may be set by
2721 * the nm_sync() below only on for the host RX ring (see
2722 * netmap_rxsync_from_host()). */
2723 kring->nr_kflags &= ~NR_FORWARD;
2724 if (kring->nm_sync(kring, sync_flags))
2727 nm_sync_finalize(kring);
2728 send_down |= (kring->nr_kflags & NR_FORWARD);
2729 ring_timestamp_set(ring);
2730 found = kring->rcur != kring->rtail;
2735 kring->nm_notify(kring, 0);
2739 if (retry_rx && sr) {
2740 nm_os_selrecord(sr, check_all_rx ?
2741 &na->si[NR_RX] : &na->rx_rings[priv->np_qfirst[NR_RX]].si);
2743 if (send_down || retry_rx) {
2746 goto flush_tx; /* and retry_rx */
2753 * Transparent mode: released bufs (i.e. between kring->nr_hwcur and
2754 * ring->head) marked with NS_FORWARD on hw rx rings are passed up
2755 * to the host stack.
2759 netmap_send_up(na->ifp, &q);
2768 /*-------------------- driver support routines -------------------*/
2770 /* default notify callback */
2772 netmap_notify(struct netmap_kring *kring, int flags)
2774 struct netmap_adapter *na = kring->na;
2775 enum txrx t = kring->tx;
2777 nm_os_selwakeup(&kring->si);
2778 /* optimization: avoid a wake up on the global
2779 * queue if nobody has registered for more
2782 if (na->si_users[t] > 0)
2783 nm_os_selwakeup(&na->si[t]);
2785 return NM_IRQ_COMPLETED;
2788 /* called by all routines that create netmap_adapters.
2789 * provide some defaults and get a reference to the
2793 netmap_attach_common(struct netmap_adapter *na)
2795 if (na->num_tx_rings == 0 || na->num_rx_rings == 0) {
2796 D("%s: invalid rings tx %d rx %d",
2797 na->name, na->num_tx_rings, na->num_rx_rings);
2802 if (na->na_flags & NAF_HOST_RINGS && na->ifp) {
2803 na->if_input = na->ifp->if_input; /* for netmap_send_up */
2805 #endif /* __FreeBSD__ */
2806 if (na->nm_krings_create == NULL) {
2807 /* we assume that we have been called by a driver,
2808 * since other port types all provide their own
2811 na->nm_krings_create = netmap_hw_krings_create;
2812 na->nm_krings_delete = netmap_hw_krings_delete;
2814 if (na->nm_notify == NULL)
2815 na->nm_notify = netmap_notify;
2818 if (na->nm_mem == NULL) {
2819 /* use the global allocator */
2820 na->nm_mem = netmap_mem_get(&nm_mem);
2823 if (na->nm_bdg_attach == NULL)
2824 /* no special nm_bdg_attach callback. On VALE
2825 * attach, we need to interpose a bwrap
2827 na->nm_bdg_attach = netmap_bwrap_attach;
2834 /* standard cleanup, called by all destructors */
2836 netmap_detach_common(struct netmap_adapter *na)
2838 if (na->tx_rings) { /* XXX should not happen */
2839 D("freeing leftover tx_rings");
2840 na->nm_krings_delete(na);
2842 netmap_pipe_dealloc(na);
2844 netmap_mem_put(na->nm_mem);
2845 bzero(na, sizeof(*na));
2849 /* Wrapper for the register callback provided netmap-enabled
2851 * nm_iszombie(na) means that the driver module has been
2852 * unloaded, so we cannot call into it.
2853 * nm_os_ifnet_lock() must guarantee mutual exclusion with
2857 netmap_hw_reg(struct netmap_adapter *na, int onoff)
2859 struct netmap_hw_adapter *hwna =
2860 (struct netmap_hw_adapter*)na;
2865 if (nm_iszombie(na)) {
2868 } else if (na != NULL) {
2869 na->na_flags &= ~NAF_NETMAP_ON;
2874 error = hwna->nm_hw_register(na, onoff);
2877 nm_os_ifnet_unlock();
2883 netmap_hw_dtor(struct netmap_adapter *na)
2885 if (nm_iszombie(na) || na->ifp == NULL)
2888 WNA(na->ifp) = NULL;
2893 * Allocate a netmap_adapter object, and initialize it from the
2894 * 'arg' passed by the driver on attach.
2895 * We allocate a block of memory of 'size' bytes, which has room
2896 * for struct netmap_adapter plus additional room private to
2898 * Return 0 on success, ENOMEM otherwise.
2901 netmap_attach_ext(struct netmap_adapter *arg, size_t size)
2903 struct netmap_hw_adapter *hwna = NULL;
2904 struct ifnet *ifp = NULL;
2906 if (size < sizeof(struct netmap_hw_adapter)) {
2907 D("Invalid netmap adapter size %d", (int)size);
2911 if (arg == NULL || arg->ifp == NULL)
2914 hwna = nm_os_malloc(size);
2918 hwna->up.na_flags |= NAF_HOST_RINGS | NAF_NATIVE;
2919 strncpy(hwna->up.name, ifp->if_xname, sizeof(hwna->up.name));
2920 hwna->nm_hw_register = hwna->up.nm_register;
2921 hwna->up.nm_register = netmap_hw_reg;
2922 if (netmap_attach_common(&hwna->up)) {
2926 netmap_adapter_get(&hwna->up);
2928 NM_ATTACH_NA(ifp, &hwna->up);
2931 if (ifp->netdev_ops) {
2932 /* prepare a clone of the netdev ops */
2933 #ifndef NETMAP_LINUX_HAVE_NETDEV_OPS
2934 hwna->nm_ndo.ndo_start_xmit = ifp->netdev_ops;
2936 hwna->nm_ndo = *ifp->netdev_ops;
2937 #endif /* NETMAP_LINUX_HAVE_NETDEV_OPS */
2939 hwna->nm_ndo.ndo_start_xmit = linux_netmap_start_xmit;
2940 if (ifp->ethtool_ops) {
2941 hwna->nm_eto = *ifp->ethtool_ops;
2943 hwna->nm_eto.set_ringparam = linux_netmap_set_ringparam;
2944 #ifdef NETMAP_LINUX_HAVE_SET_CHANNELS
2945 hwna->nm_eto.set_channels = linux_netmap_set_channels;
2946 #endif /* NETMAP_LINUX_HAVE_SET_CHANNELS */
2947 if (arg->nm_config == NULL) {
2948 hwna->up.nm_config = netmap_linux_config;
2951 if (arg->nm_dtor == NULL) {
2952 hwna->up.nm_dtor = netmap_hw_dtor;
2955 if_printf(ifp, "netmap queues/slots: TX %d/%d, RX %d/%d\n",
2956 hwna->up.num_tx_rings, hwna->up.num_tx_desc,
2957 hwna->up.num_rx_rings, hwna->up.num_rx_desc);
2961 D("fail, arg %p ifp %p na %p", arg, ifp, hwna);
2962 return (hwna ? EINVAL : ENOMEM);
2967 netmap_attach(struct netmap_adapter *arg)
2969 return netmap_attach_ext(arg, sizeof(struct netmap_hw_adapter));
2974 NM_DBG(netmap_adapter_get)(struct netmap_adapter *na)
2980 refcount_acquire(&na->na_refcount);
2984 /* returns 1 iff the netmap_adapter is destroyed */
2986 NM_DBG(netmap_adapter_put)(struct netmap_adapter *na)
2991 if (!refcount_release(&na->na_refcount))
2997 netmap_detach_common(na);
3002 /* nm_krings_create callback for all hardware native adapters */
3004 netmap_hw_krings_create(struct netmap_adapter *na)
3006 int ret = netmap_krings_create(na, 0);
3008 /* initialize the mbq for the sw rx ring */
3009 mbq_safe_init(&na->rx_rings[na->num_rx_rings].rx_queue);
3010 ND("initialized sw rx queue %d", na->num_rx_rings);
3018 * Called on module unload by the netmap-enabled drivers
3021 netmap_detach(struct ifnet *ifp)
3023 struct netmap_adapter *na = NA(ifp);
3029 netmap_set_all_rings(na, NM_KR_LOCKED);
3030 na->na_flags |= NAF_ZOMBIE;
3032 * if the netmap adapter is not native, somebody
3033 * changed it, so we can not release it here.
3034 * The NAF_ZOMBIE flag will notify the new owner that
3035 * the driver is gone.
3037 if (na->na_flags & NAF_NATIVE) {
3038 netmap_adapter_put(na);
3040 /* give active users a chance to notice that NAF_ZOMBIE has been
3041 * turned on, so that they can stop and return an error to userspace.
3042 * Note that this becomes a NOP if there are no active users and,
3043 * therefore, the put() above has deleted the na, since now NA(ifp) is
3046 netmap_enable_all_rings(ifp);
3052 * Intercept packets from the network stack and pass them
3053 * to netmap as incoming packets on the 'software' ring.
3055 * We only store packets in a bounded mbq and then copy them
3056 * in the relevant rxsync routine.
3058 * We rely on the OS to make sure that the ifp and na do not go
3059 * away (typically the caller checks for IFF_DRV_RUNNING or the like).
3060 * In nm_register() or whenever there is a reinitialization,
3061 * we make sure to make the mode change visible here.
3064 netmap_transmit(struct ifnet *ifp, struct mbuf *m)
3066 struct netmap_adapter *na = NA(ifp);
3067 struct netmap_kring *kring, *tx_kring;
3068 u_int len = MBUF_LEN(m);
3069 u_int error = ENOBUFS;
3074 kring = &na->rx_rings[na->num_rx_rings];
3075 // XXX [Linux] we do not need this lock
3076 // if we follow the down/configure/up protocol -gl
3077 // mtx_lock(&na->core_lock);
3079 if (!nm_netmap_on(na)) {
3080 D("%s not in netmap mode anymore", na->name);
3086 if (txr >= na->num_tx_rings) {
3087 txr %= na->num_tx_rings;
3089 tx_kring = &NMR(na, NR_TX)[txr];
3091 if (tx_kring->nr_mode == NKR_NETMAP_OFF) {
3092 return MBUF_TRANSMIT(na, ifp, m);
3095 q = &kring->rx_queue;
3097 // XXX reconsider long packets if we handle fragments
3098 if (len > NETMAP_BUF_SIZE(na)) { /* too long for us */
3099 D("%s from_host, drop packet size %d > %d", na->name,
3100 len, NETMAP_BUF_SIZE(na));
3104 if (nm_os_mbuf_has_offld(m)) {
3105 RD(1, "%s drop mbuf that needs offloadings", na->name);
3109 /* protect against netmap_rxsync_from_host(), netmap_sw_to_nic()
3110 * and maybe other instances of netmap_transmit (the latter
3111 * not possible on Linux).
3112 * We enqueue the mbuf only if we are sure there is going to be
3113 * enough room in the host RX ring, otherwise we drop it.
3117 busy = kring->nr_hwtail - kring->nr_hwcur;
3119 busy += kring->nkr_num_slots;
3120 if (busy + mbq_len(q) >= kring->nkr_num_slots - 1) {
3121 RD(2, "%s full hwcur %d hwtail %d qlen %d", na->name,
3122 kring->nr_hwcur, kring->nr_hwtail, mbq_len(q));
3125 ND(2, "%s %d bufs in queue", na->name, mbq_len(q));
3126 /* notify outside the lock */
3135 /* unconditionally wake up listeners */
3136 kring->nm_notify(kring, 0);
3137 /* this is normally netmap_notify(), but for nics
3138 * connected to a bridge it is netmap_bwrap_intr_notify(),
3139 * that possibly forwards the frames through the switch
3147 * netmap_reset() is called by the driver routines when reinitializing
3148 * a ring. The driver is in charge of locking to protect the kring.
3149 * If native netmap mode is not set just return NULL.
3150 * If native netmap mode is set, in particular, we have to set nr_mode to
3153 struct netmap_slot *
3154 netmap_reset(struct netmap_adapter *na, enum txrx tx, u_int n,
3157 struct netmap_kring *kring;
3160 if (!nm_native_on(na)) {
3161 ND("interface not in native netmap mode");
3162 return NULL; /* nothing to reinitialize */
3165 /* XXX note- in the new scheme, we are not guaranteed to be
3166 * under lock (e.g. when called on a device reset).
3167 * In this case, we should set a flag and do not trust too
3168 * much the values. In practice: TODO
3169 * - set a RESET flag somewhere in the kring
3170 * - do the processing in a conservative way
3171 * - let the *sync() fixup at the end.
3174 if (n >= na->num_tx_rings)
3177 kring = na->tx_rings + n;
3179 if (kring->nr_pending_mode == NKR_NETMAP_OFF) {
3180 kring->nr_mode = NKR_NETMAP_OFF;
3184 // XXX check whether we should use hwcur or rcur
3185 new_hwofs = kring->nr_hwcur - new_cur;
3187 if (n >= na->num_rx_rings)
3189 kring = na->rx_rings + n;
3191 if (kring->nr_pending_mode == NKR_NETMAP_OFF) {
3192 kring->nr_mode = NKR_NETMAP_OFF;
3196 new_hwofs = kring->nr_hwtail - new_cur;
3198 lim = kring->nkr_num_slots - 1;
3199 if (new_hwofs > lim)
3200 new_hwofs -= lim + 1;
3202 /* Always set the new offset value and realign the ring. */
3204 D("%s %s%d hwofs %d -> %d, hwtail %d -> %d",
3206 tx == NR_TX ? "TX" : "RX", n,
3207 kring->nkr_hwofs, new_hwofs,
3209 tx == NR_TX ? lim : kring->nr_hwtail);
3210 kring->nkr_hwofs = new_hwofs;
3212 kring->nr_hwtail = kring->nr_hwcur + lim;
3213 if (kring->nr_hwtail > lim)
3214 kring->nr_hwtail -= lim + 1;
3218 /* XXX check that the mappings are correct */
3219 /* need ring_nr, adapter->pdev, direction */
3220 buffer_info->dma = dma_map_single(&pdev->dev, addr, adapter->rx_buffer_len, DMA_FROM_DEVICE);
3221 if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) {
3222 D("error mapping rx netmap buffer %d", i);
3223 // XXX fix error handling
3228 * Wakeup on the individual and global selwait
3229 * We do the wakeup here, but the ring is not yet reconfigured.
3230 * However, we are under lock so there are no races.
3232 kring->nr_mode = NKR_NETMAP_ON;
3233 kring->nm_notify(kring, 0);
3234 return kring->ring->slot;
3239 * Dispatch rx/tx interrupts to the netmap rings.
3241 * "work_done" is non-null on the RX path, NULL for the TX path.
3242 * We rely on the OS to make sure that there is only one active
3243 * instance per queue, and that there is appropriate locking.
3245 * The 'notify' routine depends on what the ring is attached to.
3246 * - for a netmap file descriptor, do a selwakeup on the individual
3247 * waitqueue, plus one on the global one if needed
3248 * (see netmap_notify)
3249 * - for a nic connected to a switch, call the proper forwarding routine
3250 * (see netmap_bwrap_intr_notify)
3253 netmap_common_irq(struct netmap_adapter *na, u_int q, u_int *work_done)
3255 struct netmap_kring *kring;
3256 enum txrx t = (work_done ? NR_RX : NR_TX);
3258 q &= NETMAP_RING_MASK;
3260 if (netmap_verbose) {
3261 RD(5, "received %s queue %d", work_done ? "RX" : "TX" , q);
3264 if (q >= nma_get_nrings(na, t))
3265 return NM_IRQ_PASS; // not a physical queue
3267 kring = NMR(na, t) + q;
3269 if (kring->nr_mode == NKR_NETMAP_OFF) {
3274 kring->nr_kflags |= NKR_PENDINTR; // XXX atomic ?
3275 *work_done = 1; /* do not fire napi again */
3278 return kring->nm_notify(kring, 0);
3283 * Default functions to handle rx/tx interrupts from a physical device.
3284 * "work_done" is non-null on the RX path, NULL for the TX path.
3286 * If the card is not in netmap mode, simply return NM_IRQ_PASS,
3287 * so that the caller proceeds with regular processing.
3288 * Otherwise call netmap_common_irq().
3290 * If the card is connected to a netmap file descriptor,
3291 * do a selwakeup on the individual queue, plus one on the global one
3292 * if needed (multiqueue card _and_ there are multiqueue listeners),
3293 * and return NR_IRQ_COMPLETED.
3295 * Finally, if called on rx from an interface connected to a switch,
3296 * calls the proper forwarding routine.
3299 netmap_rx_irq(struct ifnet *ifp, u_int q, u_int *work_done)
3301 struct netmap_adapter *na = NA(ifp);
3304 * XXX emulated netmap mode sets NAF_SKIP_INTR so
3305 * we still use the regular driver even though the previous
3306 * check fails. It is unclear whether we should use
3307 * nm_native_on() here.
3309 if (!nm_netmap_on(na))
3312 if (na->na_flags & NAF_SKIP_INTR) {
3313 ND("use regular interrupt");
3317 return netmap_common_irq(na, q, work_done);
3322 * Module loader and unloader
3324 * netmap_init() creates the /dev/netmap device and initializes
3325 * all global variables. Returns 0 on success, errno on failure
3326 * (but there is no chance)
3328 * netmap_fini() destroys everything.
3331 static struct cdev *netmap_dev; /* /dev/netmap character device. */
3332 extern struct cdevsw netmap_cdevsw;
3339 destroy_dev(netmap_dev);
3340 /* we assume that there are no longer netmap users */
3342 netmap_uninit_bridges();
3345 nm_prinf("netmap: unloaded module.\n");
3356 error = netmap_mem_init();
3360 * MAKEDEV_ETERNAL_KLD avoids an expensive check on syscalls
3361 * when the module is compiled in.
3362 * XXX could use make_dev_credv() to get error number
3364 netmap_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD,
3365 &netmap_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0600,
3370 error = netmap_init_bridges();
3375 nm_os_vi_init_index();
3378 error = nm_os_ifnet_init();
3382 nm_prinf("netmap: loaded module\n");
3386 return (EINVAL); /* may be incorrect */