2 * Copyright (C) 2011-2014 Matteo Landi
3 * Copyright (C) 2011-2016 Luigi Rizzo
4 * Copyright (C) 2011-2016 Giuseppe Lettieri
5 * Copyright (C) 2011-2016 Vincenzo Maffione
8 * Redistribution and use in source and binary forms, with or without
9 * modification, are permitted provided that the following conditions
11 * 1. Redistributions of source code must retain the above copyright
12 * notice, this list of conditions and the following disclaimer.
13 * 2. Redistributions in binary form must reproduce the above copyright
14 * notice, this list of conditions and the following disclaimer in the
15 * documentation and/or other materials provided with the distribution.
17 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
18 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20 * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
21 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
34 * This module supports memory mapped access to network devices,
37 * The module uses a large, memory pool allocated by the kernel
38 * and accessible as mmapped memory by multiple userspace threads/processes.
39 * The memory pool contains packet buffers and "netmap rings",
40 * i.e. user-accessible copies of the interface's queues.
42 * Access to the network card works like this:
43 * 1. a process/thread issues one or more open() on /dev/netmap, to create
44 * select()able file descriptor on which events are reported.
45 * 2. on each descriptor, the process issues an ioctl() to identify
46 * the interface that should report events to the file descriptor.
47 * 3. on each descriptor, the process issues an mmap() request to
48 * map the shared memory region within the process' address space.
49 * The list of interesting queues is indicated by a location in
50 * the shared memory region.
51 * 4. using the functions in the netmap(4) userspace API, a process
52 * can look up the occupation state of a queue, access memory buffers,
53 * and retrieve received packets or enqueue packets to transmit.
54 * 5. using some ioctl()s the process can synchronize the userspace view
55 * of the queue with the actual status in the kernel. This includes both
56 * receiving the notification of new packets, and transmitting new
57 * packets on the output interface.
58 * 6. select() or poll() can be used to wait for events on individual
59 * transmit or receive queues (or all queues for a given interface).
62 SYNCHRONIZATION (USER)
64 The netmap rings and data structures may be shared among multiple
65 user threads or even independent processes.
66 Any synchronization among those threads/processes is delegated
67 to the threads themselves. Only one thread at a time can be in
68 a system call on the same netmap ring. The OS does not enforce
69 this and only guarantees against system crashes in case of
74 Within the kernel, access to the netmap rings is protected as follows:
76 - a spinlock on each ring, to handle producer/consumer races on
77 RX rings attached to the host stack (against multiple host
78 threads writing from the host stack to the same ring),
79 and on 'destination' rings attached to a VALE switch
80 (i.e. RX rings in VALE ports, and TX rings in NIC/host ports)
81 protecting multiple active senders for the same destination)
83 - an atomic variable to guarantee that there is at most one
84 instance of *_*xsync() on the ring at any time.
85 For rings connected to user file
86 descriptors, an atomic_test_and_set() protects this, and the
87 lock on the ring is not actually used.
88 For NIC RX rings connected to a VALE switch, an atomic_test_and_set()
89 is also used to prevent multiple executions (the driver might indeed
90 already guarantee this).
91 For NIC TX rings connected to a VALE switch, the lock arbitrates
92 access to the queue (both when allocating buffers and when pushing
95 - *xsync() should be protected against initializations of the card.
96 On FreeBSD most devices have the reset routine protected by
97 a RING lock (ixgbe, igb, em) or core lock (re). lem is missing
98 the RING protection on rx_reset(), this should be added.
100 On linux there is an external lock on the tx path, which probably
101 also arbitrates access to the reset routine. XXX to be revised
103 - a per-interface core_lock protecting access from the host stack
104 while interfaces may be detached from netmap mode.
105 XXX there should be no need for this lock if we detach the interfaces
106 only while they are down.
111 NMG_LOCK() serializes all modifications to switches and ports.
112 A switch cannot be deleted until all ports are gone.
114 For each switch, an SX lock (RWlock on linux) protects
115 deletion of ports. When configuring or deleting a new port, the
116 lock is acquired in exclusive mode (after holding NMG_LOCK).
117 When forwarding, the lock is acquired in shared mode (without NMG_LOCK).
118 The lock is held throughout the entire forwarding cycle,
119 during which the thread may incur in a page fault.
120 Hence it is important that sleepable shared locks are used.
122 On the rx ring, the per-port lock is grabbed initially to reserve
123 a number of slot in the ring, then the lock is released,
124 packets are copied from source to destination, and then
125 the lock is acquired again and the receive ring is updated.
126 (A similar thing is done on the tx ring for NIC and host stack
127 ports attached to the switch)
132 /* --- internals ----
134 * Roadmap to the code that implements the above.
136 * > 1. a process/thread issues one or more open() on /dev/netmap, to create
137 * > select()able file descriptor on which events are reported.
139 * Internally, we allocate a netmap_priv_d structure, that will be
140 * initialized on ioctl(NIOCREGIF). There is one netmap_priv_d
141 * structure for each open().
144 * FreeBSD: see netmap_open() (netmap_freebsd.c)
145 * linux: see linux_netmap_open() (netmap_linux.c)
147 * > 2. on each descriptor, the process issues an ioctl() to identify
148 * > the interface that should report events to the file descriptor.
150 * Implemented by netmap_ioctl(), NIOCREGIF case, with nmr->nr_cmd==0.
151 * Most important things happen in netmap_get_na() and
152 * netmap_do_regif(), called from there. Additional details can be
153 * found in the comments above those functions.
155 * In all cases, this action creates/takes-a-reference-to a
156 * netmap_*_adapter describing the port, and allocates a netmap_if
157 * and all necessary netmap rings, filling them with netmap buffers.
159 * In this phase, the sync callbacks for each ring are set (these are used
160 * in steps 5 and 6 below). The callbacks depend on the type of adapter.
161 * The adapter creation/initialization code puts them in the
162 * netmap_adapter (fields na->nm_txsync and na->nm_rxsync). Then, they
163 * are copied from there to the netmap_kring's during netmap_do_regif(), by
164 * the nm_krings_create() callback. All the nm_krings_create callbacks
165 * actually call netmap_krings_create() to perform this and the other
166 * common stuff. netmap_krings_create() also takes care of the host rings,
167 * if needed, by setting their sync callbacks appropriately.
169 * Additional actions depend on the kind of netmap_adapter that has been
172 * - netmap_hw_adapter: [netmap.c]
173 * This is a system netdev/ifp with native netmap support.
174 * The ifp is detached from the host stack by redirecting:
175 * - transmissions (from the network stack) to netmap_transmit()
176 * - receive notifications to the nm_notify() callback for
177 * this adapter. The callback is normally netmap_notify(), unless
178 * the ifp is attached to a bridge using bwrap, in which case it
179 * is netmap_bwrap_intr_notify().
181 * - netmap_generic_adapter: [netmap_generic.c]
182 * A system netdev/ifp without native netmap support.
184 * (the decision about native/non native support is taken in
185 * netmap_get_hw_na(), called by netmap_get_na())
187 * - netmap_vp_adapter [netmap_vale.c]
188 * Returned by netmap_get_bdg_na().
189 * This is a persistent or ephemeral VALE port. Ephemeral ports
190 * are created on the fly if they don't already exist, and are
191 * always attached to a bridge.
192 * Persistent VALE ports must must be created separately, and i
193 * then attached like normal NICs. The NIOCREGIF we are examining
194 * will find them only if they had previosly been created and
195 * attached (see VALE_CTL below).
197 * - netmap_pipe_adapter [netmap_pipe.c]
198 * Returned by netmap_get_pipe_na().
199 * Both pipe ends are created, if they didn't already exist.
201 * - netmap_monitor_adapter [netmap_monitor.c]
202 * Returned by netmap_get_monitor_na().
203 * If successful, the nm_sync callbacks of the monitored adapter
204 * will be intercepted by the returned monitor.
206 * - netmap_bwrap_adapter [netmap_vale.c]
207 * Cannot be obtained in this way, see VALE_CTL below
211 * linux: we first go through linux_netmap_ioctl() to
212 * adapt the FreeBSD interface to the linux one.
215 * > 3. on each descriptor, the process issues an mmap() request to
216 * > map the shared memory region within the process' address space.
217 * > The list of interesting queues is indicated by a location in
218 * > the shared memory region.
221 * FreeBSD: netmap_mmap_single (netmap_freebsd.c).
222 * linux: linux_netmap_mmap (netmap_linux.c).
224 * > 4. using the functions in the netmap(4) userspace API, a process
225 * > can look up the occupation state of a queue, access memory buffers,
226 * > and retrieve received packets or enqueue packets to transmit.
228 * these actions do not involve the kernel.
230 * > 5. using some ioctl()s the process can synchronize the userspace view
231 * > of the queue with the actual status in the kernel. This includes both
232 * > receiving the notification of new packets, and transmitting new
233 * > packets on the output interface.
235 * These are implemented in netmap_ioctl(), NIOCTXSYNC and NIOCRXSYNC
236 * cases. They invoke the nm_sync callbacks on the netmap_kring
237 * structures, as initialized in step 2 and maybe later modified
238 * by a monitor. Monitors, however, will always call the original
239 * callback before doing anything else.
242 * > 6. select() or poll() can be used to wait for events on individual
243 * > transmit or receive queues (or all queues for a given interface).
245 * Implemented in netmap_poll(). This will call the same nm_sync()
246 * callbacks as in step 5 above.
249 * linux: we first go through linux_netmap_poll() to adapt
250 * the FreeBSD interface to the linux one.
253 * ---- VALE_CTL -----
255 * VALE switches are controlled by issuing a NIOCREGIF with a non-null
256 * nr_cmd in the nmreq structure. These subcommands are handled by
257 * netmap_bdg_ctl() in netmap_vale.c. Persistent VALE ports are created
258 * and destroyed by issuing the NETMAP_BDG_NEWIF and NETMAP_BDG_DELIF
259 * subcommands, respectively.
261 * Any network interface known to the system (including a persistent VALE
262 * port) can be attached to a VALE switch by issuing the
263 * NETMAP_BDG_ATTACH subcommand. After the attachment, persistent VALE ports
264 * look exactly like ephemeral VALE ports (as created in step 2 above). The
265 * attachment of other interfaces, instead, requires the creation of a
266 * netmap_bwrap_adapter. Moreover, the attached interface must be put in
267 * netmap mode. This may require the creation of a netmap_generic_adapter if
268 * we have no native support for the interface, or if generic adapters have
269 * been forced by sysctl.
271 * Both persistent VALE ports and bwraps are handled by netmap_get_bdg_na(),
272 * called by nm_bdg_ctl_attach(), and discriminated by the nm_bdg_attach()
273 * callback. In the case of the bwrap, the callback creates the
274 * netmap_bwrap_adapter. The initialization of the bwrap is then
275 * completed by calling netmap_do_regif() on it, in the nm_bdg_ctl()
276 * callback (netmap_bwrap_bdg_ctl in netmap_vale.c).
277 * A generic adapter for the wrapped ifp will be created if needed, when
278 * netmap_get_bdg_na() calls netmap_get_hw_na().
281 * ---- DATAPATHS -----
283 * -= SYSTEM DEVICE WITH NATIVE SUPPORT =-
285 * na == NA(ifp) == netmap_hw_adapter created in DEVICE_netmap_attach()
287 * - tx from netmap userspace:
289 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context
290 * kring->nm_sync() == DEVICE_netmap_txsync()
291 * 2) device interrupt handler
292 * na->nm_notify() == netmap_notify()
293 * - rx from netmap userspace:
295 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context
296 * kring->nm_sync() == DEVICE_netmap_rxsync()
297 * 2) device interrupt handler
298 * na->nm_notify() == netmap_notify()
299 * - rx from host stack
303 * na->nm_notify == netmap_notify()
304 * 2) ioctl(NIOCRXSYNC)/netmap_poll() in process context
305 * kring->nm_sync() == netmap_rxsync_from_host
306 * netmap_rxsync_from_host(na, NULL, NULL)
308 * ioctl(NIOCTXSYNC)/netmap_poll() in process context
309 * kring->nm_sync() == netmap_txsync_to_host
310 * netmap_txsync_to_host(na)
312 * FreeBSD: na->if_input() == ether_input()
313 * linux: netif_rx() with NM_MAGIC_PRIORITY_RX
316 * -= SYSTEM DEVICE WITH GENERIC SUPPORT =-
318 * na == NA(ifp) == generic_netmap_adapter created in generic_netmap_attach()
320 * - tx from netmap userspace:
322 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context
323 * kring->nm_sync() == generic_netmap_txsync()
324 * nm_os_generic_xmit_frame()
325 * linux: dev_queue_xmit() with NM_MAGIC_PRIORITY_TX
326 * ifp->ndo_start_xmit == generic_ndo_start_xmit()
327 * gna->save_start_xmit == orig. dev. start_xmit
328 * FreeBSD: na->if_transmit() == orig. dev if_transmit
329 * 2) generic_mbuf_destructor()
330 * na->nm_notify() == netmap_notify()
331 * - rx from netmap userspace:
332 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context
333 * kring->nm_sync() == generic_netmap_rxsync()
336 * generic_rx_handler()
338 * na->nm_notify() == netmap_notify()
339 * - rx from host stack
340 * FreeBSD: same as native
341 * Linux: same as native except:
343 * dev_queue_xmit() without NM_MAGIC_PRIORITY_TX
344 * ifp->ndo_start_xmit == generic_ndo_start_xmit()
346 * na->nm_notify() == netmap_notify()
347 * - tx to host stack (same as native):
355 * ioctl(NIOCTXSYNC)/netmap_poll() in process context
356 * kring->nm_sync() == netmap_vp_txsync()
358 * - system device with native support:
361 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring)
362 * kring->nm_sync() == DEVICE_netmap_rxsync()
364 * kring->nm_sync() == DEVICE_netmap_rxsync()
367 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring)
368 * kring->nm_sync() == netmap_rxsync_from_host()
371 * - system device with generic support:
372 * from device driver:
373 * generic_rx_handler()
374 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring)
375 * kring->nm_sync() == generic_netmap_rxsync()
377 * kring->nm_sync() == generic_netmap_rxsync()
380 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring)
381 * kring->nm_sync() == netmap_rxsync_from_host()
384 * (all cases) --> nm_bdg_flush()
385 * dest_na->nm_notify() == (see below)
391 * 1) ioctlNIOCRXSYNC)/netmap_poll() in process context
392 * kring->nm_sync() == netmap_vp_rxsync()
393 * 2) from nm_bdg_flush()
394 * na->nm_notify() == netmap_notify()
396 * - system device with native support:
398 * na->nm_notify() == netmap_bwrap_notify()
400 * kring->nm_sync() == DEVICE_netmap_txsync()
404 * kring->nm_sync() == netmap_txsync_to_host
405 * netmap_vp_rxsync_locked()
407 * - system device with generic adapter:
409 * na->nm_notify() == netmap_bwrap_notify()
411 * kring->nm_sync() == generic_netmap_txsync()
415 * kring->nm_sync() == netmap_txsync_to_host
421 * OS-specific code that is used only within this file.
422 * Other OS-specific code that must be accessed by drivers
423 * is present in netmap_kern.h
426 #if defined(__FreeBSD__)
427 #include <sys/cdefs.h> /* prerequisite */
428 #include <sys/types.h>
429 #include <sys/errno.h>
430 #include <sys/param.h> /* defines used in kernel.h */
431 #include <sys/kernel.h> /* types used in module initialization */
432 #include <sys/conf.h> /* cdevsw struct, UID, GID */
433 #include <sys/filio.h> /* FIONBIO */
434 #include <sys/sockio.h>
435 #include <sys/socketvar.h> /* struct socket */
436 #include <sys/malloc.h>
437 #include <sys/poll.h>
438 #include <sys/rwlock.h>
439 #include <sys/socket.h> /* sockaddrs */
440 #include <sys/selinfo.h>
441 #include <sys/sysctl.h>
442 #include <sys/jail.h>
443 #include <net/vnet.h>
445 #include <net/if_var.h>
446 #include <net/bpf.h> /* BIOCIMMEDIATE */
447 #include <machine/bus.h> /* bus_dmamap_* */
448 #include <sys/endian.h>
449 #include <sys/refcount.h>
454 #include "bsd_glue.h"
456 #elif defined(__APPLE__)
458 #warning OSX support is only partial
459 #include "osx_glue.h"
461 #elif defined (_WIN32)
463 #include "win_glue.h"
467 #error Unsupported platform
469 #endif /* unsupported */
474 #include <net/netmap.h>
475 #include <dev/netmap/netmap_kern.h>
476 #include <dev/netmap/netmap_mem2.h>
479 /* user-controlled variables */
482 static int netmap_no_timestamp; /* don't timestamp on rxsync */
483 int netmap_mitigate = 1;
484 int netmap_no_pendintr = 1;
485 int netmap_txsync_retry = 2;
486 int netmap_flags = 0; /* debug flags */
487 static int netmap_fwd = 0; /* force transparent mode */
490 * netmap_admode selects the netmap mode to use.
491 * Invalid values are reset to NETMAP_ADMODE_BEST
493 enum { NETMAP_ADMODE_BEST = 0, /* use native, fallback to generic */
494 NETMAP_ADMODE_NATIVE, /* either native or none */
495 NETMAP_ADMODE_GENERIC, /* force generic */
496 NETMAP_ADMODE_LAST };
497 static int netmap_admode = NETMAP_ADMODE_BEST;
499 /* netmap_generic_mit controls mitigation of RX notifications for
500 * the generic netmap adapter. The value is a time interval in
502 int netmap_generic_mit = 100*1000;
504 /* We use by default netmap-aware qdiscs with generic netmap adapters,
505 * even if there can be a little performance hit with hardware NICs.
506 * However, using the qdisc is the safer approach, for two reasons:
507 * 1) it prevents non-fifo qdiscs to break the TX notification
508 * scheme, which is based on mbuf destructors when txqdisc is
510 * 2) it makes it possible to transmit over software devices that
511 * change skb->dev, like bridge, veth, ...
513 * Anyway users looking for the best performance should
514 * use native adapters.
516 int netmap_generic_txqdisc = 1;
518 /* Default number of slots and queues for generic adapters. */
519 int netmap_generic_ringsize = 1024;
520 int netmap_generic_rings = 1;
522 /* Non-zero if ptnet devices are allowed to use virtio-net headers. */
523 int ptnet_vnet_hdr = 1;
526 * SYSCTL calls are grouped between SYSBEGIN and SYSEND to be emulated
527 * in some other operating systems
531 SYSCTL_DECL(_dev_netmap);
532 SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args");
533 SYSCTL_INT(_dev_netmap, OID_AUTO, verbose,
534 CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode");
535 SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp,
536 CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp");
537 SYSCTL_INT(_dev_netmap, OID_AUTO, mitigate, CTLFLAG_RW, &netmap_mitigate, 0, "");
538 SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr,
539 CTLFLAG_RW, &netmap_no_pendintr, 0, "Always look for new received packets.");
540 SYSCTL_INT(_dev_netmap, OID_AUTO, txsync_retry, CTLFLAG_RW,
541 &netmap_txsync_retry, 0 , "Number of txsync loops in bridge's flush.");
543 SYSCTL_INT(_dev_netmap, OID_AUTO, flags, CTLFLAG_RW, &netmap_flags, 0 , "");
544 SYSCTL_INT(_dev_netmap, OID_AUTO, fwd, CTLFLAG_RW, &netmap_fwd, 0 , "");
545 SYSCTL_INT(_dev_netmap, OID_AUTO, admode, CTLFLAG_RW, &netmap_admode, 0 , "");
546 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_mit, CTLFLAG_RW, &netmap_generic_mit, 0 , "");
547 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_ringsize, CTLFLAG_RW, &netmap_generic_ringsize, 0 , "");
548 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_rings, CTLFLAG_RW, &netmap_generic_rings, 0 , "");
549 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_txqdisc, CTLFLAG_RW, &netmap_generic_txqdisc, 0 , "");
550 SYSCTL_INT(_dev_netmap, OID_AUTO, ptnet_vnet_hdr, CTLFLAG_RW, &ptnet_vnet_hdr, 0 , "");
554 NMG_LOCK_T netmap_global_lock;
557 * mark the ring as stopped, and run through the locks
558 * to make sure other users get to see it.
559 * stopped must be either NR_KR_STOPPED (for unbounded stop)
560 * of NR_KR_LOCKED (brief stop for mutual exclusion purposes)
563 netmap_disable_ring(struct netmap_kring *kr, int stopped)
565 nm_kr_stop(kr, stopped);
566 // XXX check if nm_kr_stop is sufficient
567 mtx_lock(&kr->q_lock);
568 mtx_unlock(&kr->q_lock);
572 /* stop or enable a single ring */
574 netmap_set_ring(struct netmap_adapter *na, u_int ring_id, enum txrx t, int stopped)
577 netmap_disable_ring(NMR(na, t) + ring_id, stopped);
579 NMR(na, t)[ring_id].nkr_stopped = 0;
583 /* stop or enable all the rings of na */
585 netmap_set_all_rings(struct netmap_adapter *na, int stopped)
590 if (!nm_netmap_on(na))
594 for (i = 0; i < netmap_real_rings(na, t); i++) {
595 netmap_set_ring(na, i, t, stopped);
601 * Convenience function used in drivers. Waits for current txsync()s/rxsync()s
602 * to finish and prevents any new one from starting. Call this before turning
603 * netmap mode off, or before removing the hardware rings (e.g., on module
607 netmap_disable_all_rings(struct ifnet *ifp)
609 if (NM_NA_VALID(ifp)) {
610 netmap_set_all_rings(NA(ifp), NM_KR_STOPPED);
615 * Convenience function used in drivers. Re-enables rxsync and txsync on the
616 * adapter's rings In linux drivers, this should be placed near each
620 netmap_enable_all_rings(struct ifnet *ifp)
622 if (NM_NA_VALID(ifp)) {
623 netmap_set_all_rings(NA(ifp), 0 /* enabled */);
628 netmap_make_zombie(struct ifnet *ifp)
630 if (NM_NA_VALID(ifp)) {
631 struct netmap_adapter *na = NA(ifp);
632 netmap_set_all_rings(na, NM_KR_LOCKED);
633 na->na_flags |= NAF_ZOMBIE;
634 netmap_set_all_rings(na, 0);
639 netmap_undo_zombie(struct ifnet *ifp)
641 if (NM_NA_VALID(ifp)) {
642 struct netmap_adapter *na = NA(ifp);
643 if (na->na_flags & NAF_ZOMBIE) {
644 netmap_set_all_rings(na, NM_KR_LOCKED);
645 na->na_flags &= ~NAF_ZOMBIE;
646 netmap_set_all_rings(na, 0);
652 * generic bound_checking function
655 nm_bound_var(u_int *v, u_int dflt, u_int lo, u_int hi, const char *msg)
658 const char *op = NULL;
667 } else if (oldv > hi) {
672 printf("%s %s to %d (was %d)\n", op, msg, *v, oldv);
678 * packet-dump function, user-supplied or static buffer.
679 * The destination buffer must be at least 30+4*len
682 nm_dump_buf(char *p, int len, int lim, char *dst)
684 static char _dst[8192];
686 static char hex[] ="0123456789abcdef";
687 char *o; /* output position */
689 #define P_HI(x) hex[((x) & 0xf0)>>4]
690 #define P_LO(x) hex[((x) & 0xf)]
691 #define P_C(x) ((x) >= 0x20 && (x) <= 0x7e ? (x) : '.')
694 if (lim <= 0 || lim > len)
697 sprintf(o, "buf 0x%p len %d lim %d\n", p, len, lim);
699 /* hexdump routine */
700 for (i = 0; i < lim; ) {
701 sprintf(o, "%5d: ", i);
705 for (j=0; j < 16 && i < lim; i++, j++) {
707 o[j*3+1] = P_LO(p[i]);
710 for (j=0; j < 16 && i < lim; i++, j++)
711 o[j + 48] = P_C(p[i]);
724 * Fetch configuration from the device, to cope with dynamic
725 * reconfigurations after loading the module.
727 /* call with NMG_LOCK held */
729 netmap_update_config(struct netmap_adapter *na)
731 u_int txr, txd, rxr, rxd;
733 txr = txd = rxr = rxd = 0;
734 if (na->nm_config == NULL ||
735 na->nm_config(na, &txr, &txd, &rxr, &rxd))
737 /* take whatever we had at init time */
738 txr = na->num_tx_rings;
739 txd = na->num_tx_desc;
740 rxr = na->num_rx_rings;
741 rxd = na->num_rx_desc;
744 if (na->num_tx_rings == txr && na->num_tx_desc == txd &&
745 na->num_rx_rings == rxr && na->num_rx_desc == rxd)
746 return 0; /* nothing changed */
747 if (netmap_verbose || na->active_fds > 0) {
748 D("stored config %s: txring %d x %d, rxring %d x %d",
750 na->num_tx_rings, na->num_tx_desc,
751 na->num_rx_rings, na->num_rx_desc);
752 D("new config %s: txring %d x %d, rxring %d x %d",
753 na->name, txr, txd, rxr, rxd);
755 if (na->active_fds == 0) {
756 D("configuration changed (but fine)");
757 na->num_tx_rings = txr;
758 na->num_tx_desc = txd;
759 na->num_rx_rings = rxr;
760 na->num_rx_desc = rxd;
763 D("configuration changed while active, this is bad...");
767 /* nm_sync callbacks for the host rings */
768 static int netmap_txsync_to_host(struct netmap_kring *kring, int flags);
769 static int netmap_rxsync_from_host(struct netmap_kring *kring, int flags);
771 /* create the krings array and initialize the fields common to all adapters.
772 * The array layout is this:
775 * na->tx_rings ----->| | \
776 * | | } na->num_tx_ring
780 * na->rx_rings ----> +----------+
782 * | | } na->num_rx_rings
787 * na->tailroom ----->| | \
788 * | | } tailroom bytes
792 * Note: for compatibility, host krings are created even when not needed.
793 * The tailroom space is currently used by vale ports for allocating leases.
795 /* call with NMG_LOCK held */
797 netmap_krings_create(struct netmap_adapter *na, u_int tailroom)
800 struct netmap_kring *kring;
804 /* account for the (possibly fake) host rings */
805 n[NR_TX] = na->num_tx_rings + 1;
806 n[NR_RX] = na->num_rx_rings + 1;
808 len = (n[NR_TX] + n[NR_RX]) * sizeof(struct netmap_kring) + tailroom;
810 na->tx_rings = malloc((size_t)len, M_DEVBUF, M_NOWAIT | M_ZERO);
811 if (na->tx_rings == NULL) {
812 D("Cannot allocate krings");
815 na->rx_rings = na->tx_rings + n[NR_TX];
818 * All fields in krings are 0 except the one initialized below.
819 * but better be explicit on important kring fields.
822 ndesc = nma_get_ndesc(na, t);
823 for (i = 0; i < n[t]; i++) {
824 kring = &NMR(na, t)[i];
825 bzero(kring, sizeof(*kring));
829 kring->nkr_num_slots = ndesc;
830 kring->nr_mode = NKR_NETMAP_OFF;
831 kring->nr_pending_mode = NKR_NETMAP_OFF;
832 if (i < nma_get_nrings(na, t)) {
833 kring->nm_sync = (t == NR_TX ? na->nm_txsync : na->nm_rxsync);
835 kring->nm_sync = (t == NR_TX ?
836 netmap_txsync_to_host:
837 netmap_rxsync_from_host);
839 kring->nm_notify = na->nm_notify;
840 kring->rhead = kring->rcur = kring->nr_hwcur = 0;
842 * IMPORTANT: Always keep one slot empty.
844 kring->rtail = kring->nr_hwtail = (t == NR_TX ? ndesc - 1 : 0);
845 snprintf(kring->name, sizeof(kring->name) - 1, "%s %s%d", na->name,
847 ND("ktx %s h %d c %d t %d",
848 kring->name, kring->rhead, kring->rcur, kring->rtail);
849 mtx_init(&kring->q_lock, (t == NR_TX ? "nm_txq_lock" : "nm_rxq_lock"), NULL, MTX_DEF);
850 nm_os_selinfo_init(&kring->si);
852 nm_os_selinfo_init(&na->si[t]);
855 na->tailroom = na->rx_rings + n[NR_RX];
861 /* undo the actions performed by netmap_krings_create */
862 /* call with NMG_LOCK held */
864 netmap_krings_delete(struct netmap_adapter *na)
866 struct netmap_kring *kring = na->tx_rings;
870 nm_os_selinfo_uninit(&na->si[t]);
872 /* we rely on the krings layout described above */
873 for ( ; kring != na->tailroom; kring++) {
874 mtx_destroy(&kring->q_lock);
875 nm_os_selinfo_uninit(&kring->si);
877 free(na->tx_rings, M_DEVBUF);
878 na->tx_rings = na->rx_rings = na->tailroom = NULL;
883 * Destructor for NIC ports. They also have an mbuf queue
884 * on the rings connected to the host so we need to purge
887 /* call with NMG_LOCK held */
889 netmap_hw_krings_delete(struct netmap_adapter *na)
891 struct mbq *q = &na->rx_rings[na->num_rx_rings].rx_queue;
893 ND("destroy sw mbq with len %d", mbq_len(q));
896 netmap_krings_delete(na);
902 * Undo everything that was done in netmap_do_regif(). In particular,
903 * call nm_register(ifp,0) to stop netmap mode on the interface and
904 * revert to normal operation.
906 /* call with NMG_LOCK held */
907 static void netmap_unset_ringid(struct netmap_priv_d *);
908 static void netmap_krings_put(struct netmap_priv_d *);
910 netmap_do_unregif(struct netmap_priv_d *priv)
912 struct netmap_adapter *na = priv->np_na;
916 /* unset nr_pending_mode and possibly release exclusive mode */
917 netmap_krings_put(priv);
920 /* XXX check whether we have to do something with monitor
921 * when rings change nr_mode. */
922 if (na->active_fds <= 0) {
923 /* walk through all the rings and tell any monitor
924 * that the port is going to exit netmap mode
926 netmap_monitor_stop(na);
930 if (na->active_fds <= 0 || nm_kring_pending(priv)) {
931 na->nm_register(na, 0);
934 /* delete rings and buffers that are no longer needed */
935 netmap_mem_rings_delete(na);
937 if (na->active_fds <= 0) { /* last instance */
939 * (TO CHECK) We enter here
940 * when the last reference to this file descriptor goes
941 * away. This means we cannot have any pending poll()
942 * or interrupt routine operating on the structure.
943 * XXX The file may be closed in a thread while
944 * another thread is using it.
945 * Linux keeps the file opened until the last reference
946 * by any outstanding ioctl/poll or mmap is gone.
947 * FreeBSD does not track mmap()s (but we do) and
948 * wakes up any sleeping poll(). Need to check what
949 * happens if the close() occurs while a concurrent
950 * syscall is running.
953 D("deleting last instance for %s", na->name);
955 if (nm_netmap_on(na)) {
956 D("BUG: netmap on while going to delete the krings");
959 na->nm_krings_delete(na);
962 /* possibily decrement counter of tx_si/rx_si users */
963 netmap_unset_ringid(priv);
964 /* delete the nifp */
965 netmap_mem_if_delete(na, priv->np_nifp);
966 /* drop the allocator */
967 netmap_mem_deref(na->nm_mem, na);
968 /* mark the priv as unregistered */
970 priv->np_nifp = NULL;
973 /* call with NMG_LOCK held */
975 nm_si_user(struct netmap_priv_d *priv, enum txrx t)
977 return (priv->np_na != NULL &&
978 (priv->np_qlast[t] - priv->np_qfirst[t] > 1));
981 struct netmap_priv_d*
982 netmap_priv_new(void)
984 struct netmap_priv_d *priv;
986 priv = malloc(sizeof(struct netmap_priv_d), M_DEVBUF,
996 * Destructor of the netmap_priv_d, called when the fd is closed
997 * Action: undo all the things done by NIOCREGIF,
998 * On FreeBSD we need to track whether there are active mmap()s,
999 * and we use np_active_mmaps for that. On linux, the field is always 0.
1000 * Return: 1 if we can free priv, 0 otherwise.
1003 /* call with NMG_LOCK held */
1005 netmap_priv_delete(struct netmap_priv_d *priv)
1007 struct netmap_adapter *na = priv->np_na;
1009 /* number of active references to this fd */
1010 if (--priv->np_refs > 0) {
1015 netmap_do_unregif(priv);
1017 netmap_unget_na(na, priv->np_ifp);
1018 bzero(priv, sizeof(*priv)); /* for safety */
1019 free(priv, M_DEVBUF);
1023 /* call with NMG_LOCK *not* held */
1025 netmap_dtor(void *data)
1027 struct netmap_priv_d *priv = data;
1030 netmap_priv_delete(priv);
1038 * Handlers for synchronization of the queues from/to the host.
1039 * Netmap has two operating modes:
1040 * - in the default mode, the rings connected to the host stack are
1041 * just another ring pair managed by userspace;
1042 * - in transparent mode (XXX to be defined) incoming packets
1043 * (from the host or the NIC) are marked as NS_FORWARD upon
1044 * arrival, and the user application has a chance to reset the
1045 * flag for packets that should be dropped.
1046 * On the RXSYNC or poll(), packets in RX rings between
1047 * kring->nr_kcur and ring->cur with NS_FORWARD still set are moved
1048 * to the other side.
1049 * The transfer NIC --> host is relatively easy, just encapsulate
1050 * into mbufs and we are done. The host --> NIC side is slightly
1051 * harder because there might not be room in the tx ring so it
1052 * might take a while before releasing the buffer.
1057 * pass a chain of buffers to the host stack as coming from 'dst'
1058 * We do not need to lock because the queue is private.
1061 netmap_send_up(struct ifnet *dst, struct mbq *q)
1064 struct mbuf *head = NULL, *prev = NULL;
1066 /* send packets up, outside the lock */
1067 while ((m = mbq_dequeue(q)) != NULL) {
1068 if (netmap_verbose & NM_VERB_HOST)
1069 D("sending up pkt %p size %d", m, MBUF_LEN(m));
1070 prev = nm_os_send_up(dst, m, prev);
1075 nm_os_send_up(dst, NULL, head);
1081 * put a copy of the buffers marked NS_FORWARD into an mbuf chain.
1082 * Take packets from hwcur to ring->head marked NS_FORWARD (or forced)
1083 * and pass them up. Drop remaining packets in the unlikely event
1084 * of an mbuf shortage.
1087 netmap_grab_packets(struct netmap_kring *kring, struct mbq *q, int force)
1089 u_int const lim = kring->nkr_num_slots - 1;
1090 u_int const head = kring->rhead;
1092 struct netmap_adapter *na = kring->na;
1094 for (n = kring->nr_hwcur; n != head; n = nm_next(n, lim)) {
1096 struct netmap_slot *slot = &kring->ring->slot[n];
1098 if ((slot->flags & NS_FORWARD) == 0 && !force)
1100 if (slot->len < 14 || slot->len > NETMAP_BUF_SIZE(na)) {
1101 RD(5, "bad pkt at %d len %d", n, slot->len);
1104 slot->flags &= ~NS_FORWARD; // XXX needed ?
1105 /* XXX TODO: adapt to the case of a multisegment packet */
1106 m = m_devget(NMB(na, slot), slot->len, 0, na->ifp, NULL);
1115 _nm_may_forward(struct netmap_kring *kring)
1117 return ((netmap_fwd || kring->ring->flags & NR_FORWARD) &&
1118 kring->na->na_flags & NAF_HOST_RINGS &&
1119 kring->tx == NR_RX);
1123 nm_may_forward_up(struct netmap_kring *kring)
1125 return _nm_may_forward(kring) &&
1126 kring->ring_id != kring->na->num_rx_rings;
1130 nm_may_forward_down(struct netmap_kring *kring)
1132 return _nm_may_forward(kring) &&
1133 kring->ring_id == kring->na->num_rx_rings;
1137 * Send to the NIC rings packets marked NS_FORWARD between
1138 * kring->nr_hwcur and kring->rhead
1139 * Called under kring->rx_queue.lock on the sw rx ring,
1142 netmap_sw_to_nic(struct netmap_adapter *na)
1144 struct netmap_kring *kring = &na->rx_rings[na->num_rx_rings];
1145 struct netmap_slot *rxslot = kring->ring->slot;
1146 u_int i, rxcur = kring->nr_hwcur;
1147 u_int const head = kring->rhead;
1148 u_int const src_lim = kring->nkr_num_slots - 1;
1151 /* scan rings to find space, then fill as much as possible */
1152 for (i = 0; i < na->num_tx_rings; i++) {
1153 struct netmap_kring *kdst = &na->tx_rings[i];
1154 struct netmap_ring *rdst = kdst->ring;
1155 u_int const dst_lim = kdst->nkr_num_slots - 1;
1157 /* XXX do we trust ring or kring->rcur,rtail ? */
1158 for (; rxcur != head && !nm_ring_empty(rdst);
1159 rxcur = nm_next(rxcur, src_lim) ) {
1160 struct netmap_slot *src, *dst, tmp;
1161 u_int dst_head = rdst->head;
1163 src = &rxslot[rxcur];
1164 if ((src->flags & NS_FORWARD) == 0 && !netmap_fwd)
1169 dst = &rdst->slot[dst_head];
1173 src->buf_idx = dst->buf_idx;
1174 src->flags = NS_BUF_CHANGED;
1176 dst->buf_idx = tmp.buf_idx;
1178 dst->flags = NS_BUF_CHANGED;
1180 rdst->head = rdst->cur = nm_next(dst_head, dst_lim);
1182 /* if (sent) XXX txsync ? */
1189 * netmap_txsync_to_host() passes packets up. We are called from a
1190 * system call in user process context, and the only contention
1191 * can be among multiple user threads erroneously calling
1192 * this routine concurrently.
1195 netmap_txsync_to_host(struct netmap_kring *kring, int flags)
1197 struct netmap_adapter *na = kring->na;
1198 u_int const lim = kring->nkr_num_slots - 1;
1199 u_int const head = kring->rhead;
1202 /* Take packets from hwcur to head and pass them up.
1203 * force head = cur since netmap_grab_packets() stops at head
1204 * In case of no buffers we give up. At the end of the loop,
1205 * the queue is drained in all cases.
1208 netmap_grab_packets(kring, &q, 1 /* force */);
1209 ND("have %d pkts in queue", mbq_len(&q));
1210 kring->nr_hwcur = head;
1211 kring->nr_hwtail = head + lim;
1212 if (kring->nr_hwtail > lim)
1213 kring->nr_hwtail -= lim + 1;
1215 netmap_send_up(na->ifp, &q);
1221 * rxsync backend for packets coming from the host stack.
1222 * They have been put in kring->rx_queue by netmap_transmit().
1223 * We protect access to the kring using kring->rx_queue.lock
1225 * This routine also does the selrecord if called from the poll handler
1226 * (we know because sr != NULL).
1228 * returns the number of packets delivered to tx queues in
1229 * transparent mode, or a negative value if error
1232 netmap_rxsync_from_host(struct netmap_kring *kring, int flags)
1234 struct netmap_adapter *na = kring->na;
1235 struct netmap_ring *ring = kring->ring;
1237 u_int const lim = kring->nkr_num_slots - 1;
1238 u_int const head = kring->rhead;
1240 struct mbq *q = &kring->rx_queue, fq;
1242 mbq_init(&fq); /* fq holds packets to be freed */
1246 /* First part: import newly received packets */
1248 if (n) { /* grab packets from the queue */
1252 nm_i = kring->nr_hwtail;
1253 stop_i = nm_prev(nm_i, lim);
1254 while ( nm_i != stop_i && (m = mbq_dequeue(q)) != NULL ) {
1255 int len = MBUF_LEN(m);
1256 struct netmap_slot *slot = &ring->slot[nm_i];
1258 m_copydata(m, 0, len, NMB(na, slot));
1259 ND("nm %d len %d", nm_i, len);
1261 D("%s", nm_dump_buf(NMB(na, slot),len, 128, NULL));
1264 slot->flags = kring->nkr_slot_flags;
1265 nm_i = nm_next(nm_i, lim);
1266 mbq_enqueue(&fq, m);
1268 kring->nr_hwtail = nm_i;
1272 * Second part: skip past packets that userspace has released.
1274 nm_i = kring->nr_hwcur;
1275 if (nm_i != head) { /* something was released */
1276 if (nm_may_forward_down(kring)) {
1277 ret = netmap_sw_to_nic(na);
1279 kring->nr_kflags |= NR_FORWARD;
1283 kring->nr_hwcur = head;
1295 /* Get a netmap adapter for the port.
1297 * If it is possible to satisfy the request, return 0
1298 * with *na containing the netmap adapter found.
1299 * Otherwise return an error code, with *na containing NULL.
1301 * When the port is attached to a bridge, we always return
1303 * Otherwise, if the port is already bound to a file descriptor,
1304 * then we unconditionally return the existing adapter into *na.
1305 * In all the other cases, we return (into *na) either native,
1306 * generic or NULL, according to the following table:
1309 * active_fds dev.netmap.admode YES NO
1310 * -------------------------------------------------------
1311 * >0 * NA(ifp) NA(ifp)
1313 * 0 NETMAP_ADMODE_BEST NATIVE GENERIC
1314 * 0 NETMAP_ADMODE_NATIVE NATIVE NULL
1315 * 0 NETMAP_ADMODE_GENERIC GENERIC GENERIC
1318 static void netmap_hw_dtor(struct netmap_adapter *); /* needed by NM_IS_NATIVE() */
1320 netmap_get_hw_na(struct ifnet *ifp, struct netmap_adapter **na)
1322 /* generic support */
1323 int i = netmap_admode; /* Take a snapshot. */
1324 struct netmap_adapter *prev_na;
1327 *na = NULL; /* default */
1329 /* reset in case of invalid value */
1330 if (i < NETMAP_ADMODE_BEST || i >= NETMAP_ADMODE_LAST)
1331 i = netmap_admode = NETMAP_ADMODE_BEST;
1333 if (NM_NA_VALID(ifp)) {
1335 /* If an adapter already exists, return it if
1336 * there are active file descriptors or if
1337 * netmap is not forced to use generic
1340 if (NETMAP_OWNED_BY_ANY(prev_na)
1341 || i != NETMAP_ADMODE_GENERIC
1342 || prev_na->na_flags & NAF_FORCE_NATIVE
1344 /* ugly, but we cannot allow an adapter switch
1345 * if some pipe is referring to this one
1347 || prev_na->na_next_pipe > 0
1355 /* If there isn't native support and netmap is not allowed
1356 * to use generic adapters, we cannot satisfy the request.
1358 if (!NM_IS_NATIVE(ifp) && i == NETMAP_ADMODE_NATIVE)
1361 /* Otherwise, create a generic adapter and return it,
1362 * saving the previously used netmap adapter, if any.
1364 * Note that here 'prev_na', if not NULL, MUST be a
1365 * native adapter, and CANNOT be a generic one. This is
1366 * true because generic adapters are created on demand, and
1367 * destroyed when not used anymore. Therefore, if the adapter
1368 * currently attached to an interface 'ifp' is generic, it
1370 * (NA(ifp)->active_fds > 0 || NETMAP_OWNED_BY_KERN(NA(ifp))).
1371 * Consequently, if NA(ifp) is generic, we will enter one of
1372 * the branches above. This ensures that we never override
1373 * a generic adapter with another generic adapter.
1375 error = generic_netmap_attach(ifp);
1385 * MUST BE CALLED UNDER NMG_LOCK()
1387 * Get a refcounted reference to a netmap adapter attached
1388 * to the interface specified by nmr.
1389 * This is always called in the execution of an ioctl().
1391 * Return ENXIO if the interface specified by the request does
1392 * not exist, ENOTSUP if netmap is not supported by the interface,
1393 * EBUSY if the interface is already attached to a bridge,
1394 * EINVAL if parameters are invalid, ENOMEM if needed resources
1395 * could not be allocated.
1396 * If successful, hold a reference to the netmap adapter.
1398 * If the interface specified by nmr is a system one, also keep
1399 * a reference to it and return a valid *ifp.
1402 netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na,
1403 struct ifnet **ifp, int create)
1406 struct netmap_adapter *ret = NULL;
1408 *na = NULL; /* default return value */
1413 /* We cascade through all possible types of netmap adapter.
1414 * All netmap_get_*_na() functions return an error and an na,
1415 * with the following combinations:
1418 * 0 NULL type doesn't match
1419 * !0 NULL type matches, but na creation/lookup failed
1420 * 0 !NULL type matches and na created/found
1421 * !0 !NULL impossible
1424 /* try to see if this is a ptnetmap port */
1425 error = netmap_get_pt_host_na(nmr, na, create);
1426 if (error || *na != NULL)
1429 /* try to see if this is a monitor port */
1430 error = netmap_get_monitor_na(nmr, na, create);
1431 if (error || *na != NULL)
1434 /* try to see if this is a pipe port */
1435 error = netmap_get_pipe_na(nmr, na, create);
1436 if (error || *na != NULL)
1439 /* try to see if this is a bridge port */
1440 error = netmap_get_bdg_na(nmr, na, create);
1444 if (*na != NULL) /* valid match in netmap_get_bdg_na() */
1448 * This must be a hardware na, lookup the name in the system.
1449 * Note that by hardware we actually mean "it shows up in ifconfig".
1450 * This may still be a tap, a veth/epair, or even a
1451 * persistent VALE port.
1453 *ifp = ifunit_ref(nmr->nr_name);
1458 error = netmap_get_hw_na(*ifp, &ret);
1463 netmap_adapter_get(ret);
1468 netmap_adapter_put(ret);
1478 /* undo netmap_get_na() */
1480 netmap_unget_na(struct netmap_adapter *na, struct ifnet *ifp)
1485 netmap_adapter_put(na);
1489 #define NM_FAIL_ON(t) do { \
1490 if (unlikely(t)) { \
1491 RD(5, "%s: fail '" #t "' " \
1493 "rh %d rc %d rt %d " \
1496 head, cur, ring->tail, \
1497 kring->rhead, kring->rcur, kring->rtail, \
1498 kring->nr_hwcur, kring->nr_hwtail); \
1499 return kring->nkr_num_slots; \
1504 * validate parameters on entry for *_txsync()
1505 * Returns ring->cur if ok, or something >= kring->nkr_num_slots
1508 * rhead, rcur and rtail=hwtail are stored from previous round.
1509 * hwcur is the next packet to send to the ring.
1512 * hwcur <= *rhead <= head <= cur <= tail = *rtail <= hwtail
1514 * hwcur, rhead, rtail and hwtail are reliable
1517 nm_txsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring)
1519 u_int head = ring->head; /* read only once */
1520 u_int cur = ring->cur; /* read only once */
1521 u_int n = kring->nkr_num_slots;
1523 ND(5, "%s kcur %d ktail %d head %d cur %d tail %d",
1525 kring->nr_hwcur, kring->nr_hwtail,
1526 ring->head, ring->cur, ring->tail);
1527 #if 1 /* kernel sanity checks; but we can trust the kring. */
1528 NM_FAIL_ON(kring->nr_hwcur >= n || kring->rhead >= n ||
1529 kring->rtail >= n || kring->nr_hwtail >= n);
1530 #endif /* kernel sanity checks */
1532 * user sanity checks. We only use head,
1533 * A, B, ... are possible positions for head:
1535 * 0 A rhead B rtail C n-1
1536 * 0 D rtail E rhead F n-1
1538 * B, F, D are valid. A, C, E are wrong
1540 if (kring->rtail >= kring->rhead) {
1541 /* want rhead <= head <= rtail */
1542 NM_FAIL_ON(head < kring->rhead || head > kring->rtail);
1543 /* and also head <= cur <= rtail */
1544 NM_FAIL_ON(cur < head || cur > kring->rtail);
1545 } else { /* here rtail < rhead */
1546 /* we need head outside rtail .. rhead */
1547 NM_FAIL_ON(head > kring->rtail && head < kring->rhead);
1549 /* two cases now: head <= rtail or head >= rhead */
1550 if (head <= kring->rtail) {
1551 /* want head <= cur <= rtail */
1552 NM_FAIL_ON(cur < head || cur > kring->rtail);
1553 } else { /* head >= rhead */
1554 /* cur must be outside rtail..head */
1555 NM_FAIL_ON(cur > kring->rtail && cur < head);
1558 if (ring->tail != kring->rtail) {
1559 RD(5, "%s tail overwritten was %d need %d", kring->name,
1560 ring->tail, kring->rtail);
1561 ring->tail = kring->rtail;
1563 kring->rhead = head;
1570 * validate parameters on entry for *_rxsync()
1571 * Returns ring->head if ok, kring->nkr_num_slots on error.
1573 * For a valid configuration,
1574 * hwcur <= head <= cur <= tail <= hwtail
1576 * We only consider head and cur.
1577 * hwcur and hwtail are reliable.
1581 nm_rxsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring)
1583 uint32_t const n = kring->nkr_num_slots;
1586 ND(5,"%s kc %d kt %d h %d c %d t %d",
1588 kring->nr_hwcur, kring->nr_hwtail,
1589 ring->head, ring->cur, ring->tail);
1591 * Before storing the new values, we should check they do not
1592 * move backwards. However:
1593 * - head is not an issue because the previous value is hwcur;
1594 * - cur could in principle go back, however it does not matter
1595 * because we are processing a brand new rxsync()
1597 cur = kring->rcur = ring->cur; /* read only once */
1598 head = kring->rhead = ring->head; /* read only once */
1599 #if 1 /* kernel sanity checks */
1600 NM_FAIL_ON(kring->nr_hwcur >= n || kring->nr_hwtail >= n);
1601 #endif /* kernel sanity checks */
1602 /* user sanity checks */
1603 if (kring->nr_hwtail >= kring->nr_hwcur) {
1604 /* want hwcur <= rhead <= hwtail */
1605 NM_FAIL_ON(head < kring->nr_hwcur || head > kring->nr_hwtail);
1606 /* and also rhead <= rcur <= hwtail */
1607 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail);
1609 /* we need rhead outside hwtail..hwcur */
1610 NM_FAIL_ON(head < kring->nr_hwcur && head > kring->nr_hwtail);
1611 /* two cases now: head <= hwtail or head >= hwcur */
1612 if (head <= kring->nr_hwtail) {
1613 /* want head <= cur <= hwtail */
1614 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail);
1616 /* cur must be outside hwtail..head */
1617 NM_FAIL_ON(cur < head && cur > kring->nr_hwtail);
1620 if (ring->tail != kring->rtail) {
1621 RD(5, "%s tail overwritten was %d need %d",
1623 ring->tail, kring->rtail);
1624 ring->tail = kring->rtail;
1631 * Error routine called when txsync/rxsync detects an error.
1632 * Can't do much more than resetting head =cur = hwcur, tail = hwtail
1633 * Return 1 on reinit.
1635 * This routine is only called by the upper half of the kernel.
1636 * It only reads hwcur (which is changed only by the upper half, too)
1637 * and hwtail (which may be changed by the lower half, but only on
1638 * a tx ring and only to increase it, so any error will be recovered
1639 * on the next call). For the above, we don't strictly need to call
1643 netmap_ring_reinit(struct netmap_kring *kring)
1645 struct netmap_ring *ring = kring->ring;
1646 u_int i, lim = kring->nkr_num_slots - 1;
1649 // XXX KASSERT nm_kr_tryget
1650 RD(10, "called for %s", kring->name);
1651 // XXX probably wrong to trust userspace
1652 kring->rhead = ring->head;
1653 kring->rcur = ring->cur;
1654 kring->rtail = ring->tail;
1656 if (ring->cur > lim)
1658 if (ring->head > lim)
1660 if (ring->tail > lim)
1662 for (i = 0; i <= lim; i++) {
1663 u_int idx = ring->slot[i].buf_idx;
1664 u_int len = ring->slot[i].len;
1665 if (idx < 2 || idx >= kring->na->na_lut.objtotal) {
1666 RD(5, "bad index at slot %d idx %d len %d ", i, idx, len);
1667 ring->slot[i].buf_idx = 0;
1668 ring->slot[i].len = 0;
1669 } else if (len > NETMAP_BUF_SIZE(kring->na)) {
1670 ring->slot[i].len = 0;
1671 RD(5, "bad len at slot %d idx %d len %d", i, idx, len);
1675 RD(10, "total %d errors", errors);
1676 RD(10, "%s reinit, cur %d -> %d tail %d -> %d",
1678 ring->cur, kring->nr_hwcur,
1679 ring->tail, kring->nr_hwtail);
1680 ring->head = kring->rhead = kring->nr_hwcur;
1681 ring->cur = kring->rcur = kring->nr_hwcur;
1682 ring->tail = kring->rtail = kring->nr_hwtail;
1684 return (errors ? 1 : 0);
1687 /* interpret the ringid and flags fields of an nmreq, by translating them
1688 * into a pair of intervals of ring indices:
1690 * [priv->np_txqfirst, priv->np_txqlast) and
1691 * [priv->np_rxqfirst, priv->np_rxqlast)
1695 netmap_interp_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags)
1697 struct netmap_adapter *na = priv->np_na;
1698 u_int j, i = ringid & NETMAP_RING_MASK;
1699 u_int reg = flags & NR_REG_MASK;
1700 int excluded_direction[] = { NR_TX_RINGS_ONLY, NR_RX_RINGS_ONLY };
1703 if (reg == NR_REG_DEFAULT) {
1704 /* convert from old ringid to flags */
1705 if (ringid & NETMAP_SW_RING) {
1707 } else if (ringid & NETMAP_HW_RING) {
1708 reg = NR_REG_ONE_NIC;
1710 reg = NR_REG_ALL_NIC;
1712 D("deprecated API, old ringid 0x%x -> ringid %x reg %d", ringid, i, reg);
1715 if ((flags & NR_PTNETMAP_HOST) && (reg != NR_REG_ALL_NIC ||
1716 flags & (NR_RX_RINGS_ONLY|NR_TX_RINGS_ONLY))) {
1717 D("Error: only NR_REG_ALL_NIC supported with netmap passthrough");
1722 if (flags & excluded_direction[t]) {
1723 priv->np_qfirst[t] = priv->np_qlast[t] = 0;
1727 case NR_REG_ALL_NIC:
1728 case NR_REG_PIPE_MASTER:
1729 case NR_REG_PIPE_SLAVE:
1730 priv->np_qfirst[t] = 0;
1731 priv->np_qlast[t] = nma_get_nrings(na, t);
1732 ND("ALL/PIPE: %s %d %d", nm_txrx2str(t),
1733 priv->np_qfirst[t], priv->np_qlast[t]);
1737 if (!(na->na_flags & NAF_HOST_RINGS)) {
1738 D("host rings not supported");
1741 priv->np_qfirst[t] = (reg == NR_REG_SW ?
1742 nma_get_nrings(na, t) : 0);
1743 priv->np_qlast[t] = nma_get_nrings(na, t) + 1;
1744 ND("%s: %s %d %d", reg == NR_REG_SW ? "SW" : "NIC+SW",
1746 priv->np_qfirst[t], priv->np_qlast[t]);
1748 case NR_REG_ONE_NIC:
1749 if (i >= na->num_tx_rings && i >= na->num_rx_rings) {
1750 D("invalid ring id %d", i);
1753 /* if not enough rings, use the first one */
1755 if (j >= nma_get_nrings(na, t))
1757 priv->np_qfirst[t] = j;
1758 priv->np_qlast[t] = j + 1;
1759 ND("ONE_NIC: %s %d %d", nm_txrx2str(t),
1760 priv->np_qfirst[t], priv->np_qlast[t]);
1763 D("invalid regif type %d", reg);
1767 priv->np_flags = (flags & ~NR_REG_MASK) | reg;
1769 if (netmap_verbose) {
1770 D("%s: tx [%d,%d) rx [%d,%d) id %d",
1772 priv->np_qfirst[NR_TX],
1773 priv->np_qlast[NR_TX],
1774 priv->np_qfirst[NR_RX],
1775 priv->np_qlast[NR_RX],
1783 * Set the ring ID. For devices with a single queue, a request
1784 * for all rings is the same as a single ring.
1787 netmap_set_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags)
1789 struct netmap_adapter *na = priv->np_na;
1793 error = netmap_interp_ringid(priv, ringid, flags);
1798 priv->np_txpoll = (ringid & NETMAP_NO_TX_POLL) ? 0 : 1;
1800 /* optimization: count the users registered for more than
1801 * one ring, which are the ones sleeping on the global queue.
1802 * The default netmap_notify() callback will then
1803 * avoid signaling the global queue if nobody is using it
1806 if (nm_si_user(priv, t))
1813 netmap_unset_ringid(struct netmap_priv_d *priv)
1815 struct netmap_adapter *na = priv->np_na;
1819 if (nm_si_user(priv, t))
1821 priv->np_qfirst[t] = priv->np_qlast[t] = 0;
1824 priv->np_txpoll = 0;
1828 /* Set the nr_pending_mode for the requested rings.
1829 * If requested, also try to get exclusive access to the rings, provided
1830 * the rings we want to bind are not exclusively owned by a previous bind.
1833 netmap_krings_get(struct netmap_priv_d *priv)
1835 struct netmap_adapter *na = priv->np_na;
1837 struct netmap_kring *kring;
1838 int excl = (priv->np_flags & NR_EXCLUSIVE);
1841 ND("%s: grabbing tx [%d, %d) rx [%d, %d)",
1843 priv->np_qfirst[NR_TX],
1844 priv->np_qlast[NR_TX],
1845 priv->np_qfirst[NR_RX],
1846 priv->np_qlast[NR_RX]);
1848 /* first round: check that all the requested rings
1849 * are neither alread exclusively owned, nor we
1850 * want exclusive ownership when they are already in use
1853 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
1854 kring = &NMR(na, t)[i];
1855 if ((kring->nr_kflags & NKR_EXCLUSIVE) ||
1856 (kring->users && excl))
1858 ND("ring %s busy", kring->name);
1864 /* second round: increment usage count (possibly marking them
1865 * as exclusive) and set the nr_pending_mode
1868 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
1869 kring = &NMR(na, t)[i];
1872 kring->nr_kflags |= NKR_EXCLUSIVE;
1873 kring->nr_pending_mode = NKR_NETMAP_ON;
1881 /* Undo netmap_krings_get(). This is done by clearing the exclusive mode
1882 * if was asked on regif, and unset the nr_pending_mode if we are the
1883 * last users of the involved rings. */
1885 netmap_krings_put(struct netmap_priv_d *priv)
1887 struct netmap_adapter *na = priv->np_na;
1889 struct netmap_kring *kring;
1890 int excl = (priv->np_flags & NR_EXCLUSIVE);
1893 ND("%s: releasing tx [%d, %d) rx [%d, %d)",
1895 priv->np_qfirst[NR_TX],
1896 priv->np_qlast[NR_TX],
1897 priv->np_qfirst[NR_RX],
1898 priv->np_qlast[MR_RX]);
1902 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
1903 kring = &NMR(na, t)[i];
1905 kring->nr_kflags &= ~NKR_EXCLUSIVE;
1907 if (kring->users == 0)
1908 kring->nr_pending_mode = NKR_NETMAP_OFF;
1914 * possibly move the interface to netmap-mode.
1915 * If success it returns a pointer to netmap_if, otherwise NULL.
1916 * This must be called with NMG_LOCK held.
1918 * The following na callbacks are called in the process:
1920 * na->nm_config() [by netmap_update_config]
1921 * (get current number and size of rings)
1923 * We have a generic one for linux (netmap_linux_config).
1924 * The bwrap has to override this, since it has to forward
1925 * the request to the wrapped adapter (netmap_bwrap_config).
1928 * na->nm_krings_create()
1929 * (create and init the krings array)
1931 * One of the following:
1933 * * netmap_hw_krings_create, (hw ports)
1934 * creates the standard layout for the krings
1935 * and adds the mbq (used for the host rings).
1937 * * netmap_vp_krings_create (VALE ports)
1938 * add leases and scratchpads
1940 * * netmap_pipe_krings_create (pipes)
1941 * create the krings and rings of both ends and
1944 * * netmap_monitor_krings_create (monitors)
1945 * avoid allocating the mbq
1947 * * netmap_bwrap_krings_create (bwraps)
1948 * create both the brap krings array,
1949 * the krings array of the wrapped adapter, and
1950 * (if needed) the fake array for the host adapter
1952 * na->nm_register(, 1)
1953 * (put the adapter in netmap mode)
1955 * This may be one of the following:
1957 * * netmap_hw_reg (hw ports)
1958 * checks that the ifp is still there, then calls
1959 * the hardware specific callback;
1961 * * netmap_vp_reg (VALE ports)
1962 * If the port is connected to a bridge,
1963 * set the NAF_NETMAP_ON flag under the
1964 * bridge write lock.
1966 * * netmap_pipe_reg (pipes)
1967 * inform the other pipe end that it is no
1968 * longer responsible for the lifetime of this
1971 * * netmap_monitor_reg (monitors)
1972 * intercept the sync callbacks of the monitored
1975 * * netmap_bwrap_reg (bwraps)
1976 * cross-link the bwrap and hwna rings,
1977 * forward the request to the hwna, override
1978 * the hwna notify callback (to get the frames
1979 * coming from outside go through the bridge).
1984 netmap_do_regif(struct netmap_priv_d *priv, struct netmap_adapter *na,
1985 uint16_t ringid, uint32_t flags)
1987 struct netmap_if *nifp = NULL;
1991 /* ring configuration may have changed, fetch from the card */
1992 netmap_update_config(na);
1993 priv->np_na = na; /* store the reference */
1994 error = netmap_set_ringid(priv, ringid, flags);
1997 error = netmap_mem_finalize(na->nm_mem, na);
2001 if (na->active_fds == 0) {
2003 * If this is the first registration of the adapter,
2004 * create the in-kernel view of the netmap rings,
2005 * the netmap krings.
2009 * Depending on the adapter, this may also create
2010 * the netmap rings themselves
2012 error = na->nm_krings_create(na);
2018 /* now the krings must exist and we can check whether some
2019 * previous bind has exclusive ownership on them, and set
2022 error = netmap_krings_get(priv);
2024 goto err_del_krings;
2026 /* create all needed missing netmap rings */
2027 error = netmap_mem_rings_create(na);
2031 /* in all cases, create a new netmap if */
2032 nifp = netmap_mem_if_new(na);
2038 if (na->active_fds == 0) {
2039 /* cache the allocator info in the na */
2040 error = netmap_mem_get_lut(na->nm_mem, &na->na_lut);
2043 ND("lut %p bufs %u size %u", na->na_lut.lut, na->na_lut.objtotal,
2044 na->na_lut.objsize);
2047 if (nm_kring_pending(priv)) {
2048 /* Some kring is switching mode, tell the adapter to
2050 error = na->nm_register(na, 1);
2055 /* Commit the reference. */
2059 * advertise that the interface is ready by setting np_nifp.
2060 * The barrier is needed because readers (poll, *SYNC and mmap)
2061 * check for priv->np_nifp != NULL without locking
2063 mb(); /* make sure previous writes are visible to all CPUs */
2064 priv->np_nifp = nifp;
2069 if (na->active_fds == 0)
2070 memset(&na->na_lut, 0, sizeof(na->na_lut));
2072 netmap_mem_if_delete(na, nifp);
2074 netmap_krings_put(priv);
2076 netmap_mem_rings_delete(na);
2078 if (na->active_fds == 0)
2079 na->nm_krings_delete(na);
2081 netmap_mem_deref(na->nm_mem, na);
2089 * update kring and ring at the end of rxsync/txsync.
2092 nm_sync_finalize(struct netmap_kring *kring)
2095 * Update ring tail to what the kernel knows
2096 * After txsync: head/rhead/hwcur might be behind cur/rcur
2099 kring->ring->tail = kring->rtail = kring->nr_hwtail;
2101 ND(5, "%s now hwcur %d hwtail %d head %d cur %d tail %d",
2102 kring->name, kring->nr_hwcur, kring->nr_hwtail,
2103 kring->rhead, kring->rcur, kring->rtail);
2107 * ioctl(2) support for the "netmap" device.
2109 * Following a list of accepted commands:
2111 * - SIOCGIFADDR just for convenience
2116 * Return 0 on success, errno otherwise.
2119 netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, caddr_t data, struct thread *td)
2121 struct nmreq *nmr = (struct nmreq *) data;
2122 struct netmap_adapter *na = NULL;
2123 struct ifnet *ifp = NULL;
2125 u_int i, qfirst, qlast;
2126 struct netmap_if *nifp;
2127 struct netmap_kring *krings;
2130 if (cmd == NIOCGINFO || cmd == NIOCREGIF) {
2132 nmr->nr_name[sizeof(nmr->nr_name) - 1] = '\0';
2133 if (nmr->nr_version != NETMAP_API) {
2134 D("API mismatch for %s got %d need %d",
2136 nmr->nr_version, NETMAP_API);
2137 nmr->nr_version = NETMAP_API;
2139 if (nmr->nr_version < NETMAP_MIN_API ||
2140 nmr->nr_version > NETMAP_MAX_API) {
2146 case NIOCGINFO: /* return capabilities etc */
2147 if (nmr->nr_cmd == NETMAP_BDG_LIST) {
2148 error = netmap_bdg_ctl(nmr, NULL);
2154 /* memsize is always valid */
2155 struct netmap_mem_d *nmd = &nm_mem;
2158 if (nmr->nr_name[0] != '\0') {
2160 /* get a refcount */
2161 error = netmap_get_na(nmr, &na, &ifp, 1 /* create */);
2167 nmd = na->nm_mem; /* get memory allocator */
2170 error = netmap_mem_get_info(nmd, &nmr->nr_memsize, &memflags,
2174 if (na == NULL) /* only memory info */
2177 nmr->nr_rx_slots = nmr->nr_tx_slots = 0;
2178 netmap_update_config(na);
2179 nmr->nr_rx_rings = na->num_rx_rings;
2180 nmr->nr_tx_rings = na->num_tx_rings;
2181 nmr->nr_rx_slots = na->num_rx_desc;
2182 nmr->nr_tx_slots = na->num_tx_desc;
2184 netmap_unget_na(na, ifp);
2190 * If nmr->nr_cmd is not zero, this NIOCREGIF is not really
2191 * a regif operation, but a different one, specified by the
2192 * value of nmr->nr_cmd.
2195 if (i == NETMAP_BDG_ATTACH || i == NETMAP_BDG_DETACH
2196 || i == NETMAP_BDG_VNET_HDR
2197 || i == NETMAP_BDG_NEWIF
2198 || i == NETMAP_BDG_DELIF
2199 || i == NETMAP_BDG_POLLING_ON
2200 || i == NETMAP_BDG_POLLING_OFF) {
2201 /* possibly attach/detach NIC and VALE switch */
2202 error = netmap_bdg_ctl(nmr, NULL);
2204 } else if (i == NETMAP_PT_HOST_CREATE || i == NETMAP_PT_HOST_DELETE) {
2205 /* forward the command to the ptnetmap subsystem */
2206 error = ptnetmap_ctl(nmr, priv->np_na);
2208 } else if (i == NETMAP_VNET_HDR_GET) {
2209 /* get vnet-header length for this netmap port */
2213 error = netmap_get_na(nmr, &na, &ifp, 0);
2215 nmr->nr_arg1 = na->virt_hdr_len;
2217 netmap_unget_na(na, ifp);
2220 } else if (i == NETMAP_POOLS_INFO_GET) {
2221 /* get information from the memory allocator */
2222 error = netmap_mem_pools_info_get(nmr, priv->np_na);
2224 } else if (i != 0) {
2225 D("nr_cmd must be 0 not %d", i);
2230 /* protect access to priv from concurrent NIOCREGIF */
2236 if (priv->np_nifp != NULL) { /* thread already registered */
2240 /* find the interface and a reference */
2241 error = netmap_get_na(nmr, &na, &ifp,
2242 1 /* create */); /* keep reference */
2245 if (NETMAP_OWNED_BY_KERN(na)) {
2246 netmap_unget_na(na, ifp);
2251 if (na->virt_hdr_len && !(nmr->nr_flags & NR_ACCEPT_VNET_HDR)) {
2252 netmap_unget_na(na, ifp);
2257 error = netmap_do_regif(priv, na, nmr->nr_ringid, nmr->nr_flags);
2258 if (error) { /* reg. failed, release priv and ref */
2259 netmap_unget_na(na, ifp);
2262 nifp = priv->np_nifp;
2263 priv->np_td = td; // XXX kqueue, debugging only
2265 /* return the offset of the netmap_if object */
2266 nmr->nr_rx_rings = na->num_rx_rings;
2267 nmr->nr_tx_rings = na->num_tx_rings;
2268 nmr->nr_rx_slots = na->num_rx_desc;
2269 nmr->nr_tx_slots = na->num_tx_desc;
2270 error = netmap_mem_get_info(na->nm_mem, &nmr->nr_memsize, &memflags,
2273 netmap_do_unregif(priv);
2274 netmap_unget_na(na, ifp);
2277 if (memflags & NETMAP_MEM_PRIVATE) {
2278 *(uint32_t *)(uintptr_t)&nifp->ni_flags |= NI_PRIV_MEM;
2281 priv->np_si[t] = nm_si_user(priv, t) ?
2282 &na->si[t] : &NMR(na, t)[priv->np_qfirst[t]].si;
2287 D("requested %d extra buffers", nmr->nr_arg3);
2288 nmr->nr_arg3 = netmap_extra_alloc(na,
2289 &nifp->ni_bufs_head, nmr->nr_arg3);
2291 D("got %d extra buffers", nmr->nr_arg3);
2293 nmr->nr_offset = netmap_mem_if_offset(na->nm_mem, nifp);
2295 /* store ifp reference so that priv destructor may release it */
2303 nifp = priv->np_nifp;
2309 mb(); /* make sure following reads are not from cache */
2311 na = priv->np_na; /* we have a reference */
2314 D("Internal error: nifp != NULL && na == NULL");
2319 t = (cmd == NIOCTXSYNC ? NR_TX : NR_RX);
2320 krings = NMR(na, t);
2321 qfirst = priv->np_qfirst[t];
2322 qlast = priv->np_qlast[t];
2324 for (i = qfirst; i < qlast; i++) {
2325 struct netmap_kring *kring = krings + i;
2326 struct netmap_ring *ring = kring->ring;
2328 if (unlikely(nm_kr_tryget(kring, 1, &error))) {
2329 error = (error ? EIO : 0);
2333 if (cmd == NIOCTXSYNC) {
2334 if (netmap_verbose & NM_VERB_TXSYNC)
2335 D("pre txsync ring %d cur %d hwcur %d",
2338 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2339 netmap_ring_reinit(kring);
2340 } else if (kring->nm_sync(kring, NAF_FORCE_RECLAIM) == 0) {
2341 nm_sync_finalize(kring);
2343 if (netmap_verbose & NM_VERB_TXSYNC)
2344 D("post txsync ring %d cur %d hwcur %d",
2348 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2349 netmap_ring_reinit(kring);
2350 } else if (kring->nm_sync(kring, NAF_FORCE_READ) == 0) {
2351 nm_sync_finalize(kring);
2353 microtime(&ring->ts);
2362 error = netmap_bdg_config(nmr);
2368 ND("FIONBIO/FIOASYNC are no-ops");
2375 D("ignore BIOCIMMEDIATE/BIOCSHDRCMPLT/BIOCSHDRCMPLT/BIOCSSEESENT");
2378 default: /* allow device-specific ioctls */
2380 struct ifnet *ifp = ifunit_ref(nmr->nr_name);
2386 bzero(&so, sizeof(so));
2387 so.so_vnet = ifp->if_vnet;
2388 // so->so_proto not null.
2389 error = ifioctl(&so, cmd, data, td);
2406 * select(2) and poll(2) handlers for the "netmap" device.
2408 * Can be called for one or more queues.
2409 * Return true the event mask corresponding to ready events.
2410 * If there are no ready events, do a selrecord on either individual
2411 * selinfo or on the global one.
2412 * Device-dependent parts (locking and sync of tx/rx rings)
2413 * are done through callbacks.
2415 * On linux, arguments are really pwait, the poll table, and 'td' is struct file *
2416 * The first one is remapped to pwait as selrecord() uses the name as an
2420 netmap_poll(struct netmap_priv_d *priv, int events, NM_SELRECORD_T *sr)
2422 struct netmap_adapter *na;
2423 struct netmap_kring *kring;
2424 struct netmap_ring *ring;
2425 u_int i, check_all_tx, check_all_rx, want[NR_TXRX], revents = 0;
2426 #define want_tx want[NR_TX]
2427 #define want_rx want[NR_RX]
2428 struct mbq q; /* packets from hw queues to host stack */
2432 * In order to avoid nested locks, we need to "double check"
2433 * txsync and rxsync if we decide to do a selrecord().
2434 * retry_tx (and retry_rx, later) prevent looping forever.
2436 int retry_tx = 1, retry_rx = 1;
2438 /* transparent mode: send_down is 1 if we have found some
2439 * packets to forward during the rx scan and we have not
2440 * sent them down to the nic yet
2446 if (priv->np_nifp == NULL) {
2447 D("No if registered");
2450 mb(); /* make sure following reads are not from cache */
2454 if (!nm_netmap_on(na))
2457 if (netmap_verbose & 0x8000)
2458 D("device %s events 0x%x", na->name, events);
2459 want_tx = events & (POLLOUT | POLLWRNORM);
2460 want_rx = events & (POLLIN | POLLRDNORM);
2463 * check_all_{tx|rx} are set if the card has more than one queue AND
2464 * the file descriptor is bound to all of them. If so, we sleep on
2465 * the "global" selinfo, otherwise we sleep on individual selinfo
2466 * (FreeBSD only allows two selinfo's per file descriptor).
2467 * The interrupt routine in the driver wake one or the other
2468 * (or both) depending on which clients are active.
2470 * rxsync() is only called if we run out of buffers on a POLLIN.
2471 * txsync() is called if we run out of buffers on POLLOUT, or
2472 * there are pending packets to send. The latter can be disabled
2473 * passing NETMAP_NO_TX_POLL in the NIOCREG call.
2475 check_all_tx = nm_si_user(priv, NR_TX);
2476 check_all_rx = nm_si_user(priv, NR_RX);
2479 * We start with a lock free round which is cheap if we have
2480 * slots available. If this fails, then lock and call the sync
2483 #if 1 /* new code- call rx if any of the ring needs to release or read buffers */
2486 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) {
2487 kring = &NMR(na, t)[i];
2488 /* XXX compare ring->cur and kring->tail */
2489 if (!nm_ring_empty(kring->ring)) {
2491 want[t] = 0; /* also breaks the loop */
2496 want_rx = 0; /* look for a reason to run the handlers */
2498 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) {
2499 kring = &NMR(na, t)[i];
2500 if (kring->ring->cur == kring->ring->tail /* try fetch new buffers */
2501 || kring->rhead != kring->ring->head /* release buffers */) {
2506 revents |= events & (POLLIN | POLLRDNORM); /* we have data */
2508 #else /* old code */
2510 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) {
2511 kring = &NMR(na, t)[i];
2512 /* XXX compare ring->cur and kring->tail */
2513 if (!nm_ring_empty(kring->ring)) {
2515 want[t] = 0; /* also breaks the loop */
2519 #endif /* old code */
2522 * If we want to push packets out (priv->np_txpoll) or
2523 * want_tx is still set, we must issue txsync calls
2524 * (on all rings, to avoid that the tx rings stall).
2525 * XXX should also check cur != hwcur on the tx rings.
2526 * Fortunately, normal tx mode has np_txpoll set.
2528 if (priv->np_txpoll || want_tx) {
2530 * The first round checks if anyone is ready, if not
2531 * do a selrecord and another round to handle races.
2532 * want_tx goes to 0 if any space is found, and is
2533 * used to skip rings with no pending transmissions.
2536 for (i = priv->np_qfirst[NR_TX]; i < priv->np_qlast[NR_TX]; i++) {
2539 kring = &na->tx_rings[i];
2542 if (!send_down && !want_tx && ring->cur == kring->nr_hwcur)
2545 if (nm_kr_tryget(kring, 1, &revents))
2548 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2549 netmap_ring_reinit(kring);
2552 if (kring->nm_sync(kring, 0))
2555 nm_sync_finalize(kring);
2559 * If we found new slots, notify potential
2560 * listeners on the same ring.
2561 * Since we just did a txsync, look at the copies
2562 * of cur,tail in the kring.
2564 found = kring->rcur != kring->rtail;
2566 if (found) { /* notify other listeners */
2569 kring->nm_notify(kring, 0);
2572 /* if there were any packet to forward we must have handled them by now */
2574 if (want_tx && retry_tx && sr) {
2575 nm_os_selrecord(sr, check_all_tx ?
2576 &na->si[NR_TX] : &na->tx_rings[priv->np_qfirst[NR_TX]].si);
2583 * If want_rx is still set scan receive rings.
2584 * Do it on all rings because otherwise we starve.
2587 /* two rounds here for race avoidance */
2589 for (i = priv->np_qfirst[NR_RX]; i < priv->np_qlast[NR_RX]; i++) {
2592 kring = &na->rx_rings[i];
2595 if (unlikely(nm_kr_tryget(kring, 1, &revents)))
2598 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) {
2599 netmap_ring_reinit(kring);
2602 /* now we can use kring->rcur, rtail */
2605 * transparent mode support: collect packets
2606 * from the rxring(s).
2608 if (nm_may_forward_up(kring)) {
2609 ND(10, "forwarding some buffers up %d to %d",
2610 kring->nr_hwcur, ring->cur);
2611 netmap_grab_packets(kring, &q, netmap_fwd);
2614 kring->nr_kflags &= ~NR_FORWARD;
2615 if (kring->nm_sync(kring, 0))
2618 nm_sync_finalize(kring);
2619 send_down |= (kring->nr_kflags & NR_FORWARD); /* host ring only */
2620 if (netmap_no_timestamp == 0 ||
2621 ring->flags & NR_TIMESTAMP) {
2622 microtime(&ring->ts);
2624 found = kring->rcur != kring->rtail;
2629 kring->nm_notify(kring, 0);
2633 if (retry_rx && sr) {
2634 nm_os_selrecord(sr, check_all_rx ?
2635 &na->si[NR_RX] : &na->rx_rings[priv->np_qfirst[NR_RX]].si);
2637 if (send_down > 0 || retry_rx) {
2640 goto flush_tx; /* and retry_rx */
2647 * Transparent mode: marked bufs on rx rings between
2648 * kring->nr_hwcur and ring->head
2649 * are passed to the other endpoint.
2651 * Transparent mode requires to bind all
2652 * rings to a single file descriptor.
2655 if (q.head && !nm_kr_tryget(&na->tx_rings[na->num_tx_rings], 1, &revents)) {
2656 netmap_send_up(na->ifp, &q);
2657 nm_kr_put(&na->tx_rings[na->num_tx_rings]);
2666 /*-------------------- driver support routines -------------------*/
2668 /* default notify callback */
2670 netmap_notify(struct netmap_kring *kring, int flags)
2672 struct netmap_adapter *na = kring->na;
2673 enum txrx t = kring->tx;
2675 nm_os_selwakeup(&kring->si);
2676 /* optimization: avoid a wake up on the global
2677 * queue if nobody has registered for more
2680 if (na->si_users[t] > 0)
2681 nm_os_selwakeup(&na->si[t]);
2683 return NM_IRQ_COMPLETED;
2688 netmap_notify(struct netmap_adapter *na, u_int n_ring,
2689 enum txrx tx, int flags)
2692 KeSetEvent(notes->TX_EVENT, 0, FALSE);
2696 KeSetEvent(notes->RX_EVENT, 0, FALSE);
2702 /* called by all routines that create netmap_adapters.
2703 * provide some defaults and get a reference to the
2707 netmap_attach_common(struct netmap_adapter *na)
2709 if (na->num_tx_rings == 0 || na->num_rx_rings == 0) {
2710 D("%s: invalid rings tx %d rx %d",
2711 na->name, na->num_tx_rings, na->num_rx_rings);
2716 if (na->na_flags & NAF_HOST_RINGS && na->ifp) {
2717 na->if_input = na->ifp->if_input; /* for netmap_send_up */
2719 #endif /* __FreeBSD__ */
2720 if (na->nm_krings_create == NULL) {
2721 /* we assume that we have been called by a driver,
2722 * since other port types all provide their own
2725 na->nm_krings_create = netmap_hw_krings_create;
2726 na->nm_krings_delete = netmap_hw_krings_delete;
2728 if (na->nm_notify == NULL)
2729 na->nm_notify = netmap_notify;
2732 if (na->nm_mem == NULL)
2733 /* use the global allocator */
2734 na->nm_mem = &nm_mem;
2735 netmap_mem_get(na->nm_mem);
2737 if (na->nm_bdg_attach == NULL)
2738 /* no special nm_bdg_attach callback. On VALE
2739 * attach, we need to interpose a bwrap
2741 na->nm_bdg_attach = netmap_bwrap_attach;
2748 /* standard cleanup, called by all destructors */
2750 netmap_detach_common(struct netmap_adapter *na)
2752 if (na->tx_rings) { /* XXX should not happen */
2753 D("freeing leftover tx_rings");
2754 na->nm_krings_delete(na);
2756 netmap_pipe_dealloc(na);
2758 netmap_mem_put(na->nm_mem);
2759 bzero(na, sizeof(*na));
2763 /* Wrapper for the register callback provided netmap-enabled
2765 * nm_iszombie(na) means that the driver module has been
2766 * unloaded, so we cannot call into it.
2767 * nm_os_ifnet_lock() must guarantee mutual exclusion with
2771 netmap_hw_reg(struct netmap_adapter *na, int onoff)
2773 struct netmap_hw_adapter *hwna =
2774 (struct netmap_hw_adapter*)na;
2779 if (nm_iszombie(na)) {
2782 } else if (na != NULL) {
2783 na->na_flags &= ~NAF_NETMAP_ON;
2788 error = hwna->nm_hw_register(na, onoff);
2791 nm_os_ifnet_unlock();
2797 netmap_hw_dtor(struct netmap_adapter *na)
2799 if (nm_iszombie(na) || na->ifp == NULL)
2802 WNA(na->ifp) = NULL;
2807 * Allocate a ``netmap_adapter`` object, and initialize it from the
2808 * 'arg' passed by the driver on attach.
2809 * We allocate a block of memory with room for a struct netmap_adapter
2810 * plus two sets of N+2 struct netmap_kring (where N is the number
2811 * of hardware rings):
2812 * krings 0..N-1 are for the hardware queues.
2813 * kring N is for the host stack queue
2814 * kring N+1 is only used for the selinfo for all queues. // XXX still true ?
2815 * Return 0 on success, ENOMEM otherwise.
2818 _netmap_attach(struct netmap_adapter *arg, size_t size)
2820 struct netmap_hw_adapter *hwna = NULL;
2821 struct ifnet *ifp = NULL;
2823 if (arg == NULL || arg->ifp == NULL)
2826 hwna = malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO);
2830 hwna->up.na_flags |= NAF_HOST_RINGS | NAF_NATIVE;
2831 strncpy(hwna->up.name, ifp->if_xname, sizeof(hwna->up.name));
2832 hwna->nm_hw_register = hwna->up.nm_register;
2833 hwna->up.nm_register = netmap_hw_reg;
2834 if (netmap_attach_common(&hwna->up)) {
2835 free(hwna, M_DEVBUF);
2838 netmap_adapter_get(&hwna->up);
2840 NM_ATTACH_NA(ifp, &hwna->up);
2843 if (ifp->netdev_ops) {
2844 /* prepare a clone of the netdev ops */
2845 #ifndef NETMAP_LINUX_HAVE_NETDEV_OPS
2846 hwna->nm_ndo.ndo_start_xmit = ifp->netdev_ops;
2848 hwna->nm_ndo = *ifp->netdev_ops;
2849 #endif /* NETMAP_LINUX_HAVE_NETDEV_OPS */
2851 hwna->nm_ndo.ndo_start_xmit = linux_netmap_start_xmit;
2852 if (ifp->ethtool_ops) {
2853 hwna->nm_eto = *ifp->ethtool_ops;
2855 hwna->nm_eto.set_ringparam = linux_netmap_set_ringparam;
2856 #ifdef NETMAP_LINUX_HAVE_SET_CHANNELS
2857 hwna->nm_eto.set_channels = linux_netmap_set_channels;
2858 #endif /* NETMAP_LINUX_HAVE_SET_CHANNELS */
2859 if (arg->nm_config == NULL) {
2860 hwna->up.nm_config = netmap_linux_config;
2863 if (arg->nm_dtor == NULL) {
2864 hwna->up.nm_dtor = netmap_hw_dtor;
2867 if_printf(ifp, "netmap queues/slots: TX %d/%d, RX %d/%d\n",
2868 hwna->up.num_tx_rings, hwna->up.num_tx_desc,
2869 hwna->up.num_rx_rings, hwna->up.num_rx_desc);
2873 D("fail, arg %p ifp %p na %p", arg, ifp, hwna);
2874 return (hwna ? EINVAL : ENOMEM);
2879 netmap_attach(struct netmap_adapter *arg)
2881 return _netmap_attach(arg, sizeof(struct netmap_hw_adapter));
2885 #ifdef WITH_PTNETMAP_GUEST
2887 netmap_pt_guest_attach(struct netmap_adapter *arg, void *csb,
2888 unsigned int nifp_offset, unsigned int memid)
2890 struct netmap_pt_guest_adapter *ptna;
2891 struct ifnet *ifp = arg ? arg->ifp : NULL;
2895 arg->nm_mem = netmap_mem_pt_guest_new(ifp, nifp_offset, memid);
2896 if (arg->nm_mem == NULL)
2898 arg->na_flags |= NAF_MEM_OWNER;
2899 error = _netmap_attach(arg, sizeof(struct netmap_pt_guest_adapter));
2903 /* get the netmap_pt_guest_adapter */
2904 ptna = (struct netmap_pt_guest_adapter *) NA(ifp);
2907 /* Initialize a separate pass-through netmap adapter that is going to
2908 * be used by the ptnet driver only, and so never exposed to netmap
2909 * applications. We only need a subset of the available fields. */
2910 memset(&ptna->dr, 0, sizeof(ptna->dr));
2911 ptna->dr.up.ifp = ifp;
2912 ptna->dr.up.nm_mem = ptna->hwup.up.nm_mem;
2913 netmap_mem_get(ptna->dr.up.nm_mem);
2914 ptna->dr.up.nm_config = ptna->hwup.up.nm_config;
2916 ptna->backend_regifs = 0;
2920 #endif /* WITH_PTNETMAP_GUEST */
2924 NM_DBG(netmap_adapter_get)(struct netmap_adapter *na)
2930 refcount_acquire(&na->na_refcount);
2934 /* returns 1 iff the netmap_adapter is destroyed */
2936 NM_DBG(netmap_adapter_put)(struct netmap_adapter *na)
2941 if (!refcount_release(&na->na_refcount))
2947 netmap_detach_common(na);
2952 /* nm_krings_create callback for all hardware native adapters */
2954 netmap_hw_krings_create(struct netmap_adapter *na)
2956 int ret = netmap_krings_create(na, 0);
2958 /* initialize the mbq for the sw rx ring */
2959 mbq_safe_init(&na->rx_rings[na->num_rx_rings].rx_queue);
2960 ND("initialized sw rx queue %d", na->num_rx_rings);
2968 * Called on module unload by the netmap-enabled drivers
2971 netmap_detach(struct ifnet *ifp)
2973 struct netmap_adapter *na = NA(ifp);
2979 netmap_set_all_rings(na, NM_KR_LOCKED);
2980 na->na_flags |= NAF_ZOMBIE;
2982 * if the netmap adapter is not native, somebody
2983 * changed it, so we can not release it here.
2984 * The NAF_ZOMBIE flag will notify the new owner that
2985 * the driver is gone.
2987 if (na->na_flags & NAF_NATIVE) {
2988 netmap_adapter_put(na);
2990 /* give active users a chance to notice that NAF_ZOMBIE has been
2991 * turned on, so that they can stop and return an error to userspace.
2992 * Note that this becomes a NOP if there are no active users and,
2993 * therefore, the put() above has deleted the na, since now NA(ifp) is
2996 netmap_enable_all_rings(ifp);
3002 * Intercept packets from the network stack and pass them
3003 * to netmap as incoming packets on the 'software' ring.
3005 * We only store packets in a bounded mbq and then copy them
3006 * in the relevant rxsync routine.
3008 * We rely on the OS to make sure that the ifp and na do not go
3009 * away (typically the caller checks for IFF_DRV_RUNNING or the like).
3010 * In nm_register() or whenever there is a reinitialization,
3011 * we make sure to make the mode change visible here.
3014 netmap_transmit(struct ifnet *ifp, struct mbuf *m)
3016 struct netmap_adapter *na = NA(ifp);
3017 struct netmap_kring *kring, *tx_kring;
3018 u_int len = MBUF_LEN(m);
3019 u_int error = ENOBUFS;
3024 kring = &na->rx_rings[na->num_rx_rings];
3025 // XXX [Linux] we do not need this lock
3026 // if we follow the down/configure/up protocol -gl
3027 // mtx_lock(&na->core_lock);
3029 if (!nm_netmap_on(na)) {
3030 D("%s not in netmap mode anymore", na->name);
3036 if (txr >= na->num_tx_rings) {
3037 txr %= na->num_tx_rings;
3039 tx_kring = &NMR(na, NR_TX)[txr];
3041 if (tx_kring->nr_mode == NKR_NETMAP_OFF) {
3042 return MBUF_TRANSMIT(na, ifp, m);
3045 q = &kring->rx_queue;
3047 // XXX reconsider long packets if we handle fragments
3048 if (len > NETMAP_BUF_SIZE(na)) { /* too long for us */
3049 D("%s from_host, drop packet size %d > %d", na->name,
3050 len, NETMAP_BUF_SIZE(na));
3054 if (nm_os_mbuf_has_offld(m)) {
3055 RD(1, "%s drop mbuf requiring offloadings", na->name);
3059 /* protect against rxsync_from_host(), netmap_sw_to_nic()
3060 * and maybe other instances of netmap_transmit (the latter
3061 * not possible on Linux).
3062 * Also avoid overflowing the queue.
3066 space = kring->nr_hwtail - kring->nr_hwcur;
3068 space += kring->nkr_num_slots;
3069 if (space + mbq_len(q) >= kring->nkr_num_slots - 1) { // XXX
3070 RD(10, "%s full hwcur %d hwtail %d qlen %d len %d m %p",
3071 na->name, kring->nr_hwcur, kring->nr_hwtail, mbq_len(q),
3075 ND(10, "%s %d bufs in queue len %d m %p",
3076 na->name, mbq_len(q), len, m);
3077 /* notify outside the lock */
3086 /* unconditionally wake up listeners */
3087 kring->nm_notify(kring, 0);
3088 /* this is normally netmap_notify(), but for nics
3089 * connected to a bridge it is netmap_bwrap_intr_notify(),
3090 * that possibly forwards the frames through the switch
3098 * netmap_reset() is called by the driver routines when reinitializing
3099 * a ring. The driver is in charge of locking to protect the kring.
3100 * If native netmap mode is not set just return NULL.
3101 * If native netmap mode is set, in particular, we have to set nr_mode to
3104 struct netmap_slot *
3105 netmap_reset(struct netmap_adapter *na, enum txrx tx, u_int n,
3108 struct netmap_kring *kring;
3111 if (!nm_native_on(na)) {
3112 ND("interface not in native netmap mode");
3113 return NULL; /* nothing to reinitialize */
3116 /* XXX note- in the new scheme, we are not guaranteed to be
3117 * under lock (e.g. when called on a device reset).
3118 * In this case, we should set a flag and do not trust too
3119 * much the values. In practice: TODO
3120 * - set a RESET flag somewhere in the kring
3121 * - do the processing in a conservative way
3122 * - let the *sync() fixup at the end.
3125 if (n >= na->num_tx_rings)
3128 kring = na->tx_rings + n;
3130 if (kring->nr_pending_mode == NKR_NETMAP_OFF) {
3131 kring->nr_mode = NKR_NETMAP_OFF;
3135 // XXX check whether we should use hwcur or rcur
3136 new_hwofs = kring->nr_hwcur - new_cur;
3138 if (n >= na->num_rx_rings)
3140 kring = na->rx_rings + n;
3142 if (kring->nr_pending_mode == NKR_NETMAP_OFF) {
3143 kring->nr_mode = NKR_NETMAP_OFF;
3147 new_hwofs = kring->nr_hwtail - new_cur;
3149 lim = kring->nkr_num_slots - 1;
3150 if (new_hwofs > lim)
3151 new_hwofs -= lim + 1;
3153 /* Always set the new offset value and realign the ring. */
3155 D("%s %s%d hwofs %d -> %d, hwtail %d -> %d",
3157 tx == NR_TX ? "TX" : "RX", n,
3158 kring->nkr_hwofs, new_hwofs,
3160 tx == NR_TX ? lim : kring->nr_hwtail);
3161 kring->nkr_hwofs = new_hwofs;
3163 kring->nr_hwtail = kring->nr_hwcur + lim;
3164 if (kring->nr_hwtail > lim)
3165 kring->nr_hwtail -= lim + 1;
3169 /* XXX check that the mappings are correct */
3170 /* need ring_nr, adapter->pdev, direction */
3171 buffer_info->dma = dma_map_single(&pdev->dev, addr, adapter->rx_buffer_len, DMA_FROM_DEVICE);
3172 if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) {
3173 D("error mapping rx netmap buffer %d", i);
3174 // XXX fix error handling
3179 * Wakeup on the individual and global selwait
3180 * We do the wakeup here, but the ring is not yet reconfigured.
3181 * However, we are under lock so there are no races.
3183 kring->nr_mode = NKR_NETMAP_ON;
3184 kring->nm_notify(kring, 0);
3185 return kring->ring->slot;
3190 * Dispatch rx/tx interrupts to the netmap rings.
3192 * "work_done" is non-null on the RX path, NULL for the TX path.
3193 * We rely on the OS to make sure that there is only one active
3194 * instance per queue, and that there is appropriate locking.
3196 * The 'notify' routine depends on what the ring is attached to.
3197 * - for a netmap file descriptor, do a selwakeup on the individual
3198 * waitqueue, plus one on the global one if needed
3199 * (see netmap_notify)
3200 * - for a nic connected to a switch, call the proper forwarding routine
3201 * (see netmap_bwrap_intr_notify)
3204 netmap_common_irq(struct netmap_adapter *na, u_int q, u_int *work_done)
3206 struct netmap_kring *kring;
3207 enum txrx t = (work_done ? NR_RX : NR_TX);
3209 q &= NETMAP_RING_MASK;
3211 if (netmap_verbose) {
3212 RD(5, "received %s queue %d", work_done ? "RX" : "TX" , q);
3215 if (q >= nma_get_nrings(na, t))
3216 return NM_IRQ_PASS; // not a physical queue
3218 kring = NMR(na, t) + q;
3220 if (kring->nr_mode == NKR_NETMAP_OFF) {
3225 kring->nr_kflags |= NKR_PENDINTR; // XXX atomic ?
3226 *work_done = 1; /* do not fire napi again */
3229 return kring->nm_notify(kring, 0);
3234 * Default functions to handle rx/tx interrupts from a physical device.
3235 * "work_done" is non-null on the RX path, NULL for the TX path.
3237 * If the card is not in netmap mode, simply return NM_IRQ_PASS,
3238 * so that the caller proceeds with regular processing.
3239 * Otherwise call netmap_common_irq().
3241 * If the card is connected to a netmap file descriptor,
3242 * do a selwakeup on the individual queue, plus one on the global one
3243 * if needed (multiqueue card _and_ there are multiqueue listeners),
3244 * and return NR_IRQ_COMPLETED.
3246 * Finally, if called on rx from an interface connected to a switch,
3247 * calls the proper forwarding routine.
3250 netmap_rx_irq(struct ifnet *ifp, u_int q, u_int *work_done)
3252 struct netmap_adapter *na = NA(ifp);
3255 * XXX emulated netmap mode sets NAF_SKIP_INTR so
3256 * we still use the regular driver even though the previous
3257 * check fails. It is unclear whether we should use
3258 * nm_native_on() here.
3260 if (!nm_netmap_on(na))
3263 if (na->na_flags & NAF_SKIP_INTR) {
3264 ND("use regular interrupt");
3268 return netmap_common_irq(na, q, work_done);
3273 * Module loader and unloader
3275 * netmap_init() creates the /dev/netmap device and initializes
3276 * all global variables. Returns 0 on success, errno on failure
3277 * (but there is no chance)
3279 * netmap_fini() destroys everything.
3282 static struct cdev *netmap_dev; /* /dev/netmap character device. */
3283 extern struct cdevsw netmap_cdevsw;
3290 destroy_dev(netmap_dev);
3291 /* we assume that there are no longer netmap users */
3293 netmap_uninit_bridges();
3296 printf("netmap: unloaded module.\n");
3307 error = netmap_mem_init();
3311 * MAKEDEV_ETERNAL_KLD avoids an expensive check on syscalls
3312 * when the module is compiled in.
3313 * XXX could use make_dev_credv() to get error number
3315 netmap_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD,
3316 &netmap_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0600,
3321 error = netmap_init_bridges();
3326 nm_os_vi_init_index();
3329 error = nm_os_ifnet_init();
3333 printf("netmap: loaded module\n");
3337 return (EINVAL); /* may be incorrect */