5 Notes on the internal structure of dummynet (2010 version)
6 by Riccardo Panicucci and Luigi Rizzo
7 Work supported by the EC project ONELAB2
13 Implementation of new dummynet
17 The reconfiguration routine
28 Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary
31 How to configure dummynet
32 How to implement a new scheduler
37 ------------------------------
38 20100131 deleting RR causes infinite loop
39 presumably in the rr_free_queue() call -- seems to hang
40 forever when deleting a live flow
41 ------------------------------
43 Dummynet is a traffic shaper and network emulator. Packets are
44 selected by an external filter such as ipfw, and passed to the emulator
45 with a tag such as "pipe 10" or "queue 5" which tells what to
46 do with the packet. As an example
48 ipfw add queue 5 icmp from 10.0.0.2 to all
50 All packets with the same tag belong to a "flowset", or a set
51 of flows which can be further partitioned according to a mask.
52 Flowsets are then passed to a scheduler for processing. The
53 association of flowsets and schedulers is configurable e.g.
55 ipfw queue 5 config sched 10 weight 3 flow_mask xxxx
56 ipfw queue 8 config sched 10 weight 1 ...
57 ipfw queue 3 config sched 20 weight 1 ...
59 "sched 10" represents one or more scheduler instances,
60 selected through a mask on the 5-tuple itself.
62 ipfw sched 20 config type FIFO sched_mask yyy ...
64 There are in fact two masks applied to each packet:
65 + the "sched_mask" sends packets arriving to a scheduler_id to
66 one of many instances.
67 + the "flow_mask" together with the flowset_id is used to
68 collect packets into independent flows on each scheduler.
70 As an example, we can have
71 ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff
72 ipfw sched 10 config type WF2Q+ sched_mask src-ip 0xffffff00
74 means that sched 10 will have one instance per /24 source subnet,
75 and within that, each individual source will be a flow.
79 Dummynet-related data is split into several data structures,
80 part of them constituting the userland-kernel API, and others
81 specific to the kernel.
82 NOTE: for up-to-date details please look at the relevant source
83 headers (ip_dummynet.h, ip_dn_private.h, dn_sched.h)
85 USERLAND-KERNEL API (ip_dummynet.h)
88 contains data about the physical link such as
89 bandwidth, delay, burst size;
92 describes a flowset, i.e. a template for queues.
93 Main parameters are the scheduler we attach to, a flow_mask,
94 buckets, queue size, plr, weight, and other scheduler-specific
98 contains information on a flow, including masks and
102 defines a scheduler (and a link attached to it).
103 Parameters include scheduler type, sched_mask, number of
104 buckets, and possibly other scheduler-specific parameters,
107 fields to simulate a delay profile
110 KERNEL REPRESENTATION (ip_dn_private.h)
113 a queue of mbufs with head and tail.
116 individual queue of packets, created by a flowset using
117 flow_mask and attached to a scheduler instance selected
119 A dn_queue has a pointer to the dn_fsk (which in turn counts
120 how many queues point to it), a pointer to the
121 dn_sch_inst it attaches to, and is in a hash table in the
122 flowset. scheduler instances also should store queues in
123 their own containers used for scheduling (lists, trees, etc.)
124 CREATE: done on packet arrivals when a flow matches a flowset.
125 DELETE: done only when deleting the parent dn_sch_inst
129 includes a dn_fs; a pointer to the dn_schk; a link field
130 for the list of dn_fsk attached to the same scheduler,
131 or for the unlinked list;
132 a refcount for the number of queues pointing to it;
133 The dn_fsk is in a hash table, fshash.
134 CREATE: done on configuration commands.
135 DELETE: on configuration commands.
138 a scheduler instance, created from a dn_schk applying sched_mask.
139 Contains a delay line, a reference to the parent, and scheduler-
140 specific info. Both dn_sch_inst and its delay line can be in the
141 evheap if they have events to be processed.
142 CREATE: created from a dn_schk applying sched_mask
143 DELETE: configuration command delete a scheduler which in turn
144 sweeps the hash table of instances deleting them
147 includes dn_sch, dn_link, a pointer to dn_profile,
148 a hash table of dn_sch_inst, a list of dn_fsk
150 CREATE: configuration command. If there are flowsets that
151 refer to this number, they are attached and moved
153 DELETE: manual, see dn_sch_inst
157 +---------------+ sched +--------------+
158 | sched-------------------->| NEW_SCHK|
159 -<----*sch_chain |<-----------------*fsk_list |
160 |NEW_FSK |<----. | [dn_link] |
161 +---------------+ | +--------------+
162 |qht (hash) | | | siht(hash) |
163 | [dn_queue] | | | [dn_si] |
164 | [dn_queue] | | | [dn_si] |
166 | +--------+ | | | +---------+ |
167 | |dn_queue| | | | |dn_si | |
168 | | fs *----------' | | | |
169 | | si *---------------------->| | |
170 | +---------+ | | +---------+ |
171 +---------------+ +--------------+
173 The following global data structures contain all
174 schedulers and flowsets.
176 - schedhash[x]: contains all scheduler templates in the system.
177 Looked up only on manual configurations, where flowsets
178 are attached to matching schedulers.
179 We have one entry per 'sched X config' command
180 (plus one for each 'pipe X config').
182 - fshash[x]: contains all flowsets.
183 We do a lookup on this for each packet.
184 We have one entry for each 'queue X config'
185 (plus one for each 'pipe X config').
187 Additionally, a list that contains all unlinked flowset:
188 - fsu: contains flowset that are not linked with any scheduler.
189 Flowset are put in this list when they refer to a non
191 We don't need an efficient data structure as we never search
192 here on a packet arrivals.
194 Scheduler instances and the delay lines associated with each scheduler
195 instance need to be woken up at certain times. Because we have many
196 such objects, we keep them in a priority heap (system_heap).
198 Almost all objects in this implementation are preceded by a structure
199 (struct dn_id) which makes it easier to identify them.
204 The dummynet code is split in several files.
205 All kernel code is in sys/netpfil/ipfw except ip_dummynet.h
206 All userland code is in sbin/ipfw.
208 - sys/netpfil/ip_dummynet.h defines the kernel-userland API
209 - ip_dn_private.h contains the kernel-specific APIs
211 - dn_sched.h defines the scheduler API
212 - ip_dummynet.c cointains module glue and sockopt handlers, with all
213 functions to configure and list objects.
214 - ip_dn_io.c contains the functions directly related to packet processing,
215 and run in the critical path. It also contains some functions
216 exported to the schedulers.
217 - dn_heap.[ch] implement a binary heap and a generic hash table
218 - dn_sched_* implement the various scheduler modules
220 - dummynet.c is the file used to implement the user side of dummynet.
221 It contains the function to parsing command line, and functions to
222 show the output of dummynet objects.
223 Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that
224 are used to allow compatibility with the "ipfw" binary from FreeBSD 7.2 and
229 At the moment the entire processing occurs under a single lock
230 which is expected to be acquired in exclusive mode
231 DN_BH_WLOCK() / DN_BH_WUNLOCK().
233 In perspective we aim at the following:
234 - the 'busy' flag, 'pending' list and all structures modified by packet
235 arrivals and departures are protected by the BH_WLOCK.
236 This is normally acquired in exclusive mode by the packet processing
237 functions for short sections of code (exception -- the timer).
238 If 'busy' is not set, we can do regular packet processing.
239 If 'busy' is set, no pieces can be accessed.
240 We must enqueue the packet on 'pending' and return immediately.
242 - the 'busy' flag is set/cleared by long sections of code as follows:
243 UH_WLOCK(); KASSERT(busy == 0);
244 BH_WLOCK(); busy=1; BH_WUNLOCK();
245 ... do processing ...
246 BH_WLOCK(); busy=0; drain_queue(pending); BH_WUNLOCK();
248 this normally happens when the upper half has something heavy
249 to do. The prologue and epilogue are not in the critical path.
251 - the main containers (fshash, schedhash, ...) are protected by
256 A packet enters dummynet through dummynet_io(). We first lookup
257 the flowset number in fshash using dn_ht_find(), then find the scheduler
258 instance using ipdn_si_find(), then possibly identify the correct
259 queue with ipdn_q_find().
260 If successful, we call the scheduler's enqueue function(), and
261 if needed start I/O on the link calling serve_sched().
262 If the packet can be returned immediately, this is done by
263 leaving *m0 set. Otherwise, the packet is absorbed by dummynet
264 and we simply return, possibly with some appropriate error code.
268 Reconfiguration is the complex part of the system because we need to
269 keep track of the various objects and containers.
270 At the moment we do not use reference counts for objects so all
271 processing must be done under a lock.
273 The main entry points for configuration is the ip_dn_ctl() handler
274 for the IP_DUMMYNET3 sockopt (others are provided only for backward
275 compatibility). Modifications to the configuration call do_config().
276 The argument is a sequence of blocks each starting with a struct dn_id
277 which specifies its content.
278 The first dn_id must contain as obj.id the DN_API_VERSION
279 The obj.type is DN_CMD_CONFIG (followed by actual objects),
280 DN_CMD_DELETE (with the correct subtype and list of objects), or
283 DN_CMD_CONFIG is followed by objects to add/reconfigure. In general,
284 if an object already exists it is reconfigured, otherwise it is
285 created in a way that keeps the structure consistent.
286 We have the following objects in the system, normally numbered with
287 an identifier N between 1 and 65535. For certain objects we have
288 "shadow" copies numbered I+NMAX and I+ 2*NMAX which are used to
289 implement certain backward compatibility features.
291 In general we have the following linking
293 TRADITIONAL DUMMYNET QUEUES "queue N config ... pipe M ..."
294 corresponds to a dn_fs object numbered N
296 TRADITIONAL DUMMYNET PIPES "pipe N config ..."
297 dn_fs N+2*NMAX --> dn_sch N+NMAX type FIFO --> dn_link N+NMAX
299 GENERIC SCHEDULER "sched N config ... "
300 [dn_fs N+NMAX] --> dn_sch N --> dn_link N
301 The flowset N+NMAX is created only if the scheduler is not
304 DELAY PROFILE "pipe N config profile ..."
305 it is always attached to an existing dn_link N
307 Because traditional dummynet pipes actually configure both a
308 'standalone' instance and one that can be used by queues,
311 "pipe N config ..." configures:
312 dn_sched N type WF2Q+
313 dn_sched N+NMAX type FIFO
314 dn_fs N+2NMAX attached to dn_sched N+NMAX
318 "queue N config" configures
321 "sched N config" configures
322 dn_sched N type as desired
323 dn_fs N+NMAX attached to dn_sched N
328 The dummynet_task() function is the main dummynet processing function and is
329 called every tick. This function first calculate the new current time, then
330 it checks if it is the time to wake up object from the system_heap comparing
331 the current time and the key of the heap. Two types of object (really the
332 heap contains pointer to objects) are in the
335 - scheduler instance: if a scheduler instance is waked up, the dequeue()
336 function is called until it has credit. If the dequeue() returns packets,
337 the scheduler instance is inserted in the heap with a new key depending of
338 the data that will be send out. If the scheduler instance remains with
339 some credit, it means that is hasn't other packet to send and so the
340 instance is no longer inserted in the heap.
342 If the scheduler instance extracted from the heap has the DELETE flag set,
343 the dequeue() is not called and the instance is destroyed now.
345 - delay line: when extracting a delay line, the function transmit_event() is
346 called to send out packet from delay line.
348 If the scheduler instance associated with this delay line doesn't exists,
349 the delay line will be delete now.
353 To create a pipe, queue or scheduler, the user should type commands like:
355 "ipfw queue y config pipe x"
356 "ipfw pipe x config sched <type>"
358 The userland side of dummynet will prepare a buffer contains data to pass to
360 The buffer contains all struct needed to configure an object. In more detail,
361 to configure a pipe all three structs (dn_link, dn_sch, dn_fs) are needed,
362 plus the delay profile struct if the pipe has a delay profile.
364 If configuring a scheduler only the struct dn_sch is wrote in the buffer,
365 while if configuring a flowset only the dn_fs struct is wrote.
367 The first struct in the buffer contains the type of command request, that is
368 if it is configuring a pipe, a queue, or a scheduler. Then there are structs
369 need to configure the object, and finally there is the struct that mark
370 the end of the buffer.
372 To support the insertion of pipe and queue using the old syntax, when adding
373 a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which
374 have a number x + DN_PIPEOFFSET.
378 A pipe is only a template for a link.
379 If the pipe already exists, parameters are updated. If a delay profile exists
380 it is deleted and a new one is created.
381 If the pipe doesn't exist a new one is created. After the creation, the
382 flowset unlinked list is scanned to see if there are some flowset that would
383 be linked with this pipe. If so, these flowset will be of wf2q+ type (for
384 compatibility) and a new wf2q+ scheduler is created now.
388 If the scheduler already exists, and the type and the mask are the same, the
389 scheduler is simply reconfigured calling the config_scheduler() scheduler
390 function with the RECONFIGURE flag active.
391 If the type or the mask differ, it is necessary to delete the old scheduler
392 and create a new one.
393 If the scheduler doesn't exists, a new one is created. If the scheduler has
394 a mask, the hash table is created to store pointers to scheduler instances.
395 When a new scheduler is created, it is necessary to scan the unlinked
396 flowset list to search eventually flowset that would be linked with this
397 scheduler number. If some are found, flowsets became of the type of this
398 scheduler and they are configured properly.
402 Flowset pointers are store in the system in two list. The unlinked flowset list
403 contains all flowset that aren't linked with a scheduler, the flowset list
404 contains flowset linked to a scheduler, and so they have a type.
405 When adding a new flowset, first it is checked if the flowset exists (that is,
406 it is in the flowset list) and if it doesn't exists a new flowset is created
407 and added to unlinked flowset list if the scheduler which the flowset would be
408 linked doesn't exists, or added in the flowset list and configured properly if
409 the scheduler exists. If the flowset (before to be created) was in the
410 unlinked flowset list, it is removed and deleted, and then recreated.
411 If the flowset exists, to allow reconfiguration of this flowset, the
412 scheduler number and types must match with the one in memory. If this isn't
413 so, the flowset is deleted and a new one will be created. Really, the flowset
414 it isn't deleted now, but it is removed from flowset list and it will be
415 deleted later because there could be some queues that are using it.
419 The user can request a list of object present in dummynet through the command
420 "ipfw [-v] pipe|queue [x] list|show"
421 The kernel side of dummynet send a buffer to user side that contains all
422 pipe, all scheduler, all flowset, plus all scheduler instances and all queues.
423 The dummynet user land will format the output and show only the relevant
425 The buffer sent start with all pipe from the system. The entire struct dn_link
426 is passed, except the delay_profile struct that is useless in user space.
427 After pipes, all flowset are wrote in the buffer. The struct contains
428 scheduler flowset specific data is linked with the flowset writing the
429 'obj' id of the extension into the 'alg_fs' pointer.
430 Then schedulers are wrote. If a scheduler has one or more scheduler instance,
431 these are linked to the parent scheduler writing the id of the parent in the
432 'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in
433 the buffer and linked thorugh the 'obj' and 'sched_inst' pointer.
434 Finally, flowsets in the unlinked flowset list are write in the buffer, and
435 then a struct gen in saved in the buffer to mark the last struct in the buffer.
440 An object is usually removed by user through a command like
441 "ipfw pipe|queue x delete". XXX sched?
442 ipfw pass to the kernel a struct gen that contains the type and the number
443 of the object to remove
447 A pipe can be deleted by the user through the command 'ipfw pipe x delete'.
448 To delete a pipe, the pipe is removed from the pipe list, and then deleted.
449 Also the scheduler associated with this pipe should be deleted.
450 For compatibility with old dummynet syntax, the associated FIFO scheduler and
451 FIFO flowset must be deleted.
455 To remove a flowset, we must be sure that is no longer referenced by any object.
456 If the flowset to remove is in the unlinked flowset list, there is not any
457 issue, the flowset can be safely removed calling a free() (the flowset
458 extension is not yet created if the flowset is in this list).
459 If the flowset is in the flowset list, first we remove from it so new packet
460 are discarded when arrive. Next, the flowset is marked as delete.
461 Now we must check if some queue is using this flowset.
462 To do this, a counter (active_f) is provided. This counter indicate how many
463 queues exist using this flowset.
464 The active_f counter is automatically incremented when a queue is created
465 and decremented when a queue is deleted.
466 If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs()
467 scheduler function is called before deallocate memory.
468 If the counter is not 0, the flowset remain in memory until the counter become
469 zero. When a queue is delete (by dn_delete_queue() function) it is checked if
470 the linked flowset is deleting and if so the counter is decrementing. If the
471 counter reaches 0, the flowset is deleted.
472 The deletion of a queue can be done only by the scheduler, or when the scheduler
475 Delete of scheduler x
476 ---------------------
477 To delete a scheduler we must be sure that any scheduler instance of this type
478 are in the system_heap. To do so, a counter (inst_counter) is provided.
479 This counter is managed by the system: it is incremented every time it is
480 inserted in the system_heap, and decremented every time it is extracted from it.
481 To delete the scheduler, first we remove it from the scheduler list, so new
482 packet are discarded when they arrive, and mark the scheduler as deleting.
484 If the counter is 0, we can remove the scheduler safely calling the
485 really_deletescheduler() function. This function will scan all scheduler
486 instances and call the delete_scheduler_instance() function that will delete
487 the instance. When all instance are deleted, the scheduler template is
488 deleted calling the delete_scheduler_template(). If the delay line associate
489 with the scheduler is empty, it is deleted now, else it will be deleted when
491 If the counter was not 0, we wait for it. Every time the dummynet_task()
492 function extract a scheduler from the system_heap, the counter is decremented.
493 If the scheduler has the delete flag enabled the dequeue() is not called and
494 delete_scheduler_instance() is called to delete the instance.
495 Obviously this scheduler instance is no longer inserted in the system_heap.
496 If the counter reaches 0, the delete_scheduler_template() function is called
497 all memory is released.
498 NOTE: Flowsets that belong to this scheduler are not deleted, so if a new
499 scheduler with the same number is inserted will use these flowsets.
500 To do so, the best approach would be insert these flowset in the
501 unlinked flowset list, but doing this now will be very expensive.
502 So flowsets will remain in memory and linked with a scheduler that no
503 longer exists until a packet belonging to this flowset arrives. When
504 this packet arrives, the reconfigure() function is called because the
505 generation number mismatch with one contains in the flowset and so
506 the flowset will be moved into the flowset unlinked list, or will be
507 linked with the new scheduler if a new one was created.
510 COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY
511 ==========================================================
512 Dummynet is not compatible with old ipfw binary because internal structs are
513 changed. Moreover, the old ipfw binary is not compatible with new kernels
514 because the struct that represents a firewall rule has changed. So, if a user
515 install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other
516 commands) will not work.
517 New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get.
518 The old option can be used to allow compatibility with the 'ipfw' binary of
519 older version (tested with 7.2 and 8.0) of FreeBSD.
520 Two file are provided for this purpose:
521 - ip_dummynet_glue.c translates old dummynet requests to the new ones,
522 - ip_fw_glue.c converts the rule format between 7.2 and 8 versions.
523 Let see in detail these two files.
527 The internal structs of new dummynet are very different from the original.
528 Because of there are some difference from between dummynet in FreeBSD 7.2 and
529 dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay
530 profile and burst option), I have to include both header files. I copied
531 the revision 191715 (for version 7.2) and the revision 196045 (for version 8)
532 and I appended a number to each struct to mark them.
534 The main function of this file is ip_dummynet_compat() that is called by
535 ip_dn_ctl() when it receive a request of old socket option.
537 A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using.
538 This variable is set every time a request of configuration is done, because
539 with this request we receive a buffer of which size depending of ipfw version.
540 Because of in general the first action is a configuration, this variable is
541 usually set accordly. If the first action is a request of listing of pipes
542 or queues, the system cannot know the version of ipfw, and we suppose that
543 version 7.2 is used. If version is wrong, the output can be senseless, but
544 the application should not crash.
546 There are four request for old dummynet:
547 - IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the
548 dummynet_flush() function is called;
549 - IP_DUMMYNET_DEL: the delete option need to be translate.
550 It is only necessary to extract the number and the type of the object
551 (pipe or queue) to delete from the buffer received and build a new struct
552 gen contains the right parameters, then call the delete_object() function;
553 - IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of
554 the ipfw version. After the properly extraction of all data, that depends
555 by the ipfw version used, new structures are filled and then the dummynet
556 config_link() function is properly called. Note that the 7.2 version does
557 not support some parameter as burst or delay profile.
558 - IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer
559 depending of its version. There are two function that build the
560 corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These
561 functions reproduce the buffer exactly as 'ipfw' expect. The only difference
562 is that the weight parameter for a queue is no longer sent by dummynet and so
564 Moreover, because of the internal structure has changed, the bucket size
565 of a queue could not be correct, because now all flowset share the hash
567 If the version of ipfw is wrong, the output could be senseless or truncated,
568 but the application should not crash.
572 The ipfw binary also is used to add rules to FreeBSD firewall. Because of the
573 struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary
574 to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel
575 provided with FreeBSD 8.
576 This file contains two functions to convert a rule from FreeBSD 7.2 format to
577 FreeBSD 8 format, and viceversa.
578 The conversion should be done when a rule passes from userspace to kernel space
580 I have to modify the ip_fw2.c file to manage these two case, and added a
581 variable (is7) to store the ipfw version used, using an approach like the
583 - when a new rule is added (option IP_FW_ADD) the is7 variable is set if the
584 size of the rule received correspond to FreeBSD 7.2 ipfw version. If so, the
585 rule is converted to version 8 calling the function convert_rule_to_8().
586 Moreover, after the insertion of the rule, the rule is now reconverted to
587 version 7 because the ipfw binary will print it.
588 - when the user request a list of rules (option IP_FW_GET) the is7 variable
589 should be set correctly because we suppose that a configure command was done,
590 else we suppose that the FreeBSD version is 8. The function ipfw_getrules()
591 in ip_fw2.c file return all rules, eventually converted to version 7 (if
592 the is7 is set) to the ipfw binary.
593 The conversion of a rule is quite simple. The only difference between the
594 two structures (struct ip_fw) is that in the new there is a new field
595 (uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in
596 the right position in the new (or old) struct. The size of commands are not
597 changed, and the copy is done into a cicle.
599 How to configure dummynet
600 =========================
601 It is possible to configure dummynet through two main commands:
602 'ipfw pipe' and 'ipfw queue'.
603 To allow compatibility with old version, it is possible configure dummynet
604 using the old command syntax. Doing so, obviously, it is only possible to
605 configure a FIFO scheduler or a wf2q+ scheduler.
606 A new command, 'ipfw pipe x config sched <type>' is supported to add a new
607 scheduler to the system.
609 - ipfw pipe x config ...
610 create a new pipe with the link parameters
611 create a new scheduler fifo (x + offset)
612 create a new flowset fifo (x + offset)
613 the mask is eventually stored in the FIFO scheduler
615 - ipfw queue y config pipe x ...
616 create a new flowset y linked to sched x.
617 The type of flowset depends by the specified scheduler.
618 If the scheduler does not exist, this flowset is inserted in a special
619 list and will be not active.
620 If pipe x exists and sched does not exist, a new wf2q+ scheduler is
621 created and the flowset will be linked to this new scheduler (this is
622 done for compatibility with old syntax).
624 - ipfw pipe x config sched <type> ...
625 create a new scheduler x of type <type>.
626 Search into the flowset unlinked list if there are some flowset that
627 should be linked with this new scheduler.
631 delete the scheduler fifo (x + offset)
632 delete the scheduler x
633 delete the flowset fifo (x + offset)
635 - ipfw queue x delete
638 - ipfw sched x delete ///XXX
639 delete the scheduler x
641 Follow now some examples to how configure dummynet:
643 ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay
644 A FIFO flowset and scheduler is
646 ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset
647 will be of wf2q+ because a pipe 10
648 exists. Moreover, the wf2q+
649 scheduler is created now.
651 ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10
652 does not exist, so this flowset
653 is inserted in the unlinked
655 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
656 Because of a flowset with 'pipe 10' exists,
657 a wf2q+ scheduler is created now and that
658 flowset is linked with this sceduler.
661 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
662 ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
664 ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset
665 will belong to scheduler 10 and
669 ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
670 pipe 10 (not exist yet)
671 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
672 ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset
673 will belong to scheduler 10 and
675 ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It
676 becomes a wf2q+ scheduler.
677 When a new packet of flowset 5 arrives,
678 the flowset 5 becomes to wf2q+ type.
680 How to implement a new scheduler
681 ================================
682 In dummynet, a scheduler algorithm is represented by two main structs, some
683 functions and other minor structs.
684 - A struct dn_sch_xyz (where xyz is the 'type' of scheduler algorithm
685 implemented) contains data relative to scheduler, as global parameter that
686 are common to all instances of the scheduler
687 - A struct dn_sch_inst_xyz contains data relative to a single scheduler
688 instance, as local status variable depending for example by flows that
689 are linked with the scheduler, and so on.
690 To add a scheduler to dummynet, the user should type a command like:
691 'ipfw pipe x config sched <type> [mask ... ...]'
692 This command creates a new struct dn_sch_xyz of type <type>, and
693 store the optional parameter in that struct.
695 The parameter mask determines how many scheduler instance of this
696 scheduler may exist. For example, it is possible to divide traffic
697 depending on the source port (or destination, or ip address...),
698 so that every scheduler instance act as an independent scheduler.
699 If the mask is not set, all traffic goes to the same instance.
701 When a packet arrives to a scheduler, the system search the corrected
702 scheduler instance, and if it does not exist it is created now (the
703 struct dn_sch_inst_xyz is allocated by the system, and the scheduler
704 fills the field correctly). It is a task of the scheduler to create
705 the struct that contains all queues for a scheduler instance.
706 Dummynet provides some function to create an hash table to store
707 queues, but the schedule algorithm can choice the own struct.
709 To link a flow to a scheduler, the user should type a command like:
710 'ipfw queue z config pipe x [mask... ...]'
712 This command creates a new 'dn_fs' struct that will be inserted
713 in the system. If the scheduler x exists, this flowset will be
714 linked to that scheduler and the flowset type become the same as
715 the scheduler type. At this point, the function create_alg_fs_xyz()
716 is called to allow store eventually parameter for the flowset that
717 depend by scheduler (for example the 'weight' parameter for a wf2q+
718 scheduler, or some priority...). A parameter mask can be used for
719 a flowset. If the mask parameter is set, the scheduler instance can
720 separate packet according to its flow id (src and dst ip, ports...)
721 and assign it to a separate queue. This is done by the scheduler,
722 so it can ignore the mask if it wants.
724 See now the two main structs:
726 struct gen g; /* important the name g */
729 struct dn_sch_inst_xyz {
730 struct gen g; /* important the name g */
731 /* params of the instance */
733 It is important to embed the struct gen as first parameter. The struct gen
734 contains some values that the scheduler instance must fill (the 'type' of
735 scheduler, the 'len' of the struct...)
736 The function create_scheduler_xyz() should be implemented to initialize global
737 parameters in the first struct, and if memory allocation is done it is
738 mandatory to implement the delete_scheduler_template() function to free that
740 The function create_scheduler_instance_xyz() must be implemented even if the
741 scheduler instance does not use extra parameters. In this function the struct
742 gen fields must be filled with corrected infos. The
743 delete_scheduler_instance_xyz() function must bu implemented if the instance
744 has allocated some memory in the previous function.
746 To store data belonging to a flowset the follow struct is used:
749 /* fill correctly the gen struct
751 g.len = sizeof(struct alg_fs_xyz)
754 /* params for the flow */
756 The create_alg_fs_xyz() function is mandatory, because it must fill the struct
757 gen, but the delete_alg_fs_xyz() is mandatory only if the previous function
758 has allocated some memory.
760 A struct dn_queue contains packets belonging to a queue and some statistical
761 data. The scheduler could have to store data in this struct, so it must define
762 a dn_queue_xyz struct:
763 struct dn_queue_xyz {
765 /* parameter for a queue */
768 All structures are allocated by the system. To do so, the scheduler must
769 set the size of its structs in the scheduler descriptor:
770 scheduler_size: sizeof(dn_sch_xyz)
771 scheduler_i_size: sizeof(dn_sch_inst_xyz)
772 flowset_size: sizeof(alg_fs_xyz)
773 queue_size: sizeof(dn_queue_xyz);
774 The scheduler_size could be 0, but other struct must have at least a struct gen.
777 After the definition of structs, it is necessary to implement the
780 - int (*config_scheduler)(char *command, void *sch, int reconfigure);
781 Configure a scheduler, or reconfigure if 'reconfigure' == 1.
782 This function performs additional allocation and initialization of global
783 parameter for this scheduler.
784 If memory is allocated here, the delete_scheduler_template() function
785 should be implemented to remove this memory.
786 - int (*delete_scheduler_template)(void* sch);
787 Delete a scheduler template. This function is mandatory if the scheduler
788 uses extra data respect the struct dn_sch.
789 - int (*create_scheduler_instance)(void *s);
790 Create a new scheduler instance. The system allocate the necessary memory
791 and the schedulet can access it using the 's' pointer.
792 The scheduler instance stores all queues, and to do this can use the
793 hash table provided by the system.
794 - int (*delete_scheduler_instance)(void *s);
795 Delete a scheduler instance. It is important to free memory allocated
796 by create_scheduler_instance() function. The memory allocated by system
797 is freed by the system itself. The struct contains all queue also has
799 - int (*enqueue)(void *s, struct gen *f, struct mbuf *m,
800 struct ipfw_flow_id *id);
801 Called when a packet arrives. The packet 'm' belongs to the scheduler
802 instance 's', has a flowset 'f' and the flowid 'id' has already been
803 masked. The enqueue() must call dn_queue_packet(q, m) function to really
804 enqueue packet in the queue q. The queue 'q' is chosen by the scheduler
805 and if it does not exist should be created calling the dn_create_queue()
806 function. If the schedule want to drop the packet, it must call the
807 dn_drop_packet() function and then return 1.
808 - struct mbuf * (*dequeue)(void *s);
809 Called when the timer expires (or when a packet arrives and the scheduler
811 This function is called when at least a packet can be send out. The
812 scheduler choices the packet and returns it; if no packet are in the
813 schedulerinstance, the function must return NULL.
814 Before return a packet, it is important to call the function
815 dn_return_packet() to update some statistic of the queue and update the
817 - int (*drain_queue)(void *s, int flag);
818 The system request to scheduler to delete all queues that is not using
819 to free memory. The flag parameter indicate if a queue must be deleted
820 even if it is active.
822 - int (*create_alg_fs)(char *command, struct gen *g, int reconfigure);
823 It is called when a flowset is linked with a scheduler. This is done
824 when the scheduler is defined, so we can know the type of flowset.
825 The function initialize the flowset paramenter parsing the command
826 line. The parameter will be stored in the g struct that have the right
827 size allocated by the system. If the reconfigure flag is set, it means
828 that the flowset is reconfiguring
829 - int (*delete_alg_fs)(struct gen *f);
830 It is called when a flowset is deleting. Must remove the memory allocate
831 by the create_alg_fs() function.
833 - int (*create_queue_alg)(struct dn_queue *q, struct gen *f);
834 Called when a queue is created. The function should link the queue
835 to the struct used by the scheduler instance to store all queues.
836 - int (*delete_queue_alg)(struct dn_queue *q);
837 Called when a queue is deleting. The function should remove extra data
838 and update the struct contains all queues in the scheduler instance.
840 The struct scheduler represent the scheduler descriptor that is passed to
841 dummynet when a scheduler module is loaded.
842 This struct contains the type of scheduler, the length of all structs and
843 all function pointers.
844 If a function is not implemented should be initialize to NULL. Some functions
845 are mandatory, other are mandatory if some memory should be freed.
847 - create_scheduler_instance()
855 Mandatory functions if the corresponding create...() has allocated memory:
856 - delete_scheduler_template()
857 - delete_scheduler_instance()