1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2 <html xmlns="http://www.w3.org/1999/xhtml">
4 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
5 <link href="../style.css" rel="stylesheet" type="text/css" />
6 <title>LLDB Homepage</title>
10 <div class="www_title">
11 <strong>LLDB</strong> Data Formatters Architecture
17 <!--#include virtual="../sidebar.incl"-->
21 <h1 class ="postheader">Bird's eye view</h1>
22 <div class="postcontent">
23 <p>The LLDB data formatters subsystem is used to allow the debugger as well as the end-users to customize the way
24 their variables look upon inspection in the user interface (be it the command line tool, or one of the several
25 GUIs that are backed by LLDB)
26 <p>To this aim, they are hooked into the ValueObjects model, in order to provide entry points through which such customization
27 questions can be answered as <i>what format should this number be printed as?</i>, <i>how many child elements does this
28 std::vector have?</i> and more along those lines
29 <p>The architecture of the subsystem is layered, with the highest level layer being the user visible interaction features
30 (e.g. the "type ***" commands, the SB classes, ...). Other layers of interest that will be analyzed in this document include
32 <li>Classes implementing individual data formatter types</li>
33 <li>Classes implementing formatters navigation, discovery and categorization</li>
34 <li>The FormatManager layer</li>
35 <li>The DataVisualization layer</li>
36 <li>The SWIG LLDB <---> communication layer</li>
39 <div class="postfooter"></div>
42 <h1 class ="postheader">Data formatter types</h1>
43 <div class="postcontent">
44 <p> As described in the user documentation, there are four types of formatters
49 <li>synthetic children</li>
51 <p>Architecturally, these are implemented by classes in the source/DataFormatters/ folder<br/>
52 Formatters have descriptor classes, Type*Impl, which contain at least a "Flags" nested object, which contains both rules to be used
53 by the matching algorithm (e.g. should the formatter for type Foo apply to a Foo*?) or rules to be used
54 by the formatter itself (e.g. is this summary a oneliner?)
55 <p>Individual formatter descriptor classes then also contain data items useful to them for performing their functionality.
56 For instance TypeFormatImpl (backing formats) contains an lldb::Format that is the format to then be applied
57 were this formatter to be selected. Upon issuing a "type format add", a new TypeFormatImpl is created that wraps
58 the user-specified format, and matching options:<br/><br/>
59 <code>entry.reset(new TypeFormatImpl(format,
60 TypeFormatImpl::Flags().SetCascades(m_command_options.m_cascade).
61 SetSkipPointers(m_command_options.m_skip_pointers).
62 SetSkipReferences(m_command_options.m_skip_references)));</code><br/><br/>
63 <p>While formats are fairly simple and only implemented by one class, the other formatter types are backed by a class hierarchy
64 <p>Summaries, for instance, can exist in one of three "flavors":
66 <li>summary strings</li>
67 <li>Python script</li>
70 <p>The base class for summaries, TypeSummaryImpl, is a pure virtual class that wraps, again, the Flags, and exports among others a
73 FormatObject (ValueObject *valobj,
74 std::string& dest) = 0;
76 <p>This is the core entry point, which allows subclasses to specify their mode of operation
77 <p>StringSummaryFormat, which is the class that implements summary strings, does a check as to whether
78 the summary is a one-liner, and if not, then uses its stored summary string to call into
79 Debugger::FormatPrompt, and obtain a string back, which it returns in dest as the resulting summary
80 <p>For a Python summary, implemented in ScriptSummaryFormat, FormatObject() calls into the ScriptInterpreter
81 which is supposed to hold the knowledge on how to bridge back and forth with the scripting language
82 (Python in the case of LLDB) in order to produce a valid string. Implementors of new ScriptInterpreters for other
83 languages are expected to provide a GetScriptedSummary() entrypoint for this purpose, if they desire to allow
84 users to provide formatters in the new language
85 <p> Lastly, C++ summaries (CXXFunctionSummaryFormat), wrap a function pointer and call into it to execute their duty.
86 It should be noted that there are no facilities for users to interact with C++ formatters, and as such they are extremely
87 opaque, effectively being a thin wrapper between plain function pointers and the LLDB formatters subsystem.<br/>
88 Also, dynamic loading of C++ formatters in LLDB is currently not implemented, and as such it is safe and reasonable
89 for these formatters to deal with internal ValueObjects instances instead of public SBValue objects
90 <p>An interesting data point is that summaries are expected to be stateless. While at the Python layer they are handed
91 an SBValue (since nothing else could be visible for scripts), it is not expected that the SBValue should be cached
92 and reused - any and all caching occurs on the LLDB side, completely transparent to the formatter itself<br/><br/><br/>
93 <p>The design of synthetic children is somewhat more intricate, due to them being stateful objects.<br/>
94 The core idea of the design is that synthetic children act like a two-tier model, in which there is a <i>backend</i>
95 dataset (the underlying unformatted ValueObject), and an higher level view (<i>frontend</i>) which vends the computed
97 <p>To implement a new type of synthetic children one would implement a subclass of SyntheticChildren, which akin to the TypeFormatImpl,
98 contains Flags for matching, and data items to be used for formatting. For instance, TypeFilterImpl (which implements filters),
99 stores the list of expression paths of the children to be displayed. <br/>Filters are themselves synthetic children. Since all they
100 do is provide child values for a ValueObject, it does not truly matter whether these come from the real set of children or are
101 crafted through some intricate algorithm. As such, they perfectly fit within the realm of synthetic children and are only
102 shown as separate entities for user friendliness (to a user, picking a subset of elements to be shown with relative ease is a
103 valuable task, and they should not be concerned with writing scripts to do so)
104 <p>Once the descriptor of the synthetic children has been coded, in order to hook it up, one has to implement a subclass of
105 SyntheticChildrenFrontEnd. For a given type of synthetic children, there is a deep coupling with the matching front-end class,
106 given that the front-end usually needs data stored in the descriptor (e.g. a filter needs the list of child elements)
107 <p>The front-end answers the interesting questions that are the true <i>raison d'ĂȘtre</i> of synthetic children:
113 CalculateNumChildren () = 0;
116 virtual lldb::ValueObjectSP
117 GetChildAtIndex (size_t idx) = 0;
121 GetIndexOfChildWithName (const ConstString &name) = 0;
129 MightHaveChildren () = 0;
133 <p> Synthetic children providers (their front-ends) will be queried by LLDB for a number of children, and then for each of them
134 as necessary, they should be prepared to return a ValueObject describing the child. They might also be asked to provide a
135 name-to-index mapping (e.g. to allow LLDB to resolve queries like <code>myFoo.myChild</code>)<br/>
136 Update() and MightHaveChildren() are described in the user documentation, and they mostly serve bookkeeping purposes
137 <p>LLDB provides three kinds of synthetic children: filters, scripted synthetics, and the native C++ providers<br/>
138 Filters are implemented by TypeFilterImpl/TypeFilterImpl::FrontEnd<br/><br/>
139 Scripted synthetics are implemented by ScriptedSyntheticChildren/ScriptedSyntheticChildren::FrontEnd, plus
140 a set of callbacks provided by the ScriptInterpteter infrastructure to allow LLDB to pass the front-end queries
141 down to the scripting languages<br/><br/>
142 As for C++ native synthetics, there is a CXXSyntheticChildren, but no corresponding FrontEnd class. The reason for this design is
143 that CXXSyntheticChildren store a callback to a creator function, which is responsible for providing a FrontEnd.
144 Each individual formatter (e.g. LibstdcppMapIteratorSyntheticFrontEnd, NSDictionaryMSyntheticFrontEnd, ...) is a standalone
145 frontend, and once created retains to relation to its underlying SyntheticChildren object
146 <p>On a ValueObject level, upon being asked to generate synthetic children for a ValueObject, LLDB spawns a ValueObjectSynthetic object
147 which is a subclass of ValueObject. Building upon the ValueObject infrastructure, it stores a backend, and a shared pointer to
148 the SyntheticChildren. <br/>
149 Upon being asked queries about children, it will use the SyntheticChildren to generate a front-end for itself
150 and will let the front-end answer questions. The reason for not storing the FrontEnd itself is that there is no guarantee that across
151 updates, the same FrontEnd will be used over and over (e.g. a SyntheticChildren object could serve an entire class hierarchy
152 and vend different frontends for different subclasses)
154 <div class="postfooter"></div>
157 <h1 class ="postheader">Formatters matching</h1>
158 <div class="postcontent">
159 <p>The problem of formatters matching is going from
160 "I have a ValueObject" to "these are the formatters to be used for it"<br/>
161 There is a rather intricate set of user rules that are involved, and a rather intricate implementation of this model. All of these
162 relate to the type of the ValueObject. It is assumed that types are a strong enough contract that it is possible to format an object
163 entirely depending on its type. If this turns out to not be correct, then the existing model will have to be changed fairly deeply.
164 <p>The basic building block is that formatters can match by exact type name or by regular expressions, i.e. one can describe matching
165 by saying things like "this formatters matches type __NSDictionaryI", or "this formatter matches all type names like ^std::__1::vector<.+>(( )?&)?$"<br/>This match happens in class FormattersContainer. For exact matches, this goes straight to the FormatMap
166 (the actual storage area for formatters), whereas for regular expression matches the regular expression is matched against the
167 provided candidate type name. If one were to introduce a new type of matching (say, match against number of $ signs present
168 in the typename, FormattersContainer is the place where such a change would have to be introduced).<br/>It should be noted that this
169 code involves template specialization, and as such is somewhat trickier than other formatters code to update.
170 <p>On top of the string matching mechanism (exact or regex), there are a set of more advanced rules implemented
171 by the FormattersContainer,
172 with the aid of the FormattersMatchCandidate. Namely, it is assumed that any formatter class will have flags to say whether
173 it allows <i>cascading</i> (i.e. seeing through typedefs), allowing pointers-to-object and reference-to-object to be formatted.
174 <br/>Upon verifying that a formatter would be a textual match, the Flags are checked, and if they do not allow the formatter
175 to be used (e.g. pointers are not allowed, and one is looking at a Foo*), then the formatter is rejected and the search continues.
176 If the flags also match, then the formatter is returned upstream and the search is over.
177 <p>One relevant fact to notice is that this entire mechanism is not dependent on the kind of formatter to be returned, which makes it
178 easier to devise new types of formatters as the lowest layers of the system. The demands on individual formatters are that they
179 define a few typedefs, and export a Flags object, and then they can be freely matched against types as needed.
180 <p>This mechanism is replicated across a number of <i>categories</i>. A category is a named bucket where formatters are grouped on
181 some basis. The most common reason for a category to exist is a library (e.g. libcxx formatters vs. libstdcpp formatters).
183 Categories can be enabled or disabled, and they have a priority number, called position. The priority sets a strong order among
184 enabled categories. A category named "default" is always the highest priority one and it's the category where all formatters that
185 do not ask for a category of their own end up (e.g. "type summary add ...." without a "-w somecategory" flag passed)<br/>
186 The algorithm inquires each category, in the order of their priorities, for a formatter for a type, and upon receiving a positive
187 answer from a category, ends the search. Of course, no search occurs in disabled categories.
188 <p>At the individual category level, there is the first dependence on the type of formatter to be returned. Since both filters and
189 synthetic children proper are implemented through the same backing store, the matching code needs to ensure that, were both a
190 synthetic children provider and a filter to match a type, only the most recently added one is actually used.
191 <br/>The details of the algorithm used are to be found in TypeCategoryImpl::Get().<br/>
192 <p>It is quite obvious, even to a casual reader, that there are a number of complexities involved in this algorithm.<br/>
193 For starters, the entire search process has to be repeated for every variable.<br/>
194 Moreover, for each category, one has to repeat the entire process of crawling the types (go to pointee, ...).<br/>
195 This is exactly the algorithm initially implemented by LLDB. Over the course of the life of the formatters subsystem,
196 two main evolutions have been made to the matching mechanism:
198 <li>A caching mechanism</li>
199 <li>A pregeneration of all possible type matches</li>
201 <p>The cache is a layer that sits between the FormatManager and the TypeCategoryMap. Upon being asked to figure out a formatter,
202 the FormatManager will first query the cache layer, and only if that fails, will the categories be queried using the full
203 search algorithm. The result of that full search will then be stored in the cache. Even a negative answer (no formatter)
204 gets stored. The negative answer is actually the most beneficial to cache as obtaining it requires traversing all possible
205 formatters in all categories just to get a no-op back.<br/>
206 Of course, once an answer is cached, getting it will be much quicker than going to a full category search, as the cached
207 answers are of the form "type foo" --> "formatter bar". But given how formatters can be edited or removed by the user,
208 either at the command line or via the API, there needs to be a way to invalidate the cache.<br/>
209 This happens through the FormatManager::Changed() method. In general, anything that changes the formatters causes
210 FormatManager::Changed() to be called through the IFormatChangeListener interface. This call increases the
211 FormatManager's revision and clears the cache. The revision number is a monotonically increasing integer counter
212 that essentially corresponds to the number of changes made to the formatters throughout the current LLDB session.
213 This counter is used by ValueObjects to know when their formatters are out of date. Since a search is a potentially
214 expensive operation, before caching was introduced, individual ValueObjects remembered which revision of the FormatManager
215 they used to search for their formatter, and stored it, so that they would not repeat the search unless a change in the
216 formatters had occurred. While caching has made this less critical of an optimization, it is still sensible and thus is kept.
217 <br/>Lastly, as a side note, it is worth highlighting that <strong>any</strong> change in the formatters invalidates the
218 <strong>entire</strong> cache. It would likely not be impossible to be smarter and figure out a subset of cache entries
219 to be deleted, letting others persist, instead of having to rebuild the entire cache from scratch. However, given that formatters
220 are not that frequently changed during a debug session, and the algorithmic complexity to "get it right" seems larger than the
221 potential benefit to be had from doing it, the full cache invalidation is the chosen policy. The algorithm to selectively invalidate
222 entries is probably one of the major areas for improvements in formatters performance.
223 <p>The second major optimization, introduced fairly recently, is the pregeneration of type matches. The original algorithm was based upon
224 the notion of a FormatNavigator as a smart object, aware of all the intricacies of the matching rules. For each category, the
225 FormatNavigator would generate the possible matches (e.g. dynamic type, pointee type, ...), and check each one, one at a time.
226 If that failed for a category, the next one would again generate the same matches.<br/>
227 This worked well, but was of course inefficient. The FormattersMatchCandidate is the solution to this performance issue.
228 In top-of-tree LLDB, the FormatManager has the centralized notion of the matching rules, and the former FormatNavigators are now
229 FormattersContainers, whose only job is to guarantee a centralized storage of formatters, and thread-safe access to such storage.
230 <br/>FormatManager::GetPossibleMatches() fills a vector of possible matches. The way it works is by applying each rule,
231 generating the corresponding typename, and storing the typename, plus the required Flags for that rule to be accepted
232 as a match candidate (e.g. if the match comes by fetching the pointee type, a formatter that matches will have to allow pointees
233 as part of its Flags object). The TypeCategoryMap, when tasked with finding a formatter for a type, generates all possible matches
234 and passes them down to each category. In this model, the type system only does its (expensive) job once, and textual or regex
235 matches are the core of the work.
237 <div class="postfooter"></div>
240 <h1 class ="postheader">FormatManager and DataVisualization</h1>
241 <div class="postcontent">
242 <p>There are two main entry points in the data formatters: the FormatManager and the DataVisualization<br/>
243 The FormatManager is the <i>internal</i> such entry point. In this context, internal refers to data formatters code
244 itself, compared to other parts of LLDB. For other components of the debugger, the DataVisualization provides a more
245 stable entry point. On the other hand, the FormatManager is an aggregator of all moving parts, and as such is less stable
246 in the face of refactoring.<br/>People involved in the data formatters code itself, however, will most likely have to confront
247 the FormatManager for significant architecture changes.
248 <p>The FormatManager wraps a TypeCategoryMap (the list of all existing categories, enabled and not), the FormatCache, and several
249 utility objects. Plus, it is the repository of named summaries, since these don't logically belong anywhere else.<br/>
250 It is also responsible for creating all builtin formatters upon the launch of LLDB. It does so through a bunch
251 of methods Load***Formatters(), invoked as part of its constructor. The original design of data formatters anticipated
252 that individual libraries would load their formatters as part of their debug information. This work however has largely been
253 left unattended in practice, and as such core system libraries (mostly those for OSX/iOS development as of today) load their
254 formatters in an hardcoded fashion.
255 <p>For performance reasons, the FormatManager is constructed upon being first required.
256 This happens through the DataVisualization layer. Upon first being inquired for anything formatters, DataVisualization
257 calls its own local static function GetFormatManager(), which in turns constructs and returns a local static FormatManager.<br/>
258 Unlike most things in LLDB, the lifetime of the FormatManager is the same as the entire session, rather than a specific Debugger
259 or Target instance. This is an area to be improved, but as of now it has not caused enough grief to warrant action. If this work
260 were to be undertaken, one could conceivably devise a per-architecture-triple model, upon the assumption that an OS and CPU
261 combination are a good enough key to decide which formatters apply (e.g. Linux i386 is probably different from OSX x86_64, but two
262 OSX x86_64 targets will probably have the same formatters; of course versioning of the underlying OS is also to be considered,
263 but experience with OSX has shown that formatters can take care of that internally in most cases of interest).
264 <p>The public entry point is the DataVisualization layer. DataVisualization is a static class on which questions can be asked
265 in a relatively refactoring-safe manner.
266 <br/>The main question asked of it is to obtain formatters for ValueObjects (or typenames).
267 One can also query DataVisualization for named summaries or individual categories, but of course those queries delve deeper
268 in the internal object model.<br/>As said, the FormatManager holds a notion of revision number, which changes every time
269 formatters are edited (added, deleted, categories enabled or disabled, ...). Through DataVisualization::ForceUpdate() one
270 can cause the same effects of a formatters edit to happen without it actually having happened.<br/>
271 The main reason for this feature is that formatters can be dynamically created in Python, and one can then enter the
272 ScriptInterpreter and edit the formatter function or class. If formatters were not updated, one could find them to be out of sync
273 with the new definitions of these objects. To avoid the issue, whenever the user exits the scripting mode, formatters force
274 an update to make sure new potential definitions are reloaded on demand.
276 <div class="postfooter"></div>
279 <h1 class ="postheader">The SWIG layer</h1>
280 <div class="postcontent">
281 <p>In order to implement formatters written in Python, LLDB requires that ScriptInterpreter implementations provide a set
282 of functions that one can call to ask formatting questions of scripts.<br/>
283 For instance, in order to obtain a scripting summary, LLDB calls
286 GetScriptedSummary (const char *function_name,<br/>
287 llldb::ValueObjectSP valobj,<br/>
288 lldb::ScriptInterpreterObjectSP& callee_wrapper_sp,<br/>
289 std::string& retval)<br/>
291 <p>For Python, this function is implemented by first checking if the callee_wrapper_sp is valid.
292 If so, LLDB knows that it does not need to search a function with the passed name, and can directly
293 call the wrapped Python function object. Either way, the call is routed to a global callback <code>g_swig_typescript_callback</code>
294 <p>This callback pointer points to <code>LLDBSwigPythonCallTypeScript</code>, defined in python-wrapper.swig<br/>
295 The details of the implementation require familiarity with the Python C API, plus a few utility objects defined
296 by LLDB to ease the burden of dealing with the scripting world. However, as a sketch of what happens, the code
297 tries to find a Python function object with the given name (i.e. if you say "type summary add -F module.function", LLDB will scan
298 for "module" module, and then for a function named "function" inside the module's namespace). If the function object is found,
299 it is wrapped in a PyCallable, which is an LLDB utility class that wraps the callable and allows for easier calling.
300 The callable gets invoked, and the return value, if any, is cast into a string. Originally, if a non-string object was returned,
301 LLDB would refuse to use it. This disallowed such simple construct as
302 <code><br/>def getSummary(value,*args):<br/> return 1</br></code> from working
303 <p>Similar considerations apply to other formatter (and non-formatter related) scripting callbacks
305 <div class="postfooter"></div>
308 <h1 class ="postheader">Conclusion</h1>
309 <div class="postcontent">
310 <p>This document is an introduction to the design of the LLDB data formatters subsystem<br/>
311 The intended target audience are people interested in understanding or modifying the formatters themselves
312 rather than writing a specific data formatter. For this latter purpose, the user documentation about formatters
313 is the main relevant document which one should refer to.
314 <p>On the other hand, this one page highlights some open areas for improvement to the general subsystem, and more evolutions
315 not anticipated here are certainly possible. As usual, the lldb-dev mailing list is the point of first contact for
316 discussing desired new features or changes of existing features.
318 <div class="postfooter"></div>