How X Window Managers Work, And How To Write One (Part I)
Window managers are one of the core components of the modern Linux/BSD desktop. It is not an exaggeration to say that they define to a large degree our day-to-day user experience, as they are responsible for deciding how individual windows look, move around, react to input, and organize themselves. Hence, almost 30 years since the first X window manager, we still argue over the merits of different window managers, and new window managers continue to reinvent how we interact with our digital world.
In this series of posts, I hope to demystify how window managers work, and how you might go about writing one yourself.
I will be quoting quite heavily from the seminal Xlib Programming Manual (3rd Ed, 1994) by Adrian Nye and published by O’Reilly. Despite its age, it remains amazingly relevant and is the best available introductory text to the internals of X, which has not changed over the past two decades as much as you’d think. Since you could buy the book plus shipping for less than the price of a cup of coffee, I strongly recommend it to anyone interested in learning more about X. In addition, its chapter 16 also covers the basics of window management.
The Role of an X Window Manager
Let’s start with an examination of the role of the window manager in a modern Linux/BSD desktop environment.
The Rights of X Window Managers
Unlike other windowing systems such as Microsoft Windows or Mac OS X, X does not dictate a window manager or how a window manager should behave. This decision is to thank for the wild diversity of X window managers we see today.
X is somewhat unusual in that it does not mandate a particular type of window manager. Its developers have tried to make X itself as free of window management or user interface policy as possible.
— Xlib Programming Manual §1.2.3
In fact, it does not even require a window manager to be present at all:
Unlike citizens, the window manager has rights but not responsibilities. Programs must be prepared to cooperate with any type of window manager or with none at all […].
— Xlib Programming Manual §1.2.3
This is in stark contrast to the integrative approach of other GUI systems. On Mac OS X and Unity, for example, an application could not possibly function without the window manager, as the latter is responsible for rendering a part of the application’s interface (e.g., menus).
The Responsibilities of X Window Managers
As you probably already know, X operates in a server-client model. An X server controls one or more physical display devices as well as input devices (mouse, keyboard, etc.). An application that wants to interact with these devices assumes the role of an X client. An X server and its clients may run on the same computer, in which case they communicate via domain sockets, or on different computers, in which case they communicate through TCP/IP.
A window manager is a regular X client. It doesn’t have any superuser privileges or keys to kernel backdoors; it is a normal user process that is allowed by the X server to call a set of special APIs. X ensures that no more than one window manager is running at any given point by denying a client access to these APIs if another client currently has access. The first client to attempt to access these APIs always succeeds.
A window manager communicates with the windows it manages through two X mechanisms: properties and events. We will discuss these in detail in later sections, but the takeaway is that the communication happens through the X server, not directly between the window manager and other applications.
This is illustrated by the following diagram:
How an X Window Manager Manages Windows
Let’s now dive into the details of how a window manager does its job.
The Window Hierarchy
When we think about modern GUIs, we usually use the term widgets or controls to refer to UI elements such as buttons, scrollbars, or text boxes, and the term windows to refer to a container for such widgets that has its own name and can be independently moved around, closed, resized, etc..
X, however, was designed to be as low-level as possible. The fundamental UI model that X provides, upon which UI frameworks such as GTK+ and Qt are built, is that of an hierarchy of rectangles. In X terminology, all top level windows and all UI elements within are windows. In other words, a window, is any rectangular area that is an unit of user interaction and/or graphic display.
Windows are organized into a tree hierarchy. At the root of the hierarchy is the root window, a virtual, invisible window that has the same size as the screen, and is always present. Top level windows are direct children of the root window. UI elements within a top level window are descendants of that window.
For example, consider the dialog box above from the Xfce desktop environment. The entire dialog is an X window. All UI elements in the dialog box - the magnifying glass icon, the text box, the green down arrow, the Close and Launch buttons, and the icons inside those buttons - are also X _window_s.
The whole dialog window is a child of the root window. The magnifying glass icon, the text box, and the Close and Launch buttons are children of the dialog window. The green down arrow is a child of the text box window, and the icons in the Close and Launch buttons are children of those buttons respectively.
An important thing to note about X windows is that a child window is clipped to the boundaries of its parent:
A child may be positioned partially or completely outside its parent window, but output to the child is displayed and input received only in the area where the child overlaps with the parent.
— Xlib Programming Manual §2.2.2
For example, if we increase the width of the text box in the dialog above by 2x without changing the size of the dialog box, the portion of the text box that extends outside of the dialog box will become invisible, and clicking on it will not send an event to the text box.
A window manager manages top level windows - that is, direct children of the root window.
In the absence of a window manager, when an application wants to do something with a window - move it, resize it, show/hide it, etc. - its request is directly processed by the X server, and that’s the end of that. A window manager, however, needs to intercept these requests. For example, a window manager may need to know that a new top level window has been created and displayed, in order to draw window decorations (e.g. minimize / maximize / close buttons) around it. It may also need to know that an existing top level window has been resized, in order to redraw the window decorations to reflect the change.
The mechanism that allows a window manager to intercept such requests is called substructure redirection.
This is how substructure redirection works. Suppose we have a window W. If a program M registers for substructure redirection on W, a matching request to modify any direct child window of W will not be executed by the X server. Instead, the X server redirects this request to the program M, which can do whatever it wants with the request, including denying the request outright or granting the request with modifications. More formally,
The structure, as the term is used here, is the location, size, stacking order, border width, and mapping status of a window. The substructure is all these statistics about the children of a particular window. This is the complete set of information about screen layout that the window manager might need in order to implement its policy. Redirection means that an event is sent to the client selecting redirection (usually the window manager), and the original structure−changing request is not executed.
— Xlib Programming Manual §16.2
Note that only direct children of a window W is affected by substructure redirection on W, not any windows further down the hierarchy.
This gets interesting when we consider substructure redirection on the root window:
When the window manager selects
SubstructureRedirectMaskon the root window, an attempt by any other client to change the configuration of any child of the root window will fail. Instead an event describing the layout change request will be sent to the window manager. The window manager then reads the event and determines whether to honor the request, modify it, or deny it completely. If it decides to honor the request, it calls the routine that the client called that triggered the event with the same arguments. If it decides to modify the request, it calls the same routine but with modified arguments.
— Xlib Programming Manual §16.2
In other words, a window manager must register for substructure redirection on the root window, which causes all creation, destruction, reconfiguration etc. of top level windows - which are direct children of the root window - to be routed to the window manager. This is the magic hook into the X server that window managers rely on to do their job.
This relationship is shown in the following diagram:
Finally, the X server only allows one running program to register for substructure redirection on any given window at any given time. An attempt to register for substructure redirection on a window will fail if another X client has already done the same on the same window, and has not unregistered, disconnected from the X server, or crashed. Since all window managers must register for substructure redirection on the root window, this latter acts as a locking mechanism that prevents two or more window managers from running simultaneously on the same screen.
In the example dialog box above, we see a title bar with, for example, little buttons for minimizing, maximizing, and closing the window. These UI elements are not created by the application, but by the window manager, via a process known as reparenting or framing:
A window manager can decorate [top level] windows on the screen with titlebars and place little boxes on the titlebar with which the window can be moved or resized. This is only one possibility […].
To do this, the window manager creates a child of the root somewhat larger than the top level window of the application. Then it calls
XReparentWindow(), specifying the top level window of the application as
winand the new parent [window it just created] as
winand all its descendants will then be descendants of
— Xlib Programming Manual §16.3
In other words, if we were to run an X application without a window manager present, the top level window of the application would be a direct child of the root window. With a window manager running, however, the top level window of the application may be reparented by the window manager; it becomes a child of a frame window which is created by the window manager, and which is itself a direct child of the root window. The window manager can add other UI elements inside this frame window alongside the application’s top level window as it sees fit.
Therefore, I’ve kind of lied to you several paragraphs ago: the dialog box shown earlier is really a child window within a frame window created by Xfce's window manager, Xfwm, along with other UI elements for window management:
Reparenting is what allows different window managers to draw different window decorations, and thereby achieve a consistent look-and-feel across windows. However, there are also window managers that do not reparent at all: these are called non-reparenting window managers. There are two reasons why a window manager would not want to reparent:
If a window manager does not draw window decorations around top level windows , it obviously has no need to reparent them. Examples: xmonad, dwm.
Compositing window managers do not always need to reparent windows; we will discuss why below. Example: Compiz. This is not true for all compositing window managers, however; for example, GNOME’s default window manager, Mutter, is a reparenting comopositing window manager.
Let’s now consider substructure redirection in the context of reparenting. When a top level window W is first shown (map'ped in X jargon), the window manager is notified because it has registered for substructure redirection on the root window, and a top level window is a direct child of the root window. It then creates a frame F and reparents W, so that W becomes a child of F, which itself is a child of the root window. But since now W is no longer a direct child of the root window, the window manager will no longer be able to intercept changes to W!
Therefore, a reparenting window manager must also subsequently register for substructure redirection on each frame window it creates.
Compositing window managers are a relatively new development. Compositing support in X was added in late 2004, a full decade after the last edition of Xlib Programming Manual. The first compositing window managers, Xfwm and Compiz, launched in early 2005.
So, what exactly does a compositing window manager do?
In our discussion above on substructure redirection and reparenting, we saw how a window manager can respond to various requests for a top level window - to display/hide it (map/unmap in X jargon), to resize it, to move it, etc.. But we didn’t talk about how to deal with what’s inside the top level windows.
Indeed, from the perspective of the window manager, top level windows are black boxes; they each manage their own descendant windows (UI elements), perhaps through a framework such as GTK+ or Qt, and the window manager has no right to interfere there. The application that creates a top level window is responsible for rendering and handling events for any descendant windows (UI elements), and does so directly through X. This is shown in the first diagram above.
As the computing power of graphics hardware grew, so did people’s expectations from their window managers. With hardware acceleration, it became possible to build much more computationally intensive user interfaces, such as the (in)famous Desktop Cube in Compiz:
or the Shift Switcher:
Let’s take a moment to think about how we can implement an interface such as the Shift Switcher above. When the user triggers this interface, we need to:
Render each top level window and all its descendant windows (UI elements) to an off-screen, in-memory buffer, instead of directly to the hardware.
Transform (rotate, contort, etc.) each buffer according to our design.
Composite the transformed buffers into a final buffer along with a background and any other floating UI elements else we need to display.
Create an overlay window that covers the entire screen and hides all other windows.
Render the final buffer into the overlay window.
There are a number of challenges:
We must be able to retrieve the displayed contents of top level windows. However, as we described earlier, top level windows render their contents directly through X, without going through the window manager.
We need to update our interface in real time as the contents of the top level windows change. However, top level windows do not notify window managers when their contents change, again because they render their contents directly through X.
A top level window A may overlap with another top level window B below, which means a portion of B isn’t currently displayed. Our interface, however, must capture the full rendering of A and B, regardless of overlapping regions.
All this complex compositing process is computationally intensive and requires hardware acceleration to function adequately.
It is clear that none of this would be possible without some heavy cooperation from the X server. Enter the Composite extension:
Many user interface operations would benefit from having pixel contents of window hierarchies available without respect to sibling and antecedent clipping. In addition, placing control over the composition of these pixel contents into a final screen image in an external application will enable a flexible system for dynamic application content presentation.
— X Composite Extension
The Composite extension provides a mechanism to request the X server not to render a specific window and its descendants directly to hardware, but to a special buffer maintained by the X server, and do so without the normal clipping and overlap computations. This buffer can then be read and used by the client that made the request.
That’s exactly what a compositing window manager does: it will ask X to render each top level window to an off-screen, in-memory buffer and composite the results into an overlay window itself. And it needs to do this not just for fancy task switcher interfaces as in our example, but also to achieve effects like translucency, animations, soft shadows, and the like.
This is illustrated in the following diagram:
Let’s end this section by considering whether a compositing window manager should reparent top level windows.
Since a compositing window manager already knows the size and position of all top level windows, it’s easy for it to just draw window decorations during compositing into the overlay window using graphics operations (e.g. OpenGL), without ever creating an actual X frame window and reparenting. Some compositing window managers do operate this way.
On the other hand, a window manager may need to support both a compositing and a non-compositing mode, for compatibility with older or unsupported graphics hardware. In this case, it needs to implement reparenting and frame windows for non-compositing mode anyway, so additionally implementing drawing window decorations using graphics operations becomes redundant. This is why may other compositing window managers still choose to reparent.
Ready For Some Code?
If you’ve read everything up to this point, you’re probably holding back the urge to cry out "Enough talk - show me some code!" I don’t blame you.
In the next installment in this series, I will walk you through a basic implementation of a reparenting, non-compositing window manager. Impatient? Check out the code on GitHub!
Next chapter: How X Window Managers Work, And How To Write One (Part II)