Sometimes while using my computer I notice random slowness when launching a certain application or some feature that just doesn’t run very well. That’s always reason enough for me to take a deeper look.
My tool of choice for analyzing performance issues is Hotspot, KDAB’s excellent perf visualizer. It comes with an easy to use GUI for browsing the results collected by it. Particularly its flame graph lets you quickly detect, well, hotspots during execution. Just launch an application through Hotspot or attach it to a running one and look at the graphs. Depending on your system configuration you might need to adjust the perf_event_paranoid kernel setting in order for it to inspect other processes.
Within a few seconds I found what caused the noticeable slowdown in the startup of Spectacle that I observed. One of its features is an image annotator for adding arrows and other highlights to screenshots. Through Hotspot I noticed that it was being loaded even when I didn’t actually use it – there’s a button for entering annotation mode. The trivial remedy was to delay loading the annotator until this button was clicked. I’m sure you can make use of Hotspot, too, and help us improve the performance of our software!
Another simple fix was on Okular’s new Welcome page: for rendering the large app icon, a new KIconLoader instance was created. Doing so parses and loads the icon theme from disk, which is quite heavy. There’s no need to do that, though, as QIcon can rasterize an icon at any size, and will use the shared icon loader just like any other icon in the application.
KUrlNavigator, the class behind the address bar in Dolphin and all file dialogs, has two modes: navigate (breadcrumb) and edit mode. When in edit mode, a button with a tick icon accepts the input URL and returns back to breadcrumb mode. Of course, not loading that icon until actually entering edit mode saves a few CPU cycles.
It stands or falls with the compositor
An important candidate for performance optimizations is KWin, our excellent Wayland compositor. The more efficient it is, the faster it can start up and render frames, and the smoother our user experience is. For example, the Wayland compositor just sends key codes to applications and they are then responsible for figuring out which actual character was pressed. In order to do that, the compositor provides clients with the keymap, a table several tens of kilobytes in size, describing which rules to apply. This is not something you want to send over the Wayland socket, so instead a file descriptor is passed around. Previously, KWin wrote the data into a temporary file and passed it to the client. Each client also got its own file so that they couldn’t tamper with another client’s keymap.
While /tmp is typically a RAM drive, it still adds overhead to go through file system IO. That’s why I ported it to use memfd_create which creates an anonymous file in memory. Since it supports fd seals, we can create it once, seal it for shrinking/growing/writing, and reuse it for all clients. Sadly, this is only an option for clients supporting wl_keyboard in version 7 or above (Qt 5.15 is on version 5), since only then it was specified that the keymap must be mapped as MAP_PRIVATE by the client so the aforementioned seals can be be applied.
Further work has been done to delay loading of resources to when they’re actually used. For instance, the noise texture used by the blur effect is now only generated when there is a window enabling blur. While most users will have a panel in Plasma with blur applied to it, this still means KWin starts up every so slightly faster, and thus all applications requiring the Wayland compositor can begin starting up sooner. The same principle was applied to the window palette. It is only needed for colorizing a window’s title bar. Naturally, the splash screen has neither blur nor a title bar.
I love DBus. Qt has its own module for easily exporting classes to the bus as well as issuing method calls and listening to signals. The most important thing to remember with Qt DBus: Never use QDBusInterface. This innocent-looking class does a blocking introspection of the interface in its constructor! Instead, use a C++ class generated by qdbusxml2cpp from an XML interface description. Alternatively, if you really have to do a manual call, you can do that through QDBusMessage. By setting Q_DBUS_BLOCKING_CALL_MAIN_THREAD_WARNING_MS it will print a warning whenever a blocking DBus call is made that took longer than the specified time.
On my laptop, I was able to speed up Dolphin’s startup by 50ms just by removing some QDBusInterface usage in Solid (the Framework which enumerates storage devices). Solid is very modular and thus each device it queries results in quite some DBus traffic. I started investigating a port to the DBus ObjectManager API so it can just query all storage devices in a single call but with with optical drives being a separate drive and a medium inside of it, and the fact I don’t own a computer with such a drive anymore, this has proven to be quite a challenge.
Furthermore, the integration of DBus properties with Qt’s property system isn’t ideal. For a start, it doesn’t relay change signals. More importantly, though, changing a DBus property through a generated interface class, again does a blocking call because setProperty returns whether a call was successful or not, even if you don’t care. Replacing a property change request with a custom DBus call in Plasma-NM fixed one of the reasons the Network applet took a while to open: it changes the network stats collection interval when it opens.
Good old gdb
Especially when it’s about blocking file system IO, gdb can be surprisingly useful. Typically, a process enters a state of “uninterruptible sleep” when stuck waiting for a device to deliver requested data. This implies that a well-timed Ctrl+C while running the application in a debugger can already give interesting insights on why it’s stuck.
One notable example is a seemingly random freeze I encountered when I right-clicked on a file while browsing a network share in Dolphin. Turns out, the plug-in provided by Ark for the “Extract here…” action had a bug which had it disregard the URL scheme of the file that was clicked on. This meant, instead of accessing smb://server/some/directory it would try to look in the local /some/directory. If that particular location existed and happened to be on an autofs mount, it would stall the UI until the mount point woke up again, explaining the apparent randomness of the issue. To figure that out I just ran Dolphin in gdb all day and after a few hours I witnessed the freeze which I could then solve.
Another blocking call that I am currently working on is when entering a directory. Unix uses a name that starts with a dot to denote a hidden file. Some file managers, Dolphin included, support reading a list of files to hide from a text file named .hidden. This is currently done in the directory lister on the application’s main thread. The obvious fix for this is to move it to the KIO worker thread/process. However, there’s jobs, such as copying files, where it doesn’t matter that a file is hidden and it would be wasteful to check for that in this case.
Other bits here and there
Thumbnail generation will always take some time, so you need to resort to additional tricks. One change I made to Dolphin is generating previews for files before those of folders, instead of strictly top-left to bottom-right. Folder thumbnails have to walk the directory to figure out which files to add to the collage. A file on the other hand is just a file and easily cached. Additionally, the JPEG thumbnailer now also uses a thumbnail embedded in EXIF metadata, if available, which is significantly faster than reading the full image. By the way, KNotifications no longer includes the alpha channel in a custom notification icon, if the image is fully opaque, which is usually the case for user photos.
Finally a neat party trick for most of your QML applications: Since Qt 5.15 ListView is capable of re-using delegate instances rather than creating and destroying them as you scroll around. By setting reuseItems it will just re-assign the model index and update any bindings without constantly instantiating a massive tree of objects. You need to be careful not to store any state in your delegates but that is something you shouldn’t be doing anyway. Enabling this gave KRunner a massive speed boost. It’s not yet enabled in notifications – I am aware that scrolling the history is quite slow – but we have several types of delegates in there (e.g. list header, and notification body) and there’s a Loader picking them. I need to refactor this, otherwise there’s no performance improvement since the Loader’s source item is still recreated as you scroll around.
3 thoughts on “Performance Musings”
This work is very much welcomed. Thanks for doing it and writing about it.
Compliments for this very clear and useful article. Have a nice day!
Kai Uwe, whenever you blog about a topic, I start waiting eagerly to get the stuff into my hands. Performance improvements are always welcome, learning about the backgrounds how they were achieved, is very insightful!