Friday, August 14, 2009

More 2D in KDE

An interesting question is: if the raster engine is faster at gross majority of graphics and Qt X11 engine falls back on quite a few features anyway why shouldn't we make raster the default until OpenGL implementations/engine are stable enough.

There are two technical reasons for that. Actually that's not quite true, there's more but these two are the ones that will bother a lot of people.

The first is that we'd kill KDE performance over network. So everyone who uses X/KDE over network would suddenly realize that their setup became unusable. Their sessions would suddenly send images for absolutely everything all the time... As you can imagine institutions/schools/companies who use KDE exactly in this way wouldn't be particularly impressed if suddenly updating their installations would render them unusable.

The second reason is that I kinda like being able to read and sometimes even write text. Text tends to be pretty helpful during the process of reading. Especially things like emails and web browsing get a lot easier with text. I think a lot of people shares that opinion with me. To render text we need fonts, in turn those are composed of glyphs. To make text rendering efficient, we need to cache the glyphs. When running with the X11 engine we render the text using Xrender, which means that there's a central process that can technically manage all the glyphs used by applications running on a desktop. That process is the Xserver. With the raster engine we take Xserver out of the equation and suddenly every single application on the desktop needs to cache the glyphs for all the fonts they're using. This implies that every application suddenly uses megs and megs of extra memory. They all need to individually cache all the glyphs even if all of them use the same font. It tends to work ok for languages with a few glyphs e.g. 40+ for english (26 letters + 10 digits + a few punctuation marks). It doesn't work at all for languages with more. So unless it will be decided that KDE can only be used by people with languages whose alphabets contain about 30 letters or less, then I'd hold off with making raster engine the default.

While the latter problem could be solved with some clever shared memory usage or forcing Xrender on top of raster engine (actually I shouldn't state that as a fact, I haven't looked at font rendering in raster engine in a while and maybe that was implemented lately), it's worth noting that X11 engine is simply fast enough to not bother over a few frames in this way or another. Those few frames that you'd gain would mean unusable KDE for others.

And if you think that neither of the above two points bothers you and you'd still want to use raster engine by default you'll have to understand that I just won't post instructions on how to do that here. If you're a developer, you already know how to do it and if not there are trivial ways of finding that out from the Qt sources. If you're not a developer then you really should stick globally to the defaults and can simply test the applications with the -graphicssystem switches.

2D in KDE

So it seems a lot of people is wondering about this. By this I mean why dwarfs always have beards. Underground big ears would be probably a better evolutionary trait, but elfs got dibs on those.

Qt, and therefore KDE, deals with 3 predominant ways of rendering graphics. I don't feel like bothering with transitions today, so find your own way from beards and dwarfs to Qt/KDE graphics. Those three ways are:
  • On the CPU with no help from the GPU using the raster engine
  • Using X11/Xrender with the X11 engine
  • Using OpenGL with the OpenGL engine
There's a couple of ways in which the decision about which one of those engines is being used is made.

First there's the default global engine. This is what you get when you open a QPainter on a QWidget and its derivatives. So whenever you have code like

void MyWidget::paintEvent(QPaintEvent *)
{
QPainter p(this);
...
}

you know the default engine is being used. The rules for that are as follows:
  • GNU/Linux : X11 engine is being used
  • Windows : Raster engine is being used
  • Application has been started with -graphicssystem= option :
    • -graphicssystem=native the rules above apply
    • -graphicssystem=raster the raster engine is being used by default
    • -graphicssystem=opengl the OpenGL engine is being used by default
Furthermore depending on which QPaintDevice is being used, different engines will be selected. The rules for that are as follows:
  • QWidget the default engine is being used (picked as described above)
  • QPixmap the default engine is being used (picked as described above)
  • QImage the raster engine is being used (always, it doesn't matter what engine has been selected as the default)
  • QGLWidget, QGLFramebufferObject, QGLPixelBuffer the OpenGL engine is being used (always, it doesn't matter what engine has been selected as the default)
Now here's where things get tricky: if the engine doesn't support certain features it will have to fallback to one engine that is sure to work on all platforms and have all the features required by the QPainter api - that is the raster engine. This was done to assure that all engines have the same feature set.

While OpenGL engine should in general never fallback, that is not the case for X11 and there are fallbacks. One of the biggest immediate optimizations you can do to make your application run faster is to assure that you don't have fallbacks. A good way to check for that is to export QT_PAINT_FALLBACK_OVERLAY and run your application against a debug build of Qt, this way the region which caused a fallback will be highlighted (the other method is to gdb break in QPainter::draw_helper). Unfortunately this will only detect fallbacks in Qt.

All of those engines also use drastically different methods of rendering primitives.
The raster engine rasterizes primitives directly.
The X11 engine tessellates primitives into trapezoids, that's because Xrender composites trapezoids.
The GL engine either uses the stencil method (described in this blog a long time ago) or shaders to decompose the primitives and the rest is handled by the normal GL rasterization rules.

Tessellation is a fairly complicated process (also described a long ago in this blog). To handle degenerate cases the first step of this algorithm is to find intersections of the primitive. In the simplest form think about rendering figure 8. There's no way of testing whether the given primitive is self-intersecting without actually running the algorithm.
To render with anti-aliasing on the X11 engine we have to tessellate. We have to tessellate because Xrender requires trapezoids to render anti-aliased primitives. So if the X11 engine is being used and the rendering is anti-aliased whether you're rendering a line, heart or a moose we have to tessellate.

Someone was worried that it's a O(n^2) process which is of course completely incorrect. We're not using a brute force algorithm here. The process is obviously O(nlogn). O(nlogn) complexity on the cpu side is something that both the raster and X11 engines need to deal with. The question is what happens next and what happens in the subsequent calls.

While the raster engine can deal with all of it while rasterizing, the X11 engine can't. It has to tessellate, send the results to the server and hope for the best. If the X11 driver doesn't implement composition of trapezoids (which realistically speaking most of them doesn't) this operation is done by Pixman. In the raster engine the sheer spatial locality almost forces better cache utilization than what could be realistically achieved by the "application tessellate->server rasterization" process that the X11 engine has to deal with. So without all out acceleration in this case X11 engine can't compete with the raster engine. While simplifying a lot it's worth remembering that in terms of cycles register access is most likely smaller or equal to 1 cycle, access to L1 data cache is likely about 3 cycles, L2 is probably about 14 cycles, while the main memory is about 240 cycles. So for CPU based graphics efficient memory utilization is one of the most crucial undertakings.

With that in mind, this is also the reason why a heavily optimized purely software based OpenGL implementation would be a lot faster than raster engine is at 2D graphics. In terms of memory usage OpenGL pipeline is simply a lot better at handling memory than the API QPainter provides.

So what you should take away from this is that if you're living in the perfect world, the GL engine is so much better than absolutely anything else Qt/KDE have it's not even funny, X11 follows it and the raster engine trails far behind.

The reality with which you're dealing with is that when using the X11 engine, due to the fallback you will be also using the raster engine (either on the application side with Qt raster engine or the server side with Pixman) and unfortunately in this case "the more the better" doesn't apply and you will suffer tremendously. Our X11 drivers don't accelerate chunks of Xrender, the applications don't have good means of testing what is accelerated, so what Qt does is simply doesn't use many of its features. So even if the driver would accelerate for example gradient fills and source picture transformations it wouldn't help you because Qt simply doesn't use them and always falls back to the raster engine. It's a bit of a chicken and an egg problem - Qt doesn't use it because it's slow, it's slow because no one uses it.

The best solution to that conundrum is to try running your applications with -graphicssystem=opengl and report any problems you see to both Qt software and the Mesa3D/DRI bugzillas because the only way out is to make sure that both our OpenGL implementations and OpenGL usage in the rendering code on the applications side are working efficiently and correctly. The quicker we get the rendering stack to work on top of OpenGL the better off we'll be.