How I Improved the Performance of a Popular Open-Source Graphics Rendering Library by 1500%

Contributing to open-source software as a high school student!

December 27, 2023
by George Shao


Cause when you're fifteen...

When I was 15 (in 2019), I took a computer science course where we were tasked with creating a final project using the open-source Python graphics library Arcade.

Arcade is one of the most popular Python game development libraries, now with over 1,300,000 downloads on the Python Package Index PyPI. It is built on top of Pyglet and OpenGL, and provides methods for rendering shapes, images, and text, as well as handling I/O, object collision, hitboxes, and physics.

For my final project, I created PyArcadePaint (now Drawing2Code). It was a simple Microsoft Paint-inspired program that allowed the user to draw shapes and lines on a canvas. With the click of a button, your drawing would be exported as Python code that could be run to render the shapes on the screen.

Looking back, the code is surprisingly clean and well-structured, at least if you ignore that all the code is in a single 700-line file, but overall I'm proud of my 15-year-old self for writing it.

Arcade's Performance Issues

However, my program was slow, and so were all the other applications my classmates created with Arcade. On my laptop with an Intel i7-7500U CPU and integrated graphics, it took a second or two to render a simple drawing.

For those creating complex games with Arcade, it took even longer to render each frame. As development continued and features were added, frames-per-second (FPS) dropped even further for anybody without a dedicated graphics card in their laptop.

I was curious why Arcade was so slow, and if it could be improved, so I decided to investigate and look into the source code.

Vertex Buffer Objects

I started by forking the Arcade repository on GitHub and cloning it to my computer. I found that the main rendering code wasn't taking advantage of OpenGL VBOs (Vertex Buffer Objects), which are used to store vertex data in the GPU's memory, and was instead rendering each shape immediately for each frame.

From Wikipedia:

A vertex buffer object (VBO) is an OpenGL feature that provides methods for uploading vertex data (position, normal vector, color, etc.) to the video device for non-immediate-mode rendering.

VBOs offer substantial performance gains over immediate mode rendering primarily because the data reside in video device memory rather than system memory and so it can be rendered directly by the video device.

Arcade already was already using VBOs in some secondary functions, but they weren't being used by default in core rendering functions. I migrated some of that VBO code over to Arcade's core rendering functions, and created an object lookup table to store VBOs to prevent us from unnecessarily creating VBOs multiple times for the same polygon.

After a few days, I had a working implementation of Arcade using VBOs. Here's a tiny bit of what that code looked like:

def draw_line_strip(point_list: PointList, color: Color, line_width: float = 1):
    """
    Draw a line made up of multiple points.

    :param PointList point_list:
    :param Color color:
    :param PointList line_width:

    :Returns Shape:

    """
    triangle_point_list: List[Point] = []
    new_color_list: List[Color] = []
    for i in range(1, len(point_list)):
        start_x = point_list[i - 1][0]
        start_y = point_list[i - 1][1]
        end_x = point_list[i][0]
        end_y = point_list[i][1]
        color1 = color
        color2 = color
        id = f"line-{start_x}-{start_y}-{end_x}-{end_y}-{color}-{line_width}"
        if id not in buffered_shapes.keys():
            points = get_points_for_thick_line(start_x, start_y, end_x, end_y, line_width)
            new_color_list += color1, color2, color1, color2
            triangle_point_list += points[1], points[0], points[2], points[3]
            shape = _create_triangles_filled_with_colors(triangle_point_list, new_color_list)
            buffered_shapes[id] = shape
        buffered_shapes[id].draw()

Benchmarking & Advising

The Arcade repository included some benchmarking scripts to test the performance of the library. I modified and ran them multiple times on my laptop, comparing the performance of Arcade with the performance of my fork, which I called ArcadePlus.

Performance of Arcade 2.3 versus ArcadePlus

Image

It was a 1500% improvement! ArcadePlus was rendering polygons during a stress test at approximately 140-160 FPS compared to Arcade's 10 FPS.

Soon after, I advised some of Arcade's open-source contributors on integrating my performance improvements back into the main Arcade repository for the next release (Arcade 2.4).

Since Arcade 2.4 was supposed to be more performant, I increased the number of polygons in the benchmarking stress test, and had the following results...

Performance of Arcade 2.4 versus ArcadePlus

Image

Now that my changes were successfully integrated into the main Arcade repository, performance parity between Arcade and my fork ArcadePlus had been achieved, so I deprecated ArcadePlus.

Looking Back

It's been 4 years since I contributed to Arcade, and I'm happy to see that it's still being actively developed and used by many people, now with over 1,300,000 downloads.

This was a lot more technically complex than anything I'd worked on previously, and I learned a lot about graphics rendering and object-oriented programming in Python.

I'm glad I took the initiative to investigate and improve the performance of Arcade, and I hope that my story inspires others to contribute to open-source software as well.