JIT options and visualization using Pandas

JIT options and visualization using Pandas#

Author: Jørgen S. Dokken

In this chapter, we will explore how to optimize and inspect the integration kernels used in DOLFINx. As we have seen in the previous demos, DOLFINx uses the Unified form language to describe variational problems.

These descriptions has to be translated in to code for assembling the right and left hand side of the discrete variational problem.

DOLFINx uses ffcx to generate efficient C code assembling the element matrices. This C code is in turned compiled using CFFI, and we can specify a variety of compile options.

We start by specifying the current directory as the place to place the generated C files, we obtain the current directory using pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn
import time
import ufl

from ufl import TestFunction, TrialFunction, dx, inner
from dolfinx.mesh import create_unit_cube
from dolfinx.fem.petsc import assemble_matrix
from dolfinx.fem import FunctionSpace, form

from mpi4py import MPI
from pathlib import Path
from typing import Dict

cache_dir = f"{str(Path.cwd())}/.cache"
print(f"Directory to put C files in: {cache_dir}")
/tmp/ipykernel_2683/2894084831.py:2: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
Directory to put C files in: /__w/dolfinx-tutorial/dolfinx-tutorial/chapter4/.cache

Next we generate a general function to assemble the mass matrix for a unit cube. Note that we use dolfinx.fem.Form to compile the variational form. For codes using dolfinx.LinearProblem, you can supply jit_options as a keyword argument.

def compile_form(space: str, degree: int, jit_options: Dict):
    N = 10
    mesh = create_unit_cube(MPI.COMM_WORLD, N, N, N)
    V = FunctionSpace(mesh, (space, degree))
    u = TrialFunction(V)
    v = TestFunction(V)
    a = inner(u, v) * dx
    a_compiled = form(a, jit_options=jit_options)
    start = time.perf_counter()
    assemble_matrix(a_compiled)
    end = time.perf_counter()
    return end - start

We start by considering the different levels of optimization the C compiled can use on the optimized code. A list of optimization options and explainations can be found here

optimization_options = ["-O1", "-O2", "-O3", "-Ofast"]

The next option we can choose is if we want to compile the code with -march=native or not. This option enables instructions for the local machine, and can give different results on different systems. More information can be found here

march_native = [True, False]

We choose a subset of finite element spaces, varying the order of the space to look at the effects it has on the assembly time with different compile options.

results = {"Space": [], "Degree": [], "Options": [], "Time": []}
for space in ["N1curl", "Lagrange", "RT"]:
    for degree in [1, 2, 3]:
        for native in march_native:
            for option in optimization_options:
                if native:
                    cffi_options = [option, "-march=native"]
                else:
                    cffi_options = [option]
                jit_options = {"cffi_extra_compile_args": cffi_options,
                               "cache_dir": cache_dir, "cffi_libraries": ["m"]}
                runtime = compile_form(space, degree, jit_options=jit_options)
                results["Space"].append(space)
                results["Degree"].append(str(degree))
                results["Options"].append("\n".join(cffi_options))
                results["Time"].append(runtime)

We have now stored all the results to a dictionary. To visualize it, we use pandas and its Dataframe class. We can inspect the data in a jupyter notebook as follows

results_df = pd.DataFrame.from_dict(results)
results_df
Space Degree Options Time
0 N1curl 1 -O1\n-march=native 0.012112
1 N1curl 1 -O2\n-march=native 0.010686
2 N1curl 1 -O3\n-march=native 0.010467
3 N1curl 1 -Ofast\n-march=native 0.010452
4 N1curl 1 -O1 0.010915
... ... ... ... ...
67 RT 3 -Ofast\n-march=native 0.234924
68 RT 3 -O1 0.361228
69 RT 3 -O2 0.347010
70 RT 3 -O3 0.284614
71 RT 3 -Ofast 0.281989

72 rows × 4 columns

We can now make a plot for each element type to see the variation given the different compile options. We create a new colum for each element type and degree.

seaborn.set(style="ticks")
seaborn.set(font_scale=1.2)
seaborn.set_style("darkgrid")
results_df["Element"] = results_df["Space"] + " " + results_df["Degree"]
elements = sorted(set(results_df["Element"]))
for element in elements:
    df_e = results_df[results_df["Element"] == element]
    g = seaborn.catplot(x="Options", y="Time", kind="bar", data=df_e, col="Element")
    g.fig.set_size_inches(16, 4)
../_images/91a99eec9b7dca2fb6646d67e02dadf01c29a32b97ec796306bdacdcd728ac54.png ../_images/c243daa63fc71c902b5824d73099298350208955afed056a3c1efe208ffa75f2.png ../_images/635fbb0d08107a0118fdfa8ca5d221510e3eb251ad07ba81c09ca5bd8102b006.png ../_images/354095814a7530b258912a35972fea45d97600f3d1ab9a963c243ca6f5071829.png ../_images/f9c9e05eabed3092ab3b475470e65409af4e66a6b1d2de57f41ef755b9d5ba42.png ../_images/dc1eabe5b73ca305709c159cd6dd2729056f2ee0ef52d2dd4f45e3d3881af0e8.png ../_images/168636244838e05000769e047edb2951e58309fb0a08586ca307901354b59f5d.png ../_images/ea4b92f7aaa68430b57ef274d6489b65427b01f34f60bb9be1075674cd6cd980.png ../_images/3879b3511b564ababbf45e0f943f260ee3c1b755acc75df55ec5482b9939fccd.png

We observe that the compile time increases when increasing the degree of the function space, and that we get most speedup by using “-O3” or “-Ofast” combined with “-march=native”.