Example Kernighan-Lin inferenceΒΆ

The algorithm for bipartite community detection is independent to the graph partitioning algorithm used. In principle, one could switch to a different partitioning engine and infer the number of communities in a similar manner. Here, we illustrate the use of the Kernighan-Lin algorithm in the heuristic.

If one wants to do the graph partitioning using Kernighan-Lin, one initiates a different engine class:

from engines.kl import *
kl = KL(f_engine="engines/bipartiteSBM-KL/biSBM",
        n_sweeps=2,                                        # Note that this will generate <n_sweeps> output sub-folders in <f_kl_output>
        kl_edgelist_delimiter="\t",                        # [KL] due to the KL code accepts 1-indexed nodes by default, we used the delimiter to transform our 0-indexed input.
        kl_steps=4,                                        # [KL] the number of random initializations (see the README_cplusplus.txt file)
        kl_itertimes=1,                                    # [KL] the number of KL runs (within each <outputFOLDER>) for returning an optimal result
        f_kl_output="engines/bipartiteSBM-KL/f_kl_output"  # [KL] path to the KL output dir; recommended to be in the same folder as the binary

It performs <n_sweeps> * <kl_itertimes> KL runs before returning the membership assignment with the highest likelihood. Note that since all outputs will appear in one specified f_kl_output and we may have n_sweeps (parallel) runs, hence the subfolders in f_kl_output are hashed in order to place the output data nicely.

Similarly, one can generate the string for command line computation. This time we test the algorithm on an example graph in dataset/bisbm-n_1000-ka_4-kb_6-r-1.0-Ka_30-Ir_1.75.gt.edgelist. This is a synthetic network with \(K_a=4\) and \(K_b=6\), generated by the bipartite SBM.

kl.prepare_engine("dataset/test/bisbm-n_1000-ka_4-kb_6-r-1.0-Ka_30-Ir_1.75.gt.edgelist", 500, 500, 6, 7, delimiter="\t")

The remaining codes are similar.

edgelist = get_edgelist("dataset/test/bisbm-n_1000-ka_4-kb_6-r-1.0-Ka_30-Ir_1.75.gt.edgelist", "\t")
types = kl.gen_types(500, 500)

oks = OptimalKs(kl, edgelist, types)
oks.set_params(init_ka=10, init_kb=10, i_0=0.1)

We will see that it correctly finds \(K_a=4\) and \(K_b=6\) as a result.

Try print out the inferred labels,

print(oks.bookkeeping_mb["mcmc"][(4, 6)])

which may be relevant for other decision-making process.

Note that Kernighan-Lin is generally slower than the MCMC algorithm when the number of communities is large.