biSBM package

Module contents

det_k_bisbm - Statistical inference of the bipartite stochastic block model

This module contains algorithms for the identification of large-scale network structure via the statistical inference of the bipartite stochastic block model.

Note

TODO.

class biSBM.OptimalKs(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]

Bases: object

Base class for OptimalKs.

Parameters
engineengine (required)

The inference engine class.

edgelistiterable or numpy.ndarray, required

Edgelist (bipartite network) for model selection.

typesiterable or numpy.ndarray, required

Types of each node specifying the type membership.

verbosebool (optional, default: False)

Logging level used. If True, progress information will be shown.

default_argsbool (optional, default: True)

Arguments for initiating the heuristic. If False, we should use set_params(), set_adaptive_ratio(), set_k_th_neighbor_to_search(), and set_nm() to set the default parameters.

random_init_kbool (optional, default: False)
bipartite_priorbool (optional, default: True)

Whether to use the bipartite prior for edge counts for the algorithm. If False, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used in graph_tool.inference.minimize.minimize_blockmodel_dl(), even if its state_args is customized to be bipartite.

tempdirstr (optional, default: None)

The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If tempdir is unset or None, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, as tempfile.gettempdir() dictates. We will pass the file to the inference engines.

compute_and_update(self, ka, kb, recompute=True)[source]

Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.

Parameters
kaint

Number of type-a communities that we want to partition.

kbint

Number of type-b communities that we want to partition.

recomputebool (optional, default: True)
compute_dl(self, ka, kb, recompute=False)[source]

Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.

Parameters
kaint

Number of type-a communities that we want to partition.

kbint

Number of type-b communities that we want to partition.

recomputebool (optional, default: False)

TODO.

Returns
dlfloat

The description length of the partition found.

e_rsnumpy.ndarray

the affinity matrix via the group membership vector found by the partitioning engine

mblist[int]

group membership vector calculated by the partitioning engine

get__q_cache(self)[source]
get_f_edgelist_name(self)[source]
minimize_bisbm_dl(self, bipartite_prior=True)[source]

Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.

Parameters
bipartite_priorbool (optional, default True)
Returns
OptimalKs.bookkeeping_dlcollections.OrderedDict

References

yen-bipartite-2019

Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.

natural_merge(self)[source]

Phase 1 natural e_rs-block merge

set_adaptive_ratio(self, adaptive_ratio=0.95)[source]

Set the adaptive ratio (float between 0 to 1, defaults to 0.95).

set_c(self, c)[source]
set_nm(self, s=10)[source]

Set the \(n_m\) parameter (defaults to 10).

set_params(self, init_ka=10, init_kb=10, i_0=0.005)[source]

Set the parameters for the heuristic.

Parameters
init_kaint (required, default: 10)
init_kbint (required, default: 10)
i_0float (required, default: 0.005)

Notes

TODO.

Warning

If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.

summary(self, mode=None)[source]

Return a summary of the algorithmic outcome.

Returns
OptimalKs._summarydict

A summary of the algorithmic outcome with minimal description length (in nats).

summary_dl(self, ka, kb)[source]

biSBM.optimalks module

class biSBM.optimalks.OptimalKs(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]

Bases: object

Base class for OptimalKs.

Parameters
engineengine (required)

The inference engine class.

edgelistiterable or numpy.ndarray, required

Edgelist (bipartite network) for model selection.

typesiterable or numpy.ndarray, required

Types of each node specifying the type membership.

verbosebool (optional, default: False)

Logging level used. If True, progress information will be shown.

default_argsbool (optional, default: True)

Arguments for initiating the heuristic. If False, we should use set_params(), set_adaptive_ratio(), set_k_th_neighbor_to_search(), and set_nm() to set the default parameters.

random_init_kbool (optional, default: False)
bipartite_priorbool (optional, default: True)

Whether to use the bipartite prior for edge counts for the algorithm. If False, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used in graph_tool.inference.minimize.minimize_blockmodel_dl(), even if its state_args is customized to be bipartite.

tempdirstr (optional, default: None)

The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If tempdir is unset or None, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, as tempfile.gettempdir() dictates. We will pass the file to the inference engines.

compute_and_update(self, ka, kb, recompute=True)[source]

Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.

Parameters
kaint

Number of type-a communities that we want to partition.

kbint

Number of type-b communities that we want to partition.

recomputebool (optional, default: True)
compute_dl(self, ka, kb, recompute=False)[source]

Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.

Parameters
kaint

Number of type-a communities that we want to partition.

kbint

Number of type-b communities that we want to partition.

recomputebool (optional, default: False)

TODO.

Returns
dlfloat

The description length of the partition found.

e_rsnumpy.ndarray

the affinity matrix via the group membership vector found by the partitioning engine

mblist[int]

group membership vector calculated by the partitioning engine

get__q_cache(self)[source]
get_f_edgelist_name(self)[source]
minimize_bisbm_dl(self, bipartite_prior=True)[source]

Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.

Parameters
bipartite_priorbool (optional, default True)
Returns
OptimalKs.bookkeeping_dlcollections.OrderedDict

References

yen-bipartite-2019

Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.

natural_merge(self)[source]

Phase 1 natural e_rs-block merge

set_adaptive_ratio(self, adaptive_ratio=0.95)[source]

Set the adaptive ratio (float between 0 to 1, defaults to 0.95).

set_c(self, c)[source]
set_nm(self, s=10)[source]

Set the \(n_m\) parameter (defaults to 10).

set_params(self, init_ka=10, init_kb=10, i_0=0.005)[source]

Set the parameters for the heuristic.

Parameters
init_kaint (required, default: 10)
init_kbint (required, default: 10)
i_0float (required, default: 0.005)

Notes

TODO.

Warning

If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.

summary(self, mode=None)[source]

Return a summary of the algorithmic outcome.

Returns
OptimalKs._summarydict

A summary of the algorithmic outcome with minimal description length (in nats).

summary_dl(self, ka, kb)[source]

biSBM.utils module

Utilities for network data manipulation and entropy computation.

biSBM.utils.accept_mb_merge(mb, mlist)[source]

Accept partition merge.

Parameters
mbiterable or numpy.ndarray

The partition to be merged.

mlistiterable or numpy.ndarray

The two block labels to be merged.

Returns
_mbnumpy.ndarray

The merged partition.

biSBM.utils.adjacency_entropy(edgelist, mb, exact=True, multigraph=True)[source]

Calculate the entropy (a.k.a. negative log-likelihood) associated with the current block partition. It does not include the model entropy.

Parameters
edgelistnumpy.ndarray
mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

exactbool
multigraphbool
Returns
entfloat

The description length (or entropy) in nat of the fitting.

biSBM.utils.assemble_e_rs_from_mb(edgelist, mb)[source]

Get \(e_{rs}\), i.e., the matrix of edge counts between blocks.

Parameters
edgelistnumpy.ndarray

List of edge tuples.

mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

Returns
e_rsnumpy.ndarray

Edge count matrix \(e_{rs}\).

biSBM.utils.assemble_edgelist_old2new(edgelist, old2new)[source]

Assemble the new edgelist array via an old2new mapping.

Parameters
edgelistnumpy.ndarray
old2newdict

Dictionary that maps the old node index to a new one.

Returns
elnumpy.ndarray

The new edgelist of a group of bi-cliques (directly pluggable to det_k_bisbm.OptimalKs)

biSBM.utils.assemble_eta_rk_from_edgelist_and_mb(edgelist, mb)[source]

Get \(\eta_{rk}\), or the number \(\eta_{rk}\) of nodes of degree \(k\) that belong to group \(r\).

Parameters
edgelistnumpy.ndarray
mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

Returns
eta_rknumpy.ndarray
biSBM.utils.assemble_mb_new2old(mb, new2old)[source]

Assemble the partition that corresponds to the old space of node indices.

Parameters
mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

new2olddict

Dictionary that maps the new node index to the old one; i.e., a reverse mapping of old2new.

Returns
old_mbnumpy.ndarray

The partition that corresponds to the old space of node indices.

biSBM.utils.assemble_n_k_from_edgelist(edgelist, mb)[source]

Get \(n_k\), i.e., the number \(n_k\) of nodes of degree \(k\).

Parameters
edgelistnumpy.ndarray

List of edge tuples.

mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

Returns
n_knumpy.ndarray

Array of the number of nodes of degree \(k\).

biSBM.utils.assemble_n_r_from_mb(mb)[source]

Get \(n_r\), i.e., the number of nodes in each group, from the partition \(b\).

Parameters
mbiterable or numpy.ndarray

Partition \(b\) of nodes into blocks.

Returns
n_rnumpy.ndarray
biSBM.utils.assemble_old2new_mapping(types)[source]

Create a mapping that map the old node-id’s to new ones, such that the types-array is sorted orderly.

Parameters
typesnumpy.ndarray
Returns
old2newdict

Dictionary that maps the old node index to a new one.

new2olddict

Dictionary that maps the new node index to the old one; i.e., a reverse mapping of old2new.

new_typesnumpy.ndarray

The new types-array, which is sorted orderly and directly applicable to det_k_bisbm.OptimalKs.

biSBM.utils.compute_profile_likelihood(edgelist, mb, ka=None, kb=None, k=None)[source]
Parameters
edgelistnumpy.ndarray
mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

kaint

Number of communities in type-a.

kbint

Number of communities in type-b.

kint

Total number of communities.

Returns
italic_ifloat
biSBM.utils.compute_profile_likelihood_from_e_rs(e_rs)[source]
Parameters
e_rsnumpy.ndarray
Returns
italic_ifloat
biSBM.utils.db_factorial_ln(val)[source]
biSBM.utils.degree_entropy(edgelist, mb, __q_cache=array([], shape=(1, 0), dtype=float64), degree_dl_kind='distributed', q_cache_max_e_r=10000)[source]

degree_entropy

Parameters
edgelistiterable or numpy.ndarray
mbiterable or numpy.ndarray

Partition \(b\) of nodes into blocks.

__q_cachenumpy.ndarray (required, default: np.array([], ndmin=2))
degree_dl_kind: ``str``
  1. degree_dl_kind == "uniform"

This corresponds to a non-informative prior, where the node degrees are sampled from an uniform distribution.

  1. degree_dl_kind == "distributed"

This option should be preferred in most cases.

Returns
entfloat

The entropy.

biSBM.utils.gen_bicliques_edgelist(b, num_nodes)[source]

Generate an array of edgelist and node-type mapping for a group of bi-cliques.

Parameters
bint

Number of bi-cliques.

num_nodesint

Number of nodes (size) for each bi-clique.

Returns
elnumpy.ndarray

The edgelist of a group of bi-cliques.

typesnumpy.ndarray

The types-array that maps each node id to a bipartite type (0 or 1).

biSBM.utils.gen_e_rs(b, n_edges, p=0)[source]
Parameters
bint

Number of communities within each type. (suppose Ka = Kb)

n_edgesint

Number of edges planted in the system.

pfloat

Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).

Returns
e_rsnumpy.ndarray

Edge counts matrix.

biSBM.utils.gen_e_rs_hard(ka, kb, n_edges, p=0)[source]
Parameters
kaint

Mumber of communities within type-a.

kbint

Number of communities within type-b.

n_edgesint

Number of edges planted in the system.

pfloat

Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).

Returns
e_rsnumpy.ndarray

Edge counts matrix.

biSBM.utils.gen_e_rs_harder(ka, kb, n_edges, samples=1, top_k=1)[source]
Parameters
kaint

Number of communities within type-a.

kbint

Number of communities within type-b.

n_edgesint

Number of edges planted in the system.

samplesint

Number of random draws made on \(e_{rs}\).

top_kint

Number of samples selected. These are top-k samples with higher profile likelihood.

Returns
e_rsnumpy.ndarray or list[numpy.ndarray] (when top_k > 1)

Edge counts matrix.

biSBM.utils.gen_equal_bipartite_partition(na, nb, ka, kb)[source]
Parameters
naint

Number of nodes in type-a.

nbint

Number of nodes in type-b.

kaint

Number of communities in type-a.

kbint

Number of communities in type-b.

Returns
nnumpy.ndarray
biSBM.utils.gen_equal_partition(n, total)[source]

gen_equal_partition

Parameters
nint
totalint
Returns
n_blockslist
biSBM.utils.gen_unequal_partition(n, total, avg_deg, alpha)[source]
Parameters
nint

Number of communities.

totalint

Number of nodes.

avg_degfloat

Average degree of the network.

alphafloat

The parameter of the Dirichlet distribution (the smaller the unevener the distribution is). We usually use alpha=1 to generate a case and filter it out if the lowest size of a community is less than (2 * total / avg_deg) ** 0.5 (resolution limit).

Returns
alist

The sizes of communities to sum to total.

_float

The ratio of largest-sized community to the lowest-sized one.

biSBM.utils.get_desc_len_from_data(na, nb, n_edges, ka, kb, edgelist, mb, diff=False, nr=None, allow_empty=False, degree_dl_kind='distributed', q_cache=array([], shape=(1, 0), dtype=float64), is_bipartite=True)[source]

Description length difference to a randomized instance

Parameters
naint

Number of nodes in type-a.

nbint

Number of nodes in type-b.

n_edgesint

Number of edges.

kaint

Number of communities in type-a.

kbint

Number of communities in type-b.

edgelistnumpy.ndarray

Edgelist in Python list structure.

mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

diffbool

When diff == True, the returned description value will be the difference to that of a random bipartite network. Otherwise, it will return the entropy (a.k.a. negative log-likelihood) associated with the current block partition.

allow_emptybool
nrnumpy.ndarray
degree_dl_kindstr (optional, default: “distributed”)
  1. degree_dl_kind == “uniform”

  2. degree_dl_kind == “distributed” (default)

  3. degree_dl_kind == “entropy”

is_bipartite: `bool` (default: `”True”`)
Returns
desc_len_bfloat

Difference of the description length to the bipartite ER network, per edge.

biSBM.utils.get_desc_len_from_data_uni(n, n_edges, k, edgelist, mb)[source]

Description length difference to a randomized instance, via PRL 110, 148701 (2013).

Parameters
nint

Number of nodes.

n_edgesint

Number of edges.

kint

Number of communities.

edgelistnumpy.ndarray

A list of edges.

mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

Returns
desc_len_bfloat

Difference of the description length to the ER network, per edge.

biSBM.utils.get_flat_entropies(state)[source]
Parameters
stategraph_tool.inference.blockmodel.BlockState

The stochastic block model state of a given graph, as defined in Graph-tool.

Returns
dldict

The entropic report of the partition.

biSBM.utils.get_nested_entropies(state)[source]
Parameters
stategraph_tool.inference.nested_blockmodel.NestedBlockState
Returns
dldict
biSBM.utils.loky_executor(max_workers, timeout, func, feeds)[source]
biSBM.utils.model_entropy(e, ka=None, kb=None, na=None, nb=None, nr=None, allow_empty=False, is_bipartite=True)[source]

Computes the amount of information necessary for the parameters of the (bipartite) blockmodel ensemble, for ka type-a blocks, kb type-b blocks, na type-a vertices, nb vertices, e edges, and either bipartite or general as a prior. This includes the entropy from modeling the edge counts and the partition.

Note that if we know a priori that the network is bipartite, we can further compress the model.

Parameters
eint

Number of edges.

kaint

Number of communities in type-a.

kbint

Number of communities in type-b.

naint

Number of vertices in type-a.

nbint

Number of vertices in type-b.

nrnumpy.ndarray

Vertex property array of the block graph which contains the block sizes.

allow_emptybool (optional, default: False)

If True, partition description length computed will allow for empty groups.

is_bipartitebool (optional, default: True)

If False, edge counts description length computed will assume a purely flat \(e_{rs}\).

Returns
dlfloat

The description length (or entropy) in nat of the model.

References

peixoto-parsimonious-2013

Tiago P. Peixoto, “Parsimonious module inference in large networks”, Phys. Rev. Lett. 110, 148701 (2013), DOI: 10.1103/PhysRevLett.110.148701 [sci-hub], arXiv: 1212.4794.

peixoto-nonparametric-2017

Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703

biSBM.utils.partition_entropy(ka=None, kb=None, k=None, na=None, nb=None, n=None, nr=None, allow_empty=False)[source]

Compute the partition entropy, P(b), for the current partition. It has several variations depending on the priors used. In the crudest way (compute_profile_likelihood_from_e_rsnr == None), we formulate P(b) = P(b | B) * P(B). Or, by a two-level Bayesian hierarchy, we can do P(b) = P(b | n) * P(n | B) * P(B).

Parameters
kaint

Number of communities in type-a.

kbint

Number of communities in type-b.

kint

Number of communities in the graph.

naint

Number of vertices in type-a.

nbint

Number of vertices in type-b.

nint

Number of vertices in the graph.

nrnumpy.ndarray

Vertex property array of the block graph which contains the block sizes.

allow_emptybool (optional, default: False)

If True, partition description length computed will allow for empty groups.

Returns
entfloat

The description length (or entropy) in nat of the partition.

biSBM.utils.virtual_moves_ds(ori_e_rs, mlists, ka)[source]

virtual_moves_ds

Parameters
ori_e_rsnumpy.ndarray
mlistsset
kaint

Number of communities in type-a.

Returns
dSfloat
_mlistnumpy.ndarray

biSBM.painter module

plots

biSBM.painter.paint_block_mat(mb, edgelist, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]
biSBM.painter.paint_block_mat_from_e_rs(e_rs, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]
biSBM.painter.paint_dl_trace(oks, output=None, figsize=(4, 2), dpi=200)[source]
biSBM.painter.paint_landscape(oks, max_ka, max_kb, output=None, dpi=200)[source]
biSBM.painter.paint_mds(oks, figsize=(20, 20))[source]
biSBM.painter.paint_similarity_trace(b, oks, output=None, figsize=(3, 3), dpi=200)[source]
biSBM.painter.paint_sorted_adj_mat(mb, edgelist, output=None, figsize=(10, 10), dpi=300, invert=True)[source]
biSBM.painter.paint_trace(oks, output=None, figsize=(4, 4), dpi=200)[source]

biSBM.int_part module

The int_part module computes the number of restricted integer partitions \(q(m, n)\) of the integer \(m\) into at most \(n\) parts.

Specifically, it counts the number of different degree counts with the sum of degrees being exactly \(m\) and that have at most \(n\) nonzero counts. Since the quantity can only computed via a recurrence, we pre-compute the values to fill up a memoization table when the number of edges and nodes is not too large; otherwise, we use accurate asymptotic expressions to efficiently compute the values for large arguments. [Rcd2a8c25c865-peixoto-nonparametric-2017]

References

Rcd2a8c25c865-peixoto-nonparametric-2017

Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703

biSBM.int_part.get_v(u, epsilon=1e-08)[source]
Parameters
uint
epsilonfloat
biSBM.int_part.init_q_cache(n_max, __q_cache=array([], shape=(1, 0), dtype=float64))[source]

Initiate the look-up table for \(q(m, n)\).

Parameters
n_maxint
__q_cachenumpy.ndarray (required, default: np.array([], ndmin=2))
biSBM.int_part.lbinom(n, k)[source]

Return log of binom(n, k).

biSBM.int_part.log_q(n, k, __q_cache)[source]
Parameters
nint
kint
__q_cachenumpy.ndarray
biSBM.int_part.log_q_approx(n, k)[source]
Parameters
nint
kint
biSBM.int_part.log_q_approx_small(n, k)[source]
Parameters
nint
kint
biSBM.int_part.log_sum(a, b)[source]
Parameters
aint
bint

biSBM.ioutils module

i/o utilities

biSBM.ioutils.get_edgelist(f_edgelist, delimiter=None)[source]

This function returns an edgelist list from a file.

Parameters
f_edgeliststr

The path to the edgelist text file.

delimiterstr

The delimiter that separate the edges.

Returns
edgelistnumpy.ndarray

The numpy list of tupled edges.

biSBM.ioutils.get_types(f_types)[source]

This function returns an edgelist list from a file.

Parameters
f_edgeliststr

The path to the types file

Returns
edgelistlist

The list of types of each node.

Examples

>>> from biSBM.ioutils import get_types
>>> edgelist = get_types(f_types)
>>> print(edgelist)
biSBM.ioutils.save_mb_to_file(path, mb)[source]

Save the group membership list to a file path.

Parameters
pathstr, required

File path for the list to save to.

mblist[int], required

Group membership list.