biSBM package¶

Module contents¶

`det_k_bisbm` - Statistical inference of the bipartite stochastic block model¶

This module contains algorithms for the identification of large-scale network structure via the statistical inference of the bipartite stochastic block model.

Note

TODO.

class biSBM.OptimalKs(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]¶

Bases: object

Base class for OptimalKs.

Parameters

engineengine (required): The inference engine class.
edgelistiterable or numpy.ndarray, required: Edgelist (bipartite network) for model selection.
typesiterable or numpy.ndarray, required: Types of each node specifying the type membership.
verbosebool (optional, default: False): Logging level used. If True, progress information will be shown.
default_argsbool (optional, default: True): Arguments for initiating the heuristic. If False, we should use set_params(), set_adaptive_ratio(), set_k_th_neighbor_to_search(), and set_nm() to set the default parameters.
random_init_kbool (optional, default: False)
bipartite_priorbool (optional, default: True): Whether to use the bipartite prior for edge counts for the algorithm. If False, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used in graph_tool.inference.minimize.minimize_blockmodel_dl(), even if its state_args is customized to be bipartite.
tempdirstr (optional, default: None): The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If tempdir is unset or None, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, as tempfile.gettempdir() dictates. We will pass the file to the inference engines.

compute_and_update(self, ka, kb, recompute=True)[source]¶

Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.

Parameters

kaint: Number of type-a communities that we want to partition.
kbint: Number of type-b communities that we want to partition.
recomputebool (optional, default: True)

compute_dl(self, ka, kb, recompute=False)[source]¶

Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.

Parameters

kaint: Number of type-a communities that we want to partition.
kbint: Number of type-b communities that we want to partition.
recomputebool (optional, default: False): TODO.

Returns

dlfloat: The description length of the partition found.
e_rsnumpy.ndarray: the affinity matrix via the group membership vector found by the partitioning engine
mblist[int]: group membership vector calculated by the partitioning engine

get__q_cache(self)[source]¶

get_f_edgelist_name(self)[source]¶

minimize_bisbm_dl(self, bipartite_prior=True)[source]¶

Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.

Parameters

bipartite_priorbool (optional, default True)

Returns

OptimalKs.bookkeeping_dlcollections.OrderedDict

References

yen-bipartite-2019: Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.

natural_merge(self)[source]¶: Phase 1 natural e_rs-block merge

set_adaptive_ratio(self, adaptive_ratio=0.95)[source]¶: Set the adaptive ratio (float between 0 to 1, defaults to 0.95).

set_c(self, c)[source]¶

set_k_th_neighbor_to_search(self, k)[source]¶

set_nm(self, s=10)[source]¶: Set the \(n_m\) parameter (defaults to 10).

set_params(self, init_ka=10, init_kb=10, i_0=0.005)[source]¶

Set the parameters for the heuristic.

Parameters

init_kaint (required, default: 10)
init_kbint (required, default: 10)
i_0float (required, default: 0.005)

Notes

TODO.

Warning

If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.

summary(self, mode=None)[source]¶

Return a summary of the algorithmic outcome.

Returns

OptimalKs._summarydict: A summary of the algorithmic outcome with minimal description length (in nats).

summary_dl(self, ka, kb)[source]¶

biSBM.optimalks module¶

class biSBM.optimalks.OptimalKs(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]¶

Bases: object

Base class for OptimalKs.

Parameters

engineengine (required): The inference engine class.
edgelistiterable or numpy.ndarray, required: Edgelist (bipartite network) for model selection.
typesiterable or numpy.ndarray, required: Types of each node specifying the type membership.
verbosebool (optional, default: False): Logging level used. If True, progress information will be shown.
default_argsbool (optional, default: True): Arguments for initiating the heuristic. If False, we should use set_params(), set_adaptive_ratio(), set_k_th_neighbor_to_search(), and set_nm() to set the default parameters.
random_init_kbool (optional, default: False)
bipartite_priorbool (optional, default: True): Whether to use the bipartite prior for edge counts for the algorithm. If False, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used in graph_tool.inference.minimize.minimize_blockmodel_dl(), even if its state_args is customized to be bipartite.
tempdirstr (optional, default: None): The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If tempdir is unset or None, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, as tempfile.gettempdir() dictates. We will pass the file to the inference engines.

compute_and_update(self, ka, kb, recompute=True)[source]¶

Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.

Parameters

kaint: Number of type-a communities that we want to partition.
kbint: Number of type-b communities that we want to partition.
recomputebool (optional, default: True)

compute_dl(self, ka, kb, recompute=False)[source]¶

Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.

Parameters

kaint: Number of type-a communities that we want to partition.
kbint: Number of type-b communities that we want to partition.
recomputebool (optional, default: False): TODO.

Returns

dlfloat: The description length of the partition found.
e_rsnumpy.ndarray: the affinity matrix via the group membership vector found by the partitioning engine
mblist[int]: group membership vector calculated by the partitioning engine

get__q_cache(self)[source]¶

get_f_edgelist_name(self)[source]¶

minimize_bisbm_dl(self, bipartite_prior=True)[source]¶

Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.

Parameters

bipartite_priorbool (optional, default True)

Returns

OptimalKs.bookkeeping_dlcollections.OrderedDict

References

yen-bipartite-2019: Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.

natural_merge(self)[source]¶: Phase 1 natural e_rs-block merge

set_adaptive_ratio(self, adaptive_ratio=0.95)[source]¶: Set the adaptive ratio (float between 0 to 1, defaults to 0.95).

set_c(self, c)[source]¶

set_k_th_neighbor_to_search(self, k)[source]¶

set_nm(self, s=10)[source]¶: Set the \(n_m\) parameter (defaults to 10).

set_params(self, init_ka=10, init_kb=10, i_0=0.005)[source]¶

Set the parameters for the heuristic.

Parameters

init_kaint (required, default: 10)
init_kbint (required, default: 10)
i_0float (required, default: 0.005)

Notes

TODO.

Warning

If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.

summary(self, mode=None)[source]¶

Return a summary of the algorithmic outcome.

Returns

OptimalKs._summarydict: A summary of the algorithmic outcome with minimal description length (in nats).

summary_dl(self, ka, kb)[source]¶

biSBM.utils module¶

Utilities for network data manipulation and entropy computation.

biSBM.utils.accept_mb_merge(mb, mlist)[source]¶

Accept partition merge.

Parameters

mbiterable or numpy.ndarray: The partition to be merged.
mlistiterable or numpy.ndarray: The two block labels to be merged.

Returns

_mbnumpy.ndarray: The merged partition.

biSBM.utils.adjacency_entropy(edgelist, mb, exact=True, multigraph=True)[source]¶

Calculate the entropy (a.k.a. negative log-likelihood) associated with the current block partition. It does not include the model entropy.

Parameters

edgelistnumpy.ndarray
mbnumpy.ndarray: Partition \(b\) of nodes into blocks.
exactbool
multigraphbool

Returns

entfloat: The description length (or entropy) in nat of the fitting.

biSBM.utils.assemble_e_rs_from_mb(edgelist, mb)[source]¶

Get \(e_{rs}\), i.e., the matrix of edge counts between blocks.

Parameters

edgelistnumpy.ndarray: List of edge tuples.
mbnumpy.ndarray: Partition \(b\) of nodes into blocks.

Returns

e_rsnumpy.ndarray: Edge count matrix \(e_{rs}\).

biSBM.utils.assemble_edgelist_old2new(edgelist, old2new)[source]¶

Assemble the new edgelist array via an old2new mapping.

Parameters

edgelistnumpy.ndarray
old2newdict: Dictionary that maps the old node index to a new one.

Returns

elnumpy.ndarray: The new edgelist of a group of bi-cliques (directly pluggable to det_k_bisbm.OptimalKs)

biSBM.utils.assemble_eta_rk_from_edgelist_and_mb(edgelist, mb)[source]¶

Get \(\eta_{rk}\), or the number \(\eta_{rk}\) of nodes of degree \(k\) that belong to group \(r\).

Parameters

edgelistnumpy.ndarray
mbnumpy.ndarray: Partition \(b\) of nodes into blocks.

Returns

eta_rknumpy.ndarray

biSBM.utils.assemble_mb_new2old(mb, new2old)[source]¶

Assemble the partition that corresponds to the old space of node indices.

Parameters

mbnumpy.ndarray: Partition \(b\) of nodes into blocks.
new2olddict: Dictionary that maps the new node index to the old one; i.e., a reverse mapping of old2new.

Returns

old_mbnumpy.ndarray: The partition that corresponds to the old space of node indices.

biSBM.utils.assemble_n_k_from_edgelist(edgelist, mb)[source]¶

Get \(n_k\), i.e., the number \(n_k\) of nodes of degree \(k\).

Parameters

edgelistnumpy.ndarray: List of edge tuples.
mbnumpy.ndarray: Partition \(b\) of nodes into blocks.

Returns

n_knumpy.ndarray: Array of the number of nodes of degree \(k\).

biSBM.utils.assemble_n_r_from_mb(mb)[source]¶

Get \(n_r\), i.e., the number of nodes in each group, from the partition \(b\).

Parameters

mbiterable or numpy.ndarray: Partition \(b\) of nodes into blocks.

Returns

n_rnumpy.ndarray

biSBM.utils.assemble_old2new_mapping(types)[source]¶

Create a mapping that map the old node-id’s to new ones, such that the types-array is sorted orderly.

Parameters

typesnumpy.ndarray

Returns

old2newdict: Dictionary that maps the old node index to a new one.
new2olddict: Dictionary that maps the new node index to the old one; i.e., a reverse mapping of old2new.
new_typesnumpy.ndarray: The new types-array, which is sorted orderly and directly applicable to det_k_bisbm.OptimalKs.

biSBM.utils.compute_profile_likelihood(edgelist, mb, ka=None, kb=None, k=None)[source]¶

Parameters

edgelistnumpy.ndarray
mbnumpy.ndarray: Partition \(b\) of nodes into blocks.
kaint: Number of communities in type-a.
kbint: Number of communities in type-b.
kint: Total number of communities.

Returns

italic_ifloat

biSBM.utils.compute_profile_likelihood_from_e_rs(e_rs)[source]¶

Parameters

e_rsnumpy.ndarray

Returns

italic_ifloat

biSBM.utils.db_factorial_ln(val)[source]¶

biSBM.utils.degree_entropy(edgelist, mb, __q_cache=array([], shape=(1, 0), dtype=float64), degree_dl_kind='distributed', q_cache_max_e_r=10000)[source]¶

degree_entropy

Parameters

edgelistiterable or numpy.ndarray

mbiterable or numpy.ndarray

Partition \(b\) of nodes into blocks.

__q_cachenumpy.ndarray (required, default: np.array([], ndmin=2))

degree_dl_kind: ``str``

degree_dl_kind == "uniform"

This corresponds to a non-informative prior, where the node degrees are sampled from an uniform distribution.

degree_dl_kind == "distributed"

This option should be preferred in most cases.

Returns

entfloat: The entropy.

biSBM.utils.gen_bicliques_edgelist(b, num_nodes)[source]¶

Generate an array of edgelist and node-type mapping for a group of bi-cliques.

Parameters

bint: Number of bi-cliques.
num_nodesint: Number of nodes (size) for each bi-clique.

Returns

elnumpy.ndarray: The edgelist of a group of bi-cliques.
typesnumpy.ndarray: The types-array that maps each node id to a bipartite type (0 or 1).

biSBM.utils.gen_e_rs(b, n_edges, p=0)[source]¶

Parameters

bint: Number of communities within each type. (suppose Ka = Kb)
n_edgesint: Number of edges planted in the system.
pfloat: Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).

Returns

e_rsnumpy.ndarray: Edge counts matrix.

biSBM.utils.gen_e_rs_hard(ka, kb, n_edges, p=0)[source]¶

Parameters

kaint: Mumber of communities within type-a.
kbint: Number of communities within type-b.
n_edgesint: Number of edges planted in the system.
pfloat: Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).

Returns

e_rsnumpy.ndarray: Edge counts matrix.

biSBM.utils.gen_e_rs_harder(ka, kb, n_edges, samples=1, top_k=1)[source]¶

Parameters

kaint: Number of communities within type-a.
kbint: Number of communities within type-b.
n_edgesint: Number of edges planted in the system.
samplesint: Number of random draws made on \(e_{rs}\).
top_kint: Number of samples selected. These are top-k samples with higher profile likelihood.

Returns

e_rsnumpy.ndarray or list[numpy.ndarray] (when top_k > 1): Edge counts matrix.

biSBM.utils.gen_equal_bipartite_partition(na, nb, ka, kb)[source]¶

Parameters

naint: Number of nodes in type-a.
nbint: Number of nodes in type-b.
kaint: Number of communities in type-a.
kbint: Number of communities in type-b.

Returns

nnumpy.ndarray

biSBM.utils.gen_equal_partition(n, total)[source]¶

gen_equal_partition

Parameters

nint
totalint

Returns

n_blockslist

biSBM.utils.gen_unequal_partition(n, total, avg_deg, alpha)[source]¶

Parameters

nint: Number of communities.
totalint: Number of nodes.
avg_degfloat: Average degree of the network.
alphafloat: The parameter of the Dirichlet distribution (the smaller the unevener the distribution is). We usually use alpha=1 to generate a case and filter it out if the lowest size of a community is less than (2 * total / avg_deg) ** 0.5 (resolution limit).

Returns

alist: The sizes of communities to sum to total.
_float: The ratio of largest-sized community to the lowest-sized one.

biSBM.utils.get_desc_len_from_data(na, nb, n_edges, ka, kb, edgelist, mb, diff=False, nr=None, allow_empty=False, degree_dl_kind='distributed', q_cache=array([], shape=(1, 0), dtype=float64), is_bipartite=True)[source]¶

Description length difference to a randomized instance

Parameters

naint

Number of nodes in type-a.

nbint

Number of nodes in type-b.

n_edgesint

Number of edges.

kaint

Number of communities in type-a.

kbint

Number of communities in type-b.

edgelistnumpy.ndarray

Edgelist in Python list structure.

mbnumpy.ndarray

Partition \(b\) of nodes into blocks.

diffbool

When diff == True, the returned description value will be the difference to that of a random bipartite network. Otherwise, it will return the entropy (a.k.a. negative log-likelihood) associated with the current block partition.

allow_emptybool

nrnumpy.ndarray

degree_dl_kindstr (optional, default: “distributed”)

degree_dl_kind == “uniform”
degree_dl_kind == “distributed” (default)
degree_dl_kind == “entropy”

is_bipartite: `bool` (default: `”True”`)

Returns

desc_len_bfloat: Difference of the description length to the bipartite ER network, per edge.

biSBM.utils.get_desc_len_from_data_uni(n, n_edges, k, edgelist, mb)[source]¶

Description length difference to a randomized instance, via PRL 110, 148701 (2013).

Parameters

nint: Number of nodes.
n_edgesint: Number of edges.
kint: Number of communities.
edgelistnumpy.ndarray: A list of edges.
mbnumpy.ndarray: Partition \(b\) of nodes into blocks.

Returns

desc_len_bfloat: Difference of the description length to the ER network, per edge.

biSBM.utils.get_flat_entropies(state)[source]¶

Parameters

stategraph_tool.inference.blockmodel.BlockState: The stochastic block model state of a given graph, as defined in Graph-tool.

Returns

dldict: The entropic report of the partition.

biSBM.utils.get_nested_entropies(state)[source]¶

Parameters

stategraph_tool.inference.nested_blockmodel.NestedBlockState

Returns

dldict

biSBM.utils.loky_executor(max_workers, timeout, func, feeds)[source]¶

biSBM.utils.model_entropy(e, ka=None, kb=None, na=None, nb=None, nr=None, allow_empty=False, is_bipartite=True)[source]¶

Computes the amount of information necessary for the parameters of the (bipartite) blockmodel ensemble, for ka type-a blocks, kb type-b blocks, na type-a vertices, nb vertices, e edges, and either bipartite or general as a prior. This includes the entropy from modeling the edge counts and the partition.

Note that if we know a priori that the network is bipartite, we can further compress the model.

Parameters

eint: Number of edges.
kaint: Number of communities in type-a.
kbint: Number of communities in type-b.
naint: Number of vertices in type-a.
nbint: Number of vertices in type-b.
nrnumpy.ndarray: Vertex property array of the block graph which contains the block sizes.
allow_emptybool (optional, default: False): If True, partition description length computed will allow for empty groups.
is_bipartitebool (optional, default: True): If False, edge counts description length computed will assume a purely flat \(e_{rs}\).

Returns

dlfloat: The description length (or entropy) in nat of the model.

References

peixoto-parsimonious-2013: Tiago P. Peixoto, “Parsimonious module inference in large networks”, Phys. Rev. Lett. 110, 148701 (2013), DOI: 10.1103/PhysRevLett.110.148701 [sci-hub], arXiv: 1212.4794.
peixoto-nonparametric-2017: Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703

biSBM.utils.partition_entropy(ka=None, kb=None, k=None, na=None, nb=None, n=None, nr=None, allow_empty=False)[source]¶

Compute the partition entropy, P(b), for the current partition. It has several variations depending on the priors used. In the crudest way (compute_profile_likelihood_from_e_rsnr == None), we formulate P(b) = P(b | B) * P(B). Or, by a two-level Bayesian hierarchy, we can do P(b) = P(b | n) * P(n | B) * P(B).

Parameters

kaint: Number of communities in type-a.
kbint: Number of communities in type-b.
kint: Number of communities in the graph.
naint: Number of vertices in type-a.
nbint: Number of vertices in type-b.
nint: Number of vertices in the graph.
nrnumpy.ndarray: Vertex property array of the block graph which contains the block sizes.
allow_emptybool (optional, default: False): If True, partition description length computed will allow for empty groups.

Returns

entfloat: The description length (or entropy) in nat of the partition.

biSBM.utils.virtual_moves_ds(ori_e_rs, mlists, ka)[source]¶

virtual_moves_ds

Parameters

ori_e_rsnumpy.ndarray
mlistsset
kaint: Number of communities in type-a.

Returns

dSfloat
_mlistnumpy.ndarray

biSBM.painter module¶

plots

biSBM.painter.paint_block_mat(mb, edgelist, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]¶

biSBM.painter.paint_block_mat_from_e_rs(e_rs, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]¶

biSBM.painter.paint_dl_trace(oks, output=None, figsize=(4, 2), dpi=200)[source]¶

biSBM.painter.paint_landscape(oks, max_ka, max_kb, output=None, dpi=200)[source]¶

biSBM.painter.paint_mds(oks, figsize=(20, 20))[source]¶

biSBM.painter.paint_similarity_trace(b, oks, output=None, figsize=(3, 3), dpi=200)[source]¶

biSBM.painter.paint_sorted_adj_mat(mb, edgelist, output=None, figsize=(10, 10), dpi=300, invert=True)[source]¶

biSBM.painter.paint_trace(oks, output=None, figsize=(4, 4), dpi=200)[source]¶

biSBM.int_part module¶

The int_part module computes the number of restricted integer partitions \(q(m, n)\) of the integer \(m\) into at most \(n\) parts.

Specifically, it counts the number of different degree counts with the sum of degrees being exactly \(m\) and that have at most \(n\) nonzero counts. Since the quantity can only computed via a recurrence, we pre-compute the values to fill up a memoization table when the number of edges and nodes is not too large; otherwise, we use accurate asymptotic expressions to efficiently compute the values for large arguments. [Rcd2a8c25c865-peixoto-nonparametric-2017]

References¶

Rcd2a8c25c865-peixoto-nonparametric-2017: Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703

biSBM.int_part.get_v(u, epsilon=1e-08)[source]¶

Parameters

uint
epsilonfloat

biSBM.int_part.init_q_cache(n_max, __q_cache=array([], shape=(1, 0), dtype=float64))[source]¶

Initiate the look-up table for \(q(m, n)\).

Parameters

n_maxint
__q_cachenumpy.ndarray (required, default: np.array([], ndmin=2))

biSBM.int_part.lbinom(n, k)[source]¶: Return log of binom(n, k).

biSBM.int_part.log_q(n, k, __q_cache)[source]¶

Parameters

nint
kint
__q_cachenumpy.ndarray

biSBM.int_part.log_q_approx(n, k)[source]¶

Parameters

nint
kint

biSBM.int_part.log_q_approx_small(n, k)[source]¶

Parameters

nint
kint

biSBM.int_part.log_sum(a, b)[source]¶

Parameters

aint
bint

biSBM.ioutils module¶

i/o utilities

biSBM.ioutils.get_edgelist(f_edgelist, delimiter=None)[source]¶

This function returns an edgelist list from a file.

Parameters

f_edgeliststr: The path to the edgelist text file.
delimiterstr: The delimiter that separate the edges.

Returns

edgelistnumpy.ndarray: The numpy list of tupled edges.

biSBM.ioutils.get_types(f_types)[source]¶

This function returns an edgelist list from a file.

Parameters

f_edgeliststr: The path to the types file

Returns

edgelistlist: The list of types of each node.

Examples

>>> from biSBM.ioutils import get_types
>>> edgelist = get_types(f_types)
>>> print(edgelist)

biSBM.ioutils.save_mb_to_file(path, mb)[source]¶

Save the group membership list to a file path.

Parameters

pathstr, required: File path for the list to save to.
mblist[int], required: Group membership list.

biSBM package¶

Module contents¶

det_k_bisbm - Statistical inference of the bipartite stochastic block model¶

biSBM.optimalks module¶

biSBM.utils module¶

biSBM.painter module¶

biSBM.int_part module¶

References¶

biSBM.ioutils module¶

`det_k_bisbm` - Statistical inference of the bipartite stochastic block model¶