biSBM package¶
Module contents¶
det_k_bisbm - Statistical inference of the bipartite stochastic block model¶
This module contains algorithms for the identification of large-scale network structure via the statistical inference of the bipartite stochastic block model.
Note
TODO.
-
class
biSBM.OptimalKs(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]¶ Bases:
objectBase class for OptimalKs.
- Parameters
- engine
engine(required) The inference engine class.
- edgelist
iterableornumpy.ndarray, required Edgelist (bipartite network) for model selection.
- types
iterableornumpy.ndarray, required Types of each node specifying the type membership.
- verbose
bool(optional, default:False) Logging level used. If
True, progress information will be shown.- default_args
bool(optional, default:True) Arguments for initiating the heuristic. If
False, we should useset_params(),set_adaptive_ratio(),set_k_th_neighbor_to_search(), andset_nm()to set the default parameters.- random_init_k
bool(optional, default: False) - bipartite_prior
bool(optional, default:True) Whether to use the bipartite prior for edge counts for the algorithm. If
False, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used ingraph_tool.inference.minimize.minimize_blockmodel_dl(), even if itsstate_argsis customized to be bipartite.- tempdir
str(optional, default:None) The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If
tempdiris unset orNone, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, astempfile.gettempdir()dictates. We will pass the file to the inferenceengines.
- engine
-
compute_and_update(self, ka, kb, recompute=True)[source]¶ Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.
- Parameters
- ka
int Number of type-a communities that we want to partition.
- kb
int Number of type-b communities that we want to partition.
- recompute
bool(optional, default:True)
- ka
-
compute_dl(self, ka, kb, recompute=False)[source]¶ Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.
- Parameters
- ka
int Number of type-a communities that we want to partition.
- kb
int Number of type-b communities that we want to partition.
- recompute
bool(optional, default:False) TODO.
- ka
- Returns
- dl
float The description length of the partition found.
- e_rs
numpy.ndarray the affinity matrix via the group membership vector found by the partitioning engine
- mb
list[int] group membership vector calculated by the partitioning engine
- dl
-
minimize_bisbm_dl(self, bipartite_prior=True)[source]¶ Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.
- Parameters
- bipartite_prior
bool(optional, defaultTrue)
- bipartite_prior
- Returns
- OptimalKs.bookkeeping_dl
collections.OrderedDict
- OptimalKs.bookkeeping_dl
References
- yen-bipartite-2019
Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.
-
set_adaptive_ratio(self, adaptive_ratio=0.95)[source]¶ Set the adaptive ratio (
floatbetween 0 to 1, defaults to0.95).
-
set_params(self, init_ka=10, init_kb=10, i_0=0.005)[source]¶ Set the parameters for the heuristic.
- Parameters
- init_ka
int(required, default:10) - init_kb
int(required, default:10) - i_0
float(required, default:0.005)
- init_ka
Notes
TODO.
Warning
If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.
biSBM.optimalks module¶
-
class
biSBM.optimalks.OptimalKs(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]¶ Bases:
objectBase class for OptimalKs.
- Parameters
- engine
engine(required) The inference engine class.
- edgelist
iterableornumpy.ndarray, required Edgelist (bipartite network) for model selection.
- types
iterableornumpy.ndarray, required Types of each node specifying the type membership.
- verbose
bool(optional, default:False) Logging level used. If
True, progress information will be shown.- default_args
bool(optional, default:True) Arguments for initiating the heuristic. If
False, we should useset_params(),set_adaptive_ratio(),set_k_th_neighbor_to_search(), andset_nm()to set the default parameters.- random_init_k
bool(optional, default: False) - bipartite_prior
bool(optional, default:True) Whether to use the bipartite prior for edge counts for the algorithm. If
False, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used ingraph_tool.inference.minimize.minimize_blockmodel_dl(), even if itsstate_argsis customized to be bipartite.- tempdir
str(optional, default:None) The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If
tempdiris unset orNone, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, astempfile.gettempdir()dictates. We will pass the file to the inferenceengines.
- engine
-
compute_and_update(self, ka, kb, recompute=True)[source]¶ Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.
- Parameters
- ka
int Number of type-a communities that we want to partition.
- kb
int Number of type-b communities that we want to partition.
- recompute
bool(optional, default:True)
- ka
-
compute_dl(self, ka, kb, recompute=False)[source]¶ Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.
- Parameters
- ka
int Number of type-a communities that we want to partition.
- kb
int Number of type-b communities that we want to partition.
- recompute
bool(optional, default:False) TODO.
- ka
- Returns
- dl
float The description length of the partition found.
- e_rs
numpy.ndarray the affinity matrix via the group membership vector found by the partitioning engine
- mb
list[int] group membership vector calculated by the partitioning engine
- dl
-
minimize_bisbm_dl(self, bipartite_prior=True)[source]¶ Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.
- Parameters
- bipartite_prior
bool(optional, defaultTrue)
- bipartite_prior
- Returns
- OptimalKs.bookkeeping_dl
collections.OrderedDict
- OptimalKs.bookkeeping_dl
References
- yen-bipartite-2019
Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.
-
set_adaptive_ratio(self, adaptive_ratio=0.95)[source]¶ Set the adaptive ratio (
floatbetween 0 to 1, defaults to0.95).
-
set_params(self, init_ka=10, init_kb=10, i_0=0.005)[source]¶ Set the parameters for the heuristic.
- Parameters
- init_ka
int(required, default:10) - init_kb
int(required, default:10) - i_0
float(required, default:0.005)
- init_ka
Notes
TODO.
Warning
If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.
biSBM.utils module¶
Utilities for network data manipulation and entropy computation.
-
biSBM.utils.accept_mb_merge(mb, mlist)[source]¶ Accept partition merge.
- Parameters
- mb
iterableornumpy.ndarray The partition to be merged.
- mlist
iterableornumpy.ndarray The two block labels to be merged.
- mb
- Returns
- _mb
numpy.ndarray The merged partition.
- _mb
-
biSBM.utils.adjacency_entropy(edgelist, mb, exact=True, multigraph=True)[source]¶ Calculate the entropy (a.k.a. negative log-likelihood) associated with the current block partition. It does not include the model entropy.
- Parameters
- edgelist
numpy.ndarray - mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- exact
bool - multigraph
bool
- edgelist
- Returns
- ent
float The description length (or entropy) in nat of the fitting.
- ent
-
biSBM.utils.assemble_e_rs_from_mb(edgelist, mb)[source]¶ Get \(e_{rs}\), i.e., the matrix of edge counts between blocks.
- Parameters
- edgelist
numpy.ndarray List of edge tuples.
- mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- edgelist
- Returns
- e_rs
numpy.ndarray Edge count matrix \(e_{rs}\).
- e_rs
-
biSBM.utils.assemble_edgelist_old2new(edgelist, old2new)[source]¶ Assemble the new edgelist array via an old2new mapping.
- Parameters
- edgelist
numpy.ndarray - old2new
dict Dictionary that maps the old node index to a new one.
- edgelist
- Returns
- el
numpy.ndarray The new edgelist of a group of bi-cliques (directly pluggable to
det_k_bisbm.OptimalKs)
- el
-
biSBM.utils.assemble_eta_rk_from_edgelist_and_mb(edgelist, mb)[source]¶ Get \(\eta_{rk}\), or the number \(\eta_{rk}\) of nodes of degree \(k\) that belong to group \(r\).
- Parameters
- edgelist
numpy.ndarray - mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- edgelist
- Returns
- eta_rk
numpy.ndarray
- eta_rk
-
biSBM.utils.assemble_mb_new2old(mb, new2old)[source]¶ Assemble the partition that corresponds to the old space of node indices.
- Parameters
- mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- new2old
dict Dictionary that maps the new node index to the old one; i.e., a reverse mapping of
old2new.
- mb
- Returns
- old_mb
numpy.ndarray The partition that corresponds to the old space of node indices.
- old_mb
-
biSBM.utils.assemble_n_k_from_edgelist(edgelist, mb)[source]¶ Get \(n_k\), i.e., the number \(n_k\) of nodes of degree \(k\).
- Parameters
- edgelist
numpy.ndarray List of edge tuples.
- mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- edgelist
- Returns
- n_k
numpy.ndarray Array of the number of nodes of degree \(k\).
- n_k
-
biSBM.utils.assemble_n_r_from_mb(mb)[source]¶ Get \(n_r\), i.e., the number of nodes in each group, from the partition \(b\).
- Parameters
- mb
iterableornumpy.ndarray Partition \(b\) of nodes into blocks.
- mb
- Returns
-
biSBM.utils.assemble_old2new_mapping(types)[source]¶ Create a mapping that map the old node-id’s to new ones, such that the types-array is sorted orderly.
- Parameters
- types
numpy.ndarray
- types
- Returns
- old2new
dict Dictionary that maps the old node index to a new one.
- new2old
dict Dictionary that maps the new node index to the old one; i.e., a reverse mapping of
old2new.- new_types
numpy.ndarray The new types-array, which is sorted orderly and directly applicable to
det_k_bisbm.OptimalKs.
- old2new
-
biSBM.utils.compute_profile_likelihood(edgelist, mb, ka=None, kb=None, k=None)[source]¶ - Parameters
- edgelist
numpy.ndarray - mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- ka
int Number of communities in type-a.
- kb
int Number of communities in type-b.
- k
int Total number of communities.
- edgelist
- Returns
- italic_i
float
- italic_i
-
biSBM.utils.compute_profile_likelihood_from_e_rs(e_rs)[source]¶ - Parameters
- e_rs
numpy.ndarray
- e_rs
- Returns
- italic_i
float
- italic_i
-
biSBM.utils.degree_entropy(edgelist, mb, __q_cache=array([], shape=(1, 0), dtype=float64), degree_dl_kind='distributed', q_cache_max_e_r=10000)[source]¶ degree_entropy
- Parameters
- edgelist
iterableornumpy.ndarray - mb
iterableornumpy.ndarray Partition \(b\) of nodes into blocks.
- __q_cache
numpy.ndarray(required, default:np.array([], ndmin=2)) - degree_dl_kind: ``str``
degree_dl_kind == "uniform"
This corresponds to a non-informative prior, where the node degrees are sampled from an uniform distribution.
degree_dl_kind == "distributed"
This option should be preferred in most cases.
- edgelist
- Returns
- ent
float The entropy.
- ent
-
biSBM.utils.gen_bicliques_edgelist(b, num_nodes)[source]¶ Generate an array of edgelist and node-type mapping for a group of bi-cliques.
- Parameters
- b
int Number of bi-cliques.
- num_nodes
int Number of nodes (size) for each bi-clique.
- b
- Returns
- el
numpy.ndarray The edgelist of a group of bi-cliques.
- types
numpy.ndarray The types-array that maps each node id to a bipartite type (
0or1).
- el
-
biSBM.utils.gen_e_rs(b, n_edges, p=0)[source]¶ - Parameters
- b
int Number of communities within each type. (suppose Ka = Kb)
- n_edges
int Number of edges planted in the system.
- p
float Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).
- b
- Returns
- e_rs
numpy.ndarray Edge counts matrix.
- e_rs
-
biSBM.utils.gen_e_rs_hard(ka, kb, n_edges, p=0)[source]¶ - Parameters
- ka
int Mumber of communities within type-a.
- kb
int Number of communities within type-b.
- n_edges
int Number of edges planted in the system.
- p
float Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).
- ka
- Returns
- e_rs
numpy.ndarray Edge counts matrix.
- e_rs
-
biSBM.utils.gen_e_rs_harder(ka, kb, n_edges, samples=1, top_k=1)[source]¶ - Parameters
- ka
int Number of communities within type-a.
- kb
int Number of communities within type-b.
- n_edges
int Number of edges planted in the system.
- samples
int Number of random draws made on \(e_{rs}\).
- top_k
int Number of samples selected. These are top-k samples with higher profile likelihood.
- ka
- Returns
- e_rs
numpy.ndarrayorlist[numpy.ndarray](whentop_k > 1) Edge counts matrix.
- e_rs
-
biSBM.utils.gen_equal_bipartite_partition(na, nb, ka, kb)[source]¶ - Parameters
- na
int Number of nodes in type-a.
- nb
int Number of nodes in type-b.
- ka
int Number of communities in type-a.
- kb
int Number of communities in type-b.
- na
- Returns
-
biSBM.utils.gen_equal_partition(n, total)[source]¶ gen_equal_partition
- Parameters
- n
int - total
int
- n
- Returns
- n_blocks
list
- n_blocks
-
biSBM.utils.gen_unequal_partition(n, total, avg_deg, alpha)[source]¶ - Parameters
- n
int Number of communities.
- total
int Number of nodes.
- avg_deg
float Average degree of the network.
- alpha
float The parameter of the Dirichlet distribution (the smaller the unevener the distribution is). We usually use alpha=1 to generate a case and filter it out if the lowest size of a community is less than (2 * total / avg_deg) ** 0.5 (resolution limit).
- n
- Returns
- a
list The sizes of communities to sum to total.
- _
float The ratio of largest-sized community to the lowest-sized one.
- a
-
biSBM.utils.get_desc_len_from_data(na, nb, n_edges, ka, kb, edgelist, mb, diff=False, nr=None, allow_empty=False, degree_dl_kind='distributed', q_cache=array([], shape=(1, 0), dtype=float64), is_bipartite=True)[source]¶ Description length difference to a randomized instance
- Parameters
- na
int Number of nodes in type-a.
- nb
int Number of nodes in type-b.
- n_edges
int Number of edges.
- ka
int Number of communities in type-a.
- kb
int Number of communities in type-b.
- edgelist
numpy.ndarray Edgelist in Python list structure.
- mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- diff
bool When diff == True, the returned description value will be the difference to that of a random bipartite network. Otherwise, it will return the entropy (a.k.a. negative log-likelihood) associated with the current block partition.
- allow_empty
bool - nr
numpy.ndarray - degree_dl_kindstr (optional, default: “distributed”)
degree_dl_kind == “uniform”
degree_dl_kind == “distributed” (default)
degree_dl_kind == “entropy”
- is_bipartite: `bool` (default: `”True”`)
- na
- Returns
- desc_len_b
float Difference of the description length to the bipartite ER network, per edge.
- desc_len_b
-
biSBM.utils.get_desc_len_from_data_uni(n, n_edges, k, edgelist, mb)[source]¶ Description length difference to a randomized instance, via PRL 110, 148701 (2013).
- Parameters
- n
int Number of nodes.
- n_edges
int Number of edges.
- k
int Number of communities.
- edgelist
numpy.ndarray A list of edges.
- mb
numpy.ndarray Partition \(b\) of nodes into blocks.
- n
- Returns
- desc_len_b
float Difference of the description length to the ER network, per edge.
- desc_len_b
-
biSBM.utils.get_flat_entropies(state)[source]¶ - Parameters
- state
graph_tool.inference.blockmodel.BlockState The stochastic block model state of a given graph, as defined in Graph-tool.
- state
- Returns
- dl
dict The entropic report of the partition.
- dl
-
biSBM.utils.model_entropy(e, ka=None, kb=None, na=None, nb=None, nr=None, allow_empty=False, is_bipartite=True)[source]¶ Computes the amount of information necessary for the parameters of the (bipartite) blockmodel ensemble, for
katype-a blocks,kbtype-b blocks,natype-a vertices,nbvertices,eedges, and either bipartite or general as a prior. This includes the entropy from modeling the edge counts and the partition.Note that if we know a priori that the network is bipartite, we can further compress the model.
- Parameters
- e
int Number of edges.
- ka
int Number of communities in type-a.
- kb
int Number of communities in type-b.
- na
int Number of vertices in type-a.
- nb
int Number of vertices in type-b.
- nr
numpy.ndarray Vertex property array of the block graph which contains the block sizes.
- allow_empty
bool(optional, default:False) If
True, partition description length computed will allow for empty groups.- is_bipartite
bool(optional, default:True) If
False, edge counts description length computed will assume a purely flat \(e_{rs}\).
- e
- Returns
- dl
float The description length (or entropy) in nat of the model.
- dl
References
- peixoto-parsimonious-2013
Tiago P. Peixoto, “Parsimonious module inference in large networks”, Phys. Rev. Lett. 110, 148701 (2013), DOI: 10.1103/PhysRevLett.110.148701 [sci-hub], arXiv: 1212.4794.
- peixoto-nonparametric-2017
Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703
-
biSBM.utils.partition_entropy(ka=None, kb=None, k=None, na=None, nb=None, n=None, nr=None, allow_empty=False)[source]¶ Compute the partition entropy, P(b), for the current partition. It has several variations depending on the priors used. In the crudest way (compute_profile_likelihood_from_e_rsnr == None), we formulate P(b) = P(b | B) * P(B). Or, by a two-level Bayesian hierarchy, we can do P(b) = P(b | n) * P(n | B) * P(B).
- Parameters
- ka
int Number of communities in type-a.
- kb
int Number of communities in type-b.
- k
int Number of communities in the graph.
- na
int Number of vertices in type-a.
- nb
int Number of vertices in type-b.
- n
int Number of vertices in the graph.
- nr
numpy.ndarray Vertex property array of the block graph which contains the block sizes.
- allow_empty
bool(optional, default:False) If
True, partition description length computed will allow for empty groups.
- ka
- Returns
- ent
float The description length (or entropy) in nat of the partition.
- ent
-
biSBM.utils.virtual_moves_ds(ori_e_rs, mlists, ka)[source]¶ virtual_moves_ds
- Parameters
- ori_e_rs
numpy.ndarray - mlists
set - ka
int Number of communities in type-a.
- ori_e_rs
- Returns
- dS
float - _mlist
numpy.ndarray
- dS
biSBM.painter module¶
plots
-
biSBM.painter.paint_block_mat(mb, edgelist, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]¶
-
biSBM.painter.paint_block_mat_from_e_rs(e_rs, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]¶
biSBM.int_part module¶
The int_part module computes the number of restricted integer partitions \(q(m, n)\) of the integer \(m\) into at most \(n\) parts.
Specifically, it counts the number of different degree counts with the sum of degrees being exactly \(m\) and that have at most \(n\) nonzero counts. Since the quantity can only computed via a recurrence, we pre-compute the values to fill up a memoization table when the number of edges and nodes is not too large; otherwise, we use accurate asymptotic expressions to efficiently compute the values for large arguments. [Rcd2a8c25c865-peixoto-nonparametric-2017]
References¶
- Rcd2a8c25c865-peixoto-nonparametric-2017
Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703
-
biSBM.int_part.init_q_cache(n_max, __q_cache=array([], shape=(1, 0), dtype=float64))[source]¶ Initiate the look-up table for \(q(m, n)\).
- Parameters
- n_max
int - __q_cache
numpy.ndarray(required, default:np.array([], ndmin=2))
- n_max
-
biSBM.int_part.log_q(n, k, __q_cache)[source]¶ - Parameters
- n
int - k
int - __q_cache
numpy.ndarray
- n
biSBM.ioutils module¶
i/o utilities
-
biSBM.ioutils.get_edgelist(f_edgelist, delimiter=None)[source]¶ This function returns an edgelist list from a file.
- Parameters
- f_edgelist
str The path to the edgelist text file.
- delimiter
str The delimiter that separate the edges.
- f_edgelist
- Returns
- edgelist
numpy.ndarray The numpy list of tupled edges.
- edgelist