biSBM package¶
Module contents¶
det_k_bisbm
- Statistical inference of the bipartite stochastic block model¶
This module contains algorithms for the identification of large-scale network structure via the statistical inference of the bipartite stochastic block model.
Note
TODO.
-
class
biSBM.
OptimalKs
(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]¶ Bases:
object
Base class for OptimalKs.
- Parameters
- engine
engine
(required) The inference engine class.
- edgelist
iterable
ornumpy.ndarray
, required Edgelist (bipartite network) for model selection.
- types
iterable
ornumpy.ndarray
, required Types of each node specifying the type membership.
- verbose
bool
(optional, default:False
) Logging level used. If
True
, progress information will be shown.- default_args
bool
(optional, default:True
) Arguments for initiating the heuristic. If
False
, we should useset_params()
,set_adaptive_ratio()
,set_k_th_neighbor_to_search()
, andset_nm()
to set the default parameters.- random_init_k
bool
(optional, default: False) - bipartite_prior
bool
(optional, default:True
) Whether to use the bipartite prior for edge counts for the algorithm. If
False
, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used ingraph_tool.inference.minimize.minimize_blockmodel_dl()
, even if itsstate_args
is customized to be bipartite.- tempdir
str
(optional, default:None
) The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If
tempdir
is unset orNone
, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, astempfile.gettempdir()
dictates. We will pass the file to the inferenceengines
.
- engine
-
compute_and_update
(self, ka, kb, recompute=True)[source]¶ Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.
- Parameters
- ka
int
Number of type-a communities that we want to partition.
- kb
int
Number of type-b communities that we want to partition.
- recompute
bool
(optional, default:True
)
- ka
-
compute_dl
(self, ka, kb, recompute=False)[source]¶ Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.
- Parameters
- ka
int
Number of type-a communities that we want to partition.
- kb
int
Number of type-b communities that we want to partition.
- recompute
bool
(optional, default:False
) TODO.
- ka
- Returns
- dl
float
The description length of the partition found.
- e_rs
numpy.ndarray
the affinity matrix via the group membership vector found by the partitioning engine
- mb
list[int]
group membership vector calculated by the partitioning engine
- dl
-
minimize_bisbm_dl
(self, bipartite_prior=True)[source]¶ Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.
- Parameters
- bipartite_prior
bool
(optional, defaultTrue
)
- bipartite_prior
- Returns
- OptimalKs.bookkeeping_dl
collections.OrderedDict
- OptimalKs.bookkeeping_dl
References
- yen-bipartite-2019
Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.
-
set_adaptive_ratio
(self, adaptive_ratio=0.95)[source]¶ Set the adaptive ratio (
float
between 0 to 1, defaults to0.95
).
-
set_params
(self, init_ka=10, init_kb=10, i_0=0.005)[source]¶ Set the parameters for the heuristic.
- Parameters
- init_ka
int
(required, default:10
) - init_kb
int
(required, default:10
) - i_0
float
(required, default:0.005
)
- init_ka
Notes
TODO.
Warning
If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.
biSBM.optimalks module¶
-
class
biSBM.optimalks.
OptimalKs
(engine, edgelist, types, verbose=False, default_args=True, random_init_k=False, bipartite_prior=True, tempdir=None)[source]¶ Bases:
object
Base class for OptimalKs.
- Parameters
- engine
engine
(required) The inference engine class.
- edgelist
iterable
ornumpy.ndarray
, required Edgelist (bipartite network) for model selection.
- types
iterable
ornumpy.ndarray
, required Types of each node specifying the type membership.
- verbose
bool
(optional, default:False
) Logging level used. If
True
, progress information will be shown.- default_args
bool
(optional, default:True
) Arguments for initiating the heuristic. If
False
, we should useset_params()
,set_adaptive_ratio()
,set_k_th_neighbor_to_search()
, andset_nm()
to set the default parameters.- random_init_k
bool
(optional, default: False) - bipartite_prior
bool
(optional, default:True
) Whether to use the bipartite prior for edge counts for the algorithm. If
False
, we will pretend that we do not assume a bipartite structure at the level of the \(e_{rs}\) matrix. This is what has been used ingraph_tool.inference.minimize.minimize_blockmodel_dl()
, even if itsstate_args
is customized to be bipartite.- tempdir
str
(optional, default:None
) The directory entry for generated temporary network data, which will be removed immediately after the class is deleted. If
tempdir
is unset orNone
, we will search for a standard list of directories and sets tempdir to the first one which the calling user can create files in, astempfile.gettempdir()
dictates. We will pass the file to the inferenceengines
.
- engine
-
compute_and_update
(self, ka, kb, recompute=True)[source]¶ Infer the partitions at a specific \((K_a, K_b)\) and then update the base class.
- Parameters
- ka
int
Number of type-a communities that we want to partition.
- kb
int
Number of type-b communities that we want to partition.
- recompute
bool
(optional, default:True
)
- ka
-
compute_dl
(self, ka, kb, recompute=False)[source]¶ Execute the partitioning code by spawning child processes in the shell; saves its output afterwards.
- Parameters
- ka
int
Number of type-a communities that we want to partition.
- kb
int
Number of type-b communities that we want to partition.
- recompute
bool
(optional, default:False
) TODO.
- ka
- Returns
- dl
float
The description length of the partition found.
- e_rs
numpy.ndarray
the affinity matrix via the group membership vector found by the partitioning engine
- mb
list[int]
group membership vector calculated by the partitioning engine
- dl
-
minimize_bisbm_dl
(self, bipartite_prior=True)[source]¶ Fit the bipartite stochastic block model, by minimizing its description length using an agglomerative heuristic.
- Parameters
- bipartite_prior
bool
(optional, defaultTrue
)
- bipartite_prior
- Returns
- OptimalKs.bookkeeping_dl
collections.OrderedDict
- OptimalKs.bookkeeping_dl
References
- yen-bipartite-2019
Tzu-Chi Yen and Daniel B. Larremore, “Blockmodeling a Bipartite Network with Bipartite Priors”, in preparation.
-
set_adaptive_ratio
(self, adaptive_ratio=0.95)[source]¶ Set the adaptive ratio (
float
between 0 to 1, defaults to0.95
).
-
set_params
(self, init_ka=10, init_kb=10, i_0=0.005)[source]¶ Set the parameters for the heuristic.
- Parameters
- init_ka
int
(required, default:10
) - init_kb
int
(required, default:10
) - i_0
float
(required, default:0.005
)
- init_ka
Notes
TODO.
Warning
If \(i_0\) is set too small, the heuristic will be slow and tends to get trapped in a local minimum (in the description length landscape) where \(K_a\) and \(K_b\) are large.
biSBM.utils module¶
Utilities for network data manipulation and entropy computation.
-
biSBM.utils.
accept_mb_merge
(mb, mlist)[source]¶ Accept partition merge.
- Parameters
- mb
iterable
ornumpy.ndarray
The partition to be merged.
- mlist
iterable
ornumpy.ndarray
The two block labels to be merged.
- mb
- Returns
- _mb
numpy.ndarray
The merged partition.
- _mb
-
biSBM.utils.
adjacency_entropy
(edgelist, mb, exact=True, multigraph=True)[source]¶ Calculate the entropy (a.k.a. negative log-likelihood) associated with the current block partition. It does not include the model entropy.
- Parameters
- edgelist
numpy.ndarray
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- exact
bool
- multigraph
bool
- edgelist
- Returns
- ent
float
The description length (or entropy) in nat of the fitting.
- ent
-
biSBM.utils.
assemble_e_rs_from_mb
(edgelist, mb)[source]¶ Get \(e_{rs}\), i.e., the matrix of edge counts between blocks.
- Parameters
- edgelist
numpy.ndarray
List of edge tuples.
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- edgelist
- Returns
- e_rs
numpy.ndarray
Edge count matrix \(e_{rs}\).
- e_rs
-
biSBM.utils.
assemble_edgelist_old2new
(edgelist, old2new)[source]¶ Assemble the new edgelist array via an old2new mapping.
- Parameters
- edgelist
numpy.ndarray
- old2new
dict
Dictionary that maps the old node index to a new one.
- edgelist
- Returns
- el
numpy.ndarray
The new edgelist of a group of bi-cliques (directly pluggable to
det_k_bisbm.OptimalKs
)
- el
-
biSBM.utils.
assemble_eta_rk_from_edgelist_and_mb
(edgelist, mb)[source]¶ Get \(\eta_{rk}\), or the number \(\eta_{rk}\) of nodes of degree \(k\) that belong to group \(r\).
- Parameters
- edgelist
numpy.ndarray
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- edgelist
- Returns
- eta_rk
numpy.ndarray
- eta_rk
-
biSBM.utils.
assemble_mb_new2old
(mb, new2old)[source]¶ Assemble the partition that corresponds to the old space of node indices.
- Parameters
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- new2old
dict
Dictionary that maps the new node index to the old one; i.e., a reverse mapping of
old2new
.
- mb
- Returns
- old_mb
numpy.ndarray
The partition that corresponds to the old space of node indices.
- old_mb
-
biSBM.utils.
assemble_n_k_from_edgelist
(edgelist, mb)[source]¶ Get \(n_k\), i.e., the number \(n_k\) of nodes of degree \(k\).
- Parameters
- edgelist
numpy.ndarray
List of edge tuples.
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- edgelist
- Returns
- n_k
numpy.ndarray
Array of the number of nodes of degree \(k\).
- n_k
-
biSBM.utils.
assemble_n_r_from_mb
(mb)[source]¶ Get \(n_r\), i.e., the number of nodes in each group, from the partition \(b\).
- Parameters
- mb
iterable
ornumpy.ndarray
Partition \(b\) of nodes into blocks.
- mb
- Returns
-
biSBM.utils.
assemble_old2new_mapping
(types)[source]¶ Create a mapping that map the old node-id’s to new ones, such that the types-array is sorted orderly.
- Parameters
- types
numpy.ndarray
- types
- Returns
- old2new
dict
Dictionary that maps the old node index to a new one.
- new2old
dict
Dictionary that maps the new node index to the old one; i.e., a reverse mapping of
old2new
.- new_types
numpy.ndarray
The new types-array, which is sorted orderly and directly applicable to
det_k_bisbm.OptimalKs
.
- old2new
-
biSBM.utils.
compute_profile_likelihood
(edgelist, mb, ka=None, kb=None, k=None)[source]¶ - Parameters
- edgelist
numpy.ndarray
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- ka
int
Number of communities in type-a.
- kb
int
Number of communities in type-b.
- k
int
Total number of communities.
- edgelist
- Returns
- italic_i
float
- italic_i
-
biSBM.utils.
compute_profile_likelihood_from_e_rs
(e_rs)[source]¶ - Parameters
- e_rs
numpy.ndarray
- e_rs
- Returns
- italic_i
float
- italic_i
-
biSBM.utils.
degree_entropy
(edgelist, mb, __q_cache=array([], shape=(1, 0), dtype=float64), degree_dl_kind='distributed', q_cache_max_e_r=10000)[source]¶ degree_entropy
- Parameters
- edgelist
iterable
ornumpy.ndarray
- mb
iterable
ornumpy.ndarray
Partition \(b\) of nodes into blocks.
- __q_cache
numpy.ndarray
(required, default:np.array([], ndmin=2)
) - degree_dl_kind: ``str``
degree_dl_kind == "uniform"
This corresponds to a non-informative prior, where the node degrees are sampled from an uniform distribution.
degree_dl_kind == "distributed"
This option should be preferred in most cases.
- edgelist
- Returns
- ent
float
The entropy.
- ent
-
biSBM.utils.
gen_bicliques_edgelist
(b, num_nodes)[source]¶ Generate an array of edgelist and node-type mapping for a group of bi-cliques.
- Parameters
- b
int
Number of bi-cliques.
- num_nodes
int
Number of nodes (size) for each bi-clique.
- b
- Returns
- el
numpy.ndarray
The edgelist of a group of bi-cliques.
- types
numpy.ndarray
The types-array that maps each node id to a bipartite type (
0
or1
).
- el
-
biSBM.utils.
gen_e_rs
(b, n_edges, p=0)[source]¶ - Parameters
- b
int
Number of communities within each type. (suppose Ka = Kb)
- n_edges
int
Number of edges planted in the system.
- p
float
Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).
- b
- Returns
- e_rs
numpy.ndarray
Edge counts matrix.
- e_rs
-
biSBM.utils.
gen_e_rs_hard
(ka, kb, n_edges, p=0)[source]¶ - Parameters
- ka
int
Mumber of communities within type-a.
- kb
int
Number of communities within type-b.
- n_edges
int
Number of edges planted in the system.
- p
float
Edge propensity between groups; i.e., the ratio \(c_{out} / c_{in}\).
- ka
- Returns
- e_rs
numpy.ndarray
Edge counts matrix.
- e_rs
-
biSBM.utils.
gen_e_rs_harder
(ka, kb, n_edges, samples=1, top_k=1)[source]¶ - Parameters
- ka
int
Number of communities within type-a.
- kb
int
Number of communities within type-b.
- n_edges
int
Number of edges planted in the system.
- samples
int
Number of random draws made on \(e_{rs}\).
- top_k
int
Number of samples selected. These are top-k samples with higher profile likelihood.
- ka
- Returns
- e_rs
numpy.ndarray
orlist[numpy.ndarray]
(whentop_k > 1
) Edge counts matrix.
- e_rs
-
biSBM.utils.
gen_equal_bipartite_partition
(na, nb, ka, kb)[source]¶ - Parameters
- na
int
Number of nodes in type-a.
- nb
int
Number of nodes in type-b.
- ka
int
Number of communities in type-a.
- kb
int
Number of communities in type-b.
- na
- Returns
-
biSBM.utils.
gen_equal_partition
(n, total)[source]¶ gen_equal_partition
- Parameters
- n
int
- total
int
- n
- Returns
- n_blocks
list
- n_blocks
-
biSBM.utils.
gen_unequal_partition
(n, total, avg_deg, alpha)[source]¶ - Parameters
- n
int
Number of communities.
- total
int
Number of nodes.
- avg_deg
float
Average degree of the network.
- alpha
float
The parameter of the Dirichlet distribution (the smaller the unevener the distribution is). We usually use alpha=1 to generate a case and filter it out if the lowest size of a community is less than (2 * total / avg_deg) ** 0.5 (resolution limit).
- n
- Returns
- a
list
The sizes of communities to sum to total.
- _
float
The ratio of largest-sized community to the lowest-sized one.
- a
-
biSBM.utils.
get_desc_len_from_data
(na, nb, n_edges, ka, kb, edgelist, mb, diff=False, nr=None, allow_empty=False, degree_dl_kind='distributed', q_cache=array([], shape=(1, 0), dtype=float64), is_bipartite=True)[source]¶ Description length difference to a randomized instance
- Parameters
- na
int
Number of nodes in type-a.
- nb
int
Number of nodes in type-b.
- n_edges
int
Number of edges.
- ka
int
Number of communities in type-a.
- kb
int
Number of communities in type-b.
- edgelist
numpy.ndarray
Edgelist in Python list structure.
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- diff
bool
When diff == True, the returned description value will be the difference to that of a random bipartite network. Otherwise, it will return the entropy (a.k.a. negative log-likelihood) associated with the current block partition.
- allow_empty
bool
- nr
numpy.ndarray
- degree_dl_kindstr (optional, default: “distributed”)
degree_dl_kind == “uniform”
degree_dl_kind == “distributed” (default)
degree_dl_kind == “entropy”
- is_bipartite: `bool` (default: `”True”`)
- na
- Returns
- desc_len_b
float
Difference of the description length to the bipartite ER network, per edge.
- desc_len_b
-
biSBM.utils.
get_desc_len_from_data_uni
(n, n_edges, k, edgelist, mb)[source]¶ Description length difference to a randomized instance, via PRL 110, 148701 (2013).
- Parameters
- n
int
Number of nodes.
- n_edges
int
Number of edges.
- k
int
Number of communities.
- edgelist
numpy.ndarray
A list of edges.
- mb
numpy.ndarray
Partition \(b\) of nodes into blocks.
- n
- Returns
- desc_len_b
float
Difference of the description length to the ER network, per edge.
- desc_len_b
-
biSBM.utils.
get_flat_entropies
(state)[source]¶ - Parameters
- state
graph_tool.inference.blockmodel.BlockState
The stochastic block model state of a given graph, as defined in Graph-tool.
- state
- Returns
- dl
dict
The entropic report of the partition.
- dl
-
biSBM.utils.
model_entropy
(e, ka=None, kb=None, na=None, nb=None, nr=None, allow_empty=False, is_bipartite=True)[source]¶ Computes the amount of information necessary for the parameters of the (bipartite) blockmodel ensemble, for
ka
type-a blocks,kb
type-b blocks,na
type-a vertices,nb
vertices,e
edges, and either bipartite or general as a prior. This includes the entropy from modeling the edge counts and the partition.Note that if we know a priori that the network is bipartite, we can further compress the model.
- Parameters
- e
int
Number of edges.
- ka
int
Number of communities in type-a.
- kb
int
Number of communities in type-b.
- na
int
Number of vertices in type-a.
- nb
int
Number of vertices in type-b.
- nr
numpy.ndarray
Vertex property array of the block graph which contains the block sizes.
- allow_empty
bool
(optional, default:False
) If
True
, partition description length computed will allow for empty groups.- is_bipartite
bool
(optional, default:True
) If
False
, edge counts description length computed will assume a purely flat \(e_{rs}\).
- e
- Returns
- dl
float
The description length (or entropy) in nat of the model.
- dl
References
- peixoto-parsimonious-2013
Tiago P. Peixoto, “Parsimonious module inference in large networks”, Phys. Rev. Lett. 110, 148701 (2013), DOI: 10.1103/PhysRevLett.110.148701 [sci-hub], arXiv: 1212.4794.
- peixoto-nonparametric-2017
Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703
-
biSBM.utils.
partition_entropy
(ka=None, kb=None, k=None, na=None, nb=None, n=None, nr=None, allow_empty=False)[source]¶ Compute the partition entropy, P(b), for the current partition. It has several variations depending on the priors used. In the crudest way (compute_profile_likelihood_from_e_rsnr == None), we formulate P(b) = P(b | B) * P(B). Or, by a two-level Bayesian hierarchy, we can do P(b) = P(b | n) * P(n | B) * P(B).
- Parameters
- ka
int
Number of communities in type-a.
- kb
int
Number of communities in type-b.
- k
int
Number of communities in the graph.
- na
int
Number of vertices in type-a.
- nb
int
Number of vertices in type-b.
- n
int
Number of vertices in the graph.
- nr
numpy.ndarray
Vertex property array of the block graph which contains the block sizes.
- allow_empty
bool
(optional, default:False
) If
True
, partition description length computed will allow for empty groups.
- ka
- Returns
- ent
float
The description length (or entropy) in nat of the partition.
- ent
-
biSBM.utils.
virtual_moves_ds
(ori_e_rs, mlists, ka)[source]¶ virtual_moves_ds
- Parameters
- ori_e_rs
numpy.ndarray
- mlists
set
- ka
int
Number of communities in type-a.
- ori_e_rs
- Returns
- dS
float
- _mlist
numpy.ndarray
- dS
biSBM.painter module¶
plots
-
biSBM.painter.
paint_block_mat
(mb, edgelist, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]¶
-
biSBM.painter.
paint_block_mat_from_e_rs
(e_rs, output=None, figsize=(3, 3), dpi=200, **kwargs)[source]¶
biSBM.int_part module¶
The int_part module computes the number of restricted integer partitions \(q(m, n)\) of the integer \(m\) into at most \(n\) parts.
Specifically, it counts the number of different degree counts with the sum of degrees being exactly \(m\) and that have at most \(n\) nonzero counts. Since the quantity can only computed via a recurrence, we pre-compute the values to fill up a memoization table when the number of edges and nodes is not too large; otherwise, we use accurate asymptotic expressions to efficiently compute the values for large arguments. [Rcd2a8c25c865-peixoto-nonparametric-2017]
References¶
- Rcd2a8c25c865-peixoto-nonparametric-2017
Tiago P. Peixoto, “Nonparametric Bayesian inference of the microcanonical stochastic block model”, Phys. Rev. E 95 012317 (2017), DOI: 10.1103/PhysRevE.95.012317 [sci-hub], arXiv: 1610.02703
-
biSBM.int_part.
init_q_cache
(n_max, __q_cache=array([], shape=(1, 0), dtype=float64))[source]¶ Initiate the look-up table for \(q(m, n)\).
- Parameters
- n_max
int
- __q_cache
numpy.ndarray
(required, default:np.array([], ndmin=2)
)
- n_max
-
biSBM.int_part.
log_q
(n, k, __q_cache)[source]¶ - Parameters
- n
int
- k
int
- __q_cache
numpy.ndarray
- n
biSBM.ioutils module¶
i/o utilities
-
biSBM.ioutils.
get_edgelist
(f_edgelist, delimiter=None)[source]¶ This function returns an edgelist list from a file.
- Parameters
- f_edgelist
str
The path to the edgelist text file.
- delimiter
str
The delimiter that separate the edges.
- f_edgelist
- Returns
- edgelist
numpy.ndarray
The numpy list of tupled edges.
- edgelist