Skip to content

Commit 32579d6

Browse files
authored
Fix #19 for real and document extensively
* Mission fix the off by factor of 2 error begins * Add expected edges back for use in vignette * Air * Improve degree tests * Test population and sample spectra are close * Add consistency vignette * Re-document * Fixes #19 for real. Closes #43. Closes #33. Closes #31. * Fix typos in NEWS * Explicitly load matrix for rowScale/colScale * Air * Drop old release checks we're not tidyverse
1 parent f2d511f commit 32579d6

34 files changed

+778
-348
lines changed

.github/workflows/R-CMD-check.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,12 @@ jobs:
2626
- {os: macos-latest, r: 'release'}
2727

2828
- {os: windows-latest, r: 'release'}
29-
# use 4.0 or 4.1 to check with rtools40's older compiler
30-
- {os: windows-latest, r: 'oldrel-4'}
3129

3230
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
3331
- {os: ubuntu-latest, r: 'release'}
3432
- {os: ubuntu-latest, r: 'oldrel-1'}
3533
- {os: ubuntu-latest, r: 'oldrel-2'}
3634
- {os: ubuntu-latest, r: 'oldrel-3'}
37-
- {os: ubuntu-latest, r: 'oldrel-4'}
3835

3936
env:
4037
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

DESCRIPTION

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: fastRG
22
Title: Sample Generalized Random Dot Product Graphs in Linear Time
3-
Version: 0.3.3.9000
3+
Version: 0.3.3.9001
44
Authors@R: c(
55
person("Alex", "Hayes", , "alexpghayes@gmail.com", role = c("aut", "cre", "cph"),
66
comment = c(ORCID = "0000-0002-4985-5160")),
@@ -38,11 +38,15 @@ Imports:
3838
tidyr
3939
Suggests:
4040
covr,
41+
irlba,
4142
knitr,
4243
magrittr,
44+
purrr,
4345
rmarkdown,
46+
scales,
4447
testthat (>= 3.0.0)
4548
Config/testthat/edition: 3
4649
Encoding: UTF-8
4750
Roxygen: list(markdown = TRUE)
4851
RoxygenNote: 7.3.2
52+
VignetteBuilder: knitr

NEWS.md

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,32 @@
11
# fastRG (development version)
22

3-
- Added option to specify precise number of nodes in each block of a `dcsbm()` or `sbm()` via the `block_sizes` argument. This makes it easier to construct blockmodels with exactly repeated eigenvalues.
4-
- The default behavior of `dcsbm()`, `sbm()` and `planted_partition()` has changed: when `block_sizes` or `pi` is unspecified, the new default is to balance block sizes as evenly as possible. Previously, `pi` was set to a constant vector, balancing block sizes in expectation only.
5-
- Specifying both `k` and `B` in `dcsbm()` and `sbm()` now results in an error; only specify one of these arguments.
3+
## Major changes
4+
5+
- This release fixes a long-standing inconsistency in how degrees are counted, which resulted in sampling twice as many edges as desired in `undirected_factor_model()`. See below for details (#19). This also caused population singular values for undirected factors models to be off by a factor of 2 (#31).
6+
7+
## Breaking changes
8+
9+
- Specifying both `k` and `B` in `dcsbm()` and `sbm()` now results in an error. You should only specify one of these arguments.
10+
11+
- It is now possible to specify the precise number of nodes in each block of a `dcsbm()` (and subclasses `sbm()` and `planted_partition()`) via the `block_sizes` argument. This makes it easier to construct blockmodels with exactly repeated eigenvalues. Additionally, the default behavior is now to use this argument and to balance block sizes as evenly as possible. Previously, the default behavior was to sample blocks memberships with equal probability.
12+
13+
## Non-breaking changes
14+
15+
- Added `vignette("consistency")` demonstrating how to check consistency of spectral estimators using `fastRG` for sampling and population spectra computations (#33, #43)
16+
17+
## Details about degree over-sampling bug and the fix
18+
19+
The `fastRG` sampling algorithm, as implemented in `sample_edgelist.matrix()`, is fundamentally a sampler for asymmetric, directed networks with conditional expectation $\mathbb E[A \mid X, S, Y] = X S Y^\top \in \mathbb R^{n_1 \times n_2}$. That is, you can think of the sampler as a very efficient procedure for iterating through $i = 1, ..., n_1$ and $j = 1, ..., n_2$ and sampling from a Poisson with rate $(X S Y^\top)_{ij}$.
20+
21+
However, we would also like to use this same sampler to sample from undirected networks. In an undirected networks, the conditional expectation $\mathbb E[A \mid X, S] = X S X^\top \in \mathbb R^{n \times n}$ is a square matrix with $(X S X^\top)_{ij} = (X S X^\top)_{ji}$. To sample from this matrix, it's typical to sample the upper triangle of $A$ from a Poisson with rate $(X S X^\top)_{ij}$ for all $1 \le i \le j \le n$, and then symmetrize $A$.
22+
23+
Since the `fastRG` algorithm samples $A_{ij}$ for all $i, j$, not just the upper triangle of $A$, we use a trick to sample from undirected networks. First, we force the conditional expectation to the symmetric by symmetrizing $S$. Then, we still sample for all $i, j$. To set $A_{ij}$ we sample once from a Poisson with rate $(X S X^\top)_{ij}$ and once from a Poisson with rate $(X S X^\top)_{ji}$ (these rates are equal by symmetry!). Then we set $A_{ij} = A_{ji}$ to the sum of these Poisson random variables. The issue is that this doubles the expected value of $A_{ij} = A_{ji}$ and so we sample twice as many edges as we should. Up until this release of `fastRG`, we've unfortunately been doing this double sampling in undirected networks (#19).
24+
25+
In this release, we fix this over-sampling. The key is that we divide $S$ by two at sampling time. We do not modify $S$ at all in the `undirected_factor_model()`! You can always use $X S X^\top$ to compute the expected value of $A$. This new change *only affects sampling*.
26+
27+
That is, instead of passing the $S$ from an `undirected_factor_model()` to the sampler `sample_edgelist.matrix()`, we pass $S / 2$ (see `sample_edgelist.undirected_factor_model()`). This fixes double sampling on the off-diagonal of $A$. The downside is that we're now undersampling by half the diagonal of $A$. I'm assuming that for most use cases this doesn't matter. We could correct for this undersampling of the diagonal of $A$, so please open an issue if self-loops are important to your project.
28+
29+
As a consequence of this change, $A$ and $\mathbb E[A | X, S]$ show now be on the same scale, rather than off by a factor of 2. Importantly, the spectrums should match up now, so you can now use `fastRG` to check how closely you're recovering the spectrum of the your model. See `vignette("consistency")` for a quick demonstration showing consistency of spectral estimates.
630

731
# fastRG 0.3.3
832

R/directed_dcsbm.R

Lines changed: 43 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
11
new_directed_dcsbm <- function(
2-
X, S, Y,
3-
theta_out,
4-
theta_in,
5-
z_out,
6-
z_in,
7-
pi_out,
8-
pi_in,
9-
sorted,
10-
...,
11-
subclass = character()) {
2+
X,
3+
S,
4+
Y,
5+
theta_out,
6+
theta_in,
7+
z_out,
8+
z_in,
9+
pi_out,
10+
pi_in,
11+
sorted,
12+
...,
13+
subclass = character()
14+
) {
1215
subclass <- c(subclass, "directed_dcsbm")
1316
dcsbm <- directed_factor_model(X, S, Y, ..., subclass = subclass)
1417
dcsbm$theta_out <- theta_out
@@ -334,16 +337,20 @@ validate_directed_dcsbm <- function(x) {
334337
#' population_svd <- svds(ddcsbm)
335338
#'
336339
directed_dcsbm <- function(
337-
n = NULL,
338-
theta_out = NULL, theta_in = NULL,
339-
k_out = NULL, k_in = NULL, B = NULL,
340-
...,
341-
pi_out = rep(1 / k_out, k_out),
342-
pi_in = rep(1 / k_in, k_in),
343-
sort_nodes = TRUE,
344-
force_identifiability = TRUE,
345-
poisson_edges = TRUE,
346-
allow_self_loops = TRUE) {
340+
n = NULL,
341+
theta_out = NULL,
342+
theta_in = NULL,
343+
k_out = NULL,
344+
k_in = NULL,
345+
B = NULL,
346+
...,
347+
pi_out = rep(1 / k_out, k_out),
348+
pi_in = rep(1 / k_in, k_in),
349+
sort_nodes = TRUE,
350+
force_identifiability = TRUE,
351+
poisson_edges = TRUE,
352+
allow_self_loops = TRUE
353+
) {
347354
# NOTE:
348355
# - X corresponds to outgoing communities, outgoing edges and rows of A
349356
# - Y corresponds to incoming communities, incoming edges and columns of A
@@ -462,8 +469,6 @@ directed_dcsbm <- function(
462469
z_in <- sort(z_in)
463470
}
464471

465-
466-
467472
if (k_out > 1) {
468473
X <- sparse.model.matrix(~ z_out + 0)
469474
} else {
@@ -525,7 +530,10 @@ directed_dcsbm <- function(
525530
#' @export
526531
print.directed_dcsbm <- function(x, ...) {
527532
cat(glue("Directed Degree-Corrected Stochastic Blockmodel\n", .trim = FALSE))
528-
cat(glue("-----------------------------------------------\n\n", .trim = FALSE))
533+
cat(glue(
534+
"-----------------------------------------------\n\n",
535+
.trim = FALSE
536+
))
529537

530538
sorted <- if (x$sorted) "arranged by block" else "not arranged by block"
531539

@@ -550,7 +558,16 @@ print.directed_dcsbm <- function(x, ...) {
550558
cat("Allow self loops:", as.character(x$allow_self_loops), "\n\n")
551559

552560
cat(glue("Expected edges: {round(expected_edges(x))}\n", .trim = FALSE))
553-
cat(glue("Expected in degree: {round(expected_out_degree(x), 1)}\n", .trim = FALSE))
554-
cat(glue("Expected out degree: {round(expected_in_degree(x), 1)}\n", .trim = FALSE))
555-
cat(glue("Expected density: {round(expected_density(x), 5)}\n", .trim = FALSE))
561+
cat(glue(
562+
"Expected in degree: {round(expected_out_degree(x), 1)}\n",
563+
.trim = FALSE
564+
))
565+
cat(glue(
566+
"Expected out degree: {round(expected_in_degree(x), 1)}\n",
567+
.trim = FALSE
568+
))
569+
cat(glue(
570+
"Expected density: {round(expected_density(x), 5)}\n",
571+
.trim = FALSE
572+
))
556573
}

R/directed_erdos_renyi.R

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
1-
new_directed_erdos_renyi <- function(X, S, Y, p, poisson_edges, allow_self_loops, ...) {
1+
new_directed_erdos_renyi <- function(
2+
X,
3+
S,
4+
Y,
5+
p,
6+
poisson_edges,
7+
allow_self_loops,
8+
...
9+
) {
210
er <- directed_factor_model(
3-
X, S, Y, ...,
11+
X,
12+
S,
13+
Y,
14+
...,
415
subclass = "directed_erdos_renyi",
516
poisson_edges = poisson_edges,
617
allow_self_loops = allow_self_loops
@@ -57,14 +68,18 @@ validate_directed_erdos_renyi <- function(x) {
5768
#' A
5869
#'
5970
directed_erdos_renyi <- function(
60-
n, ..., p = NULL,
61-
poisson_edges = TRUE,
62-
allow_self_loops = TRUE) {
71+
n,
72+
...,
73+
p = NULL,
74+
poisson_edges = TRUE,
75+
allow_self_loops = TRUE
76+
) {
6377
X <- Matrix(1, nrow = n, ncol = 1)
6478
Y <- Matrix(1, nrow = n, ncol = 1)
6579

66-
if (is.null(p) && is.null(expected_in_degree) &&
67-
is.null(expected_out_degree)) {
80+
if (
81+
is.null(p) && is.null(expected_in_degree) && is.null(expected_out_degree)
82+
) {
6883
stop(
6984
"Must specify either `p`, `expected_in_degree` or ",
7085
" `expected_out_degree`.",
@@ -79,7 +94,9 @@ directed_erdos_renyi <- function(
7994
S <- matrix(p, nrow = 1, ncol = 1)
8095

8196
er <- new_directed_erdos_renyi(
82-
X, S, Y,
97+
X,
98+
S,
99+
Y,
83100
p = p,
84101
poisson_edges = poisson_edges,
85102
allow_self_loops = allow_self_loops,

R/directed_factor_model.R

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
new_directed_factor_model <- function(
2-
X, S, Y,
3-
poisson_edges,
4-
allow_self_loops,
5-
...,
6-
subclass = character()) {
2+
X,
3+
S,
4+
Y,
5+
poisson_edges,
6+
allow_self_loops,
7+
...,
8+
subclass = character()
9+
) {
710
rlang::check_dots_unnamed()
811

912
n <- nrow(X)
@@ -160,13 +163,16 @@ validate_directed_factor_model <- function(x) {
160163
#' fm2
161164
#'
162165
directed_factor_model <- function(
163-
X, S, Y,
164-
...,
165-
expected_in_degree = NULL,
166-
expected_out_degree = NULL,
167-
expected_density = NULL,
168-
poisson_edges = TRUE,
169-
allow_self_loops = TRUE) {
166+
X,
167+
S,
168+
Y,
169+
...,
170+
expected_in_degree = NULL,
171+
expected_out_degree = NULL,
172+
expected_density = NULL,
173+
poisson_edges = TRUE,
174+
allow_self_loops = TRUE
175+
) {
170176
X <- Matrix(X)
171177
S <- Matrix(S)
172178
Y <- Matrix(Y)
@@ -186,7 +192,9 @@ directed_factor_model <- function(
186192
}
187193

188194
fm <- new_directed_factor_model(
189-
X, S, Y,
195+
X,
196+
S,
197+
Y,
190198
poisson_edges = poisson_edges,
191199
allow_self_loops = allow_self_loops,
192200
...

R/expected-degrees.R

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,23 @@ expected_edges.undirected_factor_model <- function(factor_model, ...) {
9393
sum(Cx %*% S %*% Cx)
9494
}
9595

96+
#' @rdname expected_edges
97+
#' @export
98+
expected_degrees <- function(factor_model, ...) {
99+
UseMethod("expected_degrees")
100+
}
101+
102+
#' @export
103+
expected_degrees.undirected_factor_model <- function(factor_model, ...) {
104+
# rowSums of E[A|X, S] = XSX' are XSX'1 for 1 a column vector of ones
105+
# want to avoid memory cost of instantiating all of E[A|X, S], which is
106+
# typically large and dense
107+
X <- factor_model$X
108+
S <- factor_model$S
109+
ones <- matrix(1, nrow = nrow(X))
110+
as.numeric(X %*% (tcrossprod(S, X) %*% ones))
111+
}
112+
96113
#' @rdname expected_edges
97114
#' @export
98115
expected_degree <- function(factor_model, ...) {
@@ -123,24 +140,6 @@ expected_degree.undirected_factor_model <- function(factor_model, ...) {
123140
expected_edges(factor_model) / as.numeric(factor_model$n)
124141
}
125142

126-
127-
#' @rdname expected_edges
128-
#' @export
129-
expected_degrees <- function(factor_model, ...) {
130-
UseMethod("expected_degrees")
131-
}
132-
133-
#' @export
134-
expected_degrees.undirected_factor_model <- function(factor_model, ...) {
135-
# rowSums of E[A|X, S] = XSX' are XSX'1 for 1 a column vector of ones
136-
# want to avoid memory cost of instantiating all of E[A|X, S], which is
137-
# typically large and dense
138-
X <- factor_model$X
139-
S <- factor_model$S
140-
ones <- matrix(1, nrow = nrow(X))
141-
as.numeric(X %*% (tcrossprod(S, X) %*% ones))
142-
}
143-
144143
#' @export
145144
expected_density.undirected_factor_model <- function(factor_model, ...) {
146145
expected_edges(factor_model) / as.numeric(choose(factor_model$n, 2))

R/expected-spectra.R

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,13 @@ RSpectra::eigs_sym
1616
#' @method eigs_sym undirected_factor_model
1717
#' @export
1818
eigs_sym.undirected_factor_model <- function(
19-
A, k = A$k,
20-
which = "LM", sigma = NULL,
21-
opts = list(),
22-
...) {
19+
A,
20+
k = A$k,
21+
which = "LM",
22+
sigma = NULL,
23+
opts = list(),
24+
...
25+
) {
2326
rlang::check_installed("RSpectra")
2427

2528
Ax <- function(x, args) as.numeric(args$X %*% (args$SXt %*% x))
@@ -37,12 +40,13 @@ eigs_sym.undirected_factor_model <- function(
3740
#' @method svds undirected_factor_model
3841
#' @export
3942
svds.undirected_factor_model <- function(
40-
A,
41-
k = A$k,
42-
nu = k,
43-
nv = k,
44-
opts = list(),
45-
...) {
43+
A,
44+
k = A$k,
45+
nu = k,
46+
nv = k,
47+
opts = list(),
48+
...
49+
) {
4650
rlang::check_installed("RSpectra")
4751

4852
Ax <- function(x, args) {
@@ -76,12 +80,13 @@ svds.undirected_factor_model <- function(
7680
#' @method svds directed_factor_model
7781
#' @export
7882
svds.directed_factor_model <- function(
79-
A,
80-
k = min(A$k1, A$k2),
81-
nu = k,
82-
nv = k,
83-
opts = list(),
84-
...) {
83+
A,
84+
k = min(A$k1, A$k2),
85+
nu = k,
86+
nv = k,
87+
opts = list(),
88+
...
89+
) {
8590
rlang::check_installed("RSpectra")
8691

8792
Ax <- function(x, args) {

0 commit comments

Comments
 (0)