on
Convert a linear index to lower triangular subscripts
Recently, while browsing an article
dealing with parallel euclidean distance computation, I discovered an interesting formula.
This formula allows to convert a linear index to lower triangular subscripts (i, j).
So, given an integer k, we can easily determine the row and the column of
the k-th element in a lower triangular matrix.
As an example, in the lower triangle of a 4 x 4 matrix,
the row and column numbers of the 7th element are 4 and 1, respectively (row-wise, diagonal included).
Below is the algorithm used for the computation (source: Angeletti et al. 2019):
This is a computational trick to avoid using nested loops over rows and columns and it can therefore be very useful to compute distances, or for similar operations involving pairwise comparisons.
The aforementioned paper presents formulas for row-wise numbering
(diagonal included) and col-wise numbering (diagonal included and excluded,
with some errors in this last case, I think).
There is a related
stackoverflow question,
with an insightful comment by Mikhail Maltsev pointing to the theory behind
this approach, namely triangular numbers.
Based on these elements, I wrote some R functions for this kind of index conversion.
The proposed functions can be used for row-wise, col-wise, and diag-wise numbering,
including or excluding the diagonal.
I wrote a specific function for each case and a wrapper function for more convenience.
It implies some redundancy but makes things easier to follow.
Functions for row-wise numbering
# diagonal included
index_to_i_j_rowwise_diag <- function(k, n) {
p <- (sqrt(1 + 8 * k) - 1) / 2
i0 <- floor(p)
if (i0 == p) {
return(c(i0, i0)) # (i, j)
} else {
i <- i0 + 1
j <- k - i0 * (i0 + 1) / 2
c(i, j)
}
}
# diagonal excluded
index_to_i_j_rowwise_nodiag <- function(k, n) {
kp <- k - 1
p <- floor((sqrt(1 + 8 * kp) - 1) / 2)
i <- p + 2
j <- kp - p * (p + 1) / 2 + 1
c(i, j)
}
Functions for col-wise numbering
# diagonal included
index_to_i_j_colwise_diag <- function(k, n) {
kp <- n * (n + 1) / 2 - k
p <- floor((sqrt(1 + 8 * kp) - 1) / 2)
i <- n - (kp - p * (p + 1) / 2)
j <- n - p
c(i, j)
}
# diagonal excluded
index_to_i_j_colwise_nodiag <- function(k, n) {
kp <- n * (n - 1) / 2 - k
p <- floor((sqrt(1 + 8 * kp) - 1) / 2)
i <- n - (kp - p * (p + 1) / 2)
j <- n - 1 - p
c(i, j)
}
Functions for diag-wise numbering
# diagonal included
index_to_i_j_diagwise_diag <- function(k, n) {
kp <- n * (n + 1) / 2 - k
p <- floor((sqrt(1 + 8 * kp) - 1) / 2)
i <- n - (kp - p * (p + 1) / 2)
j <- i + 1 - (n - p)
c(i, j)
}
# diagonal excluded
index_to_i_j_diagwise_nodiag <- function(k, n) {
kp <- n * (n - 1) / 2 - k
p <- floor((sqrt(1 + 8 * kp) - 1) / 2)
i <- n - (kp - p * (p + 1) / 2)
j <- i - (n - 1 - p)
c(i, j)
}
Wrapper function
index_to_i_j <- function(k, n,
direction = c("diagwise", "colwise", "rowwise"),
diagonal = FALSE) {
direction <- match.arg(direction)
if (diagonal) {
switch(direction,
diagwise = index_to_i_j_diagwise_diag(k, n),
colwise = index_to_i_j_colwise_diag(k, n),
rowwise = index_to_i_j_rowwise_diag(k, n))
} else {
switch(direction,
diagwise = index_to_i_j_diagwise_nodiag(k, n),
colwise = index_to_i_j_colwise_nodiag(k, n),
rowwise = index_to_i_j_rowwise_nodiag(k, n))
}
}
Examples
index_to_i_j(3, 7) # get i and j for the 3rd element in a matrix of size 7
## [1] 4 3
index_to_i_j(3, 7, "colwise")
## [1] 4 1
index_to_i_j(3, 7, "rowwise")
## [1] 3 2
index_to_i_j(3, 7, "diagwise", TRUE)
## [1] 3 3
index_to_i_j(3, 7, "colwise", TRUE)
## [1] 3 1
index_to_i_j(3, 7, "rowwise", TRUE)
## [1] 2 2
Test
Here is some code to illustrate the six cases with a 7*7 matrix.
The index_to_i_j
function is meant to be used for one element at a time and
for a specific numbering case. The code below is therefore quite unelegant,
but this is just to check the results.
res <- list() # used to store the results (for each direction*diagonal)
z <- 1 # counter used in the loops
n <- 7 # size of the matrix
for (direction in c("Diagwise", "Colwise", "Rowwise")) {
for (diagonal in c(FALSE, TRUE)) {
# compute the number of elements
if (diagonal) {
K <- n * (n + 1) / 2
} else {
K <- n * (n - 1) / 2
}
# run index_to_i_j for each k
ij <- lapply(1:K, function(x) {
index_to_i_j(x, n, tolower(direction), diagonal)
})
ij <- do.call(rbind, ij)
# create a full square matrix with dots (".") and replace lower triangle
mat <- matrix(".", nrow = n, ncol = n)
mat[ij] <- 1:K
# format to data.frame
row_col <- expand.grid(1:n, 1:n)
diagonal <- ifelse(diagonal, "Diagonal included", "Diagonal excluded")
res[[z]] <- data.frame(row_col, as.vector(mat), direction, diagonal)
z <- z + 1
}
}
# combine and format the results (using base R)
res <- do.call(rbind, res)
colnames(res)[1:3] <- c("i", "j", "k")
res$i <- factor(res$i, levels = n:1)
res$j <- factor(res$j, levels = 1:n)
library(ggplot2) # plot the results
p <- ggplot(data = res,
aes(x = j,
y = i)) +
geom_tile(aes(fill = (i == j)),
colour = "lightgray",
show.legend = FALSE) +
geom_text(aes(label = k),
size = 4) +
scale_fill_manual(values = c("white", "lightgray")) +
scale_x_discrete(position = "top") +
facet_grid(rows = vars(diagonal),
cols = vars(direction),
switch = "x") +
theme_minimal() +
theme(strip.text = element_text(face = "bold", size = 12)) +
xlab("") +
ylab("")
p
Conclusion
It seems to work. The potential benefit offered by these functions will be evaluated in the next article: Comparison of different for loops methods in parallel distance computation with R, C, and OpenMP.
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.3.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 knitr_1.26 magrittr_1.5 tidyselect_1.1.0
## [5] munsell_0.5.0 colorspace_1.4-1 R6_2.4.1 rlang_0.4.6
## [9] dplyr_1.0.0 stringr_1.4.0 tools_3.6.3 grid_3.6.3
## [13] gtable_0.3.0 xfun_0.11 withr_2.1.2 htmltools_0.4.0
## [17] yaml_2.2.0 digest_0.6.23 tibble_2.1.3 lifecycle_0.2.0
## [21] crayon_1.3.4 bookdown_0.16 farver_2.0.1 purrr_0.3.3
## [25] vctrs_0.3.1 glue_1.4.1 evaluate_0.14 rmarkdown_2.0
## [29] blogdown_0.17 stringi_1.4.3 compiler_3.6.3 pillar_1.4.3
## [33] generics_0.0.2 scales_1.1.0 pkgconfig_2.0.3