Marco Lopez-Cruz, Paulino Pérez-Rodríguez, Gustavo de los Campos

A fast algorithm to factorize high-dimensional Tensor Product matrices used in Genetic Models

  • Genetics (clinical)
  • Genetics
  • Molecular Biology

Abstract Many genetic models (including models for epistatic effects as well as genetic-by-environment) involve covariance structures that are Hadamard products of lower rank matrices. Implementing these models require factorizing large Hadamard product matrices. The available algorithms for factorization do not scale well for big data, making the use of some of these models not feasible with large sample sizes. Here, based on properties of Hadamard products and (related) Kronecker products we propose an algorithm that produces an approximate decomposition that is orders of magnitude faster than the standard eigenvalue decomposition. In this article, we describe the algorithm, show how it can be used to factorize large Hadamard product matrices, present benchmarks, and illustrate the use of the method by presenting an analysis of data from the northern testing locations of the G×E project from the Genomes-to-Fields Initiative (n∼60,000). We implemented the proposed algorithm in the open-source ‘tensorEVD’ R-package.

