scKTLD is a method designed for the identification of TAD-like domains on single-cell Hi-C data. It treats the Hi-C contact matrix as a graph, embeds its structures into a low-dimensional space by combining sparse matrix factorization and spectral propagation, and identifies the TAD-like domains in the embedding space via a kernel model optimized by PELT
2.1 OS
- ubuntu 18.04
2.2 Required Python Packages Make sure that all the packages listed in the requirements.txt are installed.
- Python >= 3.6
- scipy >= 1.5.2
- numpy >= 1.18.0
- networkx >= 2.5
- scikit-learn >= 0.24.2
2.3 Install from Github
(1) Download the folder scKTLD by git clone
$ git clone https://github.com/lhqxinghun/scKTLD/
(2) Install the package scKTLD with the following command:
$ conda create -n scKTLD python=3.6
$ conda activate scKTLD
$ pip install Cython
$ cd scKTLD
$ pip install . #or you can try python setup.py install
2.4 Or install from the standard package source PyPI
$ conda create -n scKTLD python=3.6
$ conda activate scKTLD
$ pip install Cython scKTLD
2.5 Run example
$ cd scKTLD
$ python example.py
# If it works properly, You can find the result files in the output directory, including the .txt file that contains the
identified TAD-like domain boundaries and the .tiff file for visualization. The .tiff file in the example is shown as follows:
More detailed examples can be find in the jupyter notebook example.ipynb
(1) The key function in this package is callTLD, it has the following input and output:
- graph np.ndarray, the dense format of a contact matrix, i.e. n×n matrix, An example is shown in "./data/exp-sc/gm12878_cell7_chr3_dense.txt"
- dimension int, dimension of the embedding vectors of nodes.
- penalty float, penalty constant during changepoint detection.
- brecon bool, whether or not to return the reconstructed Hi-C map.
If brecon is false, the callTLD function will only return a list of domain boundaires, else it will return the domain boundaries as well as a reconstructed Hi-C map
(2) For sparse format of a contact matrix, scKTLD provides function edge2adj to convert it to an adjacency matrix (dense format), which can be directly input to the fucntion callTLD
- edge np.ndarray, the sparse format of a contact matrix, i.e. three columns. An example is shown in "./data/exp-sc/gm12878_cell7_chr3_sparse.txt".
- chr string, chromosome number, e.g. 'chr1'
- resolution int, resolution of the contact matrix, e.g. 50000
- reference string, reference genome, e.g. "mm9"
The contact matrix in dense format