This is Sihao's personal page. Please feel free to have a look at the profiles and the GitHub repos.
In this project, we developed a computation pipeline NanoSPA for identifying m6A and pseudouridine modifications transcriptome wide in the same nanopore direct RNA sequencing sample simultaneously. We developed a new neural network model for m6A prediction and fuzed the pipeline with the published pseudouridine prediction pipeline NanoPsu. NanoSPA was applied to human cells and negative correlation of pseudouridine and m6A was discovered. Both pseudouridine and m6A were discovered to promote translation and the effect of pseudouridine is stronger than that of m6A. This is a pioneering study of interplay of multiple RNA modifications.
In this project, we utilized 100+ features from host tRNA to reveal the severity of COVID-19. I mainly helped to build logistic regression (LR) models for the classification of mild and severe cases. The performance of models were evaluated based on selected features for tRNA abundance, modifications and fragmentation. This project revealed the feasibility of using tRNA as biomarkers to predict COVID-19 severity.
In this project, we developed a machine learning based method “NanoPsu” to identify pseudouridine (psU) modifications in human transcriptome from Nanopore direct RNA sequencing data. We trained the models based on known psU sites in rRNA from multiple species including human stool microbiome. NanoPsu was applied to Interferon (IFN) treated samples and IFN induced genes were found to possess more psU. The open source Python package for psU identification protocol could be found here. News report about this work could be found here.
In this project, we developed a NGS-based method named dU-seq for genome-wide mapping of deoxyuridine (dU) in human cells. Thousands of dU sites were revealed in human cells and they were found to be enriched in centromeric DNA, especially in CENP-A (a histone H3 variant) binding regions.