مشاريع طلاب ربيع 2023 _ S23
In Silico Identification of Key Genes and Pathways Associated with Bipolar Disorder
Using GWAS
Abstract
Bipolar disorder (BD) is a chronic and recurrent disorder that affects more than (1%) of the global population. The most prevalent age for the onset of symptoms is 20 years old; early-onset is associated with a worse prognosis. It is a leading cause of disability in young people as it can lead to cognitive and functional impairment and increased mortality, particularly from suicide and cardiovascular disease.
Our analysis drew upon Genome-Wide Association Studies (GWAS) from the Psychiatric Genomic Consortium (PGC) and GWAS Catalog for BD patients. Through the analysis; 118 genomic risk loci and 539 genes were mapped. By utilization of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, a deeper understanding of the underlying biological processes and crucial pathways related to BD was attained. As a result, a comprehensive protein-protein interaction (PPI) network was established, revealing 16 central hub genes and two notable modules.
Using the Comparative Toxicogenomics Database (CTD), we performed in-silico validation of the hub genes. Our findings from functional enrichment analysis highlighted the crucial functions of these key genes in biological processes such as antigen processing and presentation and regulation of T-cell mediated immunity. Additionally, we identified 762 microRNAs and 28 transcription factors that target these hub genes, further supporting their significance in BD disorder.
By conducting a thorough bioinformatics analysis, we have gained insights into the underlying mechanisms of BD, identifying potential biomarkers for clinical treatment, and uncovering drug targets. These findings greatly enhance our understanding of BD and show potential for improving diagnosis and treatment methods in the future.
إعداد : الطالبة هبه فكرت محمد
إشراف: الدكتورة لمى يوسف
In Silico Identification of Key Genes and Pathways Associated with Bipolar Disorder Using GWAS
Deep learning in clinical epigenetics: shedding new light on pathological processes of Alzheimer's disease in the perspective of therapeutic approaches
Abstract
This thesis discusses Alzheimer’s disease, its impact and pathogenesis, and delves into DNA methylation of Alzheimer’s disease: the most prominent epigenetic mechanism in the disease. Several analytic methods are present for the detection of DNA methylation on cytosine-guanine dinucleotides, but none can fully grasp all loci, they also have limitations due to computational ability along with their high cost. Artificial intelligence, therefore, can be of better benefit in this case by using the results of analytic methods such as epigenome-wide association studies, and whole-genome bisulfite sequencing, and extracting the data from them to help train and test models for the prediction of new previously undetected loci in the genome, for instance. First, however, it must be noted that artificial intelligence in epigenetics is very recently new and only a handful of studies have been employed for the identification of loci undergoing epigenetic tags in Alzheimer’s disease. Also, no deep learning models have been reported so far that are targeted towards the identification of methylated CpGs in an AD context. Our reference study EWAS plus, for example, utilizes data from large resources on a super-computer-scale and uses it to predict new methylated loci of CpGs related to the disease.
Our goal was to take inspiration from this reference study in the aim of trying a new model – deep learning – in the hopes of coming closer to finding new therapeutic approaches. In this thesis, we used whole-genome bisulfite sequencing data and EWAS data to train two models: Random Forest Regressor, a machine learning model, and Keras Regressor, a deep learning model. Both were applied in this thesis to predict previously undetected methylated CpGs on chromosome 19, which contains the most important risk gene in Alzheimer’s disease: APOE.
The models resulted in the prediction of four CpGs on chromosome 19 that present with higher correlation than the rest in terms of methylation and Alzheimer’s disease.
However, many technical and computational limitations were present in the application of the models, leading to low performance. This attempt at applying a deep learning model in this epigenetic context still remains promising, due to its higher efficacy in comparison with machine learning in general.
Therefore, it is immensely important that studies such as the one presented in this thesis have broader horizons in terms of resources to fully reach the potential of the models and datasets, leading to higher precision, and closer steps towards Alzheimer’s disease therapy.
إعداد: الطالبة شهد نبيل نحال
إشراف: الدكتور رؤوف حمدان
In silico analysis of Continuous glucose monitoring (CGM) results in diabetes mellitus patients; and Automatic Event Detection Using Neural Networks
Abstract
Background: Diabetes Mellitus (DM) is a chronic metabolic disorder that results in abnormal blood glucose regulation. People with diabetes are prone to develop devastating long-term complications including cardiovascular disease, neuropathy, retinopathy, renal failure and even mortality. Keeping blood glucose in near normal levels and normalizing patients’ HbA1c leads to a lower frequency of macrovascular and microvascular complication. Due to this, blood glucose monitoring plays a vital key in diabetes care. Especially, Continuous Glucose Monitoring (CGM) which monitors interstitial blood glucose in real time. However, the huge amount of data obtained from CGM sensors requires finding ways to analyze the data more efficiently. Thus, using artificial intelligence and deep learning models to better interpret these results. Further using deep learning models like RNNs based on Long Short-Term Memory (LSTM) networks that has been designed for time sequence prediction problems has enabled researchers to propose specialized models to predict future values of blood glucose based on patient’s existing data.
Aim: The main purpose of this study is to use artificial intelligence to better analyze patients’ CGM data. In addition to using a deep learning model based on LSTM neural network to predict future trends in patients’ data and help prevent either hyperglycemia or hypoglycemia episodes from occurring in order to improve the patient’s treatment plan and their life quality.
Materials and Methods: This study utilized the Shanghai_T1DM and Shanghai_T2DM datasets. The data was collected from Diabetes Data Registry and Individualized Lifestyle Intervention (DiaDRIL) was initiated in Shanghai East Hospital and Shanghai Fourth People’s Hospital affiliated to Tongji University since 2019. The data contains 3 to 14 days of CGM data corresponding to 12 patients with T1DM and 100 patients with T2DM, respectively. Some patients might have multiple periods of CGM recordings. The CGM data was analyzed using artificial intelligence to find each dataset’s characterizations. Furthermore, we calculated the autocorrelation function (ACF) and the time percentage of TAR, TBR and TIR for patients in both datasets. Later, we mapped the data onto risk scores and used a RNN based neural network to predict future values of blood glucose.
Results: After applying the model to both Shanghai_T1DM and Shanghai_T2DM we evaluated the model performance using the Root Mean Square Error (RMSE) metric. We achieved a result of (RMSE: 9.78 mg/dl) for the LSTM model in T1DM patients’ data and (RMSE: 4.40 mg/dl) in T2DM patients’ data. Overall, our models demonstrated high prediction accuracy, supported by low RMSE values. But the model performed better in T2DM with a lower RMSE than that of T1DM. Moreover, we assessed the clinical safety of glucose prediction using the Clarke Error Grid (CEG). In T1DM data, most of the predictions fell in zones A or B which are either accurate of clinically benign with very few predictions were inaccurate or could be clinically harmful. Alternatively, in T2DM data most of the predictions were in zone A which is clinically accurate while the rest of the predictions were in Zone B which is clinically benign.
Conclusion: In this study, we show that our LSTM model was able to accurately and safely predict glucose values. In addition, translation of our prediction models to individuals with both type 1 diabetes showed encouraging results. We observed high precision in predictions. As such, the prediction model can be used to improve closed-loop insulin delivery systems by overcoming sensor delay. In addition, longer prediction intervals may be used to safely bridge periods of sensor malfunction. On another note, analyzing CGM data in T2DM and accurately predicting patient’s glucose at different intervals offers an immense help in improving the drug choices based on the trends in the data. Potential future research avenues could involve the inclusion of meals and insulin doses delivered to the patient in the model in order to computationally decide the optimal dose of insulin needed independent of patient’s input.
إعداد: الطالبة سراء محمود عبد الوهاب
إشراف: الدكتور رؤوف حمدان
Broad Neutralization Effects of Monoclonal Antibodies Targeting the Stem Helix of MERS-CoV: A Computational Study using AutoDOCK Vina, HADDOCK and PyMOL Analysis
Abstract
The emergence of SARS-CoV-2 VOCs, and other zoonotic coronaviruses with pandemic potential, research efforts focus on vaccines and antibodies targeting the most conserved regions of the spike protein. Middle East Respiratory Syndrome Coronavirus (MERS-CoV) continue to pose significant global health threats. Across the coronavirus family, the receptor binding domain is poorly conserved, and so therapeutics that target the receptor binding function have low potential as a pan-coronavirus solution. An alternative relatively conserved target on the coronavirus spike is the stem helix in S2 region, which does harbor neutralizing epitopes and therefore is of interest to generate vaccines effective against pan-beta-coronaviruses.
Monoclonal antibodies (mAbs) possessing broad neutralization capabilities against HCoVs offer a promising avenue for treatment, as there is currently no vaccine or treatment approved against MERS-CoV. This thesis leverages computational methodologies, notably Autodock Vina and HADDOCK, to explore the neutralizing effects of broad neutralizing antibodies (bnAbs) targeting the stem helix of MERS-CoV and SARS-CoV-2. Through method optimization and validation against experimental data, the study aims to efficiently identify potential drug candidates among bnAbs. This approach promises to reduce resource expenditure and streamline subsequent clinical investigations, potentially accelerating targeted therapy development against MERS-CoV while minimizing research costs.
Referencing Zhou et al.'s comprehensive study, which isolated a substantial panel of β-CoV stem-helix bnAbs, structural analyses of these bnAbs unveiled the molecular underpinnings of their broad reactivity. The study determined crystal structures of four bnAbs (CC25.106, CC95.108, CC68.109, and CC99.103) in complex with beta-coronavirus spike stem-helix peptides at resolutions ranging from 1.9 to 2.9 Å. Employing molecular docking simulations via Autodock Vina and HADDOCK2.4, this investigation aims to predict binding modes and affinities of five bnAbs (CC25.106, CC95.108, CC99,103, CC9.113, CC25.36) against the stem helix epitopes of both viruses. Additionally, it explores dynamic behavior and conformational changes of these complexes through molecular dynamics simulations.
The analysis integrates PyMOL visualization to elucidate and interpret binding modes, emphasizing crucial residue interactions governing binding specificity, affinity, and stability of bnAb-stem helix complexes. The synthesis of computational outcomes with experimental data and existing literature aims to enhance the reliability and relevance of findings. By elucidating the molecular mechanisms governing bnAb interactions with conserved MERS-CoV epitopes, this study seeks to contribute to the development of broad-spectrum antiviral strategies targeting coronaviruses. Evaluated across both viruses, the assessment of five distinct bnAbs reveals comparable neutralization potency against SARS-CoV-2 and heightened efficacy against replication-competent MERS-CoV. Notably, while CC25.106 displayed superior performance in combating beta-coronavirus disease, CC9.113 emerged as a promising therapeutic candidate due to its favorable binding characteristics. Despite inherent limitations, this study underscores CC9.113's potential for therapeutic development against coronaviruses, advocating for further exploration across a broader spectrum of bnAbs to streamline future therapeutic initiatives.
إعداد: الطالبة شذى منجد الفريجات
إشراف: الدكتور باسم عصفور
PGx ExploreEZ
A Web-Based User-Friendly Tool for Exploration of Pharmacogenomics Reference Resources
Abstract
As it is clear that one size doesn't fit all, one medication with the same dosing regimen may not be effective or safe for all patients with the same disease due to various factors. One key factor is the genetic variations between individuals.
Pharmacogenomics (PGx), a rapidly evolving field within precision medicine, has the potential to revolutionize healthcare by tailoring treatments based on an individual's genetic makeup, which can optimize treatment outcomes and minimize the risks of adverse reactions.
However, several barriers hinder the successful implementation of pharmacogenomics in clinical practice. A major challenge is the lack of knowledge among healthcare providers, researchers, and other targeted communities.
To address this barrier, we present “PGx ExploreEZ”, a web-based, user-friendly tool for exploring pharmacogenomics reference resources developed using the R shiny package. Supplied with manually collected and curated data about gene-drug associations, clinical recommendations, and other relevant data from pharmacogenomics reference resources, this tool serves as a gateway to explore these resources easily.
With its user-friendly and interactive interface, 'PGx ExploreEZ' enables healthcare professionals, researchers, and interested users to easily access and explore essential information in pharmacogenomics.
“PGx ExploreEZ” aims to simplify the process of accessing valuable insights in the field of pharmacogenomics in order to bridge the knowledge gap and pave the way for the implementation of pharmacogenomics into clinical practice.
إعداد: الطالبة رشا عبد القادر حمامه
إشراف: الدكتورة لمى يوسف
In-silico analysis of culture media miRNA as potential non invasive biomarker for embryo selection in IVF cycles
Abstract
Background: In vitro fertilization is widely used to overcome numerous reproductive challenges, but implantation failure and early pregnancy loss are common issues that affect IVF's success rates. Biological markers of embryo viability still need optimization and require invasive biopsies, thus, less invasive methods are needed for selecting the best embryos with highest potential of implantation, especially when only one embryo is going to be transferred back to the uterus.
MiRNAs have been detected in the SCM with their unique expression profiles associated with the embryonic developmental and chromosomal status, sexual dimorphism, the reproductive competence after transfer to the uterus, fertilization method, day-6 blastocysts compared to day-5 , and trophectoderm (TE) morphology grades, indicating that miRNAs should be more explored for non-invasive embryo selection.
Methods: In this study Wang S. 2021 was chosen to use their raw count data set available on GEO database to analyze in-silico and find differentially expressed miRNAs between non-pregnant and pregnant group in day 3 and day 5 of embryo's development in-vitro using DEseq2 tool in R studio graphic user interface, then finding the genes that the resulting DEmiRNAs interact with by using miRDB and Target Scan tools, and finally applying a functional enrichment analysis using DAVID and Metascape tools, in addition to using SRplot website to plot additional useful plots along the study, and finally the results were interpreted through integrating all produced information and comparing the current results with previous studys' results.
Results: DEseq2 significant results for differentially expressed miRNA in day 5 embryos CM depending on pregnancy outcome included 11 novel DEmiRNAs and 5 known DEmiRNAs (hsa-miR-629-5p , hsa-miR-30a-3p , hsa-miR-99a-5p , miR-199a-3p > miR-199b-3p, hsa-miR-199a-5p). while on day 3 there were 14 all novel differentially expressed miRNAs, known miRNAs that have been differentially expressed where pooled together with their original raw counts for comparison, day 5 samples showed better separation between outcome labeled clusters (non-pregnant , pregnant), out of these pooled DEmiRNAs , two were having the most obvious and unbroken pattern among the others in day 5 SBCM ( hsa-miR-99a-5p and hsa-miR-30a-3p).
hsa-miR-99a-5p functional enrichment analysis indicated its association with biological processes including embryonic morphogenesis and signaling pathways regulating pluripotency of stem cells, as for miR-30a-3p, it was associated with embryo development ending in birth or egg hatching.
Conclusion: Differentially expressed miRNA in day 5 embryos' culture media depending on pregnancy outcome included 11 novel and 5 known DEmiRNAs (hsa-miR-629-5p , hsa-miR-30a-3p , hsa-miR-99a-5p , miR-199a-3p > miR-199b-3p, hsa-miR-199a-5p).
إعداد: الطالبة علياء خالد الديري
إشراف: الدكتور مجد الجمالي
التنبؤ بالأورام الخبيثة في سرطان الثدي باستخدام أدوات الذكاء الاصطناعي(أداة تعلم الآلة)
Prediction of Malignant Tumors In Breast Cancer using Artificial Intelligence Tools (Machine Learning)
Abstract:
Breast cancer is one of the most common diseases in women worldwide. Many studies have been conducted to predict the prognosis of breast cancer. However, most of these analyses were predominantly performed using basic statistical methods. There for, this study aims to use machine learning techniques to build high accuracy and sensitivity models for detecting malignancy of breast cancer based on many variables in order to be able to intervene quickly in the patient's treatment protocol to reduce mortality as much as possible.
We utilized a dataset from Kaggle after processing and visualizing it. The final dataset consisted of 569 samples, 21 inputs, and one output (malignant tumor and benign tumor).
Our study showed that all machine learning algorithms achieved perfect accurac greater than 99% according to the first approach (testing set= 25%), where the decision tree, logistic regression, and random forest ranked first with an accuracy of 100%, followed by the rest of the algorithms at 99.3%.
We also found that the accuracy decreased slightly in many algorithms according to the second approach (testing set= 40%) to reach 99.56%. Moreover, when optimizing hyperparameters, the accuracy of the SVM increased from 99.56% to 100%. The performance of this classifier can be described as balanced.
In conclusion. this study underscores the importance of selecting appropriate classification algorithms for predicting breast cancer patient outcomes. These findings contribute to the field of breast cancer prognosis and provide insights for improving personalized treatment strategies.
إعداد: الطالبة رغد رفاعي عبد العزيز
إشراف: الدكتور ينال القدسي
التنبؤ بالأورام الخبيثة في سرطان الثدي باستخدام أدوات الذكاء الاصطناعي(أداة تعلم الآلة)
QSAR and 3D-QSAR Principles and applications in Drug Design (antineoplastic drugs)
Abstract:
QSAR and 3D-QSAR techniques marked a huge milestone in drug design development, especially in antineoplastic drugs.
QSAR models utilizes molecular descriptors to predict the relationships between the chemical structure and the biological activity, which aids in designing and developing potent compounds.
Classical QSAR models have limitations in drug design. Thus, 3D-QSAR methods were developed in order to provide more accurate results of the drug-target interactions.
The applications of these methods has led to the development of drug design and antineoplastic drugs with improved efficacy and reduced side effects and toxicity.
In conclusion, QSAR and 3D-QSAR play a vital role in the development of more effective and more selective drugs.
إعداد: الطالبة آيه احمد حسان المصري
إشراف: الدكتورة خنساء حسين
QSAR and 3D-QSAR Principles and applications in Drug Design (antineoplastic drugs)
Investigating the correlation between Colorectal cancer mutational profile and the associated microbiota on Tumor and matched normal healthy tissue; A computational analysis
Abstract:
Colorectal cancer is a prevalent and deadly malignancy with a significant global burden. It arises from the accumulation of genetic and epigenetic changes that transform normal colonic epithelial cells into adenocarcinomas. The microbiome plays a crucial role in CRC development. Bacterial biomarkers have prognostic value and hold potential for CRC detection and clinical outcome prediction.
The human gut microbiota is a vibrant ecosystem teeming with bacteria, viruses, fungi, and archaea, residing in a harmonious relationship with the host. It profoundly influences various aspects of human health, playing a crucial role in maintaining gut homeostasis, immune function, and metabolism.
In recent years, the association between colorectal cancer (CRC) and the microbiota has gained significant attention. Emerging evidence suggests that dysbiosis, a disruption in the gut microbiota's composition, may contribute to the initiation and progression of CRC. Studies have unveiled distinct alterations in the gut microbiota composition and diversity in individuals with CRC compared to healthy controls. These alterations encompass shifts in microbial taxa, decreased microbial diversity, and modifications in microbial metabolites. Specific bacterial species, such as Fusobacterium nucleatum, Bacteroides fragilis, and certain Enterococcus and Escherichia coli strains, have been implicated in CRC pathogenesis due to their capacity to promote inflammation, produce genotoxins, or modulate the tumor microenvironment. Thus, in this study we used 16S rRNA data from 60 samples belonging to 30 patients, from the tissue and the matched normal healthy tissue, the data went through characterization process using Linux shell command, bash programming language and R programming language with RStudio with various microbiome processing packages and tools, we implemented the DADA2 package, for Amplicon Sequencing Variants based approach. DADA2 (Denoising Amplicon Data with Adaptive Removal of Chimeras and Dereplication) is a widely used pipeline for analysing amplicon sequencing data. It employs a three-step approach to accurately identify and quantify microbial communities: error estimation, chimera detection, and denoising, the denoising algorithm employed by DADA2 is particularly effective in handling error-prone amplicon sequencing data and can significantly improve the accuracy of microbial community analysis.
The final product of the dada2 package is the corresponding taxonomy table of the data, next it is input to other packages for further manipulation, filtering and downstream analysis.
After further statistical analysis with various measure popular for microbiome studies, we compared the microbiome composition between tumor and matched healthy tissue in patients with colorectal cancer (CRC). Our findings align with previous studies highlighting the dominance of Firmicutes and Bacteroidetes phyla in the gut microbiome. While overall diversity may not be affected, the presence of a tumor may influence the abundance of specific rare taxa. Differential abundance analysis identified the genus Ruminococcus within the Firmicutes phylum as significantly enriched in cancer tissues. This finding is intriguing, considering the potential role of Ruminococcus species in promoting tumor growth and pro-inflammatory responses.
إعداد: الطالبة زينه حسام الجندلي
إشراف: الدكتور مجد الجمالي
Studying the cases of heart and arteries nutrition for cardiac patients in the Syrian community using bioinformatics tools
Abstract:
Cardiovascular diseases (CVD) are one of the most causes of death worldwide. Although of many habits like smoking and comorbidities are considered as a risk factors for developing CVD, poor eating habits should be taken into consideration.
Bioinformatics tools provide powerful computational methods for analyzing CVD data. Therefore, the aim of this study was to develop a machine learning models to predict the deferent CVD to make the right decision in the protocol treatments of patients.
The dataset was collected from Al-WATANI hospital in Sweida, which included patients demographic, comorbidities, and dietary. The dataset was further split into training (60%) and test (40%) sets for building model and evaluating.
Our study included 183 patients of which 111 patients were with hypertension, 33 patients with Infarction, 20 patients with congestive heart failure, and 19 patients with arrhythmia. Moreover, the accuracy of the algorithms varied, with support vector machine achieving the lowest accuracy of 71.62%, while it increased remarkably to 91.74% when applying the balanced weights. We also found that decision tree and random forest (tree depth=5) achieved the same accuracies of 85.14%. However, when increasing the depth of the trees to 10, 15, or 20, the accuracy increased to 87.84% and remained steady.
These models demonstrated high accuracy and reliability, making them valuable tools for clinical decision-making.
إعداد: الطالبة بسمه أسعد العشعوش
إشراف: الدكتور ينال القدسي