Computational Drug Design against Ebola Virus Targeting Viral Matrix Protein VP30

Ebola virus (EBOV) is the deadly etiological agent of Ebola viral disease (EVD), earlier known as Ebola hemorrhagic fever. It belongs to the Filovirus family and order Mononegavirales, which contains negative-sense single-stranded RNA with a protective envelope (Geisbert & Jahrling, 1995). The first outbreak was recorded in 1976, following which EVD outbreaks have proved to be persistently pandemic while the mortality rate is 90% (Feldmann & Geisbert, 2011). According to WHO, five strains of EBOV have been identified, causing a total of 25 outbreaks and differentiated based on virulence properties and geographical patterns observed (Picazo & Giordanetto, 2015). Zaire Ebola virus (ZEBOV), BundibugyoEbola virus (BDBV), Reston Ebola virus (RESTV), Sudan Ebola virus (SUDV), and Taï Forest Ebola virus (TAFV) are the five strains named after the places affected by the outbreaks. The latest outbreak in 2014 was caused by the most lethal Zaire strain in West Africa, resulting in 11,000 deaths (Baize et al.., 2014). Though the reservoirs are yet to be confirmed, fruit bats, gorilla, antelopes and chimpanzees are considered to be potential hosts for serious transmission (Ascenzi et al., 2008). Followed by this, in humans, it is spread through blood and contact with fluid body secretions. The infection projects itself in two phases with symptoms ranging from fever, fatigue, rashes, vomiting in the early phase, whereas in the late phase hemorrhagic shock occurs eventually resulting in death (Chiappelli et al., 2015). Several diagnostic and laboratory tests are used to examine this deadly virus. Some of the investigations are Computational Drug Design against Ebola Virus Targeting Viral Matrix Protein VP30


INTRODUCTION
Ebola virus (EBOV) is the deadly etiological agent of Ebola viral disease (EVD), earlier known as Ebola hemorrhagic fever. It belongs to the Filovirus family and order Mononegavirales, which contains negative-sense single-stranded RNA with a protective envelope (Geisbert & Jahrling, 1995). The first outbreak was recorded in 1976, following which EVD outbreaks have proved to be persistently pandemic while the mortality rate is 90% (Feldmann & Geisbert, 2011). According to WHO, five strains of EBOV have been identified, causing a total of 25 outbreaks and differentiated based on virulence properties and geographical patterns observed (Picazo & Giordanetto, 2015). Africa, resulting in 11,000 deaths (Baize et al.., 2014).
Though the reservoirs are yet to be confirmed, fruit bats, gorilla, antelopes and chimpanzees are considered to be potential hosts for serious transmission (Ascenzi et al., 2008). Followed by this, in humans, it is spread through blood and contact with fluid body secretions. The infection projects itself in two phases with symptoms ranging from fever, fatigue, rashes, vomiting in the early phase, whereas in the late phase hemorrhagic shock occurs eventually resulting in death (Chiappelli et al., 2015). Several diagnostic and laboratory tests are used to examine this deadly virus. Some of the investigations are CBC profile, ELISA, RT-PCR and cell culture techniques (Broadhurst et al., 2016).
The filamentous EBOV is composed of approximately 19kb genome (Yu et al., 2017;Zhao, 2016) with genes coding for 4 structural proteins such as VP30, VP35, nucleoprotein (NP), polymerase protein and three membrane-associated proteins such as VP40, glycoprotein and VP24 (Mühlberger, 2007;Easton et al., 2018). The two molecular forms of viral glycoprotein (GP1 & GP2) in combination with host proteins like Niemann-Pick C1 (NPC1) mediate viral entry (Lee & Saphire, 2009). Once the viral attachment is complete through the GP1 -host cellular receptor interaction, the virions enter the cell through endocytosis. The threedomain matrix protein VP40 facilitates virion maturation and viral assembly (Hoenen et al., 2005). VP40, which is present profusely, also maintains viral integrity (Passi et al., 2015). VP35 is bi-functional by both being a part of viral replication and also inhibiting IFNs production in the host cell (Basler, 2015 (Cantoni & Rossman, 2018). The interaction of VP30 with nucleocapsid (NP) brought by the phosphorylation of the N-terminal region, directly affects the transcription process (Biedenkopf et al., 2013). Thus, the VP30 bound NP complex is required for transcription initiation (Weik et al., 2002). The ribonucleoprotein is released from the virion and serves as a template for transcription. Mutation studies reveal that the Phosphorylation sites of VP30 at serine residues is the core requirement for transcription (Martínez et al., 2008).
At present, there is no licenced anti-viral treatment available for EVD (Schuler et al., 2017). Factors such as high resistance, the requirement of advanced biosafety levels in clinical trials and the need for further exploration of pathophysiology and complex molecular mechanism makes it highly challenging to combat EBOV (Dhama et al., 2018). Several in silico drug design approaches prove potentially useful for developing anti-viral drugs against EBOV disease as computational biology provides various tools to obtain inherent knowledge of protein structure and interactions to identify potential lead molecules (Schuler et al., 2017). Also, they aim to simulate models of the real world that requires validation through wet-lab experiments.
Most of the Ebola drug design research focuses on targeting VP35 and VP40 because of their importance in viral replication and assembly (Nasution et al., 2018;Roca et al., 2015) but the recent findings describe that intermolecular interactions between VP30 and NP regulate viral mRNA synthesis. NP binds to VP30 at a strongly conserved region in a narrow groove (Xu et al., 2017). This indicates that VP30 is a promising drug target to block viral replication at the initial stage. Hence, this computational approach identified and analyzes the properties of drug-likeness to filter compounds and were subjected to molecular docking identifying potential inhibitors based on binding energy and interaction properties.

Identification of lead molecules
One hundred and twenty lead molecules with known antiviral properties were selected from literature search (De Clercq, 2015;Huggins et al., 1999;Kouznetsova et al., 2014;Schuler et al., 2017;Shen et al., 2015;Vlietinck et al., 1998). These 120 compounds were from both synthetic and non-synthetic (plant) sources. The SMILES file format of the compounds were retrieved from PubChem database (Kim et al., 2016). SMILES are line notations distinct for different structures that are required to obtain two-dimensional and three-dimensional models of the molecules. Later the SMILES format was converted into PDB format using online CORINA server that generates 3D structures (Sadowski et al., 1994).

Protein structure preparation
The protein crystal structure of Ebola VP30 protein (PDB code: 5DVW) with the resolution of 1.75Å was retrieved from RCSB Protein Data Bank (PDB), a repository providing crystal structures of biotic macromolecules (Berman et al., 2000). The data in PDB is deposited from various researches based on the results of X-ray crystallography and NMR spectroscopy and is validated using defined standards of stereochemistry (Gore et al., 2017). The water molecules and heteroatoms were removed, and the protein molecule was subjected to energy minimization using Swiss PDB Viewer, a tool for comparative protein analysis (Guex & Peitsch, 1997).

Analyzing properties of drug-likeness
All the 120 lead compounds were analyzed for Lipinski rule satisfaction and subjected to ADMET analysis using pkCSM online server (Pires et al., 2015). Lipinski rule of five is the primary criteria to evaluate small lead molecules. This aids in discerning capable drug-like molecules using filters and thus enabling to predict clinical success with higher probability. The five parameters of this rule (Lipinski, 2004), demands the molecular weight to be higher than 500, more than five hydrogen donors and greater than 10 Hydrogen acceptors. Also, the Log P-value should be greater than 5.0 and with ten or more than ten rotatable bonds. The first three parameters correlate with 90% clinical success.
Rotational bonds higher than ten indicate low bioavailability and the ideal value of log P or log D indicating lipophilicity is between 0-3. Compounds that met these criteria were subjected to ADMET analysis.
ADMET analysis allows to quickly scan through 140 properties to identify lead compounds that are potential with high binding affinity and non-toxic to the human body while absorption and distribution rate are other key factors (Venkatesan et al., 2018). Performing ADMET tests in vivo are highly time-consuming and are not costeffective. Computational tools serve to provide quick screening of toxicity and pharmacokinetic properties.
One such tool is pkCSM that represent chemical entities mathematically based on graphical modeling. Some parameters are drug-specific, and in our study parameters such as water solubility, intestinal permeability, not less than 30%, CaCO-2 permeability that predicts absorption of orally administered drugs, AMES toxicity to test mutagenicity, hepatotoxicity and inhibition of hERG potassium channels which indicates the drug can lead to QT syndrome were considered.
Finally, the filtered compounds were subjected to molecular docking study.

Binding site analysis of VP30
The functional residues of VP30 were analyzed to know the active binding site. Based on mutagenesis analysis ARG179, LYS180, and LYS183 in the secondary nucleoprotein VP30 have been identified as the base residues, which are involved in consorting itself with the nucleocapsid activation of transcription (de La Vega et al., 2015). Mutations in these residues lead to loss of transcription function. The binding site centered on Lys 180 can thus be considered as an expedient target for small inhibitory molecules (Salata et al., 2019). CASTp server (Binkowski et al., 2003) is an online resource for studying surface topography and functional region of proteins used to analysis binding site residues. It depicts concave cavities on protein surfaces and allows the user to explore the interior of the protein providing information about the functional notations.

Molecular docking
Molecular docking is an in silico tool that has an increasing significance in drug design (de Ruyck et al., 2016). It allows characterization of protein-ligand interaction at the atomic level and combines the properties of quantum and molecular mechanics (Aminpour et al., 2019). The requisites of docking include the tertiary structure of the protein, pdbqt format of ligand, binding mode and involves four classes of scoring. While docking can be rigid or flexible, we performed rigid docking where the bond length and angles are kept constant. Compounds filtered from ADMET analysis were subjected to docking using AutoDock 4.2 (Forli et al., 2016). Protein preparation is the vital step of docking that requires the removal of water molecules, whereas hydrogen atoms and Kollman charges were added (Xie et al., 2015). The size of the grid box was set to 84 × 90 × 92 to enable docking only within the binding site. A threshold of -6.5 kcal/mol was set to select potential inhibitors. The Lamarckian genetic algorithm was used to allow flexible docking between ligands and the receptor. Out of 10 confirmations obtained, the first two compounds with the least binding energy were taken as the best-docked conformations (Garcia-Godoy et al., 2015). The docking results were analyzed by comparing with the control drug BCX4430, an antiviral adenosine analog from BioCryst Pharmaceuticals. It blocks the function of viral polymerase and acts against a broad range of Filovirus by instigating RNA chain termination (Warren et al., 2014). Intramuscular injection of this drug confers protection in rodents exposed to EBOV, thus, making it a controlled drug for docking studies of EBOV (Raj & Varadwaj, 2016). The control drug was docked with VP30, and its interactions were studied in similar conditions. The docking results were visualized using PyMOL software that delivers high-quality 3D images of the ligand-bound to its protein (Seeliger & de Groot, 2010). The interacting residues for the control and test compounds were analyzed using LigPlot+, a graphical system that reads 3D coordinates and fosters details of protein-ligand interaction as several two-dimensional figures (Laskowski & Swindells, 2011).

Bioactivity score prediction
In silico screening of structural properties related to the pharmacological activity of the selected molecules can aid in understanding their similarity to the known drugs.
Molinspiration, a web-based tool was utilized to calculate the bioactivity score of the best-docked complexes and the control drug. Indispensable properties involved for a lead molecule to bind to its receptors of drug target like GPCRs, nuclear receptors, ion channel modulation, inhibition of kinases, proteases and enzymes were calculated (Khan et al., 2017). The input is imported as SMILES structure, and the predicted bioactivity of the screened compounds is given as a numerical score between -5 and +5.

ADMET results and active site residues
This computational study was initiated with the screening of retrieved 120 antiviral compounds for ADMET properties. The results from pkCSM tool for  (Hartlieb et al., 2007).

Molecular docking
These thirty-six compounds were subjected to molecular docking to anticipate the protein-ligand binding affinities, modes of interaction and inhibition activity.
Compounds with the least binding energy are known to be better hits (Venkatesan & Febin, 2017) and based on the set threshold (binding energy < -6.5 kcal/mol), 10 compounds were selected. Further results from LigPlot+ revealed that two out of ten compounds with PubChemId 5281642 and 64981 are the most effective of all. Both are phytochemicals with previously reported antiviral activity (Vlietinck et al., 1998) and binds at the predicted active binding site forming strong hydrogen bonds. Visualising docking results revealed that ten amino acid residues are in contact with Compound 1 and 13 amino acids with compound 2.

Binding mode analysis of control compound BCX4430
Docking VP30 with BCX4430 indicates binding energy of -4.16 kcal/mol with 11 amino acids in contact with three hydrogen bonds GLU191, ALA255 and ASP217. Among these residues, only ASP217 belong to the predicted active site. Two more hydrogens bonds were visualized within BCX4430 atoms stabilizing the complex. Residues involved in hydrophobic interactions were ARG179, LYS183, VAL256, SER253, LEU218, GLY220, GLY219, LEU188. The intermolecular energy was -5.95 kcal/mol, which involves both desolvation and electrostatic energy.
This indicates the energy of non-bounded atoms. The total internal energy was -1.93 kcal/mol explaining the total of all energy changes and entropy.

Binding mode analysis of compound 1
Compound 1-6-Hydroxyluteolin (PubChem ID: 5281642) also called as 5,6,7,3ˈ,4ˈ-pentahydroxyflavone is a flavonoid isolated from Valerianella eriocapa (Italian corn salad) and also found in other herbs and spices (Yannai, 2003). Unlike other antiviral compounds, flavones have a wide range of activity. Though the action of flavones in ebola is not fully explored, it is possibly by binding to a glycoprotein of RNA virus (Vlietinck et al., 1998) blocking VP30 interaction with GP necessary for EBOV transcription. The binding energy of this compound was -7.71 kcal/mol forming one hydrogen bond within itself and five hydrogen bonds with LYS180, GLU252, LYS218, ASP217 and ARG179 residues that were a part of predicted binding site of VP30. The hydrophobic interactions of ligand were with ARG179, LYS180, PHE181, HIS215 and SER216 of VP30. The intermolecular energy was -7.99 kcal/mol with the total internal energy being -1.37 kcal/mol.

Binding mode analysis of compound 2
The second compound (-)-Arctigenin (PubChem ID: 64981) is a lignanolide of the dibenzyl butyrolactone type and is known to block HIV-1 viral replication in human cell line experiments (Vlietinck et al., 1998). It is isolated from certain plants of Asteraceae and possesses antiviral and anticancer effects (Afendi et al., 2012). With a docking score of -7.65 kcal/mol, it formed three hydrogen bonds with active site residues LYS180, GLU252 and LYS218.
Hydrophobic interactions with ARG179, HIS215, HIS215, GLU252, SER182, ARG179, PHE181, LYS180, SER182 and PHE181. The intermolecular energy was -10.04 kcal/mol, and the total internal energy change was -1.97 kcal/mol. The docking results for the control compound and the above discussed best hits are summarized as presented in Table I

Bioactivity score prediction
In general, bioactivity score of more than 0.0 indicates the complex is active whereas if the score is between -5.0 and 0.0 the compound is moderately active and if the score is less than -5.0 it is probably inactive. The scores for the test compounds are listed in Table II, which indicates that both these test compounds possess bioactive properties of potential drugs as all the scores were above -5.0 and 0.0 (Joshi et al., 2018).

ACKNOWLEDGMENT
We sincerely thank the management of Vellore Institute of Technology (VIT), Vellore, India for providing us with the computational lab facility.