Development of SARS-CoV-2 Inhibitors Using Molecular Docking Study with Different Coronavirus Spike Protein and ACE2

Coronaviruses (CoVs) belong to order nidovirales, family coronaviridae. Human coronaviruses cause respiratory infections associated with influenza-like illness ranging from the common cold to more severe symptoms1. The 21st century witnessed three outbreaks of human deadly pneumonia coronaviruses; Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) in 2003, Middle East Respiratory Syndrome Coronavirus (MERS-CoV) in 2012, and SARS-like-CoV named 2019-nCoV (also known as SARS-CoV-2) in December 20192,3. Genomic analysis on the coronavirus revealed that Bat coronavirus RaTG13 appears to be the closest relative to the SARS-CoV-2 compared to SARS-CoV4. The SARS-CoV-2, like other CoVs, are enveloped, positivesense, long single-stranded RNA viruses and translate two groups of proteins, i.e., structural proteins such as Spike (S), Nucleocapsid (N), Matrix (M), and Envelope (E), as well as non-structural proteins, such as proteases and RNA-dependent RNA polymerase (RdRp)5. Coronaviruses depend on RdRp for the high frequency of RNA recombination and are among the Development of SARS-CoV-2 Inhibitors Using Molecular Docking Study with Different Coronavirus Spike Protein and ACE2


INTRODUCTION
Coronaviruses (CoVs) belong to order nidovirales, family coronaviridae. Human coronaviruses cause respiratory infections associated with influenza-like illness ranging from the common cold to more severe symptoms 1 3 . Genomic analysis on the coronavirus revealed that Bat coronavirus RaTG13 appears to be the closest relative to the SARS-CoV-2 compared to SARS-CoV 4 . The SARS-CoV-2, like other CoVs, are enveloped, positivesense, long single-stranded RNA viruses and translate two groups of proteins, i.e., structural proteins such as Spike (S), Nucleocapsid (N), Matrix (M), and Envelope (E), as well as non-structural proteins, such as proteases and RNA-dependent RNA polymerase (RdRp) 5 . Coronaviruses depend on RdRp for the high frequency of RNA recombination and are among the

Abstract
The novel coronavirus SARS-CoV-2 is an acute respiratory tract infection that emerged in Wuhan city, China. The spike protein of coronaviruses is the main driving force for host cell recognition and is responsible for binding to the ACE2 receptor on the host cell and mediates the fusion of host and viral membranes. Recognizing compounds that could form a complex with the spike protein (Sprotein) potently could inhibit SARS-CoV-2 infections. The software was used to survey 300 plant natural compounds or derivatives for their binding ability with the SARS-CoV-2 S-protein. The docking score for ligands towards each protein was calculated to estimate the binding free energy. Four compounds showed a strong ability to bind with the S-protein (neohesperidin, quercetin 3-O-rutinoside-7-Oglucoside, 14-ketostypodiol diacetate, and hydroxypropyl methylcellulose) and used to predict its docking model and binding regions. The highest predicted ligand/protein affinity was with quercetin 3-O-rutinoside-7-O-glucoside followed by neohesperidin.
The four compounds were also tested against other related coronavirus and showed their binding ability to S-protein of the bat, SARS, and MERS coronavirus strains, indicating that they could bind and block the spike activities and subsequently prevent them infection of different coronaviruses. Molecular docking also showed the probability of the four ligands binding to the host cell receptor ACE2. The interaction residues and the binding energy for the complexes were identified. The strong binding ability of the four compounds to the S-protein and the ACE2 protein indicates that they might be used to develop therapeutics specific against SARS-CoV-2 and close related human coronaviruses. main factors that cause phenotypical and genotypical diversity of CoVs that make them capable of jumping across species 6 . The homotrimeric spike glycoprotein helps the virus initiate the infection by attaching to the host cell receptor, mediates virus fusion and genome entry into the host cell 7 . It is a large type I transmembrane protein composed of two subunits; the S1 subunit mainly contains a receptor-binding domain (RBD) responsible for recognizing the host cell surface receptor angiotensin-converting enzyme 2 (ACE2) and binding to it. The second subunit (S2) contains the basic elements required for the membrane fusion and entry into the host cells 8, 9 . The SARS-CoV-2 S-protein and its interaction with the cell receptor ACE2 have been studied using cryo-EM, and results confirmed the function of the S1 and S2 subunits 10 . The 3D atomic scale of the SARS-CoV-2 S-protein was recently reported, and structural evidence that it binds to the ACE2 with 10-to 20-fold higher affinity than the SARS-CoV S-protein. Binding residues between the RBD in SARS-CoV-2 and ACE2 were determined and compared to the SARD-CoV 11,12 . Structural analysis showed highly conserved or shared similar side chain properties with those in the SARS-CoV RBD. The SARS-CoV-2 has an extended insertion containing short β5 and β6 strands, α4 and α5 helices, and loops, which represent the receptor-binding motif (RBM) containing most of the contacting residues of SARS-CoV-2 for ACE2 binding 13, 14 . Two epitopes of two SARS-CoV antibodies targeting the RBD are also analyzed with the SARS-CoV-2 RBD, providing insights into the future identification of cross-reactive antibodies 15 . Scientists have focused on the SARS-CoV-2 S-protein as a key target for vaccines, therapeutic antibodies, and diagnostics. In fact, to discover a new vaccine and therapeutic antibody needs many years of laborious work 16 . The bioinformatics analysis approved a fast way to find potential molecules from the marketed drugs to develop a new drug against the SARS-CoV-2. Once the efficacy is determined, it can be approved by the Green Channel or approved by the hospital ethics committee for rapid clinical treatment 17 . Through this technology, several compounds, including natural plant compounds, have been screened and confirmed to directly inhibit the essential proteins responsible for viral entry and replication, such as S-protein of SARS or MERS coronavirus. Currently, commercial antiviral molecules and chemical compounds extracted from traditional Chinese medicinal herbs were investigated 18, 19 . Molecular docking using AutoDock Vina is a popular tool used in the virtual screening of small molecules against proteins and is also used to investigate the interactions of natural products against the target protein 20 . Pharmacokinetic study and in silico absorption, distribution, metabolism, and excretion (ADME) modeling is used to speed up drug approval as it indicates if new compounds have side effects on human health 21 . In this study, a molecular docking system was performed to screen and select the binding affinity of the S-protein of SARS-CoV-2 as well as the host cell receptor ACE2 against some natural plant compounds or derivatives that might be used to block virion binding to host cells and subsequently prevent viral infection and spreading.

Docking protocol Preparation of SARS-CoV-2 S-protein structures
The sequence of the human CoVs spikes protein of the SARS-CoV-2 with GenBank accession no. QHD43416.1 was downloaded from National Centre for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/). The protein sequence was retrieved in the FASTA format as an amino-acid length sequence and used to build the 3D structure monomeric, trimeric, and trimeric binding models using I-TASSER. The 3D structure for the Sprotein of other CoVs was built using the SWISS-MODEL. All water molecules and ligands were removed for pre-docking while hydrogen atoms were added to the target protein. In addition, affinity minimization was performed using the 3Drefine server. The docking system was built using SAMSON 2020.

Dataset and ligands selection
The 3D structure of 300 natural and synthetic compounds, which drive from the natural plants with drug-like properties and their derivatives, were selected. Sub-structural features of the ligand were carefully selected from references and separately downloaded from PubChem (https://pubchem.ncbi.nlm.nih.gov/) in SDF format converted into MOL2 format using Open Babel.

Screening with SwissADME
Selected compound structures were converted to SMILES notations and submitted to the webserver for calculation and filtration by the SwissADME to identify the physicochemical features and predict the ADME parameters, drug-like nature, pharmacokinetic properties, and medicinal chemistry of the selected compounds. The ADME depends on collecting data and developing models to assess and predict pharmacokinetic properties. The compounds that become ready for docking with the target protein were reduced to 250 ligands using the SwissADME, depending upon their solubility and cytotoxicity to humans.

Spike protein-ligand docking
The SARS-CoV-2 S-protein model and other CoVs related to S-protein was docked against the test ligands using SAMSON 2020. This software used AutoDock Vina to maximize the accuracy of these predictions while minimizing the computer time. The program works based on quantum mechanics. It predicts the potential affinity, molecular structure, geometry optimization of the structure, vibration frequencies of coordinates of atoms, bond length, and bond angle 20 . Following the exhaustive search, 100 poses were analyzed, and the best scoring poses were used to calculate the binding affinity of the ligands. The ligands that tightly bind to a target protein with a high score were selected.

Assessment
Virtual screening and docking parameter Virtual screening utilized docking and scoring of each compound from the previous dataset. This technique was employed based on the prediction of binding modes and binding affinities of each compound using docking to four proteins structure (experimental protein and 3D structure models of the other proteins) 22 . The docking program behaves to get the docking parameter in the SAMSON 2020, in which the program could make docking for a library of ligands with a single protein. By considering this, diverse compounds from plants and protein targets were evaluated. In general, it was important to visualize the docked poses of high-scoring compounds because many ligands were docked in different orientations. This kind of study becomes difficult when the size of the dataset increased. Therefore, it was important to eliminate unuseful compounds by SwissADME before docking by restricting the dataset to drug-like compounds and taking into considerations appropriate property, sub-structural features, solubility, and cytotoxicity to be deal with human use and eliminate the probability of side effect to get the best feature of the ligands then the docking was placed 23 . Hence, the bounded ligands were analyzed with Discovery Studio Visualizer, which was used to analyze and screen the ligand properties to reach the functional domain of protein in the human body.

Sequence alignment and phylogenetic tree
The amino acid sequences for 30 CoV S-proteins were obtained from the NCBI database. Alignments were applied to build a phylogenetic tree using the Mega X. For this alignment and constrict tree, the MUSCLE algorithm was used. The phylogenetic relationships among the 30 spike proteins were carried out using the nearest-neighbour interactions (NNI) with WAG+G+I substitution model and 500 bootstrap replicates. To reach a rational phylogenic tree, we eliminated proteins and repetitive sequences with the same species. Multiple Sequence Alignment (MSA) of the phylogenetically with the closely related CoV Sproteins sequences were used to determine the conserved region of these sequences by Clustal Omega with default parameters. In addition, the ESpript 3.0 was used to align conserved sequences among the selected proteins and secondary structure of SARS-CoV-2 S-protein. Amino acid alignment of three related CoVs S-proteins was performed using the default parameters.

Sequence structure analysis SARS-CoV-2 S-protein
Comparison of the amino acids of the SARS-CoV-2 Sprotein for the modeling (GenBank accession no. QHD43416.1) and the experimental one (PDB ID 6VSB) showed that the sequences were identical in the N terminal for the two sequences (1208 amino acids) but different at the C terminal, as the modeling SARS-CoV-2 S-protein had a longer and different sequences of 82 amino acids from the corresponding 62 amino acids of the experimental one 24 .

Protein model of SARS-CoV-2 S-protein
The predicted 3D structure of the SARS-CoV-2 Sprotein model was built with I-TASSER (Figure 1A) using the published sequence at the NCBI. The built model was compared with the experimental one (Cryo-EM structure) downloaded from the Protein Data Bank (Figure 1B). The structure of the built model was divided into four regions to facilitate the comparison with the experimental one. The two models showed different configurations. The folding in the experimental one was more compact, and region "iv" was not presented. From these results, it was important to state that although there was a similarity between the two models, the differences between these two spike proteins could mislead the scientists when depending on the model folding in docking with any compounds, and hence, the best results were obtained when using the experimental data 25 . The homotrimer of the experimental spike-protein for the SARS-CoV-2 was used to design the ligand-protein interactions with the four selected ligands (Figure 2) using the SAMSON 2020. The resembled complexes were analyzed using Discovery Studio Visualizer to resemble the ligand with the interacting residues and how the ligand 3D structure allows the binding with the ligands. and -15.2 kcal/mol, respectively) were lower than 14ketostypodiol diacetate and hydroxypropyl methylcellulose with the same ΔG of -13.7 kcal/mol. The interacting residues for the phytochemicals and derivate compounds were identified in the two models, except for the 14-ketostypodiol diacetate, which could interact with the S-protein through Van der Waals interaction. The amino acids of the S-protein in alignment with the four ligands were presented in Figure 3. The locations of the interacting residues for the experimental docking complexes ( Table I) showed that the binding of the neohesperidin and 14ketostypodiol diacetate might prevent the target protein from attaching with the host cell membrane (ACE2) as they were located at the S1 ectodomain subunit and might prevent infection process 26

Identifying sequence related to SARS-CoV-2 S-protein
To identify if the four ligands were specific for docking with the SARS-CoV-2 S-protein only or other related viruses, the most related S-protein of the near CoVs that infect humans was selected for the ligands docking analysis. For selecting the CoV S-proteins, phylogenetic analysis was carried out with 30 CoV isolates (Figure 4). According to different species and their host, the constructed tree divided the CoVs sequences (human or bat). The S-protein was highly divergent from other CoVs with less than 77% identity with SARS-CoV-2, except the Bat-RaTG13 S-protein, which showed the close phylogenetic relationship to the SARS-CoV-2 S-protein, indicating that the virus might originate from bats 30,31 . From human infection CoVs, one isolate was selected from each of the two most related clusters, i.e., Bat-RaTG13 (QHR63300.2), SARS-GD01 (AAP51227.1), and one from out-group MERS (QGV13484.1) to study their docking ability with the test ligands 32 . The amino acid sequence revealed that the Bat-RaTG13 S-protein sequence was the closest to SARS-CoV-2 S-protein with 97.41% identity. In contrast, SARS-GD01 and MERS show only 76.19% and 35% identity, respectively, against SARS-CoV-2 S-protein.

Docking test ligands with different CoVs S-proteins
Two close related S-proteins (Bat-RaTG13 and SARS-GD01) to SARS-CoV-2 and one out-group (MERS) were used to study the binding abilities with the test ligands 33 . The binding models of the four ligands with the target protein surface were presented in Figure 5.
The binding ΔG that reflects the interaction between the test ligand and S-protein of each CoVs was calculated and presented in Table I. Results show that four ligands might have the ability to bind with high affinities to the CoVs, indicating that those ligands may have a wide range of binding to the S-protein of CoVs 34, 35 . For SARS-CoV-2 S-protein, the best ligand was quercetin 3-O-rutinoside-7-O-glucoside that bind to protein active side residues (Tyr-756, Phe-970, Thr-998, Gly-999) with a ΔG of -16.7 kcal/mol in the S2 subunit of the protein, followed by neohesperidin, which bind to S1, S2 subunit with ΔG of -15.2 kcal/mol and interact in the active side (Thr-547, Thr-549, Thr-587, Thr-673, Phe-855), then 14-ketostypodiol diacetate that binds in the S1 subunit with ΔG of -13.7 kcal/mol but without direct binding with the amino acids residues (structurally constrained binding), and hydroxypropyl methylcellulose which bind in the S1, S2 subunit with ΔG of -13.7 kcal/mol. For SARS-GD01 S-protein, quercetin 3-O-rutinoside-7-O-glucoside interacts with four active site residues, including Tyr-144, Cys-176, Gly-177, His-208, Pro-210, Asp-213, Cys-214, and Glu-247 (Figure 6). Quercetin 3-O-rutinoside-7-O-glucoside showed the lowest ΔG (-18.1 kcal/mol), while neohesperidin, 14-ketostypodiol diacetate, and hydroxypropyl methylcellulose were showed ΔG of -16.3, -13.8, and -11.7 kcal/mol, respectively. All ligands bind in the S1 subunit of the SARS-GD01 S-protein, which was responsible for initiating infection of the virion to the host cell. The binding sites of hydroxypropyl methylcellulose and quercetin 3-O-rutinoside-7-O-glucoside were overlapped. Therefore, a complex of the S-protein with the bind two ligands would form (Figure 6). In the Bat-RaTG13 S-protein, the highest ligand binding score was quercetin 3-O-rutinoside-7-Oglucoside, which interacts with six active site residues, including Ser-726, Thr-774, Pyp-859, Asp-863, His-1054, and Gly-1055 with ΔG of -17.3 kcal/mol. The 14ketostypodiol diacetate and neohesperidin had ΔG of -13.9 and -13.6 kcal/mol, respectively, and bound to the S2 subunit. Meanwhile, hydroxypropyl methylcellulose had the highest ΔG of -11.7 kcal/mol) and showed binding interacting with His-245 and Thr-250. Results show the potential of all ligands fit within the S1 and S2 subunits and could work as inhibitors for Bat-RaTG13. Because the binding sites of quercetin 3-O-rutinoside-7-O-glucoside and 14-ketostypodiol diacetate were overlapped, the two ligands could bind together before binding to the S-protein (Figure 6).

Docking test ligands with ACE2
The molecular docking of four ligands with ACE2 was performed to determine whether they strongly bind to the S-protein only or its cell receptor as well. Results obtained from molecular docking showed that the four ligands were able to interact with high scores to the ACE2 receptor, as shown in Figure 8. Table II summarizes the number of contacting residues, domain residues, interacting residues, interaction mode, and ΔG. with ΔG of -8.2 kcal/mol. The molecular docking showed that out of 300 natural plant and plant derivative compounds from the PubChem database, only four could bind with the SARS-CoV-2 S-protein with high affinity. These ligands were natural plant products, so they were considered to be safe for humans 37 . The SwissADME server was used to analyze the solubility and cytotoxicity of those compounds. Also, the isolation of these compounds from their plants was well established 38-40 . Previous publications also showed that plant phytochemicals were predicted to be a potent inhibitor of the SARS-CoV-2 protease using homology modeling 41 . Six citrus flavonoids (naringenin, naringin, hesperetin, hesperidin, neohesperidin, and nobiletin) were used for molecular docking and predicting ΔG with the ACE2. However, the results show that the ΔG required for the binding between the receptor and the ligands was relatively high 42, 43 . Comparison of the docking results between the SARS-CoV-2 S-protein to the built model and the experimental SARS-CoV-2 S-protein with the four ligands revealed that the binding residues were different, although there was no significant difference in the ΔG. In this respect, the four ligands were tested for their binding ability with other related human CoVs 44 . From the phylogenetic evaluation for the Sproteins of 30 CoVs, two closely related to SARS-CoV-2 S-protein, the Bat-RaTG13 (97.41% identity) and SARS-GD01 (76.19% identify) was selected. In addition, the MERS showed only 35% identity as an out-group protein. Alignment of the three related CoVs showed that they share consensus position and structural domains, such as the N-terminal domain (NTD), the RBD, heptad-repeat regions (HR), central helix (CH), and circular dichroism (CD). Docking analysis of the CoVs S-proteins and the four ligands showed that they all could strongly bind to the Sproteins with low ΔG, but the quercetin 3-Orutinoside-7-O-glucoside showed the lowest ΔG for SARS-CoV-2 (-16.7 kcal/mol), MERS (-16.4 kcal/mol), Bat-RaTG13 (-17.3 kcal/mol), and SARS-GD01 (-18.1 kcal/mol). In ACE2, docking to the four ligands showed slight differences in ΔG ranged between -10.6 to -8.2 kcal/mol. Quercetin 3-O-rutinoside-7-O-glucoside displayed the lowest ΔG to ACE2, while the highest one was neohesperidin. Binding the ligands to the host cell receptor will decrease the rate of viral infection 45 .
The homotrimer of SARS-CoV-2 S-protein was modeled, and the 3D structure of the experimental one was compared and showed slight differences between them. Although protein structure homology modeling had become a routine technique to generate 3D models for proteins, it was not accurate as of the experimental structures 46 .