In silico subtractive hybridization of Nostoc azollae 0708 reveals that majority of proteins with hypothetical functions are unique in its proteome

Nostoc azollae 0708 is a cyanobacterium and live as an endo symbiont. The other Nostoc species such as Nostoc PCC 7107, Nostoc PCC 7120, Nostoc PCC 7524, Nostoc punctiforme PCC 73102 has different adaptation styles in their life style. Their adaptation in the environment range from hot fresh water pools to soil. Availability of the sequenced proteomes in public databases has led to the application of in silico subtractive hybridization method to predict the unique proteins of Nostoc azollae, which may be responsible for its endophytic behavior. Upon careful analysis, it is found that comparative proteome analysis of Nostoc azolle with other considered Nostoc species shows that the percentages of unique protein content ranges from 4.2% to 5.2% of which majority of proteins have hypothetical functions. Keywords— Nostoc, In silico subtractive hybridization, endophytic, hypothetical proteins, unique proteins.


INTRODUCTION
Cyanobacteria are oxygenic photosynthetic organisms possessing a variety of metabolic pathways [1]. Several species of cyanobacteria were sequenced and deposited in public databases. These sequenced genomes serve as value resources for the researchers to address several question related to the biochemistry, ecology, adaptation of the cyanobacteria [1]. Of many sequenced cyanobacterial strains, there are few species of Nostoc genus. The bacterial species and strains belonging to the genus Nostoc show diversity in their adaptation. For example, Nostoc sp. PCC 7120 lives in soil, where Nostoc PCC 7107 lives in shallow ponds. Similarly Nostoc PCC 7524 has fresh water hot spring adaption whereas the Nostoc azollae 0708 lives as endo symbiont [2]. Here arises the fundamental question about the diversified adaptation of these Nostoc strains even though they belong to the same genus. Nostoc azollae is a endo symbiont found to be in symbiotic association with water fern Azolla filiculoides [3].
It is filamentous diazotrophic cyanobacterium capable of nitrogen fixation [4]. The genome sequence of N.azollae consists of two plasmids and one chromosome [5].
In this report, Nostoc azollae 0708 strain was selected as the target of interest and other Nostoc species as references, unique proteins for Nostoc azollae 0708 were predicted using in silico subtractive hybridization method. The study of identification of unique genes, which are probable cause of endophytic behavior of Nostoc azollae 0708, by using in silico subtractive hybridization method, is first of its kind on Nostoc species.

II. RELATED WORK
In silico subtraction hybridization method was first employed during the comparative study of E.coli and Shigella strains [6]. Later a web server was developed to perform parallel computing of in silico subtractive hybridization [7]. In silico subtractive hybridization method was applied in the comparative genome studies of bacteria such as Erwinia amylovora, Pseudomonas savastanoi pv. glycinea and many others [8,9].

III. METHODOLOGY
The *.faa files of Nostoc azollae 0708, Nostoc sp PCC 7120, Nostoc PCC 7524, Nostoc punctiforme PCC 73102, and Nostoc PCC 7107 were downloaded from NCBI ftp (ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacte ria/). Local database of all these proteomes were constructed. BLASTP was performed between Nostoc azollae as target organism and other Nostoc species as reference organisms considered in our analysis. The top hits containing the highest bit score were filtered from the BLAST output and homology value is calculated between each of the target protein of Nostoc azollae and its top hit containing highest bit score from other Nostoc species as described earlier [6,7] . The homology value is calculated as follows ⁄ Where H= Homolog value I= percentage identity of the top hit with highest bit score from reference organisms.
Lh= protein sequence length of the top hit with highest bit score from reference organisms.
Lq= protein sequence length of the target organism. In-house Perl scripts were developed perform the BLASTP and to calculate the homolog value.

IV. RESULTS AND DISCUSSION
The genome size of Nostoc species consider in this analysis ranges from 5.5 to 9.06 Mbp ( Table 1). Out these species, Nostoc azollae 0708 is the smallest and Nostoc punctiforme PCC 73102 is the largest. The target organism Nostoc azollae 0708 has a total of 3,413 protein coding genes on its chromosome along with two plasmids [5] . As the main goal of this report is to predict the unique proteins between the endo symbiont Nostoc azollae and other Nostoc species, the homology value which is less than or equal to 0.43 was considered for filtering the proteins.  This analysis shows that even though the genus of Nostoc has different species and adaptation, comparison of proteomes reveals that very few numbers of proteins are responsible for the endophytic behavior of the Nostoc azollae. Moreover after observing the unique proteins obtained from comparison of Nostoc azollae with other Nostoc species, it is found that majority of proteins found to be unique are hypothetical in nature, which means there is no function determined till today.

VI. V. CONCLUSION AND FUTURE SCOPE
In this report, application of in silico subtractive hybridization on Nostoc proteomes is first of its kind. The analysis done in this report opens a new gate way for the researchers to study about the adaptation of the Nostoc species and also to characterize the hypothetical proteins in relation its adaptation and survival. None