Detection of Cross Language Clones of C and Java Language using Levenshtein Distance Algorithm

— : In code engineering the study on software forking presents that 10-15% of the code in large codebase are clones. gcc-8.7%, JDK-29%, Linux-22% There are state of art tools for detecting clones like CCFinderX, EqMiner, Dup, Simjava, Nicad but cannot work with IDE’s hence To solve the software maintenance efforts in development process it is important to propose efficient techniques to identify clones(especially type-III and type-IV clones) .we propose dictionary based Approach to detect cross clones of C and Java to provide proper inputs to the developers who engage in software forking or porting activities by detecting and correcting porting and copying errors that arise during porting process for IDE’s like NetBeans, Eclipse.


I. INTRODUCTION
Generally code clones are the result of the copy/paste activity widely used by programmers to reuse existing code to save time.Large software codebase consist of 10-15% of duplicate code [1].Code Cloning is considered harmful to the software quality [2].i.e if the code containing error is copied then the same error will be distributed across all the target code fragments.[1].Thus, it is important to develop approaches for clone detection in software systems.Code clones are divided into four classes [4]: Type I: This type is commonly referenced as exact clones.Clones fragments of type I are exactly identical code fragments.Variations in comments and white space are tolerated.
Type II: identical fragments from the structural and syntactical point of view and with variations in identifiers, literals, types, layout and comments.
Type III: Copied fragments with some modifications.The modifications consist on adding, changing and removing statements.
Type IV: Two or more code fragments that have the same behavior but implemented differently.
To solve the software maintenance efforts in development process it is important to propose efficient techniques to identify type-III and type-IV clones.Bee Bee Chua [4] in his work analyzed that Java, Python and C are the most preferred languages for implementing Open Source code like Apache, Mozilla and Ubuntu.To help developers that port application among C, Java & python clone detection is important technique.
The paper is organized as follows: Section 2 provides the related work, section 3 provides the Architecture Design & Algorithm, section 4 provides the results and discussions and section 5 limitations and 6 conclusions.

II. RELATED WORK
Based on the survey of Fang -Hsiang Su et al [5] Static Approaches  The C code will be converted into Java code using online converter www.mtsystems.comthen DAC takes input from prediction model and finds amount of fair, copied, ported and forged code snippet.Then necessary modifications will be done in either of the code to make it clone of cross language.The prediction model creates bar chart to indicate amount of lines that are part of clone.The same result will be displayed graphically to help developers monitor and analyze amount of porting taken place.
Fig. 2 shows steps to find frequency for analysis in second phase.Prediction model Generate the intelligent code comparators with respect to relevant languages.

IV. CONCLUSION
The proposed method detects all 4 types of clones in cross language under common umbrella and presents the results graphically that helps maintenance engineers to develop the porting analysis tools such as REPRTOIRE [3] that answers many questions such as i.What percentage of mainline commits is back ported?ii.What are the characteristics of back ported patches-bug fixes, feature additions, new functionalities, etc.? iii.How different is a back ported patch with respect to its original main-line patch?iv.How much time does it take to test a back ported patch?These questions could help us to understand the effort of maintaining parallel versions of a project.Studying bug report similarities in a product family.v.The proposed work helps developers involved in software porting to detect and correct porting and copying errors.

Fig. 3 shows
Fig. 3 shows Type-2 clone detection for C to Java code

II. Farouq Al-omari, Iman Keivanloo, Chanchal
Dynamic Approaches: Work done based on the dynamic profiling.Some of the techniques are listed below.I. F. Deissenboeck, L. Heinemann, B. Hummel, and S.

III. Yang Yuan and Yao Guo, Boreas [7] proposed
, which mitigates the problems in object oriented languages reported by prior work.The technique is implemented in system, HitoshiIO, which is open source and freely available.Experimental results show that Limitations 1.No accurate calculation of false positive rate 2. No results found for large codebase IV.Bayu Priyambadha, Siti Rochimah [8] proposed method clone detection based on PDG which that identifies similar methods in given large codebase Limitations 1. Wont detect type-IV clone 2. Static variables may not be detected properly 3. Applying the method for medium and large size may be challenging.V. Flavius-Mihai Lazar, Ovidiu Banias [9] proposed clone detection based on AST based method.That works on sequence detection and generalization algorithm.Limitations: 1. Works only on C code clone detection 2. Cannot be scaled on large data sets 3. Cannot be integrated to IDE VI.Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Simha Sethumadhavan [5] Technique that detects functional clones in arbitrary programs by identifying and mining their inputs and outputs.The key insight is to use existing workloads to execute programs and then measure functional similarities between programs based on their inputs and outputsVII.Chaiyong Ragkhitwetsagul,Measuring [10] Technique uses "Internet-scaled Similar Code Search (ISiCS)" framework is a code search framework that is scalable and resistant to code incompleteness Limitations 1.No results found on large code base 2. Reliability needs to be tested on frequency of false positive.VIII.Vaibhav Saini, Hitesh Sajnani, Jaewoo Kim, and Cristina Lopes [11] It is a token-based clone detector that targets the first three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation.It uses an optimized inverted-index to quickly query the potential clones of a given code block.Limitations 1. Wont detect near miss clones and type 4 clone 2. Reduced efficiency because of heuristic filtering.3.No enough results presented to prove efficiency IX.

RESULT AND DISCUSSION Experimental ResultsTable 1
Amount of copied and forged lines Sl.No.

Table 1
shows the similarity measures of three C and Java codes namely Clock, Counter & String print where tested for similarity