KOMBINASI N-GRAM TOKEN BASED DAN DICE COEFFICIENT PADA SISTEM DETEKSI INDIKASI PLAGIARISME


Penulis Agung Toto Wibowo, Kadek Wisnu Sudarmadi, Ari M. Barmawi, Suyanto
Abstract “Plagiarism detection is one case that is often discussed these days. Several algorithms have been proposed to detect plagiarism. Algorithms such as Longest Common Subsequence (LCS), Edit Distance, Document Fingerprinting and Winnowing is an example of plagiarism detection algorithm. When LCS or Edit Distance applied to a document D1 with m tokens, and Document D2 with n tokens, then the complexity of matching become O(mxn). This complexity requires more processing time if the document must be matched on a large corpus.
The time complexity can be addressed with a different approach: histogram intersection computation. Similarity is ​​calculated using the Dice coefficient. Dice Coefficient calculation applied to documents that have been converted into N-Gram Based Token. From the test results shown that N-Gram token based with Dice coefficient produces the best accuracy of 96.11% of the dataset.”
Nama Konferensi Konferensi Nasional Sistem Informasi 2013
Penyelenggara STMIK Bumigora Mataram
Tanggal Mataram,
Link Konferensi http://www.knsi.us/
ISSN / ISBN
Link Paper / Download https://www.dropbox.com/sh/hhbyf928efew1fo/6qjlbedjuO/2013/Proceedings%20KNSI%202013.pdf
,

Leave a Reply