TECHNOLOGIES OF CREATING SPELL CHECKER

Authors

DOI:

https://doi.org/10.14308/ite000698

Keywords:

spell checker, language model, regular expression, programming language Python, teaching programming

Abstract

Spell checkers are created to control and correct mistakes in a user document. They are based on the comparison of every word against the spelling dictionary and on the use of correct spelling detection algorithms. The article dwells on technologies of creating spell checker, as well as methods of teaching this technology. Spell checker by Peter Norvig has been studied. Modifications for this program necessary to process Ukrainian texts have been defined. Approach to implementation of language model, that is creating spelling dictionary, based on the Ukrainian Brown Corpus has been suggested. Peculiarities of designing a regular expression for distinguishing words in Ukrainian text have been defined. Texts containing Ukrainian subtitles, created within the volunteer translation project «To Be Announced», have been used as a means of test material for the spell checker. The program that processes this text material in order to check spelling has been described and the obtained results have been analysed. The obtained resulted were concluded to be correct, which encourages further research.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

<uk>
1. Doyle, A. (2018). The balance careers. Software Engineer Skills List. Retrieved from https://www.thebalancecareers.com/software-engineer-skills-list-2062483.
2. IT career finder (2018). Computer Programmer. Retrieved from https://www.itcareerfinder.com/it-careers/computer-programmer.html.
3. Gudmundsson, J. & Menkes, F. (2018). Swedish Natural Language Processing with Long Short-term Memory Neural Networks: A Machine Learning-powered Grammar and Spell-checker for the Swedish Language. Retrieved from http://lnu.diva-portal.org/smash/get/diva2:1232482/FULLTEXT01.pdf.
4. Nejja, M. & Yousfi, A. (2018). The vocabulary and the morphology in spell checker. Retrieved from https://www.sciencedirect.com/science/article/pii/S187705091830111X#.
5. Choudhury, M., Thomas, M., Mukherjee, A., Basu, A. & Ganguly, N. (2007). How difficult is it to develop a perfect spell-checker? A Cross-linguistic Analysis through Complex Network Approach. Retrieved from https://arxiv.org/abs/physics/0703198.
6. Cappelatti, E., De Oliveira Heidrich, R., Oliveira, R., Monticelli, C., Rodrigues, R., Goulart, R. & Velho, E. (2018). Post-correction of OCR Errors Using PyEnchant Spelling Suggestions Selected Through a Modified Needleman–Wunsch Algorithm. Retrieved from https://link.springer.com/ chapter/10.1007/978-3-319-92270-6_1.
7. Стрюк, А. М., Семеріков, С. О. & Тарасов, І. В. (2015). Компетентність бакалавра інформатики з програмування. Інформаційні технології і засоби навчання, 2(46), 91-108. Відновлено з http://lib.iitta.gov.ua/11134/1/1225-4544-1-PB.pdf.
8. Спірін, О. М., & Вакалюк, Т. А. (2017). Критерії добору відкритих Web-opiєнтованих технологій навчання основ програмування майбутніх учителів інформатики. Інформаційні технології і засоби навчання, 4 (60), 275-287. Відновлено з https://journal.iitta.gov.ua/index.php/ itlt/article/view/1815.
9. Проскура, С. Л. & Литвинова, С. Г. (2018). Підготовка фахівців з інформаційних технологій у закладах вищої освіти: стан, проблеми і перспективи. Інформаційні технології в освіті, 2(35), 72-88. Відновлено з http://ite.kspu.edu/issue_35/p-72-88.
10. Кривонос, О. М. (2014). Використання задачного підходу в процесі навчання програмування майбутніх учителів інформатики. Інформаційні технології і засоби навчання, 2(40), 83-91. Відновлено з https://journal.iitta.gov.ua/index.php/itlt/article/view/1005.
11. Chollampatt, S. & Hwee Tou Ng. (2018). A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction. Retrieved from https://arxiv.org/abs/1801.08831.
12. Norvig, P. (2016). How to Write a Spelling Corrector. Retrieved from http://norvig.com/spell-correct.html.
13. NLTK 3.4 documentation (2018). Source code for nltk.corpus.reader.plaintext. Retrieved from http://www.nltk.org/_modules/nltk/corpus/reader/plaintext.html.
14. NLTK 3.4 documentation (2018). Source code for nltk.corpus.reader.api. Retrieved from https://www.nltk.org/_modules/nltk/corpus/reader/api.html#CorpusReader.f....
15. Семеріков, С. О. (2009). Теоретико-методичні основи фундаменталізації навчання інформатичних дисциплін у вищих навчальних закладах (автореф. дис. ... д-ра пед. наук : 13.00.02). Нац пед. ун-т ім. М. П. Драгоманова, Київ. Відновлено з http://enpuir.npu.edu.ua/bitstream/123456789/226/3/Semerikov.pdf.
</uk>
<en>
1. Doyle, A. (2018). The balance careers. Software Engineer Skills List. Retrieved from https://www.thebalancecareers.com/software-engineer-skills-list-2062483.
2. IT career finder (2018). Computer Programmer. Retrieved from https://www.itcareerfinder.com/it-
careers/computer-programmer.html.
3. Gudmundsson, J. & Menkes, F. (2018). Swedish Natural Language Processing with Long Short-term Memory Neural Networks: A Machine Learning-powered Grammar and Spell-checker for the Swedish Language. Retrieved from http://lnu.diva-portal.org/smash/get/diva2:1232482/FULLTEXT01.pdf.
4. Nejja, M. & Yousfi, A. (2018). The vocabulary and the morphology in spell checker. Retrieved from https://www.sciencedirect.com/science/article/pii/S187705091830111X#.
5. Choudhury, M., Thomas, M., Mukherjee, A., Basu, A. & Ganguly, N. (2007). How difficult is it to develop a perfect spell-checker? A Cross-linguistic Analysis through Complex Network Approach. Retrieved from https://arxiv.org/abs/physics/0703198.
6. Cappelatti, E., De Oliveira Heidrich, R., Oliveira, R., Monticelli, C., Rodrigues, R., Goulart, R. & Velho, E. (2018). Post-correction of OCR Errors Using PyEnchant Spelling Suggestions Selected Through a Modified Needleman–Wunsch Algorithm. Retrieved from https://link.springer.com/chapter/10.1007/978-3-319-92270-6_1.
7. Striuk, A., Semerikov, S. & Tarasov, I. (2015). Bachelor of informatics competence in programming. Information Technologies and Learning Tools, 2(46), 91-108. Retrieved from http://lib.iitta.gov.ua/11134/1/1225-4544-1-PB.pdf.
8. Spirin, O., & Vakaliuk, T. (2017). Criteria of open web-operated technologies of teaching the fundamentals of programs of future teachers of informatics. Information Technologies and Learning Tools, 4 (60), 275-287. Retrieved from https://journal.iitta.gov.ua/index.php/ itlt/article/view/1815.
9. Proskura, S. & Lytvynova, S. (2018). Information technologies specialists training in higher education institutions of ukraine: general state, problems and perspectives. Information Technologies in Education, 2(35), 72-88. Retrieved from http://ite.kspu.edu/issue_35/p-72-88.
10. Kryvonos, O. (2014). Using of task approach method while teaching programming to the future informatics teachers. Information Technologies and Learning Tools, 2(40), 83-91. Retrieved from https://journal.iitta.gov.ua/index.php/itlt/article/view/1005.
11. Chollampatt, S. & Hwee Tou Ng. (2018). A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction. Retrieved from https://arxiv.org/abs/1801.08831.
12. Norvig, P. (2016). How to Write a Spelling Corrector. Retrieved from http://norvig.com/spell-correct.html.
13. NLTK 3.4 documentation (2018). Source code for nltk.corpus.reader.plaintext. Retrieved from http://www.nltk.org/_modules/nltk/corpus/reader/plaintext.html.
14. NLTK 3.4 documentation (2018). Source code for nltk.corpus.reader.api. Retrieved from https://www.nltk.org/_modules/nltk/corpus/reader/api.html#CorpusReader.f....
15. Semerikov, S. (2009). Theoretical and methodic foundations of fundamentalization teaching of the Computer Science at the high educational institutions (abstract of Doctor’s of Pedagogical Sciences Thesis). National Dragomanov Pedagogical University, Kyiv. Retrieved from http://enpuir.npu.edu.ua/bitstream/123456789/226/3/Semerikov.pdf.
</en>

Published

25.03.2019

How to Cite

Riezina О. В., & Kosiuh Р. М. (2019). TECHNOLOGIES OF CREATING SPELL CHECKER. Journal of Information Technologies in Education (ITE), (39), 78–88. https://doi.org/10.14308/ite000698