S. V. Znamenskij, “Stable assessment of the quality of similarity algorithms of character strings and their normalizations”, Program Systems: Theory and Applications, 9:4 (2018), 561

Loading [MathJax]/jax/output/SVG/config.js

Program Systems: Theory and Applications

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Guidelines for authors
	Submit a manuscript

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Program Systems: Theory and Applications:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Program Systems: Theory and Applications, 2018, Volume 9, Issue 4, Pages 561–578
DOI: https://doi.org/10.25209/2079-3316-2018-9-4-561-578 (Mi ps328)

Mathematical Foundations of Programming

Stable assessment of the quality of similarity algorithms of character strings and their normalizations

S. V. Znamenskij

Ailamazyan Program Systems Institute of Russian Academy of Sciences

Full-text PDF (3914 kB)

References:

PDF

HTML

DOI: https://doi.org/10.25209/2079-3316-2018-9-4-561-578

Abstract: The choice of search tools for hidden commonality in the data of a new nature requires stable and reproducible comparative assessments of the quality of abstract algorithms for the proximity of symbol strings. Conventional estimates based on artificially generated or manually labeled tests vary significantly, rather evaluating the method of this artificial generation with respect to similarity algorithms, and estimates based on user data cannot be accurately reproduced.
A simple, transparent, objective and reproducible numerical quality assessment of a string metric. Parallel texts of book translations in different languages are used. The quality of a measure is estimated by the percentage of errors in possible different tries of determining the translation of a given paragraph among two paragraphs of a book in another language, one of which is actually a translation. The stability of assessments is verified by independence from the choice of a book and a pair of languages.
The numerical experiment steadily ranked by quality algorithms for abstract character string comparisons and showed a strong dependence on the choice of normalization.

Key words and phrases: string similarity, data analysis, similarity metric, distance metric, numeric evaluation, quality assessment.

Received: 17.04.2018
03.12.2018
Accepted: 28.12.2018

Document Type: Article

UDC: 519.652.3

Language: English

Citation: S. V. Znamenskij, “Stable assessment of the quality of similarity algorithms of character strings and their normalizations”, Program Systems: Theory and Applications, 9:4 (2018), 561–578

Citation in format AMSBIB

\Bibitem{Zna18}

\by S.~V.~Znamenskij

\paper Stable assessment of the quality of similarity algorithms

of character strings and their normalizations

\jour Program Systems: Theory and Applications

\yr 2018

\vol 9

\issue 4

\pages 561--578

\mathnet{http://mi.mathnet.ru/ps328}

\crossref{https://doi.org/10.25209/2079-3316-2018-9-4-561-578}

Linking options:

https://www.mathnet.ru/eng/ps328

https://www.mathnet.ru/eng/ps/v9/i4/p561

Translation

Stable assessment of the quality of similarity algorithms of character strings and their normalizations
S. V. Znamenskij
Program Systems: Theory and Applications, 2018, 9:4, 579–596

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Program Systems: Theory and Applications

Statistics & downloads:
Abstract page:	183
Full-text PDF :	69
References:	32

Registration to the website

Logotypes