Abstract:
An approach for identification of informational objects (IO) in automatic informational systems employed for data collection, storage, and processing is presented. Information systems consist of multiple nodes and acquire data from multiple sources. In majority of cases, a data array of informational systems is presented as continuously filled event's diary. Each event's record includes characteristics of the event's participant — IO — and of the event's conditions. In order to solve analytical problems related to IO, one should identify IO, i. e., define the array of IOs that are, with certain probability, the same entity. The paper defines typical IO identification tasks for elaboration of large-scale informational systems: IO fusion and IO clustering — forming an aggregate of IOs similar with respect to certain criteria. The identification task is closely connected to the task of identification of links between IOs, as the probability of IO's identity is higher if each IO is associated with another object. The methods for solving these tasks are presented, special features of IO identification in the flow of events are studied, and the correlation search method for detection of associations between IOs is described. The method for comparison of proper names considering probable distortions (phonetic and transcriptional) and misprints is presented. The efficacy of simultaneous Cyrillic and Latin first name – second name blocks application for personal identification is substantiated and the methods for translation from Cyrillic to Latin and vice versa are presented.
Keywords:
identification of informational objects; identification of objects; correlation search; search for associations; identity of objects; fusion of informational objects; fusion of objects; text attributes; data distortions; phonetic distortions; transcriptional errors; Latin to Cyrillic transcription; Cyrillic to Latin transcription; Metaphone; Levenstein's distance; spread systems; area-spread systems; hierarchical systems; flow of events.
Received: 26.02.2014
Bibliographic databases:
Document Type:
Article
Language: Russian
Citation:
M. M. Gershkovich, T. K. Birukova, “The tasks of identification of informational objects in area-spread data arrays”, Sistemy i Sredstva Inform., 24:1 (2014), 224–243
\Bibitem{GerBir14}
\by M.~M.~Gershkovich, T.~K.~Birukova
\paper The tasks of identification of~informational objects in~area-spread data arrays
\jour Sistemy i Sredstva Inform.
\yr 2014
\vol 24
\issue 1
\pages 224--243
\mathnet{http://mi.mathnet.ru/ssi339}
\crossref{https://doi.org/10.14357/08696527140114}
\elib{https://elibrary.ru/item.asp?id=21811519}
Linking options:
https://www.mathnet.ru/eng/ssi339
https://www.mathnet.ru/eng/ssi/v24/i1/p224
This publication is cited in the following 3 articles:
T. K. Biryukova, M. M. Gershkovich, “Metody optimizatsii skorosti vypolneniya funktsionalnykh zaprosov v avtomatizirovannykh informatsionnykh sistemakh s uchetom smyslovogo analiza informatsii”, Sistemy i sredstva inform., 33:4 (2023), 82–91
Vadym Mukhin, Valerii Zavgorodnii, Viacheslav Liskin, Sergiy Syrota, Vasyl Koval, Liudmyla Honchar, 2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), 2023, 1189
S. I. Suyatinov, A. M. Khudyakov, M. S. Uvarova, “A Regularization-Based Method of Identification of Information Objects”, Autom. Doc. Math. Linguist., 56:6 (2022), 324