Аннотация:
We explore a probabilistic model of an artistic text: words of the text are chosen independently of each other
in accordance with a discrete probability distribution on an infinite dictionary. The words are enumerated 1, 2, $\ldots$,
and the probability of appearing the $i$'th word is asymptotically a power function.
Bahadur proved that in this case the number of different words as a function of the length of the text, again,
asymptotically behaves like a power function.
On the other hand, in the applied statistics community there are statements known as the Zipf’s and Heaps’ laws that are supported by empirical observations.
We highlight the links between Bahadur results and Zipf's/Heaps' laws, and
introduce and analyse a corresponding statistical test.
The work is supported by RFBR (grant 17-01-00683) and by the program of fundamental scientific researches of the SB RAS № I.1.3., project № 0314-2019-0008.
Поступила24 сентября 2019 г., опубликована 4 декабря 2019 г.
Образец цитирования:
M. G. Chebunin, A. P. Kovalevskii, “A statistical test for the Zipf's law by deviations from the Heaps' law”, Сиб. электрон. матем. изв., 16 (2019), 1822–1832
\RBibitem{CheKov19}
\by M.~G.~Chebunin, A.~P.~Kovalevskii
\paper A statistical test for the Zipf's law by deviations from the Heaps' law
\jour Сиб. электрон. матем. изв.
\yr 2019
\vol 16
\pages 1822--1832
\mathnet{http://mi.mathnet.ru/semr1170}
\crossref{https://doi.org/10.33048/semi.2019.16.129}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=000501163400009}
Образцы ссылок на эту страницу:
https://www.mathnet.ru/rus/semr1170
https://www.mathnet.ru/rus/semr/v16/p1822
Эта публикация цитируется в следующих 3 статьяx:
Berhane Abebe, Roy Cerqueti, “Application of elementary probability models for text homogeneity and segmentation: A case study of Bible”, PLoS ONE, 19:6 (2024), e0303432
M. G. Chebunin, “On the Accuracy of the Poissonisation in the Infinite Occupancy Scheme”, Sib. Electron. Math. Rep., 18:2 (2021), 1035–1045
A. Chakrabarty, M. G. Chebunin, A. P. Kovalevskii, I. M. Pupyshev, N. S. Zakrevskaya, Q. Zhou, “A statistical test for correspondence of texts to the Zipf—Mandelbrot law”, Сиб. электрон. матем. изв., 17 (2020), 1959–1974