Analysing New Entropy Measures for Tries
Published in International Symposium on String Processing and Information Retrieval, pp. 18-27. Cham: Springer Nature Switzerland, 2025
We introduce new entropy measures for tries taking into account the distribution of the edge labels. To do that, we study the combinatorial problem of counting the number of tries with a given symbol distribution. We provide an alternative proof for the closed formula counting the tries belonging to such a class. This formula allows us to directly define the worst-case entropy for the aforementioned set of tries. Moreover, we propose a new notion of k-th order empirical entropy for tries and we show that the relationships between these two entropy measures are similar to those between the corresponding well-known measures for strings. Contrary to the label entropy [FOCS ‘05], which was designed to compress the labels of a node-labeled ordered tree, our empirical entropy considers not only the labels, but the entire structure of the trie. Finally, we relate our empirical entropy to the repetitiveness measure r proposed by Prezza [SODA ‘21], which counts the number of runs in the XBWT of the trie. We show that these measures exhibit relations analogous to those of their string counterparts.
