An Axiomatic Approach to the Notion of Similarity of Individual Sequences and their Classification
Speaker: Jacob Ziv, Technion- Israel Institute of Technology
Series: Topical Seminars
Location: Engineering Quadrangle B205
Date/Time: Thursday, May 12, 2011, 10:30 a.m. - 11:30 a.m.
An axiomatic approach to the notion of similarity of sequences, that seems to be natural in many cases (e.g. Phylogenetic analysis), is proposed. Despite of the fact that it is not assume that the sequences are a realization of a probabilistic process (e.g. a variable-order Markov process), it is demonstrated that any classifier that fully complies with the proposed similarity axioms must be based on modeling of the training data that is contained in a (long) individual training sequence via a suffix tree with no more than O(N) leaves (or, alternatively, a table with O(N) entries) where N is the length of the test sequence. Some common classification algorithms may be slightly modified to comply with the proposed axiomatic conditions and the resulting organization of the training data, thus yielding a formal justification for their good empirical performance without relying on any a-priori (sometimes unjustified) probabilistic assumption. One such case is discussed in details.
Jacob Ziv is Distinguished Professor at the Technion. His research interests include data-compression, information theory and statistical communication. Together with Abraham Lempel he is the inventor of the most widely used class of data compression algorithms. Ziv was the Chairman of the Israeli Universities Planning and Grants Committee from 1985 to 1991 and the President of the Israel Academy of Sciences and Humanities from 1995 to 2004. He is a foreign associate of the United States National Academy of Sciences, and the recipient of the 1993 Israel Prize, the 1995 IEEE Richard W. Hamming Medal, the 1997 Claude E. Shannon Award, and the 2008 BBVA Foundation Frontiers of Knowledge Award for Information and Communication Technologies.