String searching algorithm

related topics
{math, number, function}
{language, word, form}
{@card@, make, design}
{rate, high, increase}
{car, race, vehicle}

String searching algorithms, sometimes called string matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.

Let Σ be an alphabet (finite set). Formally, both the pattern and searched text are concatenations of elements of Σ. The Σ may be a usual human alphabet (for example, the letters A through Z in the latin alphabet). Other applications may use binary alphabet (Σ = {0,1}) or DNA alphabet (Σ = {A,C,G,T}) in bioinformatics.

In practice, how the string is encoded can affect the feasible string search algorithms. In particular if a variable width encoding is in use then it is slow (time proportional to N) to find the Nth character. This will significantly slow down many of the more advanced search algorithms. A possible solution is to search for the sequence of code units instead, but doing so may produce false matches unless the encoding is specifically designed to avoid it.

Contents

Basic classification

The various algorithms can be classified by the number of patterns each uses.

Single pattern algorithms

Let m be the length of the pattern and let n be the length of the searchable text.

1Asymptotic times are expressed using O, Ω, and Θ notation

The Boyer–Moore string search algorithm has been the standard benchmark for the practical string search literature.[1]

Algorithms using finite set of patterns

Full article ▸

related documents
Bounded set
Pre-Abelian category
Fermat's little theorem
Tuple
Square-free integer
Enriched category
Elementary function
Banach algebra
Linear classifier
Commutator
Floor and ceiling functions
Möbius function
Cyclone (programming language)
Twin prime
Möbius inversion formula
HMAC
Burali-Forti paradox
PSPACE
Connected space
ElGamal encryption
Caesar cipher
Fuzzy set
Principal ideal
Loss of significance
Binary space partitioning
Multiplication table
Discrete space
Principal ideal domain
Entailment
Initial and terminal objects