String Pattern Matching Algorithms

An \(\textbf{alphabet }\)\(\Sigma\) is a finite set. A \(\textbf{symbol}\) is an element \(s \in \Sigma\). A \(\textbf{string}\) is a finite sequence of symbols, i.e, elements of an alphabet. The set of all strings over \(\Sigma\) is denoted by \(\Sigma^*\). We say a string \(p \in \Sigma^*\) is a \(\textbf{pattern}\) over a fixed alphabet \(\Sigma\) if \(p\) consists of symbols from \(\Sigma\) Let \(s\) be a string and denote \(s := s_1\cdots s_n\). We say that a subsequence \(s_i\cdots s_j\) of \(s\) is a \(\textbf{substring}\) of \(s\). A \(\textbf{ocurrence}\) of \(u\) in \(s\) is a pair \((i,j)\) such that \(u := s_i \cdots s_j\) is a substring of \(s\).

Let \(P\) be a set of patterns over a fixed alphabet \(\Sigma\) and let \(T\) be a fixed string. The Multiple Pattern String Matching (MPSM) is the problem of finding all occurrences of all patterns of \(P\) in \(T\). The Single Pattern String Matching (SPSM) is a special case of (MPSM) by adding the constraint \(|P| = 1\).

The Multiple Pattern String Matching is one of the basic string algorithms problems and several algorithms have been proposed to solve it. There are practical solutions to real-world problems that can be developed using these algorithms, including, but not limited to, intrusion detection systems, evolutionary biology, computational linguistics, and data retrieval.


Proposal

Poster

Monography

Implementations