# Matching with wildcards (Java)

### From LiteratePrograms

## Contents |

## Overview

An implementation of string matching with wildcards, as described in Gusfield 1999:199. When used with a *longest common extension* computation with suffix trees, it allows for searching a pattern with *k* wildcards in a text *T* of length *m* in *O(km)* time (Gusfield 1999:199). This implementation uses a simple computation of the *longest common extension*.

## Implementation

The basic idea is to align the pattern at every position of the text, computing the *longest common extension* and ensuring that mismatches occur on positions with wildcards. The matching method takes two arguments: The text and the pattern. The asterisk (*) is used as the wildcard character. The method returns a collection of strings: the matches of the pattern in the text. The algorithm, as described in Gusfield 1999:199 (modified to fit the names of some variables renamed in the implementation), consists of four steps:

- "
*Step 1: Set j to 1 and h to i.*" (ib.) Let n be the length of p.

<<init>>=intj = 0;inth = i;intn = p.length();

- "
*Step 2: Compute the length L of the longest common extension starting at positions j of P and h of T*" (ib.)

<<compute_lce>>=intL = SimpleLongestCommonExtension.longestCommonExtension(p, j, t, h);

- "
*Step 3: if j + L = n + 1 then P occurs in T starting at i; stop.*" (ib.)

<<collect>>=if(j + 1 + L == n + 1){result.add(t.substring(i, i + n));break;}

- "
*Step 4: Check if a wildcard occurs in position j + L of P or position h + L in T. If so then set j to j + L + 1, set h to h + L + 1, and then go to step 2. Else, P does not occur in T starting at i; stop.*" (ib.)

<<match>>=if(((j + L)< p.length()&& p.charAt(j + L)== '*')||((h + L)< t.length()&& t.charAt(h + L)== '*')){j = j + L + 1; h = h + L + 1;}elsebreak;

## Usage

- A JUnit 4 unit test to demonstrate the usage:

<<test>>=Collection<String> results = getMatches("abentbananaend bend", "ben*"); assertEquals(Arrays.asList("bent", "bend"), results);

- The complete program:

<<MatchingWithWildcards.java>>=import static org.junit.Assert.assertEquals;import org.junit.Test;import java.util.ArrayList;import java.util.Arrays;import java.util.Collection;publicclassMatchingWithWildcards{publicstaticCollection<String> getMatches(String t, String p){Collection<String> result =newArrayList<String>();for(inti = 0; i < t.length(); i++){initwhile(true){compute_lce collect match}}returnresult;}@TestpublicvoidtestGetMatches(){test}}

- The required simple implementation of
*longest common extension*computation:

<<SimpleLongestCommonExtension.java>>=Longest common extension (Java)#8162#SimpleLongestCommonExtension.java

## References

- Gusfield, Dan (1999),
*Algorithms on Strings, Sequences and Trees*. Cambridge: University Press.

Download code |