Designing a system that supports both exact matches and wildcard searching is a classic interview challenge. While a simple hash set works for exact matches, it fails the moment you introduce a . wildcard. ๐งฉ
The key to solving this efficiently lies in the Trie (Prefix Tree), combined with a touch of Depth-First Search (DFS) to handle the branching logic of wildcards. ๐
๐งฉ The Problem
Design a data structure that supports adding new words and finding if a string matches any previously added string.
wordmay contain dots'.'where dots can be matched with any letter.
Example
Input: ["WordDictionary","addWord","addWord","search","search"]
Args: [[],["bad"],["dad"],["bad"],[".ad"]]
Output: [null,null,null,true,true]
๐ ๏ธ Approach: Trie + Recursive DFS
To support prefix-based storage and wildcard matching, we use a Trie.
- Storage: Each node in the Trie represents a character.
- Search:
- If the character is a letter (e.g.,
'a'), follow that specific child. - If the character is a wildcard (
'.'), we must try all possible children at that level.
- If the character is a letter (e.g.,
Why Simpler Wins in Python ๐
In Python, "clever" often loses to "simple." Our first instinct might be to create a TrieNode class. However, using plain dictionaries is significantly faster because CPython internals are highly optimized for hash maps. No object overhead means fewer operations per character.
๐ Visualizing the Approach (Dry Run)
Let's look at how the Trie evolves with addWord("bad"), addWord("dad"), and then a search(".ad").
| Step | Operation | Trie Structure (Simplified) | Result |
|---|---|---|---|
| 1 | addWord("bad") |
root -> b -> a -> d -> $ |
Word stored |
| 2 | addWord("dad") |
root -> {b, d} -> a -> d -> $ |
Shared branches |
| 3 | search("pad") |
root has no p |
False |
| 4 | search(".ad") |
. tries b branch: a -> d -> $ |
True โ
|
๐ป Solution
This implementation uses a nested dictionary structure for the Trie, which achieves 97%+ performance on LeetCode.
class WordDictionary(object):
def __init__(self):
# A simple dict to store our Trie levels
self.trie = {}
def addWord(self, word):
"""
:type word: str
:rtype: None
"""
node = self.trie
for ch in word:
if ch not in node:
node[ch] = {}
node = node[ch]
node['$'] = True # '$' marks the end of a valid word
def search(self, word):
"""
:type word: str
:rtype: bool
"""
return self._dfs(word, 0, self.trie)
def _dfs(self, word, i, node):
# Base Case: We've matched every character in the word
if i == len(word):
return '$' in node
ch = word[i]
if ch == '.':
# Wildcard: Recurse into every possible child
for child in node:
if child != '$' and self._dfs(word, i + 1, node[child]):
return True
return False
# Exact Match: Follow the path or return False
if ch not in node:
return False
return self._dfs(word, i + 1, node[ch])โฑ๏ธ Complexity Analysis
| Complexity | Rating | Description |
|---|---|---|
| Time (addWord) | is the length of the word being added. | |
| Time (search) | is the number of dots. Worst case: we branch 26 times per dot. | |
| Space | is the total number of characters across all unique words. |
[!NOTE] Since the constraints specify at most 2 dots per search, the branching factor is capped at , making it extremely efficient in practice.
๐ง How to Recognize This Pattern
Whenever you see:
- Prefix-based searching or "starting with" requirements.
- Wildcard/Regex-lite matching in a set of strings.
- Need for memory-efficient storage of large vocabularies.
Think: "Is this a Trie problem?"
This same pattern is used in:
- Implement Trie (Prefix Tree)
- Word Search II
- Autocomplete systems.
๐ฏ Interview Tips
- Pythonic Optimization: Mention that you're using nested dictionaries instead of a
TrieNodeclass to reduce object creation overhead. It shows you understand language-specific performance. - The Base Case: Be careful with the DFS base case. Matching the length of the string isn't enough; you must also check if the current node is marked as the end of a word (
'$'). - Read Constraints: Notice the "at most 2 dots" constraint. This tells you that a recursive DFS approach won't lead to a TLE (Time Limit Exceeded) because the branching factor is small. ๐
