Algorithms

Suffix Trie Construction

5 months ago ; 78 views
Share this

Suffix Trie Construction

This solution implements a data structure called a Suffix Trie, which is used for efficiently storing and searching for substrings within a given string.

Problem Solution

The Suffix Trie is designed to solve the problem of efficiently storing all the suffixes of a given string and quickly determining if a given substring exists within the original string.

class suffixTrie:
	def __init__(self, string):
		self.root ={}
		self.endSymbol="*"
		self.populateSuffixTrieFrom(string)
		
	# O(n^2) time | O(n^2) space
	def populateSuffixTrieFrom(self, string):
		for i in range(len(string)):
			self.insertSubstringStartingAt(i, string)
			
	def insertSubstringStartingAt(self, i, string):
		node = self.root
		for j in range(i, len(string)):
			letter =string[j]
			if letter not in node:
				node[letter] ={}
			node =node[letter]
		node[self.endSymbol] = True
	
	# O(m) time | O(1) space
	def contains(self, string):
		node = self.root
		for letter in string:
			if letter not in node:
				return False
			node =node[letter]
		return self.endSymbol in node

Solution Analysis

I will now walk you through how this solution works.

Initialization (__init__)

Initializes the Suffix Trie with an empty root node and an end symbol to mark the end of each substring. The constructor then calls the populateSuffixTrieFrom method to populate the trie with all the suffixes of the input string.

Populate Suffix Trie (populateSuffixTrieFrom)

This method iterates over each character in the input string and calls the insertSubstringStartingAt method to insert all suffixes starting from that character into the trie.

Insert Substring (insertSubstringStartingAt)

This method inserts a substring starting from a given index i into the trie. It iterates over the characters of the substring and creates new nodes in the trie as necessary.

Contains (contains)

This method checks whether a given substring exists in the trie. It iterates over the characters of the substring and traverses the trie accordingly. If the substring is found in the trie, it returns True; otherwise, it returns False.

Time Complexity

 The time complexity of populating the suffix trie(populateSuffixTrieFrom) is O(n^2), where n is the length of the input string. This is because each character of the string is considered as the starting point of a suffix, and for each starting point, all subsequent characters are inserted into the trie.

The time complexity of inserting a substring(insertSubstringStartingAt) into the trie is O(m), where m is the length of the substring being inserted.

 The time complexity of searching for a substring(contains) in the trie is O(m), where m is the length of the substring being searched for.

Space Complexity

The space complexity of the suffix trie is O(n^2), where n is the length of the input string. This is because the trie may store up to n^2 nodes in the worst case, where each node represents a character in a substring of the input string.

Application of the Suffix Trie

The Suffix Trie data structure provides an efficient way to store all suffixes of a string and quickly search for substrings within the original string. It is particularly useful in applications such as text processing, pattern matching, and bioinformatics.

Unit Tests

Here are some unit tests to test the robustness of this solution. These tests cover various scenarios including:

    Testing if existing substrings are found in the trie.
    Testing if non-existing substrings are not found in the trie.
    Testing for edge cases such as empty strings, single character strings, and large strings.

import unittest

class TestSuffixTrie(unittest.TestCase):
    def setUp(self):
        self.trie = suffixTrie("banana")

    def test_contains_existing_substrings(self):
        self.assertTrue(self.trie.contains("banana"))
        self.assertFalse(self.trie.contains("ban"))
        self.assertTrue(self.trie.contains("ana"))
        self.assertTrue(self.trie.contains("nana"))
        self.assertTrue(self.trie.contains("a"))

    def test_contains_non_existing_substrings(self):
        self.assertFalse(self.trie.contains("apple"))
        self.assertFalse(self.trie.contains("bana"))
        self.assertFalse(self.trie.contains("ananas"))
        self.assertFalse(self.trie.contains(""))

    def test_contains_empty_string(self):
        empty_trie = suffixTrie("")
        self.assertFalse(empty_trie.contains(""))  # Empty string should be present in the trie
        self.assertFalse(empty_trie.contains("a"))  # Any non-empty string should not be present in the trie

    def test_contains_edge_cases(self):
        # Test for single character
        single_char_trie = suffixTrie("x")
        self.assertTrue(single_char_trie.contains("x"))
        self.assertFalse(single_char_trie.contains("y"))

        # Test for large string
        large_str_trie = suffixTrie("abcdefghijklmnopqrstuvwxyz")
        self.assertTrue(large_str_trie.contains("wxyz"))
        self.assertTrue(large_str_trie.contains("xyz"))
        self.assertTrue(large_str_trie.contains("z"))
        self.assertFalse(large_str_trie.contains("aaaaa"))

if __name__ == '__main__':
    unittest.main()

 

 

Become a member
Get the latest news right in your inbox. We never spam!

Read next

Island Perimeter

 This solution seeks to find the perimeter of an island in a grid.  The problem considers a grid where each cell represents land (1) or … Read More

Kibsoft 3 months, 3 weeks ago . 76 views

Pacific Atlantic Waterflow

3 months, 3 weeks ago . 82 views