Text Justify
Text Justification problem is known by various names. These include word wrap and typographical alignment.
When using word processing tools like Microsoft word, they all have different text alignment options e.g. left, center, right & justify where the paragraph looks like a block of text.
This problem is about choosing which words go onto which lines so your paragraph looks prettier.
The genesis of solving this problem is finding out how we go about measuring how good or bad a particular paragraph looks.
Essentially, we need a function that will take a line and subsequently return it as a score of how ugly that particular line looks.
The idea is that if you have more spaces at the end of your line the higher the score of that function.
The best bit of this problem is recognizing that this specific problem can be solved using a dynamic approach. To achieve this, we need to establish what the sub-problems are and if they are recurring.
Afterwards, we come up with a recurrence relationship for the text-justify problem.
Solution: Recursive Text Justify
The provided code defines a class TextjustifyRec that implements a recursive algorithm for text justification. The goal is to format a given text into lines of a specified length (line_length) while minimizing the "ugliness" of each line.
The "ugliness" of a line is calculated based on the square of the difference between the line length and the specified length.
import sys # Import the sys module for sys.maxsize
class TextjustifyRec:
def __init__(self, txt, line_length):
"""
Constructor initializes the class with a list of words and the specified line length.
Parameters:
- txt (list): List of words in the text.
- line_length (int): The specified line length.
"""
self.txt = txt
self.line_length = line_length
def ugly_score(self, txt_length):
"""
Calculate the 'ugliness' score for a given line length.
Parameters:
- txt_length (int): Length of the line.
Returns:
- int: Ugliness score.
"""
if txt_length <= self.line_length:
return (self.line_length - txt_length) ** 2
else:
return sys.maxsize
def count_chars(self, fr, to):
"""
Count the total number of characters in a range of words.
Parameters:
- fr (int): Start index.
- to (int): End index.
Returns:
- int: Total number of characters.
"""
total_chars = 0
for i in range(fr, to):
total_chars += len(self.txt[i])
if i < to - 1:
total_chars += 1 # Add one extra character for spaces between words
return total_chars
def format_txt(self, i):
"""
Recursive method to calculate the total 'ugliness' score for formatting the text.
Parameters:
- i (int): Current index.
Returns:
- int: Total 'ugliness' score.
"""
if i == len(self.txt):
return 0 # Base case: end of the text
score = sys.maxsize
for x in range(i + 1, len(self.txt) + 1):
line_len = self.count_chars(i, x)
curr_score = self.ugly_score(line_len)
curr_score += self.format_txt(x)
score = min(curr_score, score)
return score
# Example usage:
justify = TextjustifyRec("Isabel sat on the step".split(), 10)
print(justify.format_txt(0))
Code Walkthrough
Initialization
- The TextjustifyRec class is initialized with a list of words (txt) and the specified line length (line_length).
ugly_score Method
- The ugly_score method calculates the "ugliness" score for a given line length (txt_length).
- If the line length is less than or equal to the specified line length, it calculates the square of the difference.
- If the line length exceeds the specified length, it returns a large value (sys.maxsize) indicating that this configuration is not valid.
count_chars Method
- The count_chars method calculates the total number of characters in a range of words from index fr to to.
- It accounts the length of each word and adds one extra character for spaces between words.
format_txt Method
- The format_txt method is the main recursive function that calculates the total "ugliness" score for formatting the text.
- It iterates through possible line breaks, calculating the "ugliness" score for each configuration.
- The minimum score among all configurations is selected.
- The base case is when the end of the text is reached (i == len(self.txt)), and the score is set to 0.
Example Usage
- An instance of TextjustifyRec is created with the text "Isabel sat on the step" and a line length of 10.
- The format_txt method is called with the starting index 0, and the result is printed.
Time Complexity
The time complexity of the provided recursive solution is exponential, specifically O(2^n), where n is the length of the input text.
This is because, for each word in the text, the algorithm explores two possibilities (include the word in the current line or start a new line).
The recursive nature leads to an exponential number of recursive calls, resulting in a time complexity that grows rapidly with the size of the input.
Space Complexity
The space complexity is determined by the maximum depth of the recursive call stack.
In the worst case, the depth of the recursive calls is equal to the length of the input text, leading to a space complexity of O(n),
where n is the length of the input text.
Each recursive call consumes space on the call stack.
Solution: Bottom-Up Dynamic Programming
import sys
class TextJustifyDyn:
def __init__(self, txt, line_length):
"""
Constructor initializes the class with a list of words and the specified line length.
Parameters:
- txt (list): List of words in the text.
- line_length (int): The specified line length.
"""
self.txt = txt
self.line_length = line_length
def ugly_score(self, txt_length):
"""
Calculate the 'ugliness' score for a given line length.
Parameters:
- txt_length (int): Length of the line.
Returns:
- int: Ugliness score.
"""
if txt_length <= self.line_length:
return (self.line_length - txt_length) ** 2
else:
return sys.maxsize
def count_chars(self, fr, to):
"""
Count the total number of characters in a range of words.
Parameters:
- fr (int): Start index.
- to (int): End index.
Returns:
- int: Total number of characters.
"""
total_chars = 0
for i in range(fr, to):
total_chars += len(self.txt[i])
if i < to - 1:
total_chars += 1 # Add one extra character for spaces between words
return total_chars
def format_txt(self):
"""
Dynamic programming method to format the text with minimum 'ugliness' score.
Returns:
- int: Total 'ugliness' score.
"""
scores = [0] * (len(self.txt) + 1)
ptrs = [0] * len(self.txt)
for i in range(len(self.txt) - 1, -1, -1):
score = sys.maxsize
for j in range(i + 1, len(self.txt) + 1):
curr_score = self.ugly_score(self.count_chars(i, j)) + scores[j]
if curr_score < score:
score = curr_score
ptrs[i] = j
scores[i] = score
self.print_txt(ptrs)
return scores[0]
def print_txt(self, ptrs):
"""
Print the formatted text based on the pointers.
Parameters:
- ptrs (list): List of pointers indicating line breaks.
"""
i = 0
while i < len(ptrs):
line = self.txt[i:ptrs[i]]
print(" ".join(line))
i = ptrs[i]
justify = TextJustifyDyn("Isabel sat on the step".split(), 10)
print(justify.format_txt())
format_txt Method
The format_txt method is the dynamic programming approach to format the text with minimum "ugliness" score.
It uses a bottom-up approach, calculating scores and storing pointers to optimal line breaks in reverse order.
The print_txt method is called to print the formatted text based on the obtained pointers.
print_txt Method
The print_txt method prints the formatted text based on the pointers indicating line breaks.
Time Complexity
The time complexity is determined by the nested loops in the format_txt method.
In the worst case, it iterates over all possible combinations of indices, leading to a time complexity of O(n^2), where n is the length of the input text.
Space Complexity
The space complexity is mainly influenced by the two arrays, scores and ptrs, used to store intermediate results and pointers.
The lengths of these arrays are proportional to the length of the input text. Therefore, the space complexity is O(n), where n is the length of the input text.
Solution: Optimized Bottom-Up
import sys
class TextJustifyOpt:
def __init__(self, txt, line_length):
"""
Constructor initializes the class with a list of words and the specified line length.
Parameters:
- txt (list): List of words in the text.
- line_length (int): The specified line length.
"""
self.txt = txt
self.line_length = line_length
self.word_length = [[]] * len(txt)
# Populate the word_length matrix
for i in range(len(txt)):
self.word_length[i] = [0] * len(txt)
self.word_length[i][i] = len(txt[i])
for j in range(i + 1, len(txt)):
self.word_length[i][j] = self.word_length[i][j - 1] + 1 + len(txt[j])
# Print the word_length matrix
for arr in self.word_length:
print(arr)
def ugly_score(self, txt_length):
"""
Calculate the 'ugliness' score for a given line length.
Parameters:
- txt_length (int): Length of the line.
Returns:
- int: Ugliness score.
"""
if txt_length <= self.line_length:
return (self.line_length - txt_length) ** 2
else:
return sys.maxsize
def format_txt(self):
"""
Dynamic programming method to format the text with minimum 'ugliness' score.
Returns:
- int: Total 'ugliness' score.
"""
scores = [0] * (len(self.txt) + 1)
ptrs = [0] * len(self.txt)
for i in range(len(self.txt) - 1, -1, -1):
score = sys.maxsize
for j in range(i + 1, len(self.txt) + 1):
curr_score = self.ugly_score(self.word_length[i][j - 1]) + scores[j]
if curr_score < score:
score = curr_score
ptrs[i] = j
scores[i] = score
self.print_txt(ptrs)
return scores[0]
def print_txt(self, ptrs):
"""
Print the formatted text based on the pointers.
Parameters:
- ptrs (list): List of pointers indicating line breaks.
"""
i = 0
while i < len(ptrs):
line = self.txt[i:ptrs[i]]
print(" ".join(line))
i = ptrs[i]
# Example usage:
justify = TextJustifyOpt("Isabel sat on the step".split(), 10)
print(justify.format_txt())
Time Complexity
The initialization loop runs in O(n^2) time, where n is the length of the input text.
This is because it has two nested loops to calculate and populate the word_length matrix.
The dynamic programming loop(format_txt method) also has nested loops but only iterates over each element once.
It runs in O(n^2) time, similar to the initialization loop. As a result, the overall time complexity is O(n^2).
Space Complexity
The space complexity is influenced by the word_length matrix, which has dimensions len(txt) x len(txt). Therefore, its space complexity is O(n^2).
The space complexity for the scores and ptrs lists is O(n).
Other variables used in the class have constant space requirements.
Therefore, the overall space complexity is O(n^2).