Data Structures and Algorithms

Text Justify

1 year, 1 month ago ; F(visit_count) + Value(1) views
Share this

Text Justify

Text Justification problem is known by various names. These include word wrap and typographical alignment.

When using word processing tools like Microsoft word, they all have different text alignment options e.g. left, center, right & justify where the paragraph looks like a block of text.

This problem is about choosing which words go onto which lines so your paragraph looks prettier.

The genesis of solving this problem is finding out how we go about measuring how good or bad a particular paragraph looks.

Essentially, we need a function that will take a line and subsequently return it as a score of how ugly that particular line looks.

The idea is that if you have more spaces at the end of your line the higher the score of that function.

The best bit of this problem is recognizing that this specific problem can be solved using a dynamic approach. To achieve this, we need to establish what the sub-problems are and if they are recurring.

Afterwards, we come up with a recurrence relationship for the text-justify problem.

Solution: Recursive Text Justify

The provided code defines a class TextjustifyRec that implements a recursive algorithm for text justification. The goal is to format a given text into lines of a specified length (line_length) while minimizing the "ugliness" of each line.

The "ugliness" of a line is calculated based on the square of the difference between the line length and the specified length.

 

import sys  # Import the sys module for sys.maxsize

class TextjustifyRec:
    def __init__(self, txt, line_length):
        """
        Constructor initializes the class with a list of words and the specified line length.

        Parameters:
             - txt (list): List of words in the text.
             - line_length (int): The specified line length.
        """
        self.txt = txt
        self.line_length = line_length

    def ugly_score(self, txt_length):
        """
        Calculate the 'ugliness' score for a given line length.

        Parameters:
             - txt_length (int): Length of the line.

        Returns:
             - int: Ugliness score.
        """
        if txt_length <= self.line_length:
            return (self.line_length - txt_length) ** 2
        else:
            return sys.maxsize

    def count_chars(self, fr, to):
        """
        Count the total number of characters in a range of words.

        Parameters:
             - fr (int): Start index.
             - to (int): End index.

        Returns:
             - int: Total number of characters.
        """
        total_chars = 0
        for i in range(fr, to):
            total_chars += len(self.txt[i])
            if i < to - 1:
                total_chars += 1  # Add one extra character for spaces between words
        return total_chars

    def format_txt(self, i):
        """
        Recursive method to calculate the total 'ugliness' score for formatting the text.

        Parameters:
             - i (int): Current index.

        Returns:
             - int: Total 'ugliness' score.
        """
        if i == len(self.txt):
            return 0  # Base case: end of the text
        score = sys.maxsize
        for x in range(i + 1, len(self.txt) + 1):
            line_len = self.count_chars(i, x)
            curr_score = self.ugly_score(line_len)
            curr_score += self.format_txt(x)
            score = min(curr_score, score)
        return score


# Example usage:
justify = TextjustifyRec("Isabel sat on the step".split(), 10)
print(justify.format_txt(0))

Code Walkthrough

Initialization

  • The TextjustifyRec class is initialized with a list of words (txt) and the specified line length (line_length).

ugly_score Method

  • The ugly_score method calculates the "ugliness" score for a given line length (txt_length).
  • If the line length is less than or equal to the specified line length, it calculates the square of the difference.
  • If the line length exceeds the specified length, it returns a large value (sys.maxsize) indicating that this configuration is not valid.

 count_chars Method

  • The count_chars method calculates the total number of characters in a range of words from index fr to to.
  • It accounts the length of each word and adds one extra character for spaces between words.

 format_txt Method

  • The format_txt method is the main recursive function that calculates the total "ugliness" score for formatting the text.
  • It iterates through possible line breaks, calculating the "ugliness" score for each configuration.
  • The minimum score among all configurations is selected.
  • The base case is when the end of the text is reached (i == len(self.txt)), and the score is set to 0.

    Example Usage

  •   An instance of TextjustifyRec is created with the text "Isabel sat on the step" and a line length of 10.
  •   The format_txt method is called with the starting index 0, and the result is printed.

Time Complexity

The time complexity of the provided recursive solution is exponential, specifically O(2^n), where n is the length of the input text.

This is because, for each word in the text, the algorithm explores two possibilities (include the word in the current line or start a new line).

The recursive nature leads to an exponential number of recursive calls, resulting in a time complexity that grows rapidly with the size of the input.

 

Space Complexity

The space complexity is determined by the maximum depth of the recursive call stack.

In the worst case, the depth of the recursive calls is equal to the length of the input text, leading to a space complexity of O(n),

where n is the length of the input text.

Each recursive call consumes space on the call stack.

 

Solution: Bottom-Up Dynamic Programming

import sys

class TextJustifyDyn:

    def __init__(self, txt, line_length):
        """
        Constructor initializes the class with a list of words and the specified line length.

        Parameters:
             - txt (list): List of words in the text.
             - line_length (int): The specified line length.
        """
        self.txt = txt
        self.line_length = line_length

    def ugly_score(self, txt_length):
        """
        Calculate the 'ugliness' score for a given line length.

        Parameters:
             - txt_length (int): Length of the line.

        Returns:
             - int: Ugliness score.
        """
        if txt_length <= self.line_length:
            return (self.line_length - txt_length) ** 2
        else:
            return sys.maxsize

    def count_chars(self, fr, to):
        """
        Count the total number of characters in a range of words.

        Parameters:
             - fr (int): Start index.
             - to (int): End index.

        Returns:
             - int: Total number of characters.
        """
        total_chars = 0
        for i in range(fr, to):
            total_chars += len(self.txt[i])
            if i < to - 1:
                total_chars += 1  # Add one extra character for spaces between words
        return total_chars

    def format_txt(self):
        """
        Dynamic programming method to format the text with minimum 'ugliness' score.

        Returns:
             - int: Total 'ugliness' score.
        """
        scores = [0] * (len(self.txt) + 1)
        ptrs = [0] * len(self.txt)

        for i in range(len(self.txt) - 1, -1, -1):
            score = sys.maxsize
            for j in range(i + 1, len(self.txt) + 1):
                curr_score = self.ugly_score(self.count_chars(i, j)) + scores[j]
                if curr_score < score:
                    score = curr_score
                    ptrs[i] = j
            scores[i] = score
        self.print_txt(ptrs)
        return scores[0]

    def print_txt(self, ptrs):
        """
        Print the formatted text based on the pointers.

        Parameters:
             - ptrs (list): List of pointers indicating line breaks.
        """
        i = 0
        while i < len(ptrs):
            line = self.txt[i:ptrs[i]]
            print(" ".join(line))
            i = ptrs[i]


justify = TextJustifyDyn("Isabel sat on the step".split(), 10)
print(justify.format_txt())

 

 format_txt Method

 The format_txt method is the dynamic programming approach to format the text with minimum "ugliness" score.

 It uses a bottom-up approach, calculating scores and storing pointers to optimal line breaks in reverse order.

The print_txt method is called to print the formatted text based on the obtained pointers.

 print_txt Method

The print_txt method prints the formatted text based on the pointers indicating line breaks.

Time Complexity

The time complexity is determined by the nested loops in the format_txt method.

In the worst case, it iterates over all possible combinations of indices, leading to a time complexity of O(n^2), where n is the length of the input text.

Space Complexity

The space complexity is mainly influenced by the two arrays, scores and ptrs, used to store intermediate results and pointers.

The lengths of these arrays are proportional to the length of the input text. Therefore, the space complexity is O(n), where n is the length of the input text.

 

Solution: Optimized Bottom-Up

import sys

class TextJustifyOpt:
    def __init__(self, txt, line_length):
        """
        Constructor initializes the class with a list of words and the specified line length.

        Parameters:
             - txt (list): List of words in the text.
             - line_length (int): The specified line length.
        """
        self.txt = txt
        self.line_length = line_length
        self.word_length = [[]] * len(txt)

        # Populate the word_length matrix
        for i in range(len(txt)):
            self.word_length[i] = [0] * len(txt)
            self.word_length[i][i] = len(txt[i])
            for j in range(i + 1, len(txt)):
                self.word_length[i][j] = self.word_length[i][j - 1] + 1 + len(txt[j])

        # Print the word_length matrix
        for arr in self.word_length:
            print(arr)

    def ugly_score(self, txt_length):
        """
        Calculate the 'ugliness' score for a given line length.

        Parameters:
             - txt_length (int): Length of the line.

        Returns:
             - int: Ugliness score.
        """
        if txt_length <= self.line_length:
            return (self.line_length - txt_length) ** 2
        else:
            return sys.maxsize

    def format_txt(self):
        """
        Dynamic programming method to format the text with minimum 'ugliness' score.

        Returns:
             - int: Total 'ugliness' score.
        """
        scores = [0] * (len(self.txt) + 1)
        ptrs = [0] * len(self.txt)

        for i in range(len(self.txt) - 1, -1, -1):
            score = sys.maxsize
            for j in range(i + 1, len(self.txt) + 1):
                curr_score = self.ugly_score(self.word_length[i][j - 1]) + scores[j]
                if curr_score < score:
                    score = curr_score
                    ptrs[i] = j
            scores[i] = score
        self.print_txt(ptrs)
        return scores[0]

    def print_txt(self, ptrs):
        """
        Print the formatted text based on the pointers.

        Parameters:
             - ptrs (list): List of pointers indicating line breaks.
        """
        i = 0
        while i < len(ptrs):
            line = self.txt[i:ptrs[i]]
            print(" ".join(line))
            i = ptrs[i]


# Example usage:
justify = TextJustifyOpt("Isabel sat on the step".split(), 10)
print(justify.format_txt())

Time Complexity

The initialization loop runs in O(n^2) time, where n is the length of the input text.

This is because it has two nested loops to calculate and populate the word_length matrix.

The dynamic programming loop(format_txt method) also has nested loops but only iterates over each element once.

It runs in O(n^2) time, similar to the initialization loop. As a result, the overall time complexity is O(n^2).


Space Complexity

The space complexity is influenced by the word_length matrix, which has dimensions len(txt) x len(txt). Therefore, its space complexity is O(n^2).

The space complexity for the scores and ptrs lists is O(n).

Other variables used in the class have constant space requirements.

Therefore, the overall space complexity is O(n^2).

 

 

Become a member
Get the latest news right in your inbox. We never spam!

Read next

Practical Applications of Derangements

Practical Applications of Derangements in Real-World Coding Derangements are a very common concept … Read More

Kibsoft 1 week, 3 days ago . 46 views