String - Top Interview Questions[EASY] (2/9) : Java

728x90

SMALL

Strings

String type of questions were asked in interviews frequently. You will most likely encounter one during your interviews.

We recommend:

Reverse String,
First Unique Character in a String,
String to Integer (atoi)
Implement strStr().

총 8문제.

344. Reverse String

투포인터는 이렇게 쓰는 것이다. 이런 느낌의 문제.

leetcode.com/problems/reverse-string/solution/

Easy

Write a function that reverses a string. The input string is given as an array of characters char[].

Do not allocate extra space for another array, you must do this by modifying the input array in-place with O(1) extra memory.

You may assume all the characters consist of printable ascii characters.

Example 1:

Input: ["h","e","l","l","o"] Output: ["o","l","l","e","h"]

Example 2:

Input: ["H","a","n","n","a","h"] Output: ["h","a","n","n","a","H"]

Speaking seriously, let's use this problem to discuss two things:

Does in-place mean constant space complexity?
Two pointers approach.

Approach 1: Recursion, In-Place, \mathcal{O}(N) Space

Does in-place mean constant space complexity?

No. By definition, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure.

The tricky part is that space is used by many actors, not only by data structures. The classical example is to use recursive function without any auxiliary data structures.

Is it in-place? Yes.

Is it constant space? No, because of recursion stack.

Algorithm

Here is an example. Let's implement recursive function helper which receives two pointers, left and right, as arguments.

Base case: if left >= right, do nothing.
Otherwise, swap s[left] and s[right] and call helper(left + 1, right - 1).

To solve the problem, call helper function passing the head and tail indexes as arguments: return helper(0, len(s) - 1).

Implementation

class Solution {
  public void helper(char[] s, int left, int right) {
    if (left >= right) return;
    char tmp = s[left];
    s[left++] = s[right];
    s[right--] = tmp;
    helper(s, left, right);
  }

  public void reverseString(char[] s) {
    helper(s, 0, s.length - 1);
  }
}

Complexity Analysis

Time complexity : \mathcal{O}(N) time to perform N/2 swaps.
Space complexity : \mathcal{O}(N) to keep the recursion stack.

Approach 2: Two Pointers, Iteration, \mathcal{O}(1) Space

Two Pointers Approach

In this approach, two pointers are used to process two array elements at the same time. Usual implementation is to set one pointer in the beginning and one at the end and then to move them until they both meet.

Sometimes one needs to generalize this approach in order to use three pointers, like for classical Sort Colors problem.

Algorithm

Set pointer left at index 0, and pointer right at index n - 1, where n is a number of elements in the array.
While left < right:
- Swap s[left] and s[right].
- Move left pointer one step right, and right pointer one step left.

Implementation

class Solution {
    public void reverseString(char[] s) {
        int left = 0, right = s.length - 1;
        while (left < right) {
            char tmp = s[left];
            s[left++] = s[right];
            s[right--] = tmp;
        }
    }
}

Complexity Analysis

Time complexity : \mathcal{O}(N) to swap N/2 element.
Space complexity : \mathcal{O}(1), it's a constant space solution.

7. Reverse Integer

leetcode.com/problems/reverse-integer/solution/

Given a 32-bit signed integer, reverse digits of an integer.

Example 1:

Input: 123 Output: 321

Example 2:

Input: -123 Output: -321

Example 3:

Input: 120 Output: 21

Note:
Assume we are dealing with an environment which could only store integers within the 32-bit signed integer range: [−231, 231 − 1]. For the purpose of this problem, assume that your function returns 0 when the reversed integer overflows.

Solution

Approach 1: Pop and Push Digits & Check before Overflow

Intuition

We can build up the reverse integer one digit at a time. While doing so, we can check beforehand whether or not appending another digit would cause overflow.

Algorithm

Reversing an integer can be done similarly to reversing a string.

We want to repeatedly "pop" the last digit off of x and "push" it to the back of the \text{rev}. In the end, \text{rev} will be the reverse of the x.

To "pop" and "push" digits without the help of some auxiliary stack/array, we can use math.

//pop operation: pop = x % 10; x /= 10; //push operation: temp = rev * 10 + pop; rev = temp;

However, this approach is dangerous, because the statement \text{temp} = \text{rev} \cdot 10 + \text{pop} can cause overflow.

Luckily, it is easy to check beforehand whether or this statement would cause an overflow.

To explain, lets assume that \text{rev} is positive.

If temp = \text{rev} \cdot 10 + \text{pop} causes overflow, then it must be that \text{rev} \geq \frac{INTMAX}{10}
If \text{rev} > \frac{INTMAX}{10}, then temp = \text{rev} \cdot 10 + \text{pop} is guaranteed to overflow.
If \text{rev} == \frac{INTMAX}{10}, then temp = \text{rev} \cdot 10 + \text{pop} will overflow if and only if \text{pop} > 7

Similar logic can be applied when \text{rev} is negative.

Complexity Analysis

Time Complexity: O(\log(x)). There are roughly \log_{10}(x) digits in x.
Space Complexity: O(1).

접근 방식 1 : 숫자 팝 및 푸시 및 오버플로 전에 확인

직관

한 번에 한 자리 씩 역 정수를 만들 수 있습니다. 이렇게하는 동안 다른 숫자를 추가하면 오버플로가 발생하는지 미리 확인할 수 있습니다.

연산

정수 반전은 문자열 반전과 유사하게 수행 할 수 있습니다.

xx의 마지막 숫자를 반복해서 "팝"하고 \ text {rev} rev의 뒤쪽으로 "밀어 넣고"싶습니다. 결국 \ text {rev} rev는 xx의 반대가됩니다.

보조 스택 / 배열의 도움없이 숫자를 "팝"및 "푸시"하려면 수학을 사용할 수 있습니다.

// 팝 연산 : pop = x % 10; x / = 10; // 푸시 작업 : temp = rev * 10 + pop; rev = 임시;

그러나이 방법은 위험합니다. \ text {temp} = \ text {rev} \ cdot 10 + \ text {pop} temp = rev⋅10 + pop 문이 오버플로를 일으킬 수 있기 때문입니다.

운 좋게도이 문이 오버플로를 유발하는지 여부를 미리 확인하는 것은 쉽습니다.

설명하기 위해 \ text {rev} rev가 양수라고 가정합니다.

temp = \ text {rev} \ cdot 10 + \ text {pop} temp = rev⋅10 + pop으로 인해 오버플로가 발생하면 \ text {rev} \ geq \ frac {INTMAX} {10} rev≥10INTMAX 여야합니다.
\ text {rev}> \ frac {INTMAX} {10} rev> 10INTMAX이면 temp = \ text {rev} \ cdot 10 + \ text {pop} temp = rev⋅10 + pop은 오버플로가 보장됩니다.
\ text {rev} == \ frac {INTMAX} {10} rev == 10INTMAX이면 temp = \ text {rev} \ cdot 10 + \ text {pop} temp = rev⋅10 + pop은 다음과 같은 경우 오버플로됩니다. \ text {pop}> 7pop> 7 인 경우에만
\ text {rev} rev가 음수 일 때도 유사한 논리를 적용 할 수 있습니다.

복잡성 분석

시간 복잡성 : O (\ log (x)) O (log (x)). xx에는 대략 \ log_ {10} (x) log10 (x) 자리가 있습니다.
공간 복잡성 : O (1) O (1).
댓글 : 679

class Solution {
    public int reverse(int x) {
      int rev = 0;
      while(x != 0){
        int pop = x % 10;
        x /= 10;
        // this is to mitigate the largest, and smallest numbers in java.
        if(rev > Integer.MAX_VALUE/10 || (rev == Integer.MAX_VALUE / 10 && pop > 7)) return 0;
        if(rev < Integer.MIN_VALUE/10 || (rev == Integer.MIN_VALUE / 10 && pop < -8)) return 0;
        rev = (rev * 10) + pop;
      }
      return rev;
    }
}

mitigate

mitigate [ˈmɪtɪɡeɪt]

동사

1. [격식]완화[경감]시키다alleviate

action to mitigate poverty

빈곤 완화 조치

387. First Unique Character in a String

해쉬는 이렇게 쓰는 것이다. 이런 느낌의 문제.

leetcode.com/problems/first-unique-character-in-a-string/

Solution

Approach 1: Linear time solution

The best possible solution here could be of a linear time because to ensure that the character is unique you have to check the whole string anyway.

The idea is to go through the string and save in a hash map the number of times each character appears in the string. That would take \mathcal{O}(N) time, where N is a number of characters in the string.

And then we go through the string the second time, this time we use the hash map as a reference to check if a character is unique or not.
If the character is unique, one could just return its index. The complexity of the second iteration is \mathcal{O}(N) as well.

8 / 15

class Solution {
    public int firstUniqChar(String s) {
        HashMap<Character, Integer> count = new HashMap<Character, Integer>();
        int n = s.length();
        // build hash map : character and how often it appears
        for (int i = 0; i < n; i++) {
            char c = s.charAt(i);
            count.put(c, count.getOrDefault(c, 0) + 1);
        }
        
        // find the index
        for (int i = 0; i < n; i++) {
            if (count.get(s.charAt(i)) == 1) 
                return i;
        }
        return -1;
    }
}

Complexity Analysis

Time complexity : \mathcal{O}(N) since we go through the string of length N two times.
Space complexity : \mathcal{O}(1) because English alphabet contains 26 letters.

242. Valid Anagram

알파벳 공간만큼만의 해쉬가 필요한경우 char [26] 을 사용.

leetcode.com/problems/valid-anagram/

Solution

Approach #1 (Sorting) [Accepted]

Algorithm

An anagram is produced by rearranging the letters of s into t. Therefore, if t is an anagram of s, sorting both strings will result in two identical strings. Furthermore, if s and t have different lengths, t must not be an anagram of s and we can return early.

public boolean isAnagram(String s, String t) { if (s.length() != t.length()) { return false; } char[] str1 = s.toCharArray(); char[] str2 = t.toCharArray(); Arrays.sort(str1); Arrays.sort(str2); return Arrays.equals(str1, str2); }

Complexity analysis

Time complexity : O(n \log n). Assume that n is the length of s, sorting costs O(n \log n) and comparing two strings costs O(n). Sorting time dominates and the overall time complexity is O(n \log n).
Space complexity : O(1). Space depends on the sorting implementation which, usually, costs O(1) auxiliary space if heapsort is used. Note that in Java, toCharArray() makes a copy of the string so it costs O(n) extra space, but we ignore this for complexity analysis because:
- It is a language dependent detail.
- It depends on how the function is designed. For example, the function parameter types can be changed to char[].

Approach #2 (Hash Table) [Accepted]

Algorithm

To examine if t is a rearrangement of s, we can count occurrences of each letter in the two strings and compare them. Since both s and t contain only letters from a-z, a simple counter table of size 26 is suffice.

Do we need two counter tables for comparison? Actually no, because we could increment the counter for each letter in s and decrement the counter for each letter in t, then check if the counter reaches back to zero.

public boolean isAnagram(String s, String t) { if (s.length() != t.length()) { return false; } int[] counter = new int[26]; for (int i = 0; i < s.length(); i++) { counter[s.charAt(i) - 'a']++; counter[t.charAt(i) - 'a']--; } for (int count : counter) { if (count != 0) { return false; } } return true; }

Or we could first increment the counter for s, then decrement the counter for t. If at any point the counter drops below zero, we know that t contains an extra letter not in s and return false immediately.

public boolean isAnagram(String s, String t) { if (s.length() != t.length()) { return false; } int[] table = new int[26]; for (int i = 0; i < s.length(); i++) { table[s.charAt(i) - 'a']++; } for (int i = 0; i < t.length(); i++) { table[t.charAt(i) - 'a']--; if (table[t.charAt(i) - 'a'] < 0) { return false; } } return true; }

class Solution {

public boolean isAnagram(String s, String t) {
    if (s.length() != t.length()) {
        return false;
    }
    int[] table = new int[26];
    for (int i = 0; i < s.length(); i++) {
        table[s.charAt(i) - 'a']++;
    }
    for (int i = 0; i < t.length(); i++) {
        table[t.charAt(i) - 'a']--;
        if (table[t.charAt(i) - 'a'] < 0) {
            return false;
        }
    }
    return true;
}

}

Complexity analysis

Time complexity : O(n). Time complexity is O(n) because accessing the counter table is a constant time operation.
Space complexity : O(1). Although we do use extra space, the space complexity is O(1) because the table's size stays constant no matter how large n is.

Follow up

What if the inputs contain unicode characters? How would you adapt your solution to such case?

Answer

Use a hash table instead of a fixed size counter. Imagine allocating a large size array to fit the entire range of unicode characters, which could go up to more than 1 million. A hash table is a more generic solution and could adapt to any range of characters.

125. Valid Palindrome

투포인터를 알고나면 투포인터가 가장쉽다. 어프로치2

leetcode.com/problems/valid-palindrome/

Solution

Approach 1: Compare with Reverse

Intuition

A palindrome is a word, phrase, or sequence that reads the same backwards as forwards. e.g. madam

A palindrome, and its reverse, are identical to each other.

Algorithm

We'll reverse the given string and compare it with the original. If those are equivalent, it's a palindrome.

Since only alphanumeric characters are considered, we'll filter out all other types of characters before we apply our algorithm.

Additionally, because we're treating letters as case-insensitive, we'll convert the remaining letters to lower case. The digits will be left the same.

Complexity Analysis

Time complexity : O(n), in length n of the string.

We need to iterate thrice through the string:
1. When we filter out non-alphanumeric characters, and convert the remaining characters to lower-case.
2. When we reverse the string.
3. When we compare the original and the reversed strings.
Each iteration runs linear in time (since each character operation completes in constant time). Thus, the effective run-time complexity is linear.
Space complexity : O(n), in length n of the string. We need O(n) additional space to stored the filtered string and the reversed string.

Approach 2: Two Pointers

Intuition

If you take any ordinary string, and concatenate its reverse to it, you'll get a palindrome. This leads to an interesting insight about the converse: every palindrome half is reverse of the other half.

Simply speaking, if one were to start in the middle of a palindrome, and traverse outwards, they'd encounter the same characters, in the exact same order, in both halves!

1 / 6

Algorithm

Since the input string contains characters that we need to ignore in our palindromic check, it becomes tedious to figure out the real middle point of our palindromic input.

Instead of going outwards from the middle, we could just go inwards towards the middle!

So, if we start traversing inwards, from both ends of the input string, we can expect to see the same characters, in the same order.

The resulting algorithm is simple:

Set two pointers, one at each end of the input string
If the input is palindromic, both the pointers should point to equivalent characters, at all times. [1]
- If this condition is not met at any point of time, we break and return early. [2]
We can simply ignore non-alphanumeric characters by continuing to traverse further.
Continue traversing inwards until the pointers meet in the middle.

class Solution {
  public boolean isPalindrome(String s) {
    for (int i = 0, j = s.length() - 1; i < j; i++, j--) {
      while (i < j && !Character.isLetterOrDigit(s.charAt(i))) {
        i++;
      }
      while (i < j && !Character.isLetterOrDigit(s.charAt(j))) {
        j--;
      }

      if (i < j && Character.toLowerCase(s.charAt(i)) != Character.toLowerCase(s.charAt(j)))
        return false;
    }

    return true;
  }
}

Complexity Analysis

Time complexity : O(n), in length n of the string. We traverse over each character at-most once, until the two pointers meet in the middle, or when we break and return early.
Space complexity : O(1). No extra space required, at all.

Such a property is formally known as a loop invariant. ↩︎
Such a property is often called a loop termination condition. It is one of several used in this solution. Can you identify the others? ↩︎

8. String to Integer (atoi)

leetcode.com/problems/string-to-integer-atoi/

Solution

Approach 1: Scan from left to right

Intuition

The problem looks simple at first glance. However, there are a few edge conditions that we must handle. The basic idea is to scan the string from left to right and build the result for numerical characters. In this process, we must carefully handle the integer overflow and underflow conditions.

Algorithm

We have to build numerical value by repeatedly shifting the result to left by one digit and adding the next number at the unit place. Since numerical value is a decimal number represented as base 10 in the number system, each digit can be expressed as multiples of powers of 10.

Example - "142" can be represented as 1 * (10^2) + 4 * (10^1) + 2 * (10^0)

1 is at the hundredth place and left-shifted twice, 4 at tens place and shifted once, and so on.

This can be formulated as :

result * 10 + (str[i] - '0')

However, this could lead to integer overflow or underflow conditions. Since integer must be within the 32-bit signed integer range.

Handling overflow and underflow conditions

If the integer is positive, for 32 bit signed integer, \text{INT\_MAX} is 2147483647 (2^{31}-1) To avoid integer overflow, we must ensure that it doesn't exceed this value. This condition needs to be handled when the result is greater than or equal to \frac{\text{INT\_MAX}}{10} (214748364)

Case 1). If \text{result} = \frac{\text{INT\_MAX}}{10}, it would result in integer overflow if next integer character is greater than 7. (7 in this case is last digit of \text{INT\_MAX} (2147483647)). We can use \text{INT\_MAX} \% 10 to generically represent the last digit.
Case 2). If \text{result} > \frac{\text{INT\_MAX}}{10}, we are sure that adding next number would result in integer overflow.

This holds for negative integer as well. In the 32-bit integer, \text{INT\_MIN} value is -2147483648 (-2^{31}). As the last digit is greater than 7 (\text{INT\_MAX} \% 10), integer underflow can also be handled using the above approach.

We must return \text{INT\_MAX} in case of integer overflow and \text{INT\_MIN} in case of integer underflow.

Also, it must be noted that in some languages like Python, an integer value is not restricted by the number of bits. Handling of overflow and underflow won't be required in such cases. We could simply check if the value of an integer is out of specified range [{−2}^{31}, {2}^{31} − 1]

Steps

The algorithm can be summarised as follows

Discard all the whitespaces at the beginning of the string.
There could be an optional sign of a numerical value +/-. It should be noted that the integer is positive by default if there is no sign present and there could be at most one sign character.
Build the result using the above algorithm until there exists a non-whitespace character that is a number (0 to 9). Simultaneously, check for overflow/underflow conditions at each step.

Note: If there exists any non-integer character at step 3, we return 0 by default since it is not a valid integer. Also, we have to just discard all the characters after the first numerical value.

class Solution {
    public int myAtoi(String str) {
        int i = 0;
        int sign = 1;
        int result = 0;
        if (str.length() == 0) return 0;

        //Discard whitespaces in the beginning
        while (i < str.length() && str.charAt(i) == ' ')
            i++;

        // Check if optional sign if it exists
        if (i < str.length() && (str.charAt(i) == '+' || str.charAt(i) == '-'))
            sign = (str.charAt(i++) == '-') ? -1 : 1;

        // Build the result and check for overflow/underflow condition
        while (i < str.length() && str.charAt(i) >= '0' && str.charAt(i) <= '9') {
            if (result > Integer.MAX_VALUE / 10 ||
                    (result == Integer.MAX_VALUE / 10 && str.charAt(i) - '0' > Integer.MAX_VALUE % 10)) {
                return (sign == 1) ? Integer.MAX_VALUE : Integer.MIN_VALUE;
            }
            result = result * 10 + (str.charAt(i++) - '0');
        }
        return result * sign;

    }
}

Complexity Analysis

Time Complexity: \mathcal{O}(N). Here, N is the length of string str. We perform only one iteration over string str.
Space Complexity: \mathcal{O}(1) We use constant extra space for storing sign of the result.

28. Implement strStr()

라빈카프를 꼭 알고 넘어가도록 하자.

leetcode.com/problems/implement-strstr/

Implement strStr() - LeetCode

Level up your coding skills and quickly land a job. This is the best place to expand your knowledge and get prepared for your next interview.

leetcode.com

Easy

Implement strStr().

Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

Input: haystack = "hello", needle = "ll" Output: 2

Example 2:

Input: haystack = "aaaaa", needle = "bba" Output: -1

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's strstr() and Java's indexOf().

Constraints:

haystack and needle consist only of lowercase English characters.

Solution

Overview

The problem is to find needle of length L in the haystack of length N.

Let's discuss three different ideas how to proceed. They are all based on sliding window idea. The key point is how to implement a window slice.

Linear time window slice \mathcal{O}(L) is quite easy, move the window of length L along the haystack and compare substring in the window with the needle. Overall that would result in \mathcal{O}((N - L)L) time complexity.

Could that be improved? Yes. Two pointers approach is still the case of linear time slice, though the comparison happens not for all substrings, and that improves the best time complexity up to \mathcal{O}(N). The worst time complexity is still \mathcal{O}((N - L)L) though.

Could that be improved to \mathcal{O}(N)? Yes, but one has to implement constant time slice \mathcal{O}(1). There are two ways to do it:

Rabin-Karp = constant-time slice using rolling hash algorithm.
Bit manipulation = constant-time slice using bitmasks.

Bit Manipulation approach in Java is more suitable for the short strings or strings with very limited number of characters, ex. Repeated DNA Sequences. That's a consequence of overflow issues in Java (in Python there is no such a problem). Here we deal with quite long strings and it's more simple to implement the basic version of Rabin Karp algorithm.

Approach 1: Substring: Linear Time Slice

Quite straightforward approach - move sliding window along the string and compare substring in the window with the needle.

Implementation

Complexity Analysis

Time complexity: \mathcal{O}((N - L)L), where N is a length of haystack and L is a length of needle. We compute a substring of length L in a loop, which is executed (N - L) times.
Space complexity: \mathcal{O}(1).

Approach 2: Two Pointers: Linear Time Slice

Drawback of the previous algorithm is that one compares absolutely all substrings of length L with the needle. There is no need to that.

First, let's compare only substrings which starts from the first character in the needle substring.

Second, let's compare the characters one by one and stop immediately in the case of mismatch.

Here it was impossible to manage the full match up to the length of needle string, which is L = 5. Let's backtrack then. Note, that we move pn pointer back to the position pn = pn - curr_len + 1, and not to the position pn = pn - curr_len, since this last one was already investigated.

Let's try again. Here we've managed to get the full match during the second attempt, so let's return the start position of that match, pn - L.

Algorithm

Move pn till you'll find the first character of the needle string in the haystack.
Compute the max string match by increasing pn, pL and curr_len in the case of equal characters.
If you managed to get the full match, curr_len == L, return the start position of that match, pn - L.
If you didn't, backtrack: pn = pn - curr_len + 1, pL = 0, curr_len = 0.

Implementation

class Solution {
  public int strStr(String haystack, String needle) {
    int L = needle.length(), n = haystack.length();
    if (L == 0) return 0;

    int pn = 0;
    while (pn < n - L + 1) {
      // find the position of the first needle character
      // in the haystack string
      while (pn < n - L + 1 && haystack.charAt(pn) != needle.charAt(0)) ++pn;

      // compute the max match string
      int currLen = 0, pL = 0;
      while (pL < L && pn < n && haystack.charAt(pn) == needle.charAt(pL)) {
        ++pn;
        ++pL;
        ++currLen;
      }

      // if the whole needle string is found,
      // return its start position
      if (currLen == L) return pn - L;

      // otherwise, backtrack
      pn = pn - currLen + 1;
    }
    return -1;
  }
}

Complexity Analysis

Time complexity: \mathcal{O}((N - L)L) in the worst case of numerous almost complete false matches, and \mathcal{O}(N) in the best case of one single match.
Space complexity: \mathcal{O}(1).

Approach 3: Rabin Karp: Constant Time Slice

Let's now design the algorithm with \mathcal{O}(N) time complexity even in the worst case. The idea is simple: move along the string, generate hash of substring in the sliding window and compare it with the reference hash of the needle string.

There are two technical problems:

How to implement a string slice in a constant time?
How to generate substring hash in a constant time?

String slice in a constant time

Strings are immutable in Java and Python, and to move sliding window in a constant time one has to convert string to another data structure, for example, to integer array of ascii-values.

Rolling hash: hash generation in a constant time

To generate hash of array of length L, one needs \mathcal{O}(L) time.

How to have constant time of hash generation? Use the advantage of slice: only one integer in, and only one - out.

That's the idea of rolling hash. Here we'll implement the simplest one, polynomial rolling hash. Beware that's polynomial rolling hash is NOT the Rabin fingerprint.

Since one deals here with lowercase English letters, all values in the integer array are between 0 and 25 : arr[i] = (int)S.charAt(i) - (int)'a'.
So one could consider string abcd -> [0, 1, 2, 3] as a number in a numeral system with the base 26. Hence abcd -> [0, 1, 2, 3] could be hashed as

h_0 = 0 \times 26^3 + 1 \times 26^2 + 2 \times 26^1 + 3 \times 26^0

Let's write the same formula in a generalised way, where c_i is an integer array element and a = 26 is a system base.

h_0 = c_0 a^{L - 1} + c_1 a^{L - 2} + ... + c_i a^{L - 1 - i} + ... + c_{L - 1} a^1 + c_L a^0

h_0 = \sum_{i = 0}^{L - 1}{c_i a^{L - 1 - i}}

Now let's consider the slice abcd -> bcde. For int arrays that means [0, 1, 2, 3] -> [1, 2, 3, 4], to remove number 0 and to add number 4.

h_1 = (h_0 - 0 \times 26^3) \times 26 + 4 \times 26^0

In a generalised way

h_1 = (h_0 a - c_0 a^L) + c_{L + 1}

Now hash regeneration is perfect and fits in a constant time. There is one more issue to address: possible overflow problem.

How to avoid overflow

a^L could be a large number and hence the idea is to set limits to avoid the overflow. To set limits means to limit a hash by a given number called modulus and use everywhere not hash itself but h % modulus.

It's quite obvious that modulus should be large enough, but how large? Here one could read more about the topic, for the problem here 2^{31} is enough.

Algorithm

Compute the hash of substring haystack.substring(0, L) and reference hash of needle.substring(0, L).
Iterate over the start position of possible match: from 1 to N - L.
- Compute rolling hash based on the previous hash value.
- Return start position if the hash is equal to the reference one.
Return -1, that means that needle is not found.

Implementation

class Solution {
  // function to convert character to integer
  public int charToInt(int idx, String s) {
    return (int)s.charAt(idx) - (int)'a';
  }

  public int strStr(String haystack, String needle) {
    int L = needle.length(), n = haystack.length();
    if (L > n) return -1;

    // base value for the rolling hash function
    int a = 26;
    // modulus value for the rolling hash function to avoid overflow
    long modulus = (long)Math.pow(2, 31);

    // compute the hash of strings haystack[:L], needle[:L]
    long h = 0, ref_h = 0;
    for (int i = 0; i < L; ++i) {
      h = (h * a + charToInt(i, haystack)) % modulus;
      ref_h = (ref_h * a + charToInt(i, needle)) % modulus;
    }
    if (h == ref_h) return 0;

    // const value to be used often : a**L % modulus
    long aL = 1;
    for (int i = 1; i <= L; ++i) aL = (aL * a) % modulus;

    for (int start = 1; start < n - L + 1; ++start) {
      // compute rolling hash in O(1) time
      h = (h * a - charToInt(start - 1, haystack) * aL
              + charToInt(start + L - 1, haystack)) % modulus;
      if (h == ref_h) return start;
    }
    return -1;
  }
}

Complexity Analysis

Time complexity: \mathcal{O}(N), one computes the reference hash of the needle string in \mathcal{O}(L) time, and then runs a loop of (N - L) steps with constant time operations in it.
Space complexity: \mathcal{O}(1).

라빈카프 알고리즘: inner-game.tistory.com/380

어프로치2에서 포인터 각 위치의 의미 생각해보기

38. Count and Say

leetcode.com/problems/count-and-say/

broute-force로 짰기 때문에 다시 공부해야함.

Solution

Overview

Before we proceed to the solutions, in this section, we would like to rephrase the problem.

Given two adjacent sequences of number, [S_{n}, S_{n+1}], there exists a pattern that one can produce the sequence S_{n+1} from its previous sequence S_{n}.

More specifically, one can consider the sequence S_{n+1} as a sort of summary to its previous sequence S_{n}, i.e. S_{n+1} contains a list of pairs as |\text{count}, \text{digit}| which encodes all the information about its previous sequence S_{n}.

Let us take the sequence S_{4} = 1211 as an example, from left to right, we then can divide the sequence into three sub-groups where each sub-group contains a list of identical and adjacent digit, i.e. S_{4} = \{1\}\{2\}\{11\} , as shown in the following:

We then count the number of digits in each sub-group, and then output the summary in the format of |\text{count}, \text{digit}|. As the end, we would obtain the exact sequence of S_5.

With the generated sequence S_5, we then recursively apply the above rule to generate the next sequence.

Now that the description of the problem is clear, one might dismiss it as yet another strange and artificial problem to solve. Well, it is not true in this case.

Actually, we could consider this problem as a naive compression algorithm for a sequence of numbers. Instead of storing repetitive adjacent digits as they are, we could summarize them a bit with the method presented in the problem, which could save us some space as long as there are indeed repetitive occurring patterns.

Approach 1: Sliding Window

Intuition

Now that the problem has been clarified, the solution should be intuitive.

Following the rule as we described above, in order to generate the next sequence, we could scan the current sequence with a sort of sliding window which would hold the identical and adjacent digits. With the sliding window, we would divide the original sequence into a list of sub-sequences. We then count the number of digits within each sub-sequence and output the summary as pairs of |\text{count}, \text{digit}|.

Algorithm

Here we define a function nextSequence() to generate a following sequence from a previous sequence, and we recursively call this function to get the desired sequence that is located at a specific index.

Within the function, we scan the sequence with two contextual variables: prevDigit and digitCnt which refers to respectively the digit that we are expecting in the sub-sequence and the number of occurrence of the digit in the sub-sequence.
At the end of each sub-sequence, we append the summary to the result and then we reset the above two contextual variables for the next sub-sequence.
Note that, we use an artificial delimiter in the sequence to facilitate the iteration.

class Solution {
  public String countAndSay(int n) {

    LinkedList<Integer> prevSeq = new LinkedList<Integer>();
    prevSeq.add(1);
    // Using -1 as the delimiter
    prevSeq.add(-1);

    List<Integer> finalSeq = this.nextSequence(n, prevSeq);
    StringBuffer seqStr = new StringBuffer();
    for (Integer digit : finalSeq) {
      seqStr.append(String.valueOf(digit));
    }
    return seqStr.toString();
  }

  protected LinkedList<Integer> nextSequence(int n, LinkedList<Integer> prevSeq) {
    if (n <= 1) {
      // remove the delimiter before return
      prevSeq.pollLast();
      return prevSeq;
    }

    LinkedList<Integer> nextSeq = new LinkedList<Integer>();
    Integer prevDigit = null;
    Integer digitCnt = 0;
    for (Integer digit : prevSeq) {
      if (prevDigit == null) {
        prevDigit = digit;
        digitCnt += 1;
      } else if (digit == prevDigit) {
        // in the middle of the sub-sequence
        digitCnt += 1;
      } else {
        // end of a sub-sequence
        nextSeq.add(digitCnt);
        nextSeq.add(prevDigit);
        // reset for the next sub-sequence
        prevDigit = digit;
        digitCnt = 1;
      }
    }

    // add the delimiter for the next recursion
    nextSeq.add(-1);
    return this.nextSequence(n - 1, nextSeq);
  }
}

Complexity

Time Complexity: \mathcal{O}(2^n) where n is the index of the desired sequence.
- First of all, we would invoke the function nextSequence() n-1 times to get the desired sequence.
- For each invocation of the function, we would scan the current sequence whose length is difficult to determine though.
- Let us image in the worst scenario where any two adjacent digit in the sequence are not identical, then its next sequence would double the length, rather than having a reduced length. As a result, we could assume that in the worst case, the length of the sequence would grow exponentially.
- As a result, the overall time complexity of the algorithm would be \mathcal{O}(\sum_{i=0}^{n-1}{2^i}) = \mathcal{O}(2^n).
Space Complexity: \mathcal{O}(2^{n-1}).
- Within each invocation of the nextSequence() function, we are using a container to keep the result of the next sequence. The memory consumption of the container is proportional to the length of the sequence that the function needs to process, i.e 2^{n-1}.
- Though we were applying the recursion function, which typically incurs some additional memory consumption in call stack. In our case though, the recursion is implemented in the form of tail recursion, and we assume that the compiler could optimize its execution which would not incur additional memory consumption.
- One could also easily replace the recursion with the iteration in this case.
- As a result, the overall space complexity of the algorithm would be dominated by the space needed to hold the final sequence, i.e. \mathcal{O}({2^{n-1}}).

Approach 2: Regular Expression

Intuition

This problem could be a good exercise to apply pattern matching, where in our case we need to find out all those repetitive groups of digits.

A regular expression (a.k.a regex) is a sequence of characters that defines a search pattern. The regex serves as a common tool for many pattern matching problems. And many programming languages provides regex capacities either with build-in constructs or via some libraries.

Note that, although the syntax of regex is mostly universal across all programming languages, there might exist some subtile differences. Here we show two examples in Java and Python respectively.

Java: regex = "(.)\1*"

We could break down the above regex expression into three 3 parts:

"(.)": it defines a group that contains a single character that could be of anything.
"\1": this part refers to the defined group with the index of 1.
"*": this qualifier followed by the group reference \1, indicates that we would like to see the group repeats itself zero or more times.

Python: regex = "((.)*)"

Slightly different than the above regex in Java, here we define two groups instead of one.

"(.)": again, this is a group that contains a single character that could be of anything.
"": this part refers to the second group (i.e. (.)) that we define.
"((.)*)": the outer bracket defines the scope of the first group, which contains the repetitive appearance of the second group above.

Algorithm

Based on the above definitions of regex, we then find all the matches to the regex and then concatenate the results together.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Solution {
  public String countAndSay(int n) {
    String currSeq = "1";

    // Pattern to match the repetitive digits
    String regexPattern = "(.)\\1*";
    Pattern pattern = Pattern.compile(regexPattern);

    for (int i = 1; i < n; ++i) {
      Matcher m = pattern.matcher(currSeq);
      StringBuffer nextSeq = new StringBuffer();

      // each group contains identical and adjacent digits
      while (m.find()) {
        nextSeq.append(m.group().length() + String.valueOf(m.group().charAt(0)));
      }
      // prepare for the next iteration
      currSeq = nextSeq.toString();
    }

    return currSeq;
  }
}

Complexity

Time Complexity: \mathcal{O}(2^n) where n is the index of the desired sequence.
- Similar with our sliding window approach, the overall algorithm consists of a nested loop.
- We could assume that the time complexity of the regex matching is linear to the length of the input string.
- As a result, the overall time complexity of the algorithm would be \mathcal{O}(\sum_{i=0}^{n-1}{2^i}) = \mathcal{O}(2^n).
Space Complexity: \mathcal{O}(2^{n-1}).
- Within the function, we are using a container to keep the result of the next sequence. The memory consumption of the container is proportional to the length of the sequence that the function needs to process, i.e 2^{n-1}.
- As a result, the space complexity of the algorithm would be dominated by the space needed to hold the final sequence, i.e. \mathcal{O}({2^{n-1}}).

14. Longest Common Prefix

leetcode.com/problems/longest-common-prefix/

가장 쉬운 1번 어프로치부터 마지막 트라이까지해서 별로 나아지는게 없는데 어떤 답이 가장 좋은건지..

머리가 맑을때 다시한번 들여다 봐야겠다.

Solution

Approach 1: Horizontal scanning

Intuition

For a start we will describe a simple way of finding the longest prefix shared by a set of strings LCP(S_1 \ldots S_n). We will use the observation that :

LCP(S_1 \ldots S_n) = LCP(LCP(LCP(S_1, S_2),S_3),\ldots S_n)

Algorithm

To employ this idea, the algorithm iterates through the strings [S_1 \ldots S_n], finding at each iteration i the longest common prefix of strings LCP(S_1 \ldots S_i) When LCP(S_1 \ldots S_i) is an empty string, the algorithm ends. Otherwise after n iterations, the algorithm returns LCP(S_1 \ldots S_n).

Figure 1. Finding the longest common prefix (Horizontal scanning)

 public String longestCommonPrefix(String[] strs) {
    if (strs.length == 0) return "";
    String prefix = strs[0];
    for (int i = 1; i < strs.length; i++)
        while (strs[i].indexOf(prefix) != 0) {
            prefix = prefix.substring(0, prefix.length() - 1);
            if (prefix.isEmpty()) return "";
        }        
    return prefix;
}

Complexity Analysis

Time complexity : , where S is the sum of all characters in all strings.

In the worst case all n strings are the same. The algorithm compares the string S1 with the other strings [S_2 \ldots S_n] There are S character comparisons, where S is the sum of all characters in the input array.
Space complexity : O(1). We only used constant extra space.

Approach 2: Vertical scanning

Algorithm

Imagine a very short string is at the end of the array. The above approach will still do S comparisons. One way to optimize this case is to do vertical scanning. We compare characters from top to bottom on the same column (same character index of the strings) before moving on to the next column.

public String longestCommonPrefix(String[] strs) {
    if (strs == null || strs.length == 0) return "";
    for (int i = 0; i < strs[0].length() ; i++){
        char c = strs[0].charAt(i);
        for (int j = 1; j < strs.length; j ++) {
            if (i == strs[j].length() || strs[j].charAt(i) != c)
                return strs[0].substring(0, i);             
        }
    }
    return strs[0];
}

Complexity Analysis

Time complexity : O(S) , where S is the sum of all characters in all strings. In the worst case there will be n equal strings with length m and the algorithm performs S = m \cdot n character comparisons. Even though the worst case is still the same as Approach 1, in the best case there are at most n \cdot minLen comparisons where minLen is the length of the shortest string in the array.
Space complexity : O(1). We only used constant extra space.

Approach 3: Divide and conquer

Intuition

The idea of the algorithm comes from the associative property of LCP operation. We notice that : LCP(S_1 \ldots S_n) = LCP(LCP(S_1 \ldots S_k), LCP (S_{k+1} \ldots S_n)) , where LCP(S_1 \ldots S_n) is the longest common prefix in set of strings [S_1 \ldots S_n] , 1 < k < n

Algorithm

To apply the observation above, we use divide and conquer technique, where we split the LCP(S_i \ldots S_j) problem into two subproblems LCP(S_i \ldots S_{mid}) and LCP(S_{mid+1} \ldots S_j), where mid is \frac{i + j}{2}. We use their solutions lcpLeft and lcpRight to construct the solution of the main problem LCP(S_i \ldots S_j). To accomplish this we compare one by one the characters of lcpLeft and lcpRight till there is no character match. The found common prefix of lcpLeft and lcpRight is the solution of the LCP(S_i \ldots S_j).

Figure 2. Finding the longest common prefix of strings using divide and conquer technique

public String longestCommonPrefix(String[] strs) {
    if (strs == null || strs.length == 0) return "";    
        return longestCommonPrefix(strs, 0 , strs.length - 1);
}

private String longestCommonPrefix(String[] strs, int l, int r) {
    if (l == r) {
        return strs[l];
    }
    else {
        int mid = (l + r)/2;
        String lcpLeft =   longestCommonPrefix(strs, l , mid);
        String lcpRight =  longestCommonPrefix(strs, mid + 1,r);
        return commonPrefix(lcpLeft, lcpRight);
   }
}

String commonPrefix(String left,String right) {
    int min = Math.min(left.length(), right.length());       
    for (int i = 0; i < min; i++) {
        if ( left.charAt(i) != right.charAt(i) )
            return left.substring(0, i);
    }
    return left.substring(0, min);
}

Complexity Analysis

In the worst case we have n equal strings with length m

Time complexity : O(S), where S is the number of all characters in the array, S = m \cdot n Time complexity is 2 \cdot T\left ( \frac{n}{2} \right ) + O(m). Therefore time complexity is O(S). In the best case this algorithm performs O(minLen \cdot n) comparisons, where minLen is the shortest string of the array
Space complexity : O(m \cdot \log n)

There is a memory overhead since we store recursive calls in the execution stack. There are \log n recursive calls, each store need m space to store the result, so space complexity is O(m \cdot \log n)

Approach 4: Binary search

The idea is to apply binary search method to find the string with maximum value L, which is common prefix of all of the strings. The algorithm searches space is the interval (0 \ldots minLen), where minLen is minimum string length and the maximum possible common prefix. Each time search space is divided in two equal parts, one of them is discarded, because it is sure that it doesn't contain the solution. There are two possible cases:

S[1...mid] is not a common string. This means that for each j > i S[1..j] is not a common string and we discard the second half of the search space.
S[1...mid] is common string. This means that for for each i < j S[1..i] is a common string and we discard the first half of the search space, because we try to find longer common prefix.

Figure 3. Finding the longest common prefix of strings using binary search technique

public String longestCommonPrefix(String[] strs) {
    if (strs == null || strs.length == 0)
        return "";
    int minLen = Integer.MAX_VALUE;
    for (String str : strs)
        minLen = Math.min(minLen, str.length());
    int low = 1;
    int high = minLen;
    while (low <= high) {
        int middle = (low + high) / 2;
        if (isCommonPrefix(strs, middle))
            low = middle + 1;
        else
            high = middle - 1;
    }
    return strs[0].substring(0, (low + high) / 2);
}

private boolean isCommonPrefix(String[] strs, int len){
    String str1 = strs[0].substring(0,len);
    for (int i = 1; i < strs.length; i++)
        if (!strs[i].startsWith(str1))
            return false;
    return true;
}

Complexity Analysis

In the worst case we have n equal strings with length m

Time complexity : O(S \cdot \log m), where S is the sum of all characters in all strings.

The algorithm makes \log m iterations, for each of them there are S = m \cdot n comparisons, which gives in total O(S \cdot \log m) time complexity.
Space complexity : O(1). We only used constant extra space.

Further Thoughts / Follow up

Let's take a look at a slightly different problem:

Given a set of keys S = [S_1,S_2 \ldots S_n], find the longest common prefix among a string q and S. This LCP query will be called frequently.

We could optimize LCP queries by storing the set of keys S in a Trie. For more information about Trie, please see this article Implement a trie (Prefix trie). In a Trie, each node descending from the root represents a common prefix of some keys. But we need to find the longest common prefix of a string q and all key strings. This means that we have to find the deepest path from the root, which satisfies the following conditions:

it is prefix of query string q
each node along the path must contain only one child element. Otherwise the found path will not be a common prefix among all strings.
the path doesn't comprise of nodes which are marked as end of key. Otherwise the path couldn't be a prefix a of key which is shorter than itself.

Algorithm

The only question left, is how to find the deepest path in the Trie, that fulfills the requirements above. The most effective way is to build a trie from [S_1 \ldots S_n] strings. Then find the prefix of query string q in the Trie. We traverse the Trie from the root, till it is impossible to continue the path in the Trie because one of the conditions above is not satisfied.

Figure 4. Finding the longest common prefix of strings using Trie

public String longestCommonPrefix(String q, String[] strs) {
    if (strs == null || strs.length == 0)
         return "";  
    if (strs.length == 1)
         return strs[0];
    Trie trie = new Trie();      
    for (int i = 1; i < strs.length ; i++) {
        trie.insert(strs[i]);
    }
    return trie.searchLongestPrefix(q);
}

class TrieNode {

    // R links to node children
    private TrieNode[] links;

    private final int R = 26;

    private boolean isEnd;

    // number of children non null links
    private int size;    
    public void put(char ch, TrieNode node) {
        links[ch -'a'] = node;
        size++;
    }

    public int getLinks() {
        return size;
    }
    //assume methods containsKey, isEnd, get, put are implemented as it is described
   //in  https://leetcode.com/articles/implement-trie-prefix-tree/)
}

public class Trie {

    private TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

//assume methods insert, search, searchPrefix are implemented as it is described
//in  https://leetcode.com/articles/implement-trie-prefix-tree/)
    private String searchLongestPrefix(String word) {
        TrieNode node = root;
        StringBuilder prefix = new StringBuilder();
        for (int i = 0; i < word.length(); i++) {
            char curLetter = word.charAt(i);
            if (node.containsKey(curLetter) && (node.getLinks() == 1) && (!node.isEnd())) {
                prefix.append(curLetter);
                node = node.get(curLetter);
            }
            else
                return prefix.toString();

         }
         return prefix.toString();
    }
}

Complexity Analysis

In the worst case query q has length m and it is equal to all n strings of the array.

Time complexity : preprocessing , where S is the number of all characters in the array, LCP query O(m).

Trie build has O(S) time complexity. To find the common prefix of q in the Trie takes in the worst case O(m).
Space complexity : . We only used additional S extra space for the Trie.