8. Encode and Decode Strings
View on NeetCode
View on LeetCode
Problem
Design an algorithm to encode a list of strings to a single string. The encoded string is then decoded back to the original list of strings.
Please implement encode and decode.
Example 1:
Input: ["neet","code","love","you"]
Output: ["neet","code","love","you"]
Explanation: The strings are encoded and then decoded back to the original list.
Example 2:
Input: ["we","say",":","yes"]
Output: ["we","say",":","yes"]
Constraints:
0 <= strs.length < 2000 <= strs[i].length < 200strs[i]contains any possible characters out of256valid ASCII characters.
Follow up: Could you write a generalized algorithm to work on any possible set of characters?
Solution
Approach: Length-Prefixed Encoding
The key insight is to prefix each string with its length followed by a delimiter. This way, we know exactly how many characters to read for each string, even if the strings contain special characters or delimiters.
The format is: length#string for each string. For example, ["neet","code"] becomes "4#neet4#code".
This approach handles all edge cases:
- Empty strings (encoded as
"0#") - Strings with special characters (including
#) - Any ASCII characters
Implementation
class Solution:
def encode(self, strs: List[str]) -> str:
result = ""
for s in strs:
result += str(len(s)) + "#" + s
return result
def decode(self, s: str) -> List[str]:
result = []
i = 0
while i < len(s):
# Find the delimiter '#'
j = i
while s[j] != '#':
j += 1
# Extract length
length = int(s[i:j])
# Extract string of that length
i = j + 1 # Start after '#'
result.append(s[i:i + length])
# Move to next encoded string
i += length
return result
Alternative (Using Join for Encoding):
class Solution:
def encode(self, strs: List[str]) -> str:
return ''.join(f"{len(s)}#{s}" for s in strs)
def decode(self, s: str) -> List[str]:
result = []
i = 0
while i < len(s):
# Find the delimiter '#'
delimiter_pos = s.index('#', i)
# Extract length
length = int(s[i:delimiter_pos])
# Extract string of that length
start = delimiter_pos + 1
result.append(s[start:start + length])
# Move to next encoded string
i = start + length
return result
Complexity Analysis
- Time Complexity: O(n) for both encode and decode, where n is the total number of characters across all strings. We process each character once.
- Space Complexity: O(1) extra space (not counting the output). We build the result as we go without using additional data structures proportional to input size.
Key Insights
-
Length Prefix is Key: By encoding the length before each string, we know exactly how many characters to read, eliminating ambiguity even with special characters.
-
Delimiter Choice: We use
#as a delimiter between length and string. Since we know the exact length, it doesn’t matter if#appears in the string itself. - Handles Edge Cases: This approach naturally handles:
- Empty strings: encoded as
"0#" - Single character strings: encoded as
"1#x" - Strings with numbers, delimiters, or any ASCII characters
- Empty strings: encoded as
-
No Escaping Needed: Unlike approaches that use a single delimiter (like
","), we don’t need to escape special characters in the strings. - Generalized Solution: This algorithm works for any possible set of characters, satisfying the follow-up requirement. The length-prefix approach is language-agnostic and robust.