Finding the Closest Matching String in Python using Difflib
Learn how to use Python's difflib library to find the closest matching string from a list of strings.

Introduction
In many applications, such as data cleaning, text processing, and search functionality, finding the closest matching string from a list of strings is a common task. Python's difflib library provides an efficient way to achieve this. In this post, we'll explore how to use difflib to find the closest matching string.
Using Difflib to Find Closest Matches
The difflib library provides a function called get_close_matches() that returns a list of the best matches among the given list of strings. Here's an example:
import difflib
my_str = 'apple'
str_list = ['ape' , 'fjsdf', 'aerewtg', 'dgyow', 'paepd']
best_match = difflib.get_close_matches(my_str, str_list)[0]In this code, get_close_matches() takes two arguments: the string to be matched (my_str) and the list of strings to search (str_list). The function returns a list of the best matches, and we're retrieving the first (and best) match using [0].
Calculating the Score of the Best Match
To determine how close the best match is to the original string, we can use the SequenceMatcher.ratio() method. This method returns a measure of the sequences' similarity as a float in the range [0, 1]. Here's how to calculate the score:
score = difflib.SequenceMatcher(None, my_str, best_match).ratio()The SequenceMatcher class is initialized with None as the first argument (which means we're not using a junk function), and the two strings to be compared (my_str and best_match). The ratio() method then returns the similarity score.
Example Usage
Let's put it all together with a complete example:
import difflib
def find_closest_match(input_str, str_list):
best_match = difflib.get_close_matches(input_str, str_list)[0]
score = difflib.SequenceMatcher(None, input_str, best_match).ratio()
return best_match, score
my_str = 'apple'
str_list = ['ape' , 'fjsdf', 'aerewtg', 'dgyow', 'paepd']
best_match, score = find_closest_match(my_str, str_list)
print(f'Best match: {best_match}, Score: {score:.2f}')Conclusion
In this post, we've seen how to use Python's difflib library to find the closest matching string from a list of strings. By leveraging the get_close_matches() function and SequenceMatcher.ratio() method, you can efficiently implement fuzzy matching in your applications.
Tags
Comments (0)
Leave a Comment
No comments yet. Be the first to share your thoughts!