Back to Blog

Finding the Closest Matching String in Python using Difflib

K
Karan Goyal
--2 min read

Learn how to use Python's difflib library to find the closest matching string from a list of strings.

Finding the Closest Matching String in Python using Difflib

Introduction

In many applications, such as data cleaning, text processing, and search functionality, finding the closest matching string from a list of strings is a common task. Python's difflib library provides an efficient way to achieve this. In this post, we'll explore how to use difflib to find the closest matching string.

Using Difflib to Find Closest Matches

The difflib library provides a function called get_close_matches() that returns a list of the best matches among the given list of strings. Here's an example:

python
import difflib

my_str = 'apple'
str_list = ['ape' , 'fjsdf', 'aerewtg', 'dgyow', 'paepd']
best_match = difflib.get_close_matches(my_str, str_list)[0]

In this code, get_close_matches() takes two arguments: the string to be matched (my_str) and the list of strings to search (str_list). The function returns a list of the best matches, and we're retrieving the first (and best) match using [0].

Calculating the Score of the Best Match

To determine how close the best match is to the original string, we can use the SequenceMatcher.ratio() method. This method returns a measure of the sequences' similarity as a float in the range [0, 1]. Here's how to calculate the score:

python
score = difflib.SequenceMatcher(None, my_str, best_match).ratio()

The SequenceMatcher class is initialized with None as the first argument (which means we're not using a junk function), and the two strings to be compared (my_str and best_match). The ratio() method then returns the similarity score.

Example Usage

Let's put it all together with a complete example:

python
import difflib

def find_closest_match(input_str, str_list):
    best_match = difflib.get_close_matches(input_str, str_list)[0]
    score = difflib.SequenceMatcher(None, input_str, best_match).ratio()
    return best_match, score

my_str = 'apple'
str_list = ['ape' , 'fjsdf', 'aerewtg', 'dgyow', 'paepd']
best_match, score = find_closest_match(my_str, str_list)
print(f'Best match: {best_match}, Score: {score:.2f}')

Conclusion

In this post, we've seen how to use Python's difflib library to find the closest matching string from a list of strings. By leveraging the get_close_matches() function and SequenceMatcher.ratio() method, you can efficiently implement fuzzy matching in your applications.

Tags

#python#difflib#string matching#fuzzy matching

Share this article

Comments (0)

Leave a Comment

0/2000

No comments yet. Be the first to share your thoughts!