“Your car has failed its MOT.”
“We’ve run out of milk.”
“I’d give that ten minutes”
There are certain things we just don’t want to hear. However, receiving reliable and accurate information, even if it’s bad news, is essential in all aspects of our lives in recent years.
When your address data is being verified, you don’t want to be told that a large percentage doesn’t match. But sometimes it is what you need to hear.
Different address validation software uses different methods to verify data. How these systems manage the process can look the same, but may produce very different results and metrics.
One of the key issues is the logic that is used within the system. Some systems use Fuzzy Logic, while others employ Rules-Based Logic.
To understand which system you need, it’s important to understand how both approaches work.
The Fuzzy Way
Fuzzy logic is a decision-making solution that attempts to match as many addresses as possible using phonetics and algorithms, along with percentage scores to decide on the results. It is a discipline that intersects with mathematics, machine learning, and computer science. However, even the most advanced algorithms can produce false results when data mining. Whether that is search queries, address matching or predicting exam results. Reliance on this approach can often create difficulties.
Proponents of Fuzzy Logic occasionally attempt to position the method as Artificial Intelligence, but this can be misleading and provide false confidence in the expert system. In reality, fuzzy logic generally attempts to find patterns in data and use those patterns to attempt to correct mistakes, fill in missing fields and make a ‘best guess’ at the result.
Fuzzy systems are about making as many matches as possible using the minimal number of input variables, even if that means changing the data. If something is 80% correct, then it’s also 20% incorrect in the real world.
Following The Rules
Rules-based logic is about determining what is, and what is not, allowed to match based on specific rules. These rules are based on natural language representations and models.
For instance, the English language allows for both ie and ei, along with the age-old ‘i before e, except after c’ rule, even if that is not always true! Using a rules-based system, if the word does not match, the program will test a different rule to see whether it produces a match, rather than deciding something is the same, just because it could sound similar.
Other rules include silent letter semantics, such as ‘e’ at the end of words to be added or removed, or double letters to be reduced to a single.
By defining the rules and working through them in a logical manner, it is easy to provide reasons for why something has matched or not.
The best methodology to use alongside this, for address matching purposes, is to work ‘up’ through an address. Starting with the town (potentially within a county), then the thoroughfare with the town, followed by the individual address details (premise number, premise name, sub-premise, organisation). This is why rules-based matching for addressing works so well, because addresses, per country, have a known hierarchy.
Alongside this, by ‘squeezing’ the address to start with, it is possible to locate the likely components, such as postcode, town, thoroughfare and premise to help hone the matching rules for specific items, such as industrial estates, where the abbreviations can be varied.
Postcode and premise could be utilised for an ‘easy’ match, but a simple typo could have a radical effect, for instance, AB is the area code for Aberdeen, whilst BA is the area code for Bath… This is why Hopewiser would not recommend this form of matching where a computer is making the sole decision such as on a web form. However, if the address is being validated via a Call Centre during a call, then looking up just the postcode and premise number is great to reduce keystrokes for validating data interactively.
Alongside the generic rules, it is possible to add more specific rules, based on analysis of items not matching, especially as abbreviations and colloquialisms change over time, to allow improvements to be made, in a structured manner using the existing knowledge base.
So, do you want the good news or the bad news?
Both methods (fuzzy and rules) are based on matching to a primary dataset, so the better the quality of the data behind the processes, the higher the quality of the matches. However, rules-based matching produces fewer incorrect matches because it is a defined process, rather than a best guess.
This means rules-based logic can sometimes produce fewer address matches, but you can be confident of the results produced.
Our advice is that a decision-maker should not use fuzzy logic systems for mission-critical data, but it may be acceptable for some uses. However, if you want to have complete control of your address matching and be confident in the accuracy of your data and decision tree, you should only use rules-based logic.
, updated 10th September 2024.