Data Cleaning using a Matching Dependency Technique
| dc.contributor.author | Jain, Shashank | |
| dc.contributor.supervisor | Coady, Yvonne | |
| dc.date.accessioned | 2019-01-03T00:47:56Z | |
| dc.date.available | 2019-01-03T00:47:56Z | |
| dc.date.copyright | 2018 | en_US |
| dc.date.issued | 2019-01-02 | |
| dc.degree.department | Department of Computer Science | |
| dc.degree.level | Master of Science M.Sc. | en_US |
| dc.description.abstract | In today’s digital society, people are often required to enter their home or office addresses on forms available online. It is not uncommon for people to introduce some minor mistakes, such as misspelled addresses, or incorrect postal codes/zip codes. Such mistakes made by the user can be quite problematic when automated systems must process their request. For example, if a person orders something online providing the incorrect postal code in the entered address, this mistake could lead to delay in the delivery of the item or even worse, the item may remain undelivered. To avoid such situations, these systems often use a machine learning technique called ‘Matching Dependency’ which has been proven helpful in making recommendations for the correction of any incorrect value in the input address. This technique uses a binary search algorithm to reduce the number of cycles the process has to go through to make recommendations. Our exploration of one possible implementation of this algorithm uses our own synthesized sample data sets instead of real user input with the external data. External data has been used as the authenticated data source to verify the user input data. We compare our synthesized user input data with the external data that is considered to be completely trust worthy. The system then makes possible recommendations based on the correctness of the user input. The evaluation was mainly done on two different sizes of data sets, 1000 and 15000. The results had zero false negatives, few false positives, and mostly relevant recommendations. | en_US |
| dc.description.scholarlevel | Graduate | en_US |
| dc.identifier.uri | http://hdl.handle.net/1828/10477 | |
| dc.language.iso | en | en_US |
| dc.rights | Available to the World Wide Web | en_US |
| dc.title | Data Cleaning using a Matching Dependency Technique | en_US |
| dc.type | Project | en_US |