TOAA

Train Once,

Apply Anywhere

Abstract: Generative Large language models (LLMs), such as OpenAI's GPT-3, have demonstrated remarkable success in various natural language processing (NLP) tasks by generating human-like content. One significant advantage of these models is their ability to generalize across different domains without further fine-tuning. This paper investigates the applicability of Generative LLMs for entity matching, a crucial task for entity resolution and data quality. We initiated our study by aligning our methods with traditional research protocols: training and testing LLMs, specifically OpenAI's base GPT-3 (ada) and DataBricks open-source model Dolly 2.0, on domain-specific datasets. In this conventional approach, both models showcased commendable performance, exceeding that of other established methods in many tasks. In the paper's innovative pivot, we introduced a unique approach: training the model on a single dataset and then testing its applicability across a spectrum of different domain datasets. This approach yielded noteworthy results, marking a pioneering step in the domain of entity-matching research. Although the outcomes didn't consistently outperform traditional domain-specific models, they underlined the potential versatility of LLMs in entity matching. Additionally, we delve into a detailed analysis of the differential performance across datasets, shedding light on factors that contributed to varying results. Our findings suggest that the proposed approach has the potential to significantly impact the field of entity matching by providing a robust and scalable solution that can be applied across multiple domains.

(Pre-Print)