A Hybrid Approach for Automatic Information Extraction in Real Estate Documents using IBM Watson NLU Services and Semantic Rules
With the high volume of unstructured text doc- uments exist today and the advancement of AI technologies, the demand for automatic extraction of valuable information has never been higher. In this paper, we propose a hybrid information extraction method for noisy real estate documents that can take advantage of the robustness of machine learning models and the tractable behaviors of the rule-based approach. Our problem involve extracting convoluted Property objects from text documents, which are outputted from an OCR system. To solve this, we propose a pipeline system that includes three main components: (1) a pre-processing module, (2) custom machine learning models trained using IBM Watson NLU services, and (3) a Semantic Rules module. Our experiments on nearly three thousand real estate documents show promising results
Research paper: A Hybrid Approach for Automatic Information Extraction in Real Estate Documents using IBM Watson NLU Services and Semantic Rules
Trang M. Nguyen, Kim H.Nguyen, Tan T. Mai
9/26/2025
QaiDora Products
QaiDora draws inspiration from the myth of Pandora’s box—a symbol of unexpected possibilities and hope. For us, AI models are like modern Pandora’s boxes, holding untapped potential to turn challenges into opportunities. At QAI, QaiDora serves as an ecosystem of AI products designed to drive innovation and deliver competitive advantages.
Trusted by
Contact us









