Open Data QnA

Background

A major commercial bank in Vietnam needed to assess the financial capacity and risks of enterprises when considering credit approvals or investments. The bank possessed a large dataset of bidding contracts won by enterprises, including details such as bid name, enterprise name, tax code, procuring entity, bid value, approval date, and approval decision number. However, querying and analyzing this data using traditional SQL required advanced technical expertise and was time-consuming, making it difficult for credit officers to access information quickly and efficiently.
To address this issue, the bank decided to implement a solution leveraging artificial intelligence (AI) to allow staff to ask questions in natural language and receive visualized analytical results, such as trends in bid values or dependencies on specific procuring entities. The Open Data QnA system from Google Cloud was chosen for its ability to convert natural language queries into SQL and generate visual charts.

Challenge

The bank faced the following challenges:
  1. Difficulty in Data Querying: Credit officers lacked SQL skills to query the BigQuery database containing bidding data.
  2. Delayed Analysis: Manual analysis of bidding data (e.g., trends, enterprise performance) was time-consuming and failed to meet the need for rapid decision-making.
  3. Complex Risk Assessment: The bank needed to understand enterprises' dependency on specific procuring entities and bidding trends to evaluate credit risks.
  4. Data Visualization: There was a lack of tools to present data in intuitive charts or dashboards for managers.

Solution

The bank implemented Open Data QnA, an open-source Python library from Google Cloud, to build a system for analyzing bidding data. This system enabled staff to ask questions in natural language (e.g., "Which enterprise won the most bids in 2024?" or "What is the trend of bid values for Enterprise A over the years?") and receive responses in the form of SQL results or visualized charts. The solution was deployed on Google Cloud Platform (GCP), leveraging BigQuery for data storage and large language models (LLMs) for query processing.

Key Components

  • Open Data QnA: An open-source Python library that uses LLMs to convert natural language into SQL queries (NL2SQL) and generate visualized charts.
  • Google BigQuery: A database for storing bidding data, supporting fast queries on large datasets.
  • Google Cloud AI Platform: Provides the environment for deploying LLMs and processing natural language queries.
  • VisualizeAgent: A component of Open Data QnA that generates interactive charts (bar, line, pie, etc.) from query results.
  • Jupyter Notebooks: Used for setting up and testing the system before production deployment.

Architecture

The system was built on Google Cloud with the following architecture:
  1. Data Storage: Bidding data was stored in BigQuery, with tables grouped by use case (e.g., "Enterprise Analysis") to optimize querying.
  2. Natural Language Processing: Open Data QnA used LLMs to analyze user questions, generate corresponding SQL queries, and return results.
  3. Chart Generation: VisualizeAgent converted query results into interactive charts (using Plotly or Chart.js) to display trends and insights.
  4. User Interface: A web interface (using Streamlit or Flutter) was implemented for staff to input questions and view results/charts.
  5. Multiturn Support: Enabled users to ask follow-up questions, e.g., "Show the enterprise with the highest bid value" followed by "Compare it to the previous year."

Implementation

  1. Data Preparation:
    • Bidding data was imported into BigQuery from CSV sources, with columns including bid name, enterprise name, tax code, procuring entity, bid value, approval date, and approval decision number.
    • Used the 0_CopyDataToBigQuery.ipynb notebook from Open Data QnA to create and manage tables in BigQuery.
  2. Setting Up Open Data QnA:
    • Cloned the Open Data QnA repository.
    • Installed Python >= 3.10 and dependencies via Poetry.
    • Enabled required APIs (BigQuery, AI Platform, etc.) on GCP.
  3. System Configuration:
    • Set up a vector store to store table metadata, facilitating quick retrieval of relevant tables for queries.
    • Used the 1_Setup_OpenDataQnA.ipynb notebook to configure the environment and save settings in config.ini.
  4. Interface Integration:
    • Deployed a Streamlit interface for staff to input questions and view results/charts.
    • Customized VisualizeAgent to generate charts such as bar charts (total bid value by enterprise) and line charts (trends over time).
  5. Testing and Deployment:
    • Tested the system in a non-production environment using Jupyter Notebooks.
    • Deployed to production using Terraform to automate GCP infrastructure setup.

Results

  • Rapid Querying: Credit officers could ask questions like "Which enterprise had the highest bid value in 2024?" and receive results in seconds, eliminating the need for SQL.
  • Visualized Charts: The system generated charts such as:
    • Bar chart: Comparing total bid values across enterprises.
    • Line chart: Showing bid value trends over time.
    • Pie chart: Displaying bid value distribution by procuring entity.
  • Increased Efficiency: Data analysis time was reduced from hours to minutes, enabling faster credit decisions.
  • Multiturn Support: Staff could ask follow-up questions to refine insights, e.g., "Show details of Enterprise A's bids in 2023."

Impact

  • Improved Decision-Making: The bank could more accurately assess enterprises' financial capacity and risks based on bidding data.
  • Reduced Dependency on Technical Experts: Credit officers no longer needed SQL knowledge, lowering training costs.
  • Enhanced Transparency: Dashboards and charts provided a clear view of bidding activities, aiding managers in strategic planning.
  • Scalability: The system could integrate additional data sources (e.g., enterprise financial data) for more comprehensive analysis.

Lessons Learned

  1. Data Quality Matters: Bidding data needed to be cleaned and standardized before ingestion into BigQuery to ensure accurate results.
  2. Query Optimization: Grouping tables by use case improved LLM performance and reduced query processing time.
  3. Thorough Testing: Non-production testing was critical to avoid errors, especially with sensitive data.
  4. User Interaction: The Streamlit interface was simple and effective, but users needed guidance on phrasing clear questions to avoid ambiguous results.
  5. Cost Considerations: Large-scale BigQuery queries could incur costs, requiring quota limits and sample tables to optimize expenses.

Conclusion

The implementation of Open Data QnA provided a transformative solution for the bank, enabling rapid, intuitive, and non-technical analysis of bidding data. The system not only improved decision-making efficiency but also offered deep insights into enterprise performance and risks. With its scalability and integration capabilities, this solution promises to support the bank in tackling more complex data analyses in the future.
6/6/2025
bia.png
NEW
AI for Procurement: Optimizing Vietnam's Largest IT Giant
Apply AI-Model to optimize the identification and classification of products (based on analysis, product details, unit price, unit of measurement, etc.)
thumb.png
NEW
Synthesis Data
"Use a large language model (LLM) to identify descriptive features from a reference image. Based on these features, users can customize specific attributes and use the customized information to search for or generate images with similar characteristics."
thumb.png
NEW
Crawler and Extract Information
Crawl data from websites and use a large language model (LLM) to extract and summarize information that aligns with the user's needs.
QaiDora Products
QaiDora draws inspiration from the myth of Pandora’s box—a symbol of unexpected possibilities and hope. For us, AI models are like modern Pandora’s boxes, holding untapped potential to turn challenges into opportunities. At QAI, QaiDora serves as an ecosystem of AI products designed to drive innovation and deliver competitive advantages.
Trusted by
Contact us
Copyright by qaidora.com