Synthesis Data

Challenge

The development team needed to train a model to detect weapons appearing at tourist sites, but they could not find a suitable image dataset. Searching for images on Google using English or Vietnamese keywords did not yield the desired results, and text-to-image generation tools failed to produce images that matched the context. Describing diverse and complex scenarios (e.g., weapon type, setting, number of people, time of day) made the task even more challenging.

Solution

The development team built two systems:
  • Prompt Engine: Assists users in creating detailed, realistic descriptions (prompts) to search for or generate images for training.
  • Data Creator System: Automatically identifies key attributes from real images, verifies and validates compatibility between real and generated images, and allows users to supplement or edit information to enhance data diversity and quality.

Key Components

  • Prompt Engine: A list of descriptive attributes (subject, action, object, location, time), a template framework for prompt creation, a model to identify individual attribute components, and a model to generate complete prompts.
  • Synthesis Data System: A tool for extracting attributes from reference images (e.g., weapon type, environment, number of people, context), a system for verifying and validating information, and a manual confirmation and editing interface.
  • Zero-shot Classification: Identifies and generates data for new weapon types or scenarios not present in the training set.
  • Smart Search: Performs precise image searches on the internet based on attributes identified from images.

Implementation

  • Developed a library of attributes, description templates, and NLP models to assist users in creating prompts.
  • Built a tool for attribute recognition from real images (using computer vision and NLP).
  • Integrated a verification, editing, and manual attribute supplementation interface for users.
  • Applied zero-shot classification to identify new cases.
  • Optimized the iterative process of image generation, verification, and validation to ensure the quality of output data.

A. PROMPT ENGINE

1. Motivation

Suppose you need to build a model to detect weapons in a tourist area and do not have a training data set. You search in Google with the keyword: "weapons in tourist area".
The results from Google image are as follows:
Then you try to search with more detailed keywords, with description “A man brought a weapon into the tourist area”. The results received are as follows:
You try with text-to-image models with the prompt: “A man brought a weapon into the tourist area”
The results are:
➡️ You want to search or generate more data but don't know how to write a good description

2. Solution

We created a Prompt Engine to help users create descriptions so they can search or generate the most suitable images.
When we search or generate image data with the prompt “Images from the camera recorded a man holding a knife wandering around in front of the park”.
The corresponding results are as follows:
➡️ From the results generated from the Prompt Engine, users can search or generate image data more effectively.
Our Prompt Engine includes 4 components:
  1. List of required properties for users to fill out
  2. A set of formats for generating prompts
  3. A LLM model to identify sub-objects of each property
  4. A model to generate the output prompt
How it works:
  • First, the user will fill in the listed properties, the user can skip some properties if they find them unnecessary. Users can fill in other properties if needed (component 1).
  • A series of formats exported from component 2 are queued for the generate prompt.
  • The LLM model in component 3 generates a list of sub-objects of each property that are filled in by the user. Several other properties were also analyzed to generate more diverse data.
  • The model in component 4 combines information from components 3 and 2 to generate output prompts.

3. The Novelty

  • Create a set of formats for generating prompts
  • Create a set of specific properties for weapon detection:
  • Create prompts for the weapons detection problem that match the properties entered by the user

B. DATA CREATION

1. Motivation

Suppose you have several descriptions to generate data, but each description gives you several output images of a different environment.
➡️ How to ensure that the generated images have the same properties as real images?
We created a Data Creator System to identify properties from the reference images, thereby controlling the image generation process to match the properties of the real images.

2. Architecture

a. Identify description features from reference images.
In the first step, the Large Language Model (LLM) will extract properties from the reference images. We will build questions so that the LLM model gives answers for each property.
b. Interact between the user and the model to check the results of the model.
In this step, users will interact with the results in step 1. Users can:
  • Confirm correct properties
  • Correct incorrect properties
  • Add new properties
    The properties after being confirmed by humans will be updated and used for steps 3 and 5.
    Zero-shot Classification
    Zero-shot Classification is the task of predicting a class that wasn't seen by the model during training.
    In zero-shot classification, we provide the model with a prompt and a sequence of text that describes what we want our model to do, in natural language.
  • Use the model to classify new weapon types, depending on their intended use
  • Generate more data with more weapons

3. The Novelty

  • Build an LLM model to identify description features from reference images (1)
  • Build interaction between the user and the model to check the results of the model and verified generated data (2)
  • Allows users to create descriptions for new objects (2)
  • Double-check the final result of the system (5)
  • Build a data generation system by combining LLM models that have references from real images

Results

  • Successfully created a diverse, realistic image dataset covering various scenarios.
  • The weapon detection model trained on the new dataset achieved significantly higher accuracy compared to previous versions.
  • Enhanced user autonomy and creativity in expanding and enriching the dataset.

Impact

  • Addressed the challenge of data scarcity for specific, hard-to-find scenarios on the internet.
  • Reduced dependency on existing data sources, increasing flexibility in building training datasets.
  • Ensured diversity, realism, and scalability for various computer vision-related tasks.

Lessons Learned

  • Clearly defining descriptive attributes is critical for generating relevant data.
  • Combining real image references with manual validation effectively controls data quality.
  • Establishing an iterative process for verification and validation is essential for continuously improving and refining the dataset.

Conclusion

By developing the Prompt Engine and Data Creator System, the team effectively addressed the challenge of data scarcity for the weapon detection model. This approach produced a diverse and realistic dataset while enhancing adaptability to new scenarios that traditional data solutions could not address.
6/13/2025
bia.png
NEW
AI for Procurement: Optimizing Vietnam's Largest IT Giant
Apply AI-Model to optimize the identification and classification of products (based on analysis, product details, unit price, unit of measurement, etc.)
thumb.png
NEW
Crawler and Extract Information
Crawl data from websites and use a large language model (LLM) to extract and summarize information that aligns with the user's needs.
Thumb.jpg
NEW
Serverless RAG on AWS
Deploy a Retrieval-Augmented Generation (RAG) system on AWS using a serverless architecture to build an AI application capable of answering questions based on retrieved data. The solution allows users to upload documents, index the data, and interact through a web interface (built with Streamlit) to ask questions, with answers generated by combining information retrieval and the content generation capabilities of a large language model (LLM).
QaiDora Products
QaiDora draws inspiration from the myth of Pandora’s box—a symbol of unexpected possibilities and hope. For us, AI models are like modern Pandora’s boxes, holding untapped potential to turn challenges into opportunities. At QAI, QaiDora serves as an ecosystem of AI products designed to drive innovation and deliver competitive advantages.
Trusted by
Contact us
Copyright by qaidora.com