A system for detecting fraudulent merchant applications using vector similarity search and pattern matching.
The system consists of three main components:
-
Training Module (
fraud_detection_training):- Generates synthetic training data
- Creates and populates the database with merchant records
- Handles vector embeddings for similarity search
-
API Module (
fraud_detection_api):- Provides REST API endpoints for fraud detection
- Implements pattern matching and similarity search
- Returns detailed fraud analysis results
-
Common Module (
fraud_detection_common):- Shared utilities and configurations
- Database operations
- Dynamic model generation
- Project Structure
- Prerequisites
- Setup
- Usage
- How It Works
- System Flow
- Development
- Contributing
- License
- Acknowledgments
The system is divided into three main components:
-
fraud_detection_common - Shared utilities and models
- Database operations with pgvector
- Custom embedding generation using feature engineering and PCA
- Dynamic model generation based on configuration
- Common data models and types
-
fraud_detection_training - Training and embedding generation
- Processes training data from JSON or CSV files
- Generates custom embeddings
- Stores embeddings in the database
-
fraud_detection_api - API service
- FastAPI-based REST API
- Evaluates new applications
- Returns fraud detection results
graph TD
A[Training Data] --> B[Training Module]
B --> C[Feature Engineering]
C --> D[PCA Transformation]
D --> E[Store Embeddings]
F[New Application] --> G[API Service]
G --> H[Feature Engineering]
H --> I[PCA Transformation]
I --> J[Vector Similarity Search]
J --> K[Field Matching]
K --> L[Decision Making]
L --> M[Return Result]
E --> J
- Python 3.9+
- PostgreSQL 15+ with pgvector extension
- Docker and Docker Compose (for containerized deployment)
- Make (optional, for using Makefile commands)
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate# Install common module
pip install -e fraud_detection_common/
# Install training module
pip install -e fraud_detection_training/
# Install API module
pip install -e fraud_detection_api/# Start PostgreSQL with pgvector
docker compose up -d db
# Wait for database to be ready
docker compose exec db pg_isready -U postgres# Generate training data and populate database
python -m fraud_detection_training.train# Start the API server
python -m fraud_detection_api.api# Build all services
docker compose build
# Start all services
docker compose up -d# View logs
docker compose logs -f
# Check service status
docker compose ps# Stop all services
docker compose down
# Stop and remove volumes
docker compose down -v# Get API schema
curl http://localhost:8000/schema# Submit a merchant application for fraud detection
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"merchant_id": "test-merchant-1",
"owner_ssn": "123-45-6789",
"business_fed_tax_id": "12-3456789",
"owner_drivers_license": "DL12345678",
"business_phone_number": "+1-555-123-4567",
"owner_phone_number": "+1-555-987-6543",
"email": "[email protected]",
"address_line1": "123 Main St",
"city": "New York",
"state": "NY",
"zip_code": "10001",
"country": "US",
"website": "https://example.com"
}'# Start database
docker compose up -d db
# Install dependencies
pip install -e fraud_detection_common/
pip install -e fraud_detection_training/
pip install -e fraud_detection_api/
# Run training
python -m fraud_detection_training.train
# Run API
python -m fraud_detection_api.api# Build and start services
docker compose up --build
# View logs
docker compose logs -f
# Stop services
docker compose downThe system uses a centralized configuration in the config/ directory:
database_config.json: Database connection settingsdatabase_config.local.json: Local development overrides
To modify configuration:
- Copy the template:
cp config/database_config.json config/database_config.local.json- Edit the local configuration file with your settings
# Check database status
docker compose exec db psql -U postgres -c "\l"
# Check table structure
docker compose exec db psql -U postgres -d fraud_detection -c "\dt"# View API logs
docker compose logs -f api
# View training logs
docker compose logs -f training
# View database logs
docker compose logs -f db- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- pgvector for vector similarity search
- FastAPI for the API framework
- scikit-learn for feature engineering and PCA