OCR Engine

Turqoa's OCR engine is a purpose-built optical character recognition system optimized for port and terminal environments. It handles three distinct recognition tasks — license plates, container codes, and seal numbers — each with specialized models trained on millions of real-world port images.

Capabilities

License Plate Recognition

The plate OCR model recognizes license plates from over 120 countries and regional formats. It handles:

Standard passenger and commercial vehicle plates
Multi-line plate formats (common in European and Middle Eastern regions)
Temporary and dealer plates
Dirty, damaged, or partially obscured plates
Plates captured under infrared illumination

Container Code Recognition

The container code model is trained specifically on ISO 6346 container markings. It extracts:

Owner code — 3 letters identifying the container operator
Equipment category — 1 letter (U, J, or Z)
Serial number — 6 digits
Check digit — 1 digit, validated automatically against ISO 6346

The model also reads:

Container size and type codes
Maximum gross weight markings
Tare weight markings
CSC plate data (when visible)

Seal Recognition

The seal OCR model reads alphanumeric codes from high-security bolt seals, cable seals, and e-seals:

ISO 17712 compliant seal formats
Carrier-specific seal numbering
QR and barcode seals (via separate barcode decoder)

Supported Formats

Container Codes (ISO 6346)

Component	Format	Example
Owner code	3 uppercase letters	`MSC`
Category	1 letter (U/J/Z)	`U`
Serial number	6 digits	`123456`
Check digit	1 digit	`7`
Full code	Combined	`MSCU1234567`

Note: The engine automatically validates the check digit using the ISO 6346 algorithm. Mismatched check digits are flagged and the transaction confidence is reduced.

Regional Plate Formats

Turqoa ships with recognition profiles for all major port regions:

Region	Formats	Notes
North America	US state plates, Canadian provincial, Mexican federal	Including temporary tags
Europe	EU standard, UK post-Brexit, Turkish	Multi-line support
Middle East	UAE, Saudi, Oman, Bahrain, Qatar, Kuwait	Arabic + Latin dual
East Asia	Chinese provincial, Japanese, Korean	CJK character support
Southeast Asia	Malaysian, Singaporean, Thai, Indonesian	Variable formats
Africa	South African, Egyptian, Moroccan, Nigerian	Regional variants

Configuration

# ~/.turqoa/sites/my-terminal/ocr.yaml
ocr:
  container:
    model: turqoa-container-v4
    confidence_threshold: 0.90
    check_digit_validation: true
    retry_on_low_confidence: true
    max_retries: 2

  plate:
    model: turqoa-plate-v3
    confidence_threshold: 0.85
    regions:
      - middle_east
      - europe
    multi_line: true
    infrared_mode: auto

  seal:
    model: turqoa-seal-v2
    confidence_threshold: 0.80
    barcode_fallback: true
    formats:
      - iso_17712
      - carrier_specific

Confidence Thresholds

Confidence thresholds control when OCR results are considered reliable enough for automated processing. Setting these correctly is critical for balancing automation rate against accuracy.

Threshold	Effect
Too high (> 0.98)	Many transactions routed to manual review; low automation rate
Recommended (0.90–0.95)	Good balance of automation and accuracy
Too low (< 0.80)	Higher automation rate but increased risk of misreads

Tune thresholds using historical data:

turqoa ocr analyze --site "my-terminal" --period 7d

This produces a report showing accuracy at different threshold levels, allowing data-driven tuning.

Multi-Language Support

The OCR engine natively supports multiple scripts:

Latin — English, Spanish, Portuguese, French, German, Turkish
Arabic — Standard Arabic, Farsi numerals
CJK — Simplified Chinese, Traditional Chinese, Japanese, Korean
Cyrillic — Russian, Ukrainian
Devanagari — Hindi

For plates with dual-script formats (common in the Middle East), the engine reads both scripts and cross-validates:

ocr:
  plate:
    dual_script:
      enabled: true
      primary: arabic
      secondary: latin
      cross_validate: true

Performance Tuning

Image Quality

OCR accuracy is directly tied to image quality. Key factors:

Resolution — Minimum 150 pixels across the plate or container code
Exposure — Avoid overexposed or underexposed captures
Focus — The character region must be in sharp focus
Angle — Maximum 30-degree skew from perpendicular

Processing Speed

Model	Average Latency	GPU Memory
Container OCR v4	45 ms	1.2 GB
Plate OCR v3	30 ms	0.8 GB
Seal OCR v2	55 ms	1.0 GB

To optimize throughput on multi-lane installations:

ocr:
  processing:
    batch_size: 4          # Process multiple images per GPU call
    worker_threads: 2      # Parallel processing threads per model
    gpu_memory_fraction: 0.8  # Reserve 80% of GPU for OCR
    preload_models: true   # Load models at startup, not on first request

Warning: Setting gpu_memory_fraction too high may starve the damage detection model of GPU resources. Ensure total GPU memory allocation across all models does not exceed 90% of available VRAM.

Retry Logic

When a read falls below the confidence threshold, the engine can request a re-capture:

ocr:
  retry:
    enabled: true
    max_attempts: 3
    delay_ms: 500
    escalate_on_failure: true  # Route to manual review after all retries fail