Files
sides/docs/04-DATA-PIPELINE.md
root 9122deaacd fix: seeder idempotent with firstOrCreate
Use firstOrCreate instead of create so db:seed can run safely
on container restart without duplicate key violation.
2026-05-21 02:31:47 +08:00

5.0 KiB

Data Pipeline: Python Autoscript

Overview

The file autoscript/sidesdecode.py is the data ingestion pipeline that:

  1. Connects to an FTP server where telemetry stations upload CSV data
  2. Downloads and parses CSV files for the current day
  3. Inserts rainfall, water level, and siren data into PostgreSQL
  4. Triggers push notifications when thresholds are exceeded

How It Runs

The script is designed to be run on a schedule (likely cron job), processing new data files uploaded by remote telemetry stations throughout the day.

FTP Connection

Server: myvscada.com
Username: tck
Password: tck6789
Path: files/SIDES/SUCCESS/{year}/{month}/{day}/

The script navigates to today's date folder and lists all files.

File Filtering

  • Skips files containing "rf" in the filename (Tideda format files)
  • Only processes files with today's date (yymmdd format) in the filename

CSV Format

Each line in the CSV file contains 37+ comma-separated columns. Key columns extracted:

Column Index Field Description
1 station_id Station identifier (e.g., KBLG0026)
4 timestamp Timestamp in yymmddHHMMSS format
6 battery Battery voltage
15 wlalert Water level alert threshold
16 wlwarn Water level warning threshold
17 wldgr Water level danger threshold
18 sirenid Siren identifier
19 siren Siren status (H=Danger/High, L=Warning/Low, N=Normal)
21 anncumm Annual cumulative rainfall
22 dailycumm Daily cumulative rainfall
23 hourlycumm Hourly rainfall
24 currrf Current rainfall
36 waterlevel Current water level reading

Data Processing Logic

Rainfall Data

  1. Check if dailycumm or hourlycumm is not null
  2. Check if record already exists for this station+timestamp
  3. If new, INSERT into rainfall table
  4. Threshold check: If hourlycumm >= 30:
    • 30 <= hourly < 60Warning level
    • hourly >= 60Danger level
    • INSERT into notification table
    • Send push notification via Laravel API

Water Level Data

  1. Check if waterlevel is not null
  2. Check if record already exists for this station+datetime
  3. If new, INSERT into waterlevel table (with alert/warning/danger thresholds)
  4. Threshold check: If waterlevel >= alert:
    • alert <= wl < warningAlert level
    • warning <= wl < dangerWarning level
    • wl >= dangerDanger level
    • INSERT into notification table
    • Send push notification via Laravel API

Siren Data

  1. Check if sirenid is not null
  2. Check if record already exists for this station+active_time
  3. Determine level from siren status:
    • HDanger
    • LWarning
    • NNormal
  4. INSERT into siren table
  5. If level is not Normal, send push notification via Laravel API

Alert Notification Flow

When a threshold is triggered, the script calls send_alert_to_laravel():

def send_alert_to_laravel(stationid, level, stationtype):
    payload = {
        "stationid": stationid,
        "level": level,
        "stationtype": stationtype,  # 1=rainfall, 2=waterlevel, 3=siren
    }
    response = requests.post("https://sides.tck.com.my/api/alert", json=payload, timeout=5)

This hits the Laravel AlertController which:

  1. Builds notification title/body based on station type and level
  2. Calls FcmService::sendToTopic() which:
    • Reads Firebase service account credentials
    • Gets an OAuth2 access token from Google
    • Sends FCM message to topic (e.g., rainfall_warning)
    • Push notification arrives on subscribed mobile devices

PostgreSQL Connection

The script connects directly to PostgreSQL:

pg_host = "192.168.0.211"
pg_database = "sides_db"
pg_user = "tck"
pg_password = "projectdev##1"

Note: This is a hardcoded external IP, not using the Docker container. The database name is sides_db (different from the Docker .env which uses tckdev).

File Management (Commented Out)

The script contains (commented out) functions for:

  • move_to_error_folder() — Move malformed files to an FTP error folder
  • move_to_success_folder() — Move processed files to a success archive folder

These are currently disabled — files remain in the source folder after processing.

Log Files

  • autoscript/sidesdecode.log — Processing output
  • autoscript/sidesdecode_error.log — Error output

Known Issues

  1. Hardcoded credentials — FTP and PostgreSQL credentials are embedded in the script
  2. No deduplication beyond same-timestamp — If the script runs twice, it skips exact duplicates but has no broader deduplication
  3. Commented out file management — Processed files are not moved/archived
  4. Water level alert sends stationtype=1 instead of 2 (likely a bug at line 378)
  5. No error recovery — If the script crashes mid-processing, some data may be partially inserted
  6. No connection pooling — New FTP and database connections each run