Use firstOrCreate instead of create so db:seed can run safely on container restart without duplicate key violation.
5.0 KiB
Data Pipeline: Python Autoscript
Overview
The file autoscript/sidesdecode.py is the data ingestion pipeline that:
- Connects to an FTP server where telemetry stations upload CSV data
- Downloads and parses CSV files for the current day
- Inserts rainfall, water level, and siren data into PostgreSQL
- Triggers push notifications when thresholds are exceeded
How It Runs
The script is designed to be run on a schedule (likely cron job), processing new data files uploaded by remote telemetry stations throughout the day.
FTP Connection
Server: myvscada.com
Username: tck
Password: tck6789
Path: files/SIDES/SUCCESS/{year}/{month}/{day}/
The script navigates to today's date folder and lists all files.
File Filtering
- Skips files containing "rf" in the filename (Tideda format files)
- Only processes files with today's date (
yymmddformat) in the filename
CSV Format
Each line in the CSV file contains 37+ comma-separated columns. Key columns extracted:
| Column Index | Field | Description |
|---|---|---|
| 1 | station_id |
Station identifier (e.g., KBLG0026) |
| 4 | timestamp |
Timestamp in yymmddHHMMSS format |
| 6 | battery |
Battery voltage |
| 15 | wlalert |
Water level alert threshold |
| 16 | wlwarn |
Water level warning threshold |
| 17 | wldgr |
Water level danger threshold |
| 18 | sirenid |
Siren identifier |
| 19 | siren |
Siren status (H=Danger/High, L=Warning/Low, N=Normal) |
| 21 | anncumm |
Annual cumulative rainfall |
| 22 | dailycumm |
Daily cumulative rainfall |
| 23 | hourlycumm |
Hourly rainfall |
| 24 | currrf |
Current rainfall |
| 36 | waterlevel |
Current water level reading |
Data Processing Logic
Rainfall Data
- Check if
dailycummorhourlycummis not null - Check if record already exists for this station+timestamp
- If new, INSERT into
rainfalltable - Threshold check: If
hourlycumm >= 30:30 <= hourly < 60→ Warning levelhourly >= 60→ Danger level- INSERT into
notificationtable - Send push notification via Laravel API
Water Level Data
- Check if
waterlevelis not null - Check if record already exists for this station+datetime
- If new, INSERT into
waterleveltable (with alert/warning/danger thresholds) - Threshold check: If
waterlevel >= alert:alert <= wl < warning→ Alert levelwarning <= wl < danger→ Warning levelwl >= danger→ Danger level- INSERT into
notificationtable - Send push notification via Laravel API
Siren Data
- Check if
sirenidis not null - Check if record already exists for this station+active_time
- Determine level from siren status:
H→ DangerL→ WarningN→ Normal
- INSERT into
sirentable - If level is not Normal, send push notification via Laravel API
Alert Notification Flow
When a threshold is triggered, the script calls send_alert_to_laravel():
def send_alert_to_laravel(stationid, level, stationtype):
payload = {
"stationid": stationid,
"level": level,
"stationtype": stationtype, # 1=rainfall, 2=waterlevel, 3=siren
}
response = requests.post("https://sides.tck.com.my/api/alert", json=payload, timeout=5)
This hits the Laravel AlertController which:
- Builds notification title/body based on station type and level
- Calls
FcmService::sendToTopic()which:- Reads Firebase service account credentials
- Gets an OAuth2 access token from Google
- Sends FCM message to topic (e.g.,
rainfall_warning) - Push notification arrives on subscribed mobile devices
PostgreSQL Connection
The script connects directly to PostgreSQL:
pg_host = "192.168.0.211"
pg_database = "sides_db"
pg_user = "tck"
pg_password = "projectdev##1"
Note: This is a hardcoded external IP, not using the Docker container. The database name is sides_db (different from the Docker .env which uses tckdev).
File Management (Commented Out)
The script contains (commented out) functions for:
move_to_error_folder()— Move malformed files to an FTP error foldermove_to_success_folder()— Move processed files to a success archive folder
These are currently disabled — files remain in the source folder after processing.
Log Files
autoscript/sidesdecode.log— Processing outputautoscript/sidesdecode_error.log— Error output
Known Issues
- Hardcoded credentials — FTP and PostgreSQL credentials are embedded in the script
- No deduplication beyond same-timestamp — If the script runs twice, it skips exact duplicates but has no broader deduplication
- Commented out file management — Processed files are not moved/archived
- Water level alert sends
stationtype=1instead of2(likely a bug at line 378) - No error recovery — If the script crashes mid-processing, some data may be partially inserted
- No connection pooling — New FTP and database connections each run