docs: rewrite all documentation to reflect current state
- Remove adminer references (service was removed) - Remove mermaid diagrams (ASCII only) - Remove hardcoded credentials (use env var references) - Update all Docker references to 4-container setup (app, postgres, web, pgadmin) - Document env-based admin credentials (ADMIN_EMAIL/ADMIN_PASSWORD) - Document parameterized queries (SQL injection fixed) - Document FCM topic routing by stationtype+level - Document siren stationtype=3 fix in sidesdecode.py - Document idempotent seeder (firstOrCreate) - Document reverse proxy setup in deployment guide - Remove Makefile references (Docker Compose only)
This commit is contained in:
@@ -1,48 +1,62 @@
|
||||
<!-- generated-by: gsd-doc-writer -->
|
||||
# Data Pipeline: Python Autoscript
|
||||
|
||||
## Overview
|
||||
|
||||
The file `autoscript/sidesdecode.py` is the data ingestion pipeline that:
|
||||
`autoscript/sidesdecode.py` is the data ingestion pipeline that:
|
||||
|
||||
1. Connects to an FTP server where telemetry stations upload CSV data
|
||||
2. Downloads and parses CSV files for the current day
|
||||
3. Inserts rainfall, water level, and siren data into PostgreSQL
|
||||
4. Triggers push notifications when thresholds are exceeded
|
||||
|
||||
## How It Runs
|
||||
The script is designed to run on a schedule (e.g., cron job), processing new files uploaded by remote telemetry stations throughout the day.
|
||||
|
||||
The script is designed to be run on a **schedule** (likely cron job), processing new data files uploaded by remote telemetry stations throughout the day.
|
||||
## Environment Variables
|
||||
|
||||
All credentials come from environment variables with defaults:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `FTP_SERVER` | `myvscada.com` | FTP server hostname |
|
||||
| `FTP_USERNAME` | `tck` | FTP login username |
|
||||
| `FTP_PASSWORD` | *(empty)* | FTP login password |
|
||||
| `PG_HOST` | `postgres` | PostgreSQL host (`postgres` on Docker network, `localhost` on Docker host) |
|
||||
| `PG_DATABASE` | `sides_db` | PostgreSQL database name |
|
||||
| `PG_USER` | `tck` | PostgreSQL username |
|
||||
| `PG_PASSWORD` | *(empty)* | PostgreSQL password |
|
||||
|
||||
## FTP Connection
|
||||
|
||||
The script connects to the configured FTP server and navigates to today's date folder:
|
||||
|
||||
```
|
||||
Server: myvscada.com
|
||||
Username: tck
|
||||
Password: tck6789
|
||||
Path: files/SIDES/SUCCESS/{year}/{month}/{day}/
|
||||
```
|
||||
|
||||
The script navigates to today's date folder and lists all files.
|
||||
Example path for 21 May 2026: `files/SIDES/SUCCESS/2026/05/21/`
|
||||
|
||||
### File Filtering
|
||||
|
||||
- Skips files containing "rf" in the filename (Tideda format files)
|
||||
- Only processes files with today's date (`yymmdd` format) in the filename
|
||||
- Only processes files with today's date in `yymmdd` format in the filename
|
||||
|
||||
## CSV Format
|
||||
|
||||
Each line in the CSV file contains 37+ comma-separated columns. Key columns extracted:
|
||||
Each line in the CSV file contains 37+ comma-separated columns. The script requires at least 25 columns per line; shorter lines are skipped.
|
||||
|
||||
| Column Index | Field | Description |
|
||||
|-------------|-------|-------------|
|
||||
| 1 | `station_id` | Station identifier (e.g., KBLG0026) |
|
||||
Key columns extracted:
|
||||
|
||||
| Column Index | Variable | Description |
|
||||
|-------------|----------|-------------|
|
||||
| 1 | `station_id` | Station identifier (e.g., `KBLG0026`) |
|
||||
| 4 | `timestamp` | Timestamp in `yymmddHHMMSS` format |
|
||||
| 6 | `battery` | Battery voltage |
|
||||
| 15 | `wlalert` | Water level alert threshold |
|
||||
| 16 | `wlwarn` | Water level warning threshold |
|
||||
| 17 | `wldgr` | Water level danger threshold |
|
||||
| 18 | `sirenid` | Siren identifier |
|
||||
| 19 | `siren` | Siren status (`H`=Danger/High, `L`=Warning/Low, `N`=Normal) |
|
||||
| 19 | `siren` | Siren status: `H`=Danger, `L`=Warning, `N`=Normal |
|
||||
| 21 | `anncumm` | Annual cumulative rainfall |
|
||||
| 22 | `dailycumm` | Daily cumulative rainfall |
|
||||
| 23 | `hourlycumm` | Hourly rainfall |
|
||||
@@ -51,39 +65,129 @@ Each line in the CSV file contains 37+ comma-separated columns. Key columns extr
|
||||
|
||||
## Data Processing Logic
|
||||
|
||||
The `process_line()` function handles each CSV line. All database operations use `psycopg2` parameterized queries to prevent SQL injection.
|
||||
|
||||
### Rainfall Data
|
||||
|
||||
1. Check if `dailycumm` or `hourlycumm` is not null
|
||||
2. Check if record already exists for this station+timestamp
|
||||
3. If new, INSERT into `rainfall` table
|
||||
4. **Threshold check**: If `hourlycumm >= 30`:
|
||||
- `30 <= hourly < 60` → **Warning** level
|
||||
- `hourly >= 60` → **Danger** level
|
||||
- INSERT into `notification` table
|
||||
- Send push notification via Laravel API
|
||||
```
|
||||
CSV line
|
||||
|
|
||||
v
|
||||
[dailycumm or hourlycumm not null?]
|
||||
|-- No --> skip
|
||||
|-- Yes
|
||||
v
|
||||
[record exists for station+timestamp?]
|
||||
|-- Yes --> skip insert
|
||||
|-- No --> INSERT INTO rainfall (stationid, timestamp, anncum, daily, hourly, currentrf, battery)
|
||||
|
|
||||
v
|
||||
[hourlycumm >= 30?]
|
||||
|-- No --> done
|
||||
|-- Yes
|
||||
|
|
||||
v
|
||||
Determine level:
|
||||
30 <= hourly < 60 --> Warning
|
||||
hourly >= 60 --> Danger
|
||||
|
|
||||
v
|
||||
[notification exists for station+timestamp+stationtype=1?]
|
||||
|-- Yes --> skip insert
|
||||
|-- No --> INSERT INTO notification (stationid, timestamp, stationtype=1, level, active_time)
|
||||
|
|
||||
v
|
||||
send_alert_to_laravel(station_id, level, 1)
|
||||
```
|
||||
|
||||
### Water Level Data
|
||||
|
||||
1. Check if `waterlevel` is not null
|
||||
2. Check if record already exists for this station+datetime
|
||||
3. If new, INSERT into `waterlevel` table (with alert/warning/danger thresholds)
|
||||
4. **Threshold check**: If `waterlevel >= alert`:
|
||||
- `alert <= wl < warning` → **Alert** level
|
||||
- `warning <= wl < danger` → **Warning** level
|
||||
- `wl >= danger` → **Danger** level
|
||||
- INSERT into `notification` table
|
||||
- Send push notification via Laravel API
|
||||
```
|
||||
CSV line
|
||||
|
|
||||
v
|
||||
[waterlevel not null?]
|
||||
|-- No --> skip
|
||||
|-- Yes
|
||||
v
|
||||
[record exists for station+datetime?]
|
||||
|-- Yes --> skip insert
|
||||
|-- No --> INSERT INTO waterlevel (stationid, datetime, waterlevel, alert, warning, danger)
|
||||
|
|
||||
v
|
||||
[waterlevel >= wlalert?]
|
||||
|-- No --> done
|
||||
|-- Yes
|
||||
|
|
||||
v
|
||||
Determine level:
|
||||
alert <= wl < warning --> Alert
|
||||
warning <= wl < danger --> Warning
|
||||
wl >= danger --> Danger
|
||||
|
|
||||
v
|
||||
[notification exists for station+timestamp+stationtype=2?]
|
||||
|-- Yes --> skip insert
|
||||
|-- No --> INSERT INTO notification (stationid, timestamp, stationtype=2, level, active_time)
|
||||
|
|
||||
v
|
||||
send_alert_to_laravel(station_id, level, 2)
|
||||
```
|
||||
|
||||
### Siren Data
|
||||
|
||||
1. Check if `sirenid` is not null
|
||||
2. Check if record already exists for this station+active_time
|
||||
3. Determine level from siren status:
|
||||
- `H` → **Danger**
|
||||
- `L` → **Warning**
|
||||
- `N` → **Normal**
|
||||
4. INSERT into `siren` table
|
||||
5. If level is not Normal, send push notification via Laravel API
|
||||
```
|
||||
CSV line
|
||||
|
|
||||
v
|
||||
[sirenid not null?]
|
||||
|-- No --> skip
|
||||
|-- Yes
|
||||
v
|
||||
Determine level from siren status:
|
||||
H --> Danger
|
||||
L --> Warning
|
||||
N --> Normal
|
||||
|
|
||||
v
|
||||
[record exists for station+active_time?]
|
||||
|-- Yes --> skip insert
|
||||
|-- No --> INSERT INTO siren (stationid, stationtype=3, active_time, level)
|
||||
|
|
||||
v
|
||||
[level != Normal?]
|
||||
|-- No --> done
|
||||
|-- Yes --> send_alert_to_laravel(station_id, level, 3)
|
||||
```
|
||||
|
||||
## Station Types
|
||||
|
||||
The `stationtype` integer identifies the data source in notifications and alerts:
|
||||
|
||||
| stationtype | Data Source |
|
||||
|------------|-------------|
|
||||
| 1 | Rainfall |
|
||||
| 2 | Water Level |
|
||||
| 3 | Siren |
|
||||
|
||||
## Threshold Summary
|
||||
|
||||
### Rainfall Thresholds
|
||||
|
||||
| Condition | Level |
|
||||
|-----------|-------|
|
||||
| `hourlycumm >= 30` and `< 60` | Warning |
|
||||
| `hourlycumm >= 60` | Danger |
|
||||
|
||||
### Water Level Thresholds
|
||||
|
||||
Thresholds are per-station values from CSV columns 15-17.
|
||||
|
||||
| Condition | Level |
|
||||
|-----------|-------|
|
||||
| `waterlevel >= wlalert` and `< wlwarn` | Alert |
|
||||
| `waterlevel >= wlwarn` and `< wldgr` | Warning |
|
||||
| `waterlevel >= wldgr` | Danger |
|
||||
|
||||
## Alert Notification Flow
|
||||
|
||||
@@ -99,45 +203,57 @@ def send_alert_to_laravel(stationid, level, stationtype):
|
||||
response = requests.post("https://sides.tck.com.my/api/alert", json=payload, timeout=5)
|
||||
```
|
||||
|
||||
This hits the Laravel `AlertController` which:
|
||||
1. Builds notification title/body based on station type and level
|
||||
2. Calls `FcmService::sendToTopic()` which:
|
||||
- Reads Firebase service account credentials
|
||||
- Gets an OAuth2 access token from Google
|
||||
- Sends FCM message to topic (e.g., `rainfall_warning`)
|
||||
- Push notification arrives on subscribed mobile devices
|
||||
<!-- VERIFY: Alert API endpoint is https://sides.tck.com.my/api/alert -->
|
||||
|
||||
## PostgreSQL Connection
|
||||
The full notification chain:
|
||||
|
||||
The script connects directly to PostgreSQL:
|
||||
|
||||
```python
|
||||
pg_host = "192.168.0.211"
|
||||
pg_database = "sides_db"
|
||||
pg_user = "tck"
|
||||
pg_password = "projectdev##1"
|
||||
```
|
||||
sidesdecode.py
|
||||
| POST /api/alert {stationid, level, stationtype}
|
||||
v
|
||||
AlertController (Laravel)
|
||||
| Builds notification title/body from station type and level
|
||||
v
|
||||
FcmService::sendToTopic()
|
||||
| Routes to FCM topic by stationtype and level
|
||||
| (e.g., rainfall_warning, rainfall_danger, waterlevel_alert, waterlevel_danger)
|
||||
v
|
||||
Firebase Cloud Messaging
|
||||
| Push notification delivered to subscribed mobile devices
|
||||
v
|
||||
Mobile App
|
||||
```
|
||||
|
||||
**Note**: This is a hardcoded external IP, not using the Docker container. The database name is `sides_db` (different from the Docker `.env` which uses `tckdev`).
|
||||
## Deduplication
|
||||
|
||||
Before every INSERT, the script checks for an existing record:
|
||||
|
||||
- **Rainfall**: `SELECT COUNT(*) FROM rainfall WHERE stationid = %s AND timestamp = %s`
|
||||
- **Water Level**: `SELECT COUNT(*) FROM waterlevel WHERE stationid = %s AND datetime = %s`
|
||||
- **Siren**: `SELECT COUNT(*) FROM siren WHERE stationid = %s AND active_time = %s`
|
||||
- **Notification**: `SELECT COUNT(*) FROM notification WHERE stationid = %s AND timestamp = %s AND stationtype = %s`
|
||||
|
||||
If a record exists, the INSERT is skipped. This makes the script safe to re-run for the same time period.
|
||||
|
||||
## File Management (Commented Out)
|
||||
|
||||
The script contains (commented out) functions for:
|
||||
- `move_to_error_folder()` — Move malformed files to an FTP error folder
|
||||
- `move_to_success_folder()` — Move processed files to a success archive folder
|
||||
Two file management functions are defined but currently commented out:
|
||||
|
||||
These are currently disabled — files remain in the source folder after processing.
|
||||
- `move_to_error_folder()` -- Move malformed files to an FTP error subfolder
|
||||
- `move_to_success_folder()` -- Move processed files to a success archive subfolder
|
||||
|
||||
## Log Files
|
||||
When active, these functions create the target FTP directory if it does not exist, upload the file, and delete the original. Currently disabled -- processed files remain in the source FTP folder after processing.
|
||||
|
||||
- `autoscript/sidesdecode.log` — Processing output
|
||||
- `autoscript/sidesdecode_error.log` — Error output
|
||||
## Error Handling
|
||||
|
||||
- Malformed lines (fewer than 25 columns) are skipped with a log message
|
||||
- Any exception during line processing triggers `conn.rollback()` to prevent partial inserts
|
||||
- Alert sending failures are caught and logged but do not halt processing
|
||||
- The script closes both FTP and database connections on exit
|
||||
|
||||
## Known Issues
|
||||
|
||||
1. **Hardcoded credentials** — FTP and PostgreSQL credentials are embedded in the script
|
||||
2. **No deduplication beyond same-timestamp** — If the script runs twice, it skips exact duplicates but has no broader deduplication
|
||||
3. **Commented out file management** — Processed files are not moved/archived
|
||||
4. **Water level alert sends `stationtype=1`** instead of `2` (likely a bug at line 378)
|
||||
5. **No error recovery** — If the script crashes mid-processing, some data may be partially inserted
|
||||
6. **No connection pooling** — New FTP and database connections each run
|
||||
1. **No file archiving** -- `move_to_error_folder` and `move_to_success_folder` are commented out, so files are never moved after processing
|
||||
2. **No broader deduplication** -- Deduplication only checks exact station+timestamp matches; no handling for near-duplicate records
|
||||
3. **No connection retry** -- If FTP or PostgreSQL is unreachable, the script fails immediately with no retry logic
|
||||
4. **Partial processing risk** -- If the script crashes mid-file, lines already processed are committed but remaining lines are lost until the next run
|
||||
|
||||
Reference in New Issue
Block a user