fix: seeder idempotent with firstOrCreate
Use firstOrCreate instead of create so db:seed can run safely on container restart without duplicate key violation.
This commit is contained in:
143
docs/04-DATA-PIPELINE.md
Normal file
143
docs/04-DATA-PIPELINE.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Data Pipeline: Python Autoscript
|
||||
|
||||
## Overview
|
||||
|
||||
The file `autoscript/sidesdecode.py` is the data ingestion pipeline that:
|
||||
|
||||
1. Connects to an FTP server where telemetry stations upload CSV data
|
||||
2. Downloads and parses CSV files for the current day
|
||||
3. Inserts rainfall, water level, and siren data into PostgreSQL
|
||||
4. Triggers push notifications when thresholds are exceeded
|
||||
|
||||
## How It Runs
|
||||
|
||||
The script is designed to be run on a **schedule** (likely cron job), processing new data files uploaded by remote telemetry stations throughout the day.
|
||||
|
||||
## FTP Connection
|
||||
|
||||
```
|
||||
Server: myvscada.com
|
||||
Username: tck
|
||||
Password: tck6789
|
||||
Path: files/SIDES/SUCCESS/{year}/{month}/{day}/
|
||||
```
|
||||
|
||||
The script navigates to today's date folder and lists all files.
|
||||
|
||||
### File Filtering
|
||||
|
||||
- Skips files containing "rf" in the filename (Tideda format files)
|
||||
- Only processes files with today's date (`yymmdd` format) in the filename
|
||||
|
||||
## CSV Format
|
||||
|
||||
Each line in the CSV file contains 37+ comma-separated columns. Key columns extracted:
|
||||
|
||||
| Column Index | Field | Description |
|
||||
|-------------|-------|-------------|
|
||||
| 1 | `station_id` | Station identifier (e.g., KBLG0026) |
|
||||
| 4 | `timestamp` | Timestamp in `yymmddHHMMSS` format |
|
||||
| 6 | `battery` | Battery voltage |
|
||||
| 15 | `wlalert` | Water level alert threshold |
|
||||
| 16 | `wlwarn` | Water level warning threshold |
|
||||
| 17 | `wldgr` | Water level danger threshold |
|
||||
| 18 | `sirenid` | Siren identifier |
|
||||
| 19 | `siren` | Siren status (`H`=Danger/High, `L`=Warning/Low, `N`=Normal) |
|
||||
| 21 | `anncumm` | Annual cumulative rainfall |
|
||||
| 22 | `dailycumm` | Daily cumulative rainfall |
|
||||
| 23 | `hourlycumm` | Hourly rainfall |
|
||||
| 24 | `currrf` | Current rainfall |
|
||||
| 36 | `waterlevel` | Current water level reading |
|
||||
|
||||
## Data Processing Logic
|
||||
|
||||
### Rainfall Data
|
||||
|
||||
1. Check if `dailycumm` or `hourlycumm` is not null
|
||||
2. Check if record already exists for this station+timestamp
|
||||
3. If new, INSERT into `rainfall` table
|
||||
4. **Threshold check**: If `hourlycumm >= 30`:
|
||||
- `30 <= hourly < 60` → **Warning** level
|
||||
- `hourly >= 60` → **Danger** level
|
||||
- INSERT into `notification` table
|
||||
- Send push notification via Laravel API
|
||||
|
||||
### Water Level Data
|
||||
|
||||
1. Check if `waterlevel` is not null
|
||||
2. Check if record already exists for this station+datetime
|
||||
3. If new, INSERT into `waterlevel` table (with alert/warning/danger thresholds)
|
||||
4. **Threshold check**: If `waterlevel >= alert`:
|
||||
- `alert <= wl < warning` → **Alert** level
|
||||
- `warning <= wl < danger` → **Warning** level
|
||||
- `wl >= danger` → **Danger** level
|
||||
- INSERT into `notification` table
|
||||
- Send push notification via Laravel API
|
||||
|
||||
### Siren Data
|
||||
|
||||
1. Check if `sirenid` is not null
|
||||
2. Check if record already exists for this station+active_time
|
||||
3. Determine level from siren status:
|
||||
- `H` → **Danger**
|
||||
- `L` → **Warning**
|
||||
- `N` → **Normal**
|
||||
4. INSERT into `siren` table
|
||||
5. If level is not Normal, send push notification via Laravel API
|
||||
|
||||
## Alert Notification Flow
|
||||
|
||||
When a threshold is triggered, the script calls `send_alert_to_laravel()`:
|
||||
|
||||
```python
|
||||
def send_alert_to_laravel(stationid, level, stationtype):
|
||||
payload = {
|
||||
"stationid": stationid,
|
||||
"level": level,
|
||||
"stationtype": stationtype, # 1=rainfall, 2=waterlevel, 3=siren
|
||||
}
|
||||
response = requests.post("https://sides.tck.com.my/api/alert", json=payload, timeout=5)
|
||||
```
|
||||
|
||||
This hits the Laravel `AlertController` which:
|
||||
1. Builds notification title/body based on station type and level
|
||||
2. Calls `FcmService::sendToTopic()` which:
|
||||
- Reads Firebase service account credentials
|
||||
- Gets an OAuth2 access token from Google
|
||||
- Sends FCM message to topic (e.g., `rainfall_warning`)
|
||||
- Push notification arrives on subscribed mobile devices
|
||||
|
||||
## PostgreSQL Connection
|
||||
|
||||
The script connects directly to PostgreSQL:
|
||||
|
||||
```python
|
||||
pg_host = "192.168.0.211"
|
||||
pg_database = "sides_db"
|
||||
pg_user = "tck"
|
||||
pg_password = "projectdev##1"
|
||||
```
|
||||
|
||||
**Note**: This is a hardcoded external IP, not using the Docker container. The database name is `sides_db` (different from the Docker `.env` which uses `tckdev`).
|
||||
|
||||
## File Management (Commented Out)
|
||||
|
||||
The script contains (commented out) functions for:
|
||||
- `move_to_error_folder()` — Move malformed files to an FTP error folder
|
||||
- `move_to_success_folder()` — Move processed files to a success archive folder
|
||||
|
||||
These are currently disabled — files remain in the source folder after processing.
|
||||
|
||||
## Log Files
|
||||
|
||||
- `autoscript/sidesdecode.log` — Processing output
|
||||
- `autoscript/sidesdecode_error.log` — Error output
|
||||
|
||||
## Known Issues
|
||||
|
||||
1. **Hardcoded credentials** — FTP and PostgreSQL credentials are embedded in the script
|
||||
2. **No deduplication beyond same-timestamp** — If the script runs twice, it skips exact duplicates but has no broader deduplication
|
||||
3. **Commented out file management** — Processed files are not moved/archived
|
||||
4. **Water level alert sends `stationtype=1`** instead of `2` (likely a bug at line 378)
|
||||
5. **No error recovery** — If the script crashes mid-processing, some data may be partially inserted
|
||||
6. **No connection pooling** — New FTP and database connections each run
|
||||
Reference in New Issue
Block a user