fix: seeder idempotent with firstOrCreate

Use firstOrCreate instead of create so db:seed can run safely
on container restart without duplicate key violation.
This commit is contained in:
root
2026-05-21 02:31:47 +08:00
parent bb8d951287
commit 9122deaacd
11 changed files with 1935 additions and 0 deletions

143
docs/04-DATA-PIPELINE.md Normal file
View File

@@ -0,0 +1,143 @@
# Data Pipeline: Python Autoscript
## Overview
The file `autoscript/sidesdecode.py` is the data ingestion pipeline that:
1. Connects to an FTP server where telemetry stations upload CSV data
2. Downloads and parses CSV files for the current day
3. Inserts rainfall, water level, and siren data into PostgreSQL
4. Triggers push notifications when thresholds are exceeded
## How It Runs
The script is designed to be run on a **schedule** (likely cron job), processing new data files uploaded by remote telemetry stations throughout the day.
## FTP Connection
```
Server: myvscada.com
Username: tck
Password: tck6789
Path: files/SIDES/SUCCESS/{year}/{month}/{day}/
```
The script navigates to today's date folder and lists all files.
### File Filtering
- Skips files containing "rf" in the filename (Tideda format files)
- Only processes files with today's date (`yymmdd` format) in the filename
## CSV Format
Each line in the CSV file contains 37+ comma-separated columns. Key columns extracted:
| Column Index | Field | Description |
|-------------|-------|-------------|
| 1 | `station_id` | Station identifier (e.g., KBLG0026) |
| 4 | `timestamp` | Timestamp in `yymmddHHMMSS` format |
| 6 | `battery` | Battery voltage |
| 15 | `wlalert` | Water level alert threshold |
| 16 | `wlwarn` | Water level warning threshold |
| 17 | `wldgr` | Water level danger threshold |
| 18 | `sirenid` | Siren identifier |
| 19 | `siren` | Siren status (`H`=Danger/High, `L`=Warning/Low, `N`=Normal) |
| 21 | `anncumm` | Annual cumulative rainfall |
| 22 | `dailycumm` | Daily cumulative rainfall |
| 23 | `hourlycumm` | Hourly rainfall |
| 24 | `currrf` | Current rainfall |
| 36 | `waterlevel` | Current water level reading |
## Data Processing Logic
### Rainfall Data
1. Check if `dailycumm` or `hourlycumm` is not null
2. Check if record already exists for this station+timestamp
3. If new, INSERT into `rainfall` table
4. **Threshold check**: If `hourlycumm >= 30`:
- `30 <= hourly < 60`**Warning** level
- `hourly >= 60`**Danger** level
- INSERT into `notification` table
- Send push notification via Laravel API
### Water Level Data
1. Check if `waterlevel` is not null
2. Check if record already exists for this station+datetime
3. If new, INSERT into `waterlevel` table (with alert/warning/danger thresholds)
4. **Threshold check**: If `waterlevel >= alert`:
- `alert <= wl < warning`**Alert** level
- `warning <= wl < danger`**Warning** level
- `wl >= danger`**Danger** level
- INSERT into `notification` table
- Send push notification via Laravel API
### Siren Data
1. Check if `sirenid` is not null
2. Check if record already exists for this station+active_time
3. Determine level from siren status:
- `H`**Danger**
- `L`**Warning**
- `N`**Normal**
4. INSERT into `siren` table
5. If level is not Normal, send push notification via Laravel API
## Alert Notification Flow
When a threshold is triggered, the script calls `send_alert_to_laravel()`:
```python
def send_alert_to_laravel(stationid, level, stationtype):
payload = {
"stationid": stationid,
"level": level,
"stationtype": stationtype, # 1=rainfall, 2=waterlevel, 3=siren
}
response = requests.post("https://sides.tck.com.my/api/alert", json=payload, timeout=5)
```
This hits the Laravel `AlertController` which:
1. Builds notification title/body based on station type and level
2. Calls `FcmService::sendToTopic()` which:
- Reads Firebase service account credentials
- Gets an OAuth2 access token from Google
- Sends FCM message to topic (e.g., `rainfall_warning`)
- Push notification arrives on subscribed mobile devices
## PostgreSQL Connection
The script connects directly to PostgreSQL:
```python
pg_host = "192.168.0.211"
pg_database = "sides_db"
pg_user = "tck"
pg_password = "projectdev##1"
```
**Note**: This is a hardcoded external IP, not using the Docker container. The database name is `sides_db` (different from the Docker `.env` which uses `tckdev`).
## File Management (Commented Out)
The script contains (commented out) functions for:
- `move_to_error_folder()` — Move malformed files to an FTP error folder
- `move_to_success_folder()` — Move processed files to a success archive folder
These are currently disabled — files remain in the source folder after processing.
## Log Files
- `autoscript/sidesdecode.log` — Processing output
- `autoscript/sidesdecode_error.log` — Error output
## Known Issues
1. **Hardcoded credentials** — FTP and PostgreSQL credentials are embedded in the script
2. **No deduplication beyond same-timestamp** — If the script runs twice, it skips exact duplicates but has no broader deduplication
3. **Commented out file management** — Processed files are not moved/archived
4. **Water level alert sends `stationtype=1`** instead of `2` (likely a bug at line 378)
5. **No error recovery** — If the script crashes mid-processing, some data may be partially inserted
6. **No connection pooling** — New FTP and database connections each run