Hello GMTers,
I’ve encountered a significant issue while using the gmt sample1d
command to resample a tidal gauge dataset. Despite defining a specific time range and interval, the output file is excessively large (13 GB) compared to the input file (75 MB). Additionally, an analysis of the output reveals an unexpectedly high number of duplicate timestamps.
Here is the command I executed:
gmt sample1d dummy2 -gx300s -T2018-09-14T07:25:00/2024-02-05T09:00:00/300s -Fa -V > dummy3
Problem Details:
- Input File (
dummy2
):
- Contains a tidal gauge time series (sample rate 1 min for epoch 2018-09-14T07:25:00/2024-02-05T09:00:00).
- Record is mostly continuous within the defined time range with some manageable gaps and out of order periods.
- Output File (
dummy3
):
- File size: 13 GB.
- Total lines: 569,083,143.
- Unique timestamps: 990,725 (determined via
cut -d' ' -f1 dummy3 | sort | uniq | wc -l
).
- Expected Behavior:
- The defined time range (2018-09-14 to 2024-02-05) with a 5-minute interval corresponds to 567,379 unique timestamps.
- The output should align with this interval and exclude duplicate or excessive lines.
Observations:
- The verbose output during execution indicates multiple segment headers due to detected data gaps:
sample1d [INFORMATION]: Data gap detected via -g; Segment header inserted near/at line # XXXXXXX
- This suggests that the gaps in the input file might be improperly handled or lead to unnecessary duplication.
Actions Taken:
- Verified the number of unique timestamps:
cut -d' ' -f1 dummy3 | sort | uniq | wc -l
Result: 990,725 unique timestamps.
2. Inspected the input file for discontinuities or anomalies (e.g., unexpected gaps or NaN values).
3. Attempted to identify lines with duplicate timestamps but found the scale of the file challenging to process efficiently.
Questions:
Any ideas about why the output file contains such a large number of duplicate lines, far exceeding the expected unique timestamps?
Any insights or recommendations for addressing this issue would be greatly appreciated.
Thank you in advance for your help!
Best regards,