Want delimiter to be shown on exception
# colA,colB
# aaaaa...aaaaa zzzzz...zzzzz \
# ... } 10 or 100 rows
# aaaaa...aaaaa zzzzz...zzzzz /
#
# \___________/ \___________/
# 1000chars 1000chars
# 10 rows
# "," is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(10)]" | csvstat
# => ok
# 100 rows
# " " is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(100)]" | csvstat
# => Row 0 has 3 values, but Table only has 2 columns.
In the latter case, sample is trimmed, losing the header colA,colB
, thus white space " " is used as the delimiter.
It was tough for me to figure out this behavior. So how about showing "what delimiter is used" in:
- Debug output
$ csvstat -v ...
inferred delimiter: ' '
- Error message
$ csvstat -v ...
Row 0 has 3 values, but Table only has 2 columns (delimiter: ' ').
and, how about showing warning of excessing SNIFF_LIMIT
?:
$ csvstat -v ...
warning: input (XXX bytes) exceeds SNIFF_LIMIT (YYY bytes), delimiter guessing may be incorrect (NOTE: SNIFF_LIMIT can be changed by -y flag)
warning: guessed delimiter: ' '
Row 0 has 3 values, but Table only has 2 columns.