Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #1011
Closed
Open
Issue created Jan 09, 2019 by Wataru Ashihara@wataash

Want delimiter to be shown on exception

# colA,colB
# aaaaa...aaaaa zzzzz...zzzzz  \
# ...                           } 10 or 100 rows
# aaaaa...aaaaa zzzzz...zzzzz  /
#
# \___________/ \___________/
#  1000chars     1000chars

# 10 rows
# "," is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(10)]" | csvstat
# => ok

# 100 rows
# " " is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(100)]" | csvstat
# => Row 0 has 3 values, but Table only has 2 columns.

In the latter case, sample is trimmed, losing the header colA,colB, thus white space " " is used as the delimiter.

It was tough for me to figure out this behavior. So how about showing "what delimiter is used" in:

  1. Debug output
$ csvstat -v ...
inferred delimiter: ' '
  1. Error message
$ csvstat -v ...
Row 0 has 3 values, but Table only has 2 columns (delimiter: ' ').

and, how about showing warning of excessing SNIFF_LIMIT?:

$ csvstat -v ...
warning: input (XXX bytes) exceeds SNIFF_LIMIT (YYY bytes), delimiter guessing may be incorrect (NOTE: SNIFF_LIMIT can be changed by -y flag)
warning: guessed delimiter: ' '
Row 0 has 3 values, but Table only has 2 columns.
Assignee
Assign to
Time tracking