Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #1158
Closed
Open
Issue created Dec 23, 2021 by techhobbit@techhobbit

Encoding error on Win10 UTF8 file

There is one other report about encoding errors on Win10, but this may be different. Including more info:

Using Powershell I'm retrieving data from a site using curl.exe and saving as a UTF-16 file, which fails csv2kit conversion unless I convert to UTF-8. After reencoding with powershell 'set-content', all files succeed in json>csv conversion except this one file, which complains about a newline character. I've tried converting it multiple ways using multiple apps (including notepad, notepad++, etc) and multiple formats (UTF8/ANSI and back) but no joy on this one text file.

Calling from powershell in2csv.py 1.0.6 Python 3.10.1 Windows 10 Version 10.0.19043.1415

python.exe : Traceback (most recent call last):
At C:\...\Curl.ps1:8 char:3
+   python.exe $script -v -k items $folder$i.json > $folder$i.csv
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
 
  File "C:\...\AppData\Local\Programs\Python\Python310\Scripts\csvkit-master\csvkit\utilities\in2csv.py", line 207, in 
<module>
    launch_new_instance()
  File "C:\...\AppData\Local\Programs\Python\Python310\Scripts\csvkit-master\csvkit\utilities\in2csv.py", line 203, in 
launch_new_instance
    utility.run()
  File "C:\...\AppData\Local\Programs\Python\Python310\lib\site-packages\csvkit\cli.py", line 118, in run
    self.main()
  File "C:\...\AppData\Local\Programs\Python\Python310\Scripts\csvkit-master\csvkit\utilities\in2csv.py", line 171, in 
main
    table.to_csv(self.output_file, **self.writer_kwargs)
  File "C:\...\AppData\Local\Programs\Python\Python310\lib\site-packages\agate\table\to_csv.py", line 43, in to_csv
    writer.writerow(tuple(csv_funcs[i](d) for i, d in enumerate(row)))
  File "C:\...\AppData\Local\Programs\Python\Python310\lib\site-packages\agate\csv_py3.py", line 92, in writerow
    self.writer.writerow(row)
  File "C:\...\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2028' in position 11113: character maps to <undefined>
Assignee
Assign to
Time tracking