Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #1114
Closed
Open
Issue created Mar 30, 2021 by Csaba Szilveszter@csabaszilveszter

csvsql: ability to pass arguments to create_engine

On macOS and using:

pyodbc==4.0.30
SQLAlchemy==1.4.3
csvkit==1.0.5

.. and the Microsoft ODBC Driver 17 for SQL Server (x64)

Inserts are really-really slow.

For example:

$ time csvsql --db 'mssql+pyodbc:///?odbc_connect=...' --db-schema foo --table bar --insert 1000_rows.csv

real	0m56.999s
user	0m0.662s
sys	0m0.177s

Or:

$ time csvsql --db 'mssql+pyodbc:///?odbc_connect=...' --db-schema foo --table bar --insert 10000_rows.csv

real	9m17.993s
user	0m2.520s
sys	0m0.771s

Whereas with the mysql-connector-python==8.0.23 they work as expected:

$ time csvsql --db 'mysql+mysqlconnector://...' --db-schema foo --table bar --insert 284224_rows.csv

real	0m9.189s
user	0m7.366s
sys	0m0.301s

Obviously I haven't tried to time an insert of 284224 rows into the SQL Server, I don't have that much time 😜. But based on the above data it would take an estimated 280 minutes. 😢

But all of this was quite easy to fix by adding fast_executemany=True to the create_engine call here.

But arguments to the create_engine call can't be passed to csvsql from the command line.

So I'd suggest another argument to csvsql, maybe something called --engine-kwargs or something like that, that should take a something-separated list of key=value pairs that could be passed as create_engine(self.args.connection_string, **self.args.engine_kwargs). I'm sure others would also appreciate the power this would give to the user but then it must be specified in the documentation that some engines only accept some arguments.

Or am I missing something and there's actually a way to pass arguments to the create_engine call? Or is there another way to speed things up?

Assignee
Assign to
Time tracking