csvsql: ability to pass arguments to create_engine
On macOS and using:
pyodbc==4.0.30
SQLAlchemy==1.4.3
csvkit==1.0.5
.. and the Microsoft ODBC Driver 17 for SQL Server (x64)
Inserts are really-really slow.
For example:
$ time csvsql --db 'mssql+pyodbc:///?odbc_connect=...' --db-schema foo --table bar --insert 1000_rows.csv
real 0m56.999s
user 0m0.662s
sys 0m0.177s
Or:
$ time csvsql --db 'mssql+pyodbc:///?odbc_connect=...' --db-schema foo --table bar --insert 10000_rows.csv
real 9m17.993s
user 0m2.520s
sys 0m0.771s
Whereas with the mysql-connector-python==8.0.23
they work as expected:
$ time csvsql --db 'mysql+mysqlconnector://...' --db-schema foo --table bar --insert 284224_rows.csv
real 0m9.189s
user 0m7.366s
sys 0m0.301s
Obviously I haven't tried to time an insert of 284224 rows into the SQL Server, I don't have that much time
But all of this was quite easy to fix by adding fast_executemany=True
to the create_engine
call here.
But arguments to the create_engine
call can't be passed to csvsql
from the command line.
So I'd suggest another argument to csvsql
, maybe something called --engine-kwargs
or something like that, that should take a something-separated list of key=value
pairs that could be passed as create_engine(self.args.connection_string, **self.args.engine_kwargs)
.
I'm sure others would also appreciate the power this would give to the user but then it must be specified in the documentation that some engines only accept some arguments.
Or am I missing something and there's actually a way to pass arguments to the create_engine
call? Or is there another way to speed things up?