Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • C csvkit
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 61
    • Issues 61
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • wireservice
  • csvkit
  • Issues
  • #868
Closed
Open
Issue created Jul 24, 2017 by Administrator@rootContributor

Round-trip GeoJSON

Created by: jayvdb

I would like to be able to round-trip GeoJSON through csvkit, so that small operations can be done using csvkit in the middle, without resulting in strange diffs. https://github.com/wireservice/csvkit/pull/867 is part of this effort.

i.e. The following should have very minimal output.

$ wget https://raw.githubusercontent.com/lyzidiamond/learn-geojson/master/geojson/pdxplaces.geojson
$ in2csv -f geojson pdxplaces.geojson > pdxplaces.csv
$ csvjson --lon longitude --lat latitude --indent=2 pdxplaces.csv > pdxplaces.csv.geojson
$ diff -u pdxplaces.geojson pdxplaces.csv.geojson

Some of the diff results which need to be controllable using command line args:

  1. sequence of keys in a feature ; i.e. should geometry appear before properties or the opposite. It seems different tools make different choices for this ordering, and ideally in2csv can annotate its output with this ordering (assuming it is consistent throughout the input) so that csvjson can re-use the same ordering.
  2. do not emit empty properties. Currently they are being emitted and mostly they shouldn't be emitted. (https://github.com/wireservice/csvkit/pull/869)
  3. order the properties by key. in2csv adds a column when it sees a new property, which doesnt work well if many properties do not exist on the first Feature, but appear in latter nodes. Sorting could be a feature of in2csv or csvjson
  4. generation of the bbox. It should be possible to disable this being computed and added, and it is non-trivial to compute it for complex geojson (c.f. https://github.com/wireservice/csvkit/pull/867).
  5. other metadata (https://github.com/wireservice/csvkit/issues/870)

After simplistic solutions for those three, the diff looks like:

diff -u pdxplaces.geojson pdxplaces.csv.geojson
--- pdxplaces.geojson	2017-07-24 10:22:24.234446535 +0700
+++ pdxplaces.csv.geojson	2017-07-24 12:58:46.408943761 +0700
@@ -106,7 +106,7 @@
       "properties": {
         "Name": "place 2",
         "Contributor": "geografa",
-        "Reason": 2
+        "Reason": "2"
       },
       "geometry": {
         "type": "Point",
@@ -164,8 +164,7 @@
     {
       "type": "Feature",
       "properties": {
-        "my polygon": "it's here",
-        "Name": ""
+        "my polygon": "it's here"
       },
       "geometry": {
         "type": "Polygon",
@@ -251,8 +250,8 @@
       "type": "Feature",
       "properties": {
         "Name": "The Commons Brewery",
-        "Reason": "Farmhouse ales, duh.",
-        "Contributor": "Dillon Mahmoudi"
+        "Contributor": "Dillon Mahmoudi",
+        "Reason": "Farmhouse ales, duh."
       },
       "geometry": {
         "type": "Point",
@@ -267,7 +266,7 @@
       "properties": {
         "Name": "Kenilworth Coffeehouse",
         "Utility": "Excellent biscuits",
-        "Coffee": "Yes"
+        "Coffee": true
       },
       "geometry": {
         "type": "Point",
@@ -327,8 +326,7 @@
       "properties": {
         "Name": "Square 54",
         "Contributor": "Henrik",
-        "Reason": "Mystic zombie area",
-        "my polygon": ""
+        "Reason": "Mystic zombie area"
       },
       "geometry": {
         "type": "Polygon",

The change to "Coffee" is concerning (and maybe there is a command line arg which would prevent that), but the rest of those changes are IMO a good "linted" output.

Assignee
Assign to
Time tracking