`LookupError: unknown encoding: ascii` when parsing Buck files in virtualenv

Created by: jiangty-addepar

Problem

Reproduction steps on current Buck master

Example commit: https://github.com/facebook/buck/commit/a20f04a8c9202abd1207abd425d89a10522590b8

In a buck project that uses the Python DSL, find a BUCK file that uses glob: for example, foo/BUCK.
Enter a virtual environment that uses Python 2.7. If your default Python is already 2.7, you can just do

virtualenv -q venv
source venv/bin/activate

Parse that file with buck, for example by doing

buck targets //foo:

These steps worked on both Ubuntu and Mac OS X.

Result:

The following error is thrown:

Buck wasn't able to parse /repo/foo/BUCK:
LookupError: unknown encoding: ascii
Call stack:
  File "/repo/foo/BUCK", line 4
    ['src/main/java/**/*.java'],
  File "/repo/.buckd/resources/fff631c0eeb99191988f1cf0304b26113626ffbe/pathlib.py", line 984, in __new__
    self = cls._from_parts(args, init=False)
  File "/repo/.buckd/resources/fff631c0eeb99191988f1cf0304b26113626ffbe/pathlib.py", line 627, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/repo/.buckd/resources/fff631c0eeb99191988f1cf0304b26113626ffbe/pathlib.py", line 620, in _parse_args
    return cls._flavour.parse_parts(parts)
  File "/repo/.buckd/resources/fff631c0eeb99191988f1cf0304b26113626ffbe/pathlib.py", line 75, in parse_parts
    parts = _py2_fsencode(parts)
  File "/repo/.buckd/resources/fff631c0eeb99191988f1cf0304b26113626ffbe/pathlib.py", line 58, in _py2_fsencode
    else part for part in parts]

Expected: buck targets succeeds.

Investigation

After doing a git bisect, we identified https://github.com/facebook/buck/commit/f947921a1afb7f766021bb9ad977de4b70e5d87d as the offending commit. We found that deleting all 3 of the .encode("ascii") calls was the cause.

In fact, if we insert the line "123".encode("ascii") at (almost) any point in buck.py---for example, at the top of the file, or at a random line in the process_with_diagnostics method---the error is fixed.

It's not clear what the root cause is, but apparently, unless we call .encode("ascii") sometime early in the program's execution, we get the above error.

Also, for example, calling .encode("utf-8") won't fix it.

I'm not sure what a good fix will be, but we're just going to do https://github.com/Addepar/buck/commit/6a4c88c439b00206db5128ae736c288b52e30170 on our fork for now as a workaround.