[Feature request / discussion] Buck performance bottle-neck in java_binary

Created by: romanoid

When engineers build backend code, this is mostly some kind of java_binary they can locally test/debug.

While most of the graph is cacheable / incremental, java_binary needs to fully rebuild every time. Ending up being a bottle-neck taking significant time for large enough binaries.

Sources of bottle-neck: – When combining multiple jars into a single fat jar, buck inflates individual entries and deflates them again. – java_binary needs to do entry sorting / deduplication, making any parallel processing much more difficult.

Couple of notes: – Buck has custom deflation / zip writing code. (https://github.com/facebook/buck/blob/74a4ea8f385ad8eb17568c6763930e6bea61352b/src/com/facebook/buck/util/zip/CustomJarOutputStream.java) – Buck uses default java API for zip file reading, even more, comments seem to include hints that this workflow was designed before deflation/writing was re-implemented (https://github.com/facebook/buck/blob/a6c7d7391a7a8e575c6372cac6690a240b5907ea/src/com/facebook/buck/util/zip/ZipFileJarEntryContainer.java#L76) – Default java API does not allow access to raw entry as stored, but apache commons compression does (https://github.com/apache/commons-compress/blob/master/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java#L510), since apache commons re-implements the logic, it may not 100% match.

It seems it is possible to update the fat jar logic to transfer deflated entries directly to output jar, yet I may be missing some corner cases. Please advice.