How to git diff epub files
Track epub, docx, pptx, and sqlite files in Git
I'm not a big fan of binary formats because it's hard to track changes to them. Fortunately, there are ways to handle binary files like epub, docx, pptx, and even SQLite databases in Git and make them more manageable. In this article, we will explore techniques to git diff these binary formats and enable version control for them.
Git Configuration for Different Binary Formats
Tracking ZIP-Based File Formats
As mentioned in an article from tante.cc, some binary formats like docx and pptx are essentially zipped packages of XML files, which means some form of diffing is possible. To enable Git to handle these formats, follow these steps:
Windows users: In Windows, the path ~/.config/git/attributes is equivalent to
-
Open your
~/.gitconfig
file (create it if not existing already) and add the following stanza:[diff "zip"] textconv = unzip -c -a
This configuration tells Git to use the
unzip
command to convert the zipfile into ASCII text when performing diffs. -
Create or modify the file
YOUR_GIT_REPOSITORY/.gitattributes
or~/.config/git/attributes
and add the following lines:*.pptx diff=zip *.docx diff=zip *.epub diff=zip *.odt diff=zip
These lines specify which file extensions should be treated as zip-diffing formats by Git.
Now, when you use git diff
, Git will automatically unzip these files and show you the ASCII differences, making it easier to track changes in these binary formats.
Handling Older Microsoft Office Files
For Microsoft Office files like .doc, .xls, and .ppt, you can configure Git to handle them as well. Here are the steps:
-
Add the following lines into your
$HOME/.config/git/attributes
file:*.doc diff=doc *.xls diff=xls *.xlsx diff=xls *.ppt diff=ppt
This associates these file extensions with specific diffing configurations.
-
Add the following to your global configuration file at
$HOME/.gitconfig
or$HOME/.config/git/config
:[diff "word"] textconv = catdoc binary = true [diff "xls"] textconv = xls2csv binary = true [diff "ppt"] textconv = catppt binary = true
These configurations use different text converters for each file format to make diffs more readable.
Managing Older Open Office Files
If you are using Open Office, you can follow a similar approach as with Microsoft Office files. Here's how:
-
In your attributes file, add:
*.odt diff=odt
This associates the .odt file extension with the diffing configuration.
-
In your config file, add:
[diff "odt"] textconv = odt2txt binary = true
This configuration uses
odt2txt
to convert .odt files for diffing purposes.
Handling PDF Files
Even PDF files can be managed with Git. To achieve this, make the following changes in your attributes file:
*.pdf diff=pdf
And in your config file:
[diff "pdf"]
textconv = pdf2txt.py
binary = true
This configuration allows Git to extract text from PDFs for diffs.
Tracking SQLite Database Changes
SQLite databases are binary files, and tracking changes to them can be challenging. However, you can still get meaningful diffs for SQLite databases in Git. Here's how:
-
Add a diff type called "sqlite3" to your config:
git config diff.sqlite3.binary true git config diff.sqlite3.textconv "echo .dump | sqlite3"
Alternatively, add this snippet to your
~/.gitconfig
or.git/config
in your repository:[diff "sqlite3"] binary = true textconv = "echo .dump | sqlite3"
-
Create a file called
.gitattributes
if it's not already present and add this line:*.sqlite diff=sqlite3
Now, when you run git diff
or any other Git command that produces a diff on an SQLite file, you'll see a nicely formatted diff of the changes, making it easier to track changes in SQLite databases.
Conclusion
In this article, we've explored various techniques to handle binary formats such as epub, docx, pptx, and SQLite databases in Git. By configuring Git to use specific text converters and diffing strategies, you can make these binary files more manageable and gain better visibility into their changes. With these approaches, you can effectively track and version control a wider range of file formats in your Git repositories.
For more information and further customization options, refer to the provided sources and Git documentation: