Deprecated method to delete node_modules from github repository commit history

WARNING: git filter-branch is no longer officially recommended.

The official recommendation is to use git-filter-repo.

see André Anjos’ answer for details


If you are here to copy-paste code:

This is an example which removes node_modules from history

git filter-branch --tree-filter "rm -rf node_modules" --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
echo node_modules/ >> .gitignore
git add .gitignore
git commit -m 'Removing node_modules from git history'
git gc
git push origin master --force

What git actually does:

The first line iterates through all references on the same tree (--tree-filter) as HEAD (your current branch), running the command rm -rf node_modules. This command deletes the node_modules folder (-r, without -r, rm won’t delete folders), with no prompt given to the user (-f). The added --prune-empty deletes useless (not changing anything) commits recursively.

The second line deletes the reference to that old branch.

The rest of the commands are relatively straightforward.

André Anjos’ answer using git-filter-repo

required git-filter-repo

It appears that the up-to-date answer to this is to not use filter-branch directly (at least git itself does not recommend it anymore), and defer that work to an external tool. In particular, git-filter-repo is currently recommended. The author of that tool provides arguments on why using filter-branch directly can lead to issues.

Most of the multi-line scripts above to remove node_modules from the history could be re-written as:

git-filter-repo --path node_modules --invert-paths --refs BRANCH_NAME
git remote add origin https://github.com/YOUR_GITHUB_USERNAME/YOUR_REPOSITORY_NAME
git fetch --all --prune
git push --force-with-lease -u origin BRANCH_NAME

To remove node_modules recursively within all subfolders:

git-filter-repo --invert-paths --path-glob "**/node_modules" --refs BRANCH_NAME
git remote add origin https://github.com/YOUR_GITHUB_USERNAME/YOUR_REPOSITORY_NAME
git fetch --all --prune
git push --force-with-lease -u origin BRANCH_NAME

The tool is more powerful than just that, apparently. You can apply filters by author, email, refname and more (full manpage here). Furthermore, it is fast. Installation is easy - it is distributed in a variety of formats.

article incoming terms:

  • git-filter-repo delete node_modules recursively
  • reduce size github repository by deleting node_modules in whole repo

removes node_modules from git history | WMI - https://github.com/dimaslanjaka/source-posts/assets/12471057/40dd6736-8c54-4039-bce4-cbddd5984f82