Advanced Git Techniques

Learn powerful Git features for complex workflows and scenarios

As you become more proficient with Git, you'll encounter situations that require more advanced techniques. This section covers powerful Git features that can help you handle complex workflows and scenarios.

Advanced Rebasing Techniques

Interactive Rebase with Fixup

When you need to make changes to a previous commit, you can use the fixup option:

# Make your changes
git add .
git commit --fixup commit-hash

# Then rebase to incorporate the fixup
git rebase -i --autosquash HEAD~5

The --autosquash flag automatically moves fixup commits next to the commits they're fixing and marks them as fixup.

Rebase onto a Different Base

Sometimes you need to move a branch from one base to another:

git rebase --onto new-base old-base feature-branch

This rebases the commits between old-base and feature-branch onto new-base.

Example: Moving a branch from one feature branch to another:

# Before:
#  main
#    |
#    A---B---C feature-1
#         \
#          D---E---F feature-2

git rebase --onto main feature-1 feature-2

# After:
#  main
#    |
#    A---D'---E'---F' feature-2
#    |
#    B---C feature-1

Preserving Merges During Rebase

By default, rebasing flattens merge commits. To preserve the merge structure:

git rebase -i --rebase-merges HEAD~10

This is useful when you want to keep the context of merges in your history.

Stashing Advanced Techniques

Stashing Specific Files

To stash only specific files:

git stash push -m "Work on login component" src/components/Login.js

Creating a Branch from a Stash

If you want to apply a stash to a new branch:

git stash branch new-branch stash@{1}

This creates a new branch starting from the commit where the stash was created, applies the stash, and drops it if successful.

Managing Multiple Stashes

When you have multiple stashes, you can:

List all stashes with details:

git stash list --stat

Show the contents of a specific stash:

git stash show -p stash@{2}

Apply a specific stash without removing it:

git stash apply stash@{2}

Remove a specific stash:

git stash drop stash@{2}

Stashing Untracked Files

By default, git stash only saves tracked files. To include untracked files:

git stash -u

To include ignored files as well:

git stash -a

Submodules and Subtrees

Working with Submodules

Submodules allow you to include other Git repositories within your repository:

Adding a submodule:

git submodule add https://github.com/username/library.git lib/library

Initializing submodules after cloning:

git submodule init
git submodule update

Or clone with submodules in one step:

git clone --recurse-submodules https://github.com/username/project.git

Updating all submodules:

git submodule update --remote

Git Subtrees

Subtrees are an alternative to submodules that merge external repositories into a subdirectory:

Adding a subtree:

git subtree add --prefix=lib/library https://github.com/username/library.git main --squash

Updating a subtree:

git subtree pull --prefix=lib/library https://github.com/username/library.git main --squash

Contributing back to the original repository:

git subtree push --prefix=lib/library https://github.com/username/library.git contribution-branch

Submodules vs Subtrees

Feature Submodules Subtrees
Storage References only Full copy of code
Learning curve Steeper Gentler
Dependency tracking Explicit version Can be less clear
Updates Manual Can be easier
External changes Need explicit permission Can modify locally
Best for Third-party libraries Project splitting

Rewriting History

Filtering Repository History

For extensive history rewriting, use git-filter-repo (the modern replacement for git-filter-branch):

# Install git-filter-repo
pip install git-filter-repo

# Remove a file from all of history
git filter-repo --path path/to/large-file.bin --invert-paths

This is useful for:

  • Removing large files
  • Removing sensitive information
  • Extracting a subfolder into its own repository

Splitting a Repository

To extract a subdirectory into a new repository:

git filter-repo --subdirectory-filter path/to/subdirectory

This rewrites history as if the subdirectory had been the root of the repository all along.

Combining Repositories

To merge the history of multiple repositories:

# Add the other repository as a remote
git remote add other-repo https://github.com/username/other-repo.git
git fetch other-repo

# Merge with the allow-unrelated-histories flag
git merge other-repo/main --allow-unrelated-histories

# Resolve any conflicts
git commit

Advanced Merge Strategies

Octopus Merge

When merging more than two branches at once:

git merge branch1 branch2 branch3

This creates a single merge commit with multiple parents. It only works if there are no conflicts.

Merge with Strategy Options

Git offers several merge strategies with customizable options:

Recursive strategy with patience algorithm (better handling of complex changes):

git merge feature-branch -s recursive -X patience

Ignore whitespace changes during merge:

git merge feature-branch -X ignore-space-change

Favor one side in conflicts:

git merge feature-branch -X ours  # Keep our version in conflicts

Or:

git merge feature-branch -X theirs  # Use their version in conflicts

Refs and Refspecs

Understanding Git References

Git uses references (refs) to point to commits:

  • HEAD: The current commit you're working on
  • refs/heads/main: The main branch
  • refs/tags/v1.0.0: A tag named v1.0.0
  • refs/remotes/origin/main: The main branch on the origin remote

You can use symbolic references to refer to commits:

HEAD~3  # Three commits before HEAD
main^2  # The second parent of the main branch tip (for merge commits)
v1.0.0^{} # The commit object that the tag points to

Using Refspecs for Remote Operations

Refspecs define the mapping between remote and local references:

git fetch origin +refs/heads/*:refs/remotes/origin/*

This fetches all branches from origin and stores them as remote-tracking branches.

To fetch only specific branches:

git fetch origin main:refs/remotes/origin/main develop:refs/remotes/origin/develop

Creating Custom Refspecs

You can set up custom refspecs in your git config:

[remote "origin"]
    url = https://github.com/username/repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*
    fetch = +refs/pull/*/head:refs/remotes/origin/pr/*

With this configuration, git fetch will also fetch all pull requests from GitHub.

Git Hooks

Creating Useful Git Hooks

Git hooks are scripts that run automatically on certain Git events. They're stored in the .git/hooks directory.

Pre-commit hook to prevent committing large files:

#!/bin/bash
# .git/hooks/pre-commit

# Check for files larger than 5MB
large_files=$(find . -type f -size +5M -not -path "./.git/*")

if [ -n "$large_files" ]; then
  echo "Error: Attempting to commit large files:"
  echo "$large_files"
  echo "Please remove these files or add them to .gitignore"
  exit 1
fi

exit 0

Commit-msg hook to enforce commit message conventions:

#!/bin/bash
# .git/hooks/commit-msg

commit_msg_file=$1
commit_msg=$(cat "$commit_msg_file")

# Check if the message follows conventional commits format
if ! echo "$commit_msg" | grep -qE '^(feat|fix|docs|style|refactor|test|chore)(\(.+\))?: .+'; then
  echo "Error: Commit message does not follow the conventional commits format."
  echo "Example: feat(auth): add login functionality"
  exit 1
fi

exit 0

Sharing Hooks with Your Team

Git hooks aren't copied when a repository is cloned. To share hooks:

  1. Store hooks in a directory in your repository:

    project/
    ├── .git/
    └── git-hooks/
        ├── pre-commit
        ├── commit-msg
        └── ...
    
  2. Set up a script to install the hooks:

    #!/bin/bash
    # setup-hooks.sh
    
    cp git-hooks/* .git/hooks/
    chmod +x .git/hooks/*
    
  3. Document the process in your README

Alternatively, use a tool like Husky for npm projects to manage hooks in package.json.

Working with Patches

Creating and Applying Patches

Patches allow you to share changes without a common remote repository:

Creating a patch for the last commit:

git format-patch -1 HEAD

Creating patches for a range of commits:

git format-patch main..feature-branch

Applying a patch:

git apply path/to/0001-commit-message.patch

Applying and creating a commit:

git am path/to/0001-commit-message.patch

Creating Patch Series

For a series of related changes:

git format-patch -o patches/ main..feature-branch --cover-letter

This creates a series of patch files with a cover letter that you can edit to explain the entire series.

Git Attributes

Customizing Git's Behavior with Attributes

Git attributes, defined in .gitattributes, control how Git handles different file types:

# .gitattributes
*.txt text
*.jpg binary
*.sh text eol=lf
*.bat text eol=crlf

This ensures:

  • Text files have normalized line endings
  • JPG files are treated as binary
  • Shell scripts always use LF endings (for Unix)
  • Batch files always use CRLF endings (for Windows)

Defining Custom Diff and Merge Strategies

For specific file types:

*.png diff=image
*.docx diff=word

You'll need to configure these diff drivers:

git config diff.image.textconv exiftool
git config diff.word.textconv docx2txt

Now git diff will show meaningful differences for these file types.

Bundle and Archive

Creating Portable Git Repositories

Bundle creates a single file containing all commits and references:

git bundle create repo.bundle HEAD main

To clone from a bundle:

git clone repo.bundle -b main new-repo

This is useful for transferring Git data without a network connection.

Creating Archives of Your Code

To create a ZIP archive of your project:

git archive --format=zip HEAD > project.zip

To archive a specific tag:

git archive --format=zip v1.0.0 > project-v1.0.0.zip

To include submodules:

git submodule foreach --recursive 'git archive --prefix=$path/ --format=zip HEAD > $PWD/submodule-$name.zip'

Advanced Configuration

Configuring Git for Productivity

Some helpful configuration settings:

# Auto-correct typos in Git commands
git config --global help.autocorrect 20

# Show branch names in commit logs
git config --global log.decorate true

# Cache credentials
git config --global credential.helper cache

# Set a global .gitignore
git config --global core.excludesfile ~/.gitignore_global

# Use delta for improved diffs
git config --global core.pager "delta"

# Automatically prune deleted remote branches on fetch/pull
git config --global fetch.prune true

Aliases for Advanced Workflows

Setting up complex aliases:

# Undo the last commit but keep changes staged
git config --global alias.uncommit 'reset --soft HEAD^'

# Interactive rebase for cleanup
git config --global alias.cleanup 'rebase -i @{upstream}'

# Visualize log with graph
git config --global alias.graph 'log --graph --oneline --decorate --all'

# Show all configs
git config --global alias.aliases 'config --get-regexp ^alias\.'

# Clean up merged branches
git config --global alias.clean-branches '!git branch --merged | grep -v "\*" | xargs -n 1 git branch -d'

Git Internals

Understanding Git Objects

Git stores four types of objects:

  1. Blob: Content of a file
  2. Tree: Directory listing, containing blobs and other trees
  3. Commit: Points to a tree, with metadata
  4. Tag: An object pointing to a specific commit

You can examine these objects:

# View a blob
git cat-file -p blob_hash

# View a tree
git cat-file -p tree_hash

# View a commit
git cat-file -p commit_hash

# View an object's type
git cat-file -t object_hash

Git References

References (refs) are pointers to commits. They're stored in .git/refs/:

  • Branches: .git/refs/heads/
  • Tags: .git/refs/tags/
  • Remote branches: .git/refs/remotes/

View the commit a reference points to:

git rev-parse main
git rev-parse HEAD
git rev-parse --verify v1.0.0

Custom Scripts Using Git Plumbing Commands

Git's low-level "plumbing" commands can be used to build custom tools:

A script to find large objects in your Git repository:

#!/bin/bash
# find-large-objects.sh

# Create a temporary pack
git gc --quiet --prune=now

# Find the 10 largest objects
git verify-pack -v .git/objects/pack/*.idx |
  grep -v chain |
  sort -k3nr |
  head -10 |
  while read hash type size remainder; do
    if [ $size -gt 100000 ]; then
      echo "$size bytes: $(git cat-file -t $hash) $hash"
      git rev-list --all --objects | grep $hash | sed 's/^.*\t//'
    fi
  done

Git for Specialized Workflows

Monorepos with Git

For managing large monorepos:

  • Use sparse checkout to work with subsets of the repo:

    git clone --no-checkout https://github.com/username/monorepo.git
    cd monorepo
    git sparse-checkout set path/to/subdirectory1 path/to/subdirectory2
    git checkout main
    
  • Consider partial clones to reduce download size:

    git clone --filter=blob:none https://github.com/username/monorepo.git
    
  • Use tools like Git LFS for large binary files

Git for Continuous Integration/Deployment

In CI/CD pipelines, optimize Git operations:

  • Use shallow clones to speed up the process:

    git clone --depth 1 https://github.com/username/repo.git
    
  • Fetch only the necessary branches:

    git fetch origin $CI_COMMIT_REF_NAME
    
  • Use Git's built-in functionality to check for changes in specific directories:

    if git diff --name-only HEAD~1 HEAD | grep -q "^frontend/"; then
      run_frontend_tests
    fi
    

Git for Data Science and ML

For data science projects:

  • Use Git LFS for large model files and datasets
  • Consider tools like DVC (Data Version Control) for data and ML model versioning
  • Create hooks to prevent committing sensitive data or large untracked files
  • Use branch naming conventions that reflect experiments (e.g., experiment/feature-selection-1)

Conclusion

These advanced Git techniques provide powerful tools for handling complex scenarios and optimizing your workflow. While you may not need all of these features immediately, understanding them will help you solve challenging version control problems as they arise.

Remember that Git is highly flexible and extensible. As you become more comfortable with these advanced features, you can combine and customize them to create workflows that perfectly suit your needs and those of your team.

The most important skill is knowing when to use these advanced techniques. Start with the basics, and gradually incorporate these more powerful features as you encounter situations that require them.
EOF

Found an issue?