How to Compare Two Directory Trees and Find Files That Differ by Content
You have two directory trees - maybe a backup and the original, or two versions of a project - and you need to know which files have different content. How do you compare them efficiently?
TL;DR
Use diff -r dir1 dir2 for a quick recursive comparison showing differences. For a summary of which files differ without showing the actual differences, use diff -rq dir1 dir2. For large directories, use rsync -avn --delete dir1/ dir2/ to see what would change. To compare by checksums, use find with md5sum or sha256sum.
Comparing directory trees is common when verifying backups, checking synchronization, or finding what changed between versions.
Let's say you have two directories you want to compare:
project-v1/
├── src/
│ ├── main.js
│ └── utils.js
└── README.md
project-v2/
├── src/
│ ├── main.js
│ └── utils.js
└── README.md
Using diff for Recursive Comparison
The diff command with -r compares directories recursively:
diff -r project-v1 project-v2
This shows the actual differences in files that don't match. If the directories are identical, there's no output.
Example output when files differ:
diff -r project-v1/src/main.js project-v2/src/main.js
15c15
< const version = "1.0";
---
> const version = "2.0";
This shows line 15 changed from version 1.0 to 2.0.
Getting a Summary Without Showing Differences
If you just want to know which files differ, not the actual differences:
diff -rq project-v1 project-v2
The -q flag (brief mode) only reports which files differ:
Files project-v1/src/main.js and project-v2/src/main.js differ
Only in project-v2: package-lock.json
Only in project-v1: old-config.json
This is faster and easier to read when you have many differences.
Comparing with rsync
The rsync command can show what would be different if you synchronized two directories:
rsync -avn --delete project-v1/ project-v2/
Flags explained:
-a- archive mode (preserves permissions, times, etc.)-v- verbose (shows files being compared)-n- dry run (don't actually copy anything)--delete- show files that exist in destination but not source
Output shows what would be copied or deleted:
sending incremental file list
deleting package-lock.json
src/main.js
old-config.json
This means:
src/main.jswould be updated in project-v2old-config.jsonwould be copied to project-v2package-lock.jsonwould be deleted from project-v2
Using rsync to Compare by Checksum
By default, rsync compares files by size and modification time. To compare by content (checksum):
rsync -avn --checksum project-v1/ project-v2/
The --checksum flag makes rsync compute checksums for every file, which is slower but more accurate. Files with identical content but different timestamps won't show as different.
Comparing with find and md5sum
For a more manual approach, generate checksums for all files in both directories:
# Generate checksums for first directory
cd project-v1
find . -type f -exec md5sum {} \; | sort > ../checksums-v1.txt
# Generate checksums for second directory
cd ../project-v2
find . -type f -exec md5sum {} \; | sort > ../checksums-v2.txt
# Compare the checksum files
diff ../checksums-v1.txt ../checksums-v2.txt
This shows which files have different content based on MD5 checksums.
For more security (less chance of collisions), use SHA-256:
find . -type f -exec sha256sum {} \; | sort > ../checksums-v1.txt
Ignoring Certain Files or Directories
To exclude specific files or directories from comparison:
# Exclude node_modules and .git
diff -rq --exclude='node_modules' --exclude='.git' project-v1 project-v2
# Exclude multiple patterns
diff -rq --exclude='*.log' --exclude='*.tmp' --exclude='build' project-v1 project-v2
With rsync:
rsync -avn --exclude='node_modules' --exclude='.git' project-v1/ project-v2/
Comparing Only Specific File Types
To compare only certain files, combine diff with find:
# Compare only JavaScript files
find project-v1 -name "*.js" -type f | while read file; do
relative=${file#project-v1/}
if [ -f "project-v2/$relative" ]; then
diff "$file" "project-v2/$relative" > /dev/null || echo "Different: $relative"
else
echo "Missing in project-v2: $relative"
fi
done
This checks each .js file in project-v1 against the corresponding file in project-v2.
Using Tree to Visualize Structure Differences
The tree command can help visualize directory structures:
# Install tree if needed
sudo apt install tree # Ubuntu/Debian
# Compare structures side by side
tree project-v1 > tree-v1.txt
tree project-v2 > tree-v2.txt
diff tree-v1.txt tree-v2.txt
This shows which files or directories exist in one tree but not the other.
Finding Files That Exist in Only One Directory
To find files that exist in one directory but not the other:
# Files only in project-v1
diff -rq project-v1 project-v2 | grep "Only in project-v1"
# Files only in project-v2
diff -rq project-v1 project-v2 | grep "Only in project-v2"
Or use comm with sorted file lists:
# Get sorted file lists
find project-v1 -type f | sed 's|^project-v1/||' | sort > files-v1.txt
find project-v2 -type f | sed 's|^project-v2/||' | sort > files-v2.txt
# Show files only in project-v1 (column 1)
comm -23 files-v1.txt files-v2.txt
# Show files only in project-v2 (column 2)
comm -13 files-v1.txt files-v2.txt
# Show files in both (column 3)
comm -12 files-v1.txt files-v2.txt
Practical Example: Backup Verification Script
Here's a script to verify a backup matches the original:
#!/bin/bash
set -e
SOURCE="$1"
BACKUP="$2"
if [ -z "$SOURCE" ] || [ -z "$BACKUP" ]; then
echo "Usage: $0 <source-dir> <backup-dir>"
exit 1
fi
echo "Comparing $SOURCE with $BACKUP..."
echo
# Find files that differ
DIFF_OUTPUT=$(diff -rq --exclude='.git' "$SOURCE" "$BACKUP")
if [ -z "$DIFF_OUTPUT" ]; then
echo "✓ Backup is identical to source"
exit 0
else
echo "✗ Differences found:"
echo "$DIFF_OUTPUT"
exit 1
fi
Run it:
chmod +x verify-backup.sh
./verify-backup.sh /var/www/app /backup/app
Practical Example: Finding Recently Changed Files
Compare directories but only show files modified in the last day:
#!/bin/bash
DIR1="$1"
DIR2="$2"
# Find files modified in last 24 hours in DIR1
find "$DIR1" -type f -mtime -1 | while read file; do
relative=${file#$DIR1/}
file2="$DIR2/$relative"
if [ -f "$file2" ]; then
# Compare content
if ! cmp -s "$file" "$file2"; then
echo "Modified: $relative"
fi
else
echo "New file: $relative"
fi
done
Using git diff for Version-Controlled Projects
If both directories are git repositories, you can use git:
# Compare two branches or commits
git diff branch1..branch2
# Compare working directory with another directory
git diff --no-index project-v1 project-v2
The --no-index flag lets you use git diff on directories that aren't in a repository.
Performance Considerations
For large directory trees:
diff -rqis faster thandiff -rbecause it doesn't compute actual differencesrsync --checksumis slow because it reads entire filesrsyncwithout--checksumis faster but only compares size/time- Checksums (md5sum/sha256sum) are reliable but slow on large files
For quick checks, use:
diff -rq dir1 dir2
For thorough content comparison:
rsync -avn --checksum dir1/ dir2/
Comparing Binary Files
The methods above work for text and binary files. For binary-specific comparison:
# Use cmp for binary comparison
find dir1 -type f | while read file; do
file2="dir2/${file#dir1/}"
if [ -f "$file2" ]; then
cmp -s "$file" "$file2" || echo "Binary files differ: $file"
fi
done
The cmp command compares files byte-by-byte and works well with binaries.
Creating a Detailed Comparison Report
Generate a comprehensive report:
#!/bin/bash
DIR1="$1"
DIR2="$2"
REPORT="comparison-report.txt"
{
echo "Directory Comparison Report"
echo "==========================="
echo "Generated: $(date)"
echo "Directory 1: $DIR1"
echo "Directory 2: $DIR2"
echo
echo "Files that differ in content:"
diff -rq "$DIR1" "$DIR2" | grep "differ$"
echo
echo "Files only in $DIR1:"
diff -rq "$DIR1" "$DIR2" | grep "Only in $DIR1"
echo
echo "Files only in $DIR2:"
diff -rq "$DIR1" "$DIR2" | grep "Only in $DIR2"
echo
echo "Total files in $DIR1: $(find "$DIR1" -type f | wc -l)"
echo "Total files in $DIR2: $(find "$DIR2" -type f | wc -l)"
} > "$REPORT"
echo "Report saved to $REPORT"
cat "$REPORT"
Comparing directory trees is straightforward with tools like diff and rsync. Whether you need a quick check or detailed analysis, these methods help you identify differences, verify backups, and track changes between directory versions.
Found an issue?