2024-04-12
7 min read

How to Count Lines of Code in a Directory Recursively

How to Count Lines of Code in a Directory Recursively

Counting lines of code is essential for project analysis, code reviews, and understanding codebase complexity. Whether you're tracking development progress or preparing project documentation, knowing how to accurately count code lines across directory structures is a valuable skill.

Basic Line Counting with find and wc

The simplest approach uses find to locate files and wc to count lines:

find . -type f -name "*.py" | xargs wc -l

This command finds all Python files in the current directory and subdirectories, then counts lines in each file. The output shows individual file counts plus a total at the bottom.

For a cleaner total-only output:

find . -type f -name "*.py" -exec cat {} + | wc -l

This concatenates all matching files and provides a single line count for the entire codebase.

Counting Multiple File Types

Most projects contain various source file types. You can count them together using multiple patterns:

find . -type f \( -name "*.js" -o -name "*.ts" -o -name "*.jsx" -o -name "*.tsx" \) -exec cat {} + | wc -l

For a more readable approach with variables:

# Define file extensions for your project
extensions=("*.py" "*.js" "*.ts" "*.html" "*.css" "*.sql")

total_lines=0
for ext in "${extensions[@]}"; do
    count=$(find . -type f -name "$ext" -exec cat {} + 2>/dev/null | wc -l)
    echo "$ext: $count lines"
    total_lines=$((total_lines + count))
done

echo "Total: $total_lines lines"

Excluding Directories and Files

Real projects often contain generated code, dependencies, or build artifacts that shouldn't be counted. Use -not -path to exclude these:

find . -type f -name "*.py" \
    -not -path "./venv/*" \
    -not -path "./.git/*" \
    -not -path "./build/*" \
    -not -path "./__pycache__/*" \
    -exec cat {} + | wc -l

For Node.js projects, exclude common directories:

find . -type f \( -name "*.js" -o -name "*.ts" -o -name "*.jsx" -o -name "*.tsx" \) \
    -not -path "./node_modules/*" \
    -not -path "./dist/*" \
    -not -path "./build/*" \
    -not -path "./.next/*" \
    -exec cat {} + | wc -l

Advanced Filtering with Detailed Output

Create a script that provides comprehensive statistics:

#!/bin/bash

# Function to count lines for specific file types
count_lines() {
    local pattern="$1"
    local description="$2"

    local files=$(find . -type f -name "$pattern" \
        -not -path "./node_modules/*" \
        -not -path "./venv/*" \
        -not -path "./.git/*" \
        -not -path "./dist/*" \
        -not -path "./build/*" \
        2>/dev/null)

    if [[ -n "$files" ]]; then
        local count=$(echo "$files" | xargs cat 2>/dev/null | wc -l)
        local file_count=$(echo "$files" | wc -l)
        printf "%-20s %8d lines in %3d files\n" "$description" "$count" "$file_count"
        return $count
    else
        printf "%-20s %8d lines in %3d files\n" "$description" "0" "0"
        return 0
    fi
}

echo "Code Line Count Analysis"
echo "========================"

total=0

# Count different file types
count_lines "*.py" "Python"; total=$((total + $?))
count_lines "*.js" "JavaScript"; total=$((total + $?))
count_lines "*.ts" "TypeScript"; total=$((total + $?))
count_lines "*.jsx" "React JSX"; total=$((total + $?))
count_lines "*.tsx" "React TSX"; total=$((total + $?))
count_lines "*.html" "HTML"; total=$((total + $?))
count_lines "*.css" "CSS"; total=$((total + $?))
count_lines "*.scss" "SCSS"; total=$((total + $?))
count_lines "*.java" "Java"; total=$((total + $?))
count_lines "*.cpp" "C++"; total=$((total + $?))
count_lines "*.c" "C"; total=$((total + $?))
count_lines "*.go" "Go"; total=$((total + $?))
count_lines "*.rs" "Rust"; total=$((total + $?))
count_lines "*.php" "PHP"; total=$((total + $?))

echo "========================"
printf "%-20s %8d lines total\n" "TOTAL" "$total"

Excluding Comments and Blank Lines

For more accurate code analysis, you might want to exclude comments and empty lines:

# Count only non-empty, non-comment lines in Python files
find . -type f -name "*.py" \
    -not -path "./venv/*" \
    -not -path "./.git/*" \
    -exec grep -v '^\s*#' {} + | \
    grep -v '^\s*$' | \
    wc -l

For multiple comment styles:

#!/bin/bash

count_code_lines() {
    local file="$1"
    local ext="${file##*.}"

    case "$ext" in
        py)
            # Remove Python comments and empty lines
            grep -v '^\s*#' "$file" | grep -v '^\s*$' | wc -l
            ;;
        js|ts|jsx|tsx|java|cpp|c|go|rs)
            # Remove // comments, /* */ comments, and empty lines
            sed 's|//.*||g' "$file" | \
            sed 's|/\*.*\*/||g' | \
            grep -v '^\s*$' | \
            wc -l
            ;;
        html|xml)
            # Remove HTML comments and empty lines
            sed 's|<!--.*-->||g' "$file" | \
            grep -v '^\s*$' | \
            wc -l
            ;;
        *)
            # Default: just remove empty lines
            grep -v '^\s*$' "$file" | wc -l
            ;;
    esac
}

export -f count_code_lines

# Find all source files and count meaningful lines
find . -type f \( \
    -name "*.py" -o -name "*.js" -o -name "*.ts" -o \
    -name "*.jsx" -o -name "*.tsx" -o -name "*.java" -o \
    -name "*.cpp" -o -name "*.c" -o -name "*.go" -o \
    -name "*.rs" -o -name "*.html" -o -name "*.xml" \
\) \
    -not -path "./node_modules/*" \
    -not -path "./venv/*" \
    -not -path "./.git/*" \
    -exec bash -c 'count_code_lines "$0"' {} \; | \
    awk '{sum += $1} END {print sum " lines of actual code"}'

Using External Tools

For professional code analysis, consider specialized tools:

# Install cloc (Count Lines of Code)
# On macOS: brew install cloc
# On Ubuntu: sudo apt install cloc

# Basic usage
cloc .

# Exclude specific directories
cloc . --exclude-dir=node_modules,venv,.git,dist,build

# Output to different formats
cloc . --json > code_stats.json
cloc . --csv > code_stats.csv

The cloc tool automatically recognizes file types, excludes comments and blank lines, and provides detailed statistics including complexity metrics.

Project-Specific Analysis

Create a project analysis script that adapts to your specific needs:

#!/bin/bash

PROJECT_DIR="${1:-.}"
EXCLUDE_DIRS=("node_modules" "venv" ".git" "dist" "build" "__pycache__" ".next" "vendor")

echo "Analyzing project: $PROJECT_DIR"
echo "================================"

# Build exclude pattern for find command
exclude_pattern=""
for dir in "${EXCLUDE_DIRS[@]}"; do
    exclude_pattern="$exclude_pattern -not -path \"./$dir/*\""
done

# Count total files
total_files=$(eval "find '$PROJECT_DIR' -type f $exclude_pattern" | wc -l)
echo "Total files: $total_files"

# Count source code files
source_files=$(eval "find '$PROJECT_DIR' -type f \( -name \"*.py\" -o -name \"*.js\" -o -name \"*.ts\" -o -name \"*.jsx\" -o -name \"*.tsx\" -o -name \"*.html\" -o -name \"*.css\" -o -name \"*.java\" -o -name \"*.cpp\" -o -name \"*.c\" -o -name \"*.go\" -o -name \"*.rs\" -o -name \"*.php\" \) $exclude_pattern" | wc -l)
echo "Source code files: $source_files"

# Count total lines in source files
total_lines=$(eval "find '$PROJECT_DIR' -type f \( -name \"*.py\" -o -name \"*.js\" -o -name \"*.ts\" -o -name \"*.jsx\" -o -name \"*.tsx\" -o -name \"*.html\" -o -name \"*.css\" -o -name \"*.java\" -o -name \"*.cpp\" -o -name \"*.c\" -o -name \"*.go\" -o -name \"*.rs\" -o -name \"*.php\" \) $exclude_pattern -exec cat {} +" | wc -l)
echo "Total lines of code: $total_lines"

# Calculate average lines per file
if [ $source_files -gt 0 ]; then
    avg_lines=$((total_lines / source_files))
    echo "Average lines per file: $avg_lines"
fi

These methods give you flexibility to count lines of code according to your specific project requirements, whether you need simple totals or detailed analysis with exclusions and filtering.

Published: 2024-04-12|Last updated: 2024-04-12T16:00:00Z

Found an issue?