CodeQL
Master CodeQL for deep semantic code analysis. Learn the query language, run security queries, and integrate with GitHub Advanced Security.
CodeQL is GitHub's semantic code analysis engine. Unlike pattern-based tools, CodeQL treats code as data—you query it using a specialized language to find complex vulnerabilities that simpler tools miss.
How CodeQL Works
CodeQL operates in two phases:
1. Database Creation
Source Code → CodeQL Extractor → CodeQL Database
2. Query Execution
CodeQL Database + Queries → Results
Database creation parses your code into a relational database containing:
- Abstract Syntax Tree (AST) nodes
- Data flow information
- Control flow graphs
- Type information
- Call graphs
Query execution runs QL queries against this database to find patterns.
Installation
CodeQL CLI
# macOS
brew install codeql
# Download from GitHub
# https://github.com/github/codeql-cli-binaries/releases
# Verify installation
codeql --version
# CodeQL command-line toolchain release 2.16.0
Standard Query Packs
# Download standard libraries and queries
codeql pack download codeql/python-queries
codeql pack download codeql/javascript-queries
codeql pack download codeql/java-queries
Creating a CodeQL Database
Before running queries, you must create a database from your source code:
# Python project
codeql database create ./codeql-db \\
--language=python \\
--source-root=./src
# JavaScript/TypeScript project
codeql database create ./codeql-db \\
--language=javascript \\
--source-root=.
# Java project (requires build)
codeql database create ./codeql-db \\
--language=java \\
--command='mvn clean compile'
Note: Compiled languages (Java, C++, Go) require CodeQL to observe the build process. Interpreted languages (Python, JavaScript) don't need a build command.
Running Security Queries
Using Query Packs
# Run all security queries for Python
codeql database analyze ./codeql-db \\
codeql/python-queries:codeql-suites/python-security-extended.qls \\
--format=sarif-latest \\
--output=results.sarif
# Run specific query
codeql database analyze ./codeql-db \\
codeql/python-queries:Security/CWE-089/SqlInjection.ql \\
--format=sarif-latest \\
--output=sql-injection.sarif
Query Suites
| Suite | Description | Use Case |
|---|---|---|
security-extended.qls |
Comprehensive security queries | Deep security audits |
security-and-quality.qls |
Security + code quality | General CI scanning |
code-scanning.qls |
GitHub Code Scanning defaults | PR checks |
Understanding the QL Language
QL is a declarative, object-oriented query language. Learning the basics helps you understand query results and write custom queries.
Basic Query Structure
/**
* @name Find calls to eval
* @description Detects calls to the eval function
* @kind problem
* @problem.severity warning
* @id py/call-to-eval
*/
import python
from Call call, Name name
where
call.getFunc() = name and
name.getId() = "eval"
select call, "Call to eval() detected"
Query components:
import python— Import the Python standard libraryfrom ... where ... select— SQL-like query structure@kind problem— Query produces alert-style results@problem.severity— warning, error, or recommendation
Finding SQL Injection (Python)
/**
* @name SQL injection vulnerability
* @description User input flows to SQL query without sanitization
* @kind path-problem
* @problem.severity error
* @id py/sql-injection
*/
import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.Concepts
module SqlInjectionConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
predicate isSink(DataFlow::Node sink) {
exists(SqlExecution sql | sql.getSql() = sink)
}
}
module SqlInjectionFlow = TaintTracking::Global<SqlInjectionConfig>;
from SqlInjectionFlow::PathNode source, SqlInjectionFlow::PathNode sink
where SqlInjectionFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "SQL injection from $@", source.getNode(), "user input"
This query:
- Defines sources (user input from HTTP requests)
- Defines sinks (SQL execution functions)
- Uses taint tracking to find paths from sources to sinks
- Reports vulnerabilities with data flow paths
Common QL Concepts
Classes and predicates:
// Class representing function calls
class DangerousCall extends Call {
DangerousCall() {
this.getFunc().(Name).getId() in ["eval", "exec", "compile"]
}
}
// Predicate (reusable condition)
predicate isUserInput(Expr e) {
exists(Call call |
call.getFunc().(Attribute).getName() = "get" and
call.getFunc().(Attribute).getObject().(Name).getId() in ["request", "args", "form"] and
call = e
)
}
Data flow:
// Track data from source to sink
from DataFlow::Node source, DataFlow::Node sink
where
source instanceof RemoteFlowSource and
sink = any(Call c | c.getFunc().(Name).getId() = "eval").getAnArg() and
DataFlow::localFlow(source, sink)
select sink, "User input flows to eval"
GitHub Code Scanning Integration
The easiest way to use CodeQL is through GitHub's free Code Scanning feature.
Enable Code Scanning
- Go to your repository on GitHub
- Click Security > Code scanning
- Click Set up code scanning
- Select CodeQL Analysis
- Review and commit the workflow
Default Workflow
# .github/workflows/codeql.yml
name: "CodeQL"
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
schedule:
- cron: '0 6 * * 1' # Weekly on Monday
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ['python', 'javascript']
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: +security-and-quality
- name: Autobuild
uses: github/codeql-action/autobuild@v3
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:${{ matrix.language }}"
Custom Queries in CI
Add your own queries to the analysis:
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: +security-and-quality
# Add custom query pack
packs: my-org/[email protected]
Or reference local queries:
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: python
queries: ./codeql/custom-queries/
Writing Custom Queries
Query Pack Structure
my-queries/
├── qlpack.yml # Pack metadata
├── codeql-suites/
│ └── my-suite.qls # Query suite definition
└── src/
└── security/
├── HardcodedCredentials.ql
└── InsecureRandomness.ql
qlpack.yml:
name: my-org/my-queries
version: 1.0.0
dependencies:
codeql/python-all: "*"
Query suite (my-suite.qls):
- queries: .
- include:
kind: problem
- include:
kind: path-problem
Example: Detecting Hardcoded Secrets
/**
* @name Hardcoded credentials
* @description Credentials should not be hardcoded in source code
* @kind problem
* @problem.severity error
* @precision high
* @id py/hardcoded-credentials
* @tags security
* external/cwe/cwe-798
*/
import python
predicate isCredentialVariable(Name name) {
name.getId().toLowerCase().regexpMatch(".*\\b(password|passwd|pwd|secret|api_key|apikey|token|auth)\\b.*")
}
from Assign assign, Name target, StrConst value
where
assign.getATarget() = target and
assign.getValue() = value and
isCredentialVariable(target) and
value.getText().length() > 4 // Ignore empty/short strings
select assign, "Potential hardcoded credential in variable '" + target.getId() + "'"
Testing Your Queries
CodeQL supports unit testing for queries:
my-queries/
└── test/
└── security/
├── HardcodedCredentials/
│ ├── test.py # Test code
│ └── HardcodedCredentials.expected # Expected results
test.py:
# Test file for hardcoded credentials detection
password = "secret123" # $result
api_key = "AKIA1234567890ABCDEF" # $result
username = "admin" # Safe - not a credential variable
safe_value = get_password_from_vault() # Safe - not a string literal
Run tests:
codeql test run my-queries/test/
Best Practices
1. Start with Standard Queries
CodeQL's built-in queries are well-tested. Customize only when you have specific needs.
2. Use Path Queries for Data Flow
Path queries (@kind path-problem) show the complete data flow from source to sink, making vulnerabilities easier to understand and fix.
3. Tune for Your Codebase
Exclude generated code and vendor directories:
# codeql-config.yml
paths-ignore:
- node_modules
- vendor
- "**/*.generated.py"
4. Cache Databases
CodeQL database creation is expensive. Cache it in CI:
- name: Cache CodeQL database
uses: actions/cache@v3
with:
path: .codeql-db
key: codeql-${{ hashFiles('**/*.py') }}
5. Review Alerts Systematically
GitHub Code Scanning shows alerts in the Security tab. Triage them:
- Dismiss with reason if false positive
- Create issue for real vulnerabilities
- Fix in PR for easy fixes
CodeQL vs. Other Tools
| Feature | CodeQL | Semgrep | SonarQube |
|---|---|---|---|
| Analysis depth | Deepest | Pattern-based | Deep |
| Speed | Slow | Fast | Medium |
| Custom rules | Complex (QL) | Easy (YAML) | Medium |
| Data flow | Excellent | Pro only | Good |
| Free for private repos | Paid | Limited | Community Ed |
| GitHub integration | Native | Good | Good |
Use CodeQL when:
- You need to find complex vulnerabilities
- You're using GitHub and want native integration
- You have time for thorough analysis (weekly scans)
- You're auditing security-critical code
GitHub Advanced Security Pricing
- Free for public repositories
- Paid for private repositories (GitHub Advanced Security license)
- Includes: Code Scanning, Secret Scanning, Dependency Review
Troubleshooting
Database Creation Fails
# Verbose output for debugging
codeql database create ./codeql-db \\
--language=python \\
--source-root=. \\
--verbosity=progress
Query Times Out
# Increase timeout (default 5 minutes)
codeql database analyze ./codeql-db \\
codeql/python-queries:codeql-suites/python-security-extended.qls \\
--timeout=1800 # 30 minutes
Too Many Results
# Limit to high-precision queries
codeql database analyze ./codeql-db \\
codeql/python-queries:codeql-suites/python-security-extended.qls \\
--sarif-add-snippets \\
--threads=4
Key Takeaways
- CodeQL is the deepest analysis — Best for finding complex vulnerabilities
- Database creation is a separate step — Plan for build time
- QL is powerful but complex — Start with standard queries
- GitHub integration is seamless — Free for open source
- Combine with faster tools — Use Semgrep for pre-commit, CodeQL for weekly deep scans
You now have a comprehensive SAST toolkit: SonarQube for platform-wide visibility, Semgrep for fast custom rules, and CodeQL for deep semantic analysis. Layer these tools to catch vulnerabilities at every stage of development.
Found an issue?