Case Study: SQL Injection in Java/Spring
A SAST tool flags a line of code for potential SQL Injection. Here's a checklist to investigate if it's a false positive:
1. Trace the Data Source: Is there any user input?
The most crucial question. If the data used in the query is entirely static, hardcoded, or system-generated (e.g., `System.currentTimeMillis()`), it cannot be influenced by a user and is not a vulnerability.
2. Check for Parameterized Queries (The Right Way)
SAST tools are good, but sometimes they miss the context of modern frameworks. A query that looks like string concatenation might actually be safe if the framework handles parameterization under the hood.
SAST Flag (Potentially Vulnerable):
String query = "SELECT * FROM users WHERE status = '" + status + "'";
Investigation:
- Is it Spring Data JPA? If the query is defined in a repository interface (e.g., `@Query("...")`), Spring automatically uses parameterized queries for method arguments. This is safe.
- Is it `JdbcTemplate`? If `JdbcTemplate` is used with `?` placeholders and arguments are passed separately, it is safe. Example: `jdbcTemplate.query(query, new Object[]{status}, ...)`.
3. Analyze the Input's Constraints: Is the input strictly controlled?
Even if user input is part of a query, the risk can be mitigated if the input is strictly validated against a known-safe allowlist.
- Enum Validation: If the `status` variable from the example above is validated against an Enum (`Status.ACTIVE`, `Status.INACTIVE`), an attacker cannot inject arbitrary values. The query is safe in this context.
- Strict Regex: If an input must match a rigid pattern (e.g., a 5-digit zip code `^[0-9]{5}$`), the potential for malicious injection is effectively eliminated.
Case Study: "Traffic is not Encrypted (HTTP)"
This is a classic example of a SAST tool lacking architectural context. A tool might see an internal service call like `http://product-service:8080` and flag it, but in modern cloud environments, this is often a false positive.
1. Where is TLS Termination Handled?
In most production systems, encryption is handled outside the application. The application code itself communicates over HTTP within a secure, private network, while an upstream component handles all public-facing HTTPS traffic.
User (HTTPS) → Load Balancer / Gateway → Service Mesh / Reverse Proxy → Application (HTTP)
TLS is terminated at the Load Balancer or Gateway. All internal traffic from that point on is within a trusted network.
2. How to Verify HTTPS is Enforced
Instead of looking at the code, test the running application's behavior from an external perspective.
- Use `curl`: Run `curl -I http://your-production-site.com`. You should see a `301` or `302` redirect to the `https://` version in the `Location:` header. To follow the redirect, use `curl -IL http://your-production-site.com` and verify the final response is a `200 OK` from the HTTPS URL.
- Browser Dev Tools: Open the Network tab in your browser's developer tools. Type `http://your-production-site.com` into the address bar. The first entry should be a `301` redirect, and all subsequent requests should be to the `https://` version.
- Check Infrastructure Configuration: Review the configuration for your Cloud Load Balancer (GCP, AWS, Azure), Ingress Controller (Kubernetes), or reverse proxy (NGINX) to confirm it is configured to redirect all HTTP traffic to HTTPS.
Other Common False Positives
Cross-Site Scripting (XSS)
A SAST tool might flag data being rendered in a template without seeing that the framework provides automatic protection.
Investigation: Is this a modern front-end framework like React, Vue, or Angular? These frameworks automatically escape or sanitize data by default when rendering, mitigating most common XSS risks. The finding is likely a false positive unless `dangerouslySetInnerHTML` or a similar unguarded method is being used.
Hardcoded Secrets
Tools are aggressive about finding anything that looks like a key or password.
Investigation: Check the context. Is the flagged "secret" an example key in a comment or documentation? Is it part of a test file using mock or placeholder credentials? Is it a publicly known key for a test environment (e.g., a Stripe public test key)? True positives are secrets that control access to production data or systems.
Leveraging File Path Analysis for Triage
Before you even start random sampling, you can often eliminate a huge number of findings by analyzing the file paths from your SAST tool's data export. A file's location provides critical context about its role and risk level.
1. Identify and Exclude Non-Production Code
Many findings reside in code that is not part of the production application. A quick analysis of the file paths can help you identify and bulk-close these irrelevant alerts.
Test Code
Findings in unit, integration, or end-to-end tests are generally low-risk. Look for common test directories:
- `src/test/java/` (Java)
- `tests/`, `__tests__/` (Python, JS)
- `spec/` (Ruby)
- `*_test.go` (Go files)
Build & CI/CD Scripts
Code for pipelines might contain test credentials or configurations that are not secrets in a production context.
- `.github/workflows/`
- `Jenkinsfile`
- `azure-pipelines.yml`
- `scripts/`
Third-Party Libraries
You shouldn't be responsible for fixing findings in vendor code. These should be reported to the vendor and suppressed.
- `vendor/`
- `node_modules/`
- `target/`
- `build/`
Documentation & Examples
Code in these folders is illustrative and not part of the deployed application.
- `docs/`
- `examples/`
- `samples/`
- `demo/`
2. The Workflow
The process is straightforward and can be done with simple scripts or even spreadsheet software:
- Export All Findings: Get a complete list from your SAST tool, ensuring the output includes the full file path for each finding. A CSV format is ideal.
- Filter and Categorize: Use filtering rules based on the path patterns above to tag each finding. For example, create a new column called "Category" and label findings as "Test", "Build", "Vendor", etc.
- Prioritize the Rest: By filtering out these categories, your list of findings is now much smaller and focused exclusively on production source code. You can now apply statistical sampling to this reduced dataset for a more accurate analysis.
Tackling High-Volume Findings with Statistical Sampling
What if a SAST tool generates thousands of instances of the same finding across a large codebase? Reviewing each one is impractical. Instead, we can use statistical sampling to analyze a smaller subset and make a highly confident, data-driven decision about the entire population.
Sample Size Calculator
Required Sample Size to Review:
278
The Approach
- Define Your Parameters: Use the calculator to determine your sample size. A 95% confidence level with a 5% margin of error is a common industry standard. This means you can be 95% sure that the results from your sample reflect the entire set of findings, plus or minus 5%.
- Select a Random Sample: Export the list of all findings. Use a script or even a spreadsheet's random function to select the required number of findings randomly. It is critical that the sample is random and not just the "first N" findings.
- Analyze the Sample: Manually investigate each finding in your random sample, classifying it as a "True Positive" or "False Positive" based on the techniques described on this page.
- Extrapolate the Results: Calculate the false positive rate from your sample. For example, if you reviewed 278 findings and found that 270 were false positives, your false positive rate is ~97%.
- Make an Informed Decision: With a high false positive rate (e.g., >95%) backed by statistical data, you can now confidently recommend closing all findings of this type with a clear justification. This saves hundreds of hours of developer and security time.