As we reach the end of September 2024, ThreadFix version 3.x on-premises has officially reached its End-of-Life. Therefore, there is no longer support or updates for this version of the product. We have fully transitioned our product and development teams to focus ThreadFix SaaS and migrating all customers over from the on-premises versions. Our Customer Success and Support teams are here to help you in migrating to ThreadFix SaaS and maximizing the value you see from this improved offering from Coalfire. This is the next phase of ThreadFix and our team is looking forward to continuing to support you on this journey.

Configurations and Tuning Guide

You will learn

About the various service configurations and tuning options in ThreadFix.

Prerequisites

Audience: IT Professional
Difficulty: Advanced
Time needed: Approximately 25 minutes
Tools required: N/A

General Recommendations

Recommendation

Keywords

Importance

Recommendation

Keywords

Importance

For large scale deployments with more than a few thousand applications, contact ThreadFix Support for involvement in the deployment and configuration process.

new deployment, optimal configuration

High

Prior to bringing the ThreadFix deployment down or restarting, allow scans in the ingestion pipeline to flush and finish processing without interruption.

restarting, upgrades, applying config changes, maintenance routines

Critical

Monitor CPU/RAM average/peak utilization on the database and application servers during peak scan ingestion activity. Under-allocated resources can hinder performance and stability of the ingestion pipeline.

If the database server resources aren’t enough, and increasing them isn’t option, consider scaling down scan ingestion services. Especially the data (writer) service.

It’s recommended to start with slightly over-allocated resources. Allocation can be optimized after monitoring resource utilization in the environment under the typical/average activity and load.

resource allocation, utilization, and monitoring

Critical

Scan/Application/Team delete jobs acquire global locks blocking other jobs while processing, so it’s recommended to run these jobs during scheduled maintenance hours.

scan ingestion throughput hindrance, maintenance routines

High

Configuration Name/Detail

Services

Configuration Name/Detail

Services

Overview

 

Importer Service

Vulnerability Ingestion Processor (VIP Service)

Data Writer Service

AppSec Core/Main Application

 

Contains 2 consumers that process the following:

Raw Scan File Consumer
Parses raw scan files, saves the parsed data to the staging storage, and the scan is placed on the pipeline for further processing and ingestion.

Pending Scan Statues:

  • Queued for Import → Importing

  • Queued for Processing

Remote Provider Import Request Consumer
Handles import requests initiated by a user or scheduled job to import for a single app or a bulk import for all mapped apps for a remote provider connection.

Remote Provider Import Request Statuses:

For a bulk import, this request imports new scans for all mapped Remote Provider Apps sequentially.

  • Queued → Processing → Finished/Failed

Remote Provider Application Import Attempt Statuses

 

Pending Scan Statues:

  • Queued for Processing: a PendingScan is created with this start status after the scan is received and imported from the external Remote Provider.

 

Essentially both of these consumers produce a parsed and normalized scan which is stored to the staging storage (Minio) and some metadata to the database.

Consumes a parsed and normalized scan

  • Deduplicates findings within the scan file

  • Deduplicates against
    application channel history and carries over updates

  • Merges scan findings with other findings and vulnerabilities application-wide. Also identifies and creates new vulnerabilities.

    Findings reporting the same risk cluster and merge into ThreadFix Vulnerabilities.

  • Processed results are produced to Kafka to be picked up by the Data Writer Service where they get written to the database in efficient batches.

Pending Scan Statues:

  • Queued for Processing → Processing

  • Queueing for Ingestion

  • Queued for Ingestion

 

  • Handles the majority of database writes for scan ingestion

  • Includes process to reconcile application vulnerability data and statistics

  • Rolls back and cleans up failed pending scans.

Pending Scan Statues:

  • Queued for Ingestion → Queued for Reconcile

  • Reconciling

  • Completed

 

~ Application Threads
Average threads a single instance of the application/service utilizes at its peak. This doesn’t mean all threads will be utilized all the time.

Increasing available CPU cores can lead to better for performance for services that have more processing threads/consumers.

**Does not include the Kafka consumer’s background heartbeat thread(s).

~2

~1

** Does not include additional threads that can be utilized by Kafka's asynchronous producers used to produce Processed Finding and Vulnerability results to Kafka for Database ingestion.

~20

 

Docker Compose service name/overrides

appsec-importer:
appsec-vip:
appsec-data:

K8 service name/overrides

Max Processing Time

This translates to the Kafka consumer max.poll.interval.ms configuration which dictates how long a message or job can take to process.

Increasing this time may cause consumers/workers to take longer to rebalance, especially when scaling up and down in a busy system/full pipeline.

TF Default: 2 hours

Consider increasing this to allow Remote Provider Bulk Imports Requests to run for longer periods.

Allow importing scans for all configured and mapped Remote Provider Applications.

If any connection configurations listed below apply:
a) Have more than 1000 Remote Provider Apps with small scan data. A typical scan will have around 350 findings.
b) Takes longer than 2 hours.

TF Default: 6 minutes

Consider increasing this if processing very large scans frequently and the “Processing” stage needs more time to successfully complete.

Kafka Default: 5 minutes

TF Default: 2 hours

Docker Compose Env Config

APPSEC_IMPORTER_MAX_PROCESSING_TIME_MS=7200000

APPSEC_VIP_MAX_PROCESSING_TIME_MS=360000

Override the Kafka max.poll.interval.ms for this service if only truly necessary.

Not recommended to increase, most operations by this services run quickly and efficiently and should not need more than a few seconds at most.

K8 Env Config

Kafka Partition Count Configs

The number of partitions for Kafka topics a service consumes from and produces messages/data to. Partitions allow for concurrency if the user would like to scale services to process concurrently.

Important

  • The current recommended default to start with is between 16 - 20 partitions across the board.

  • In the interest of simplicity, ThreadFix currently recommends keeping partition counts close to the same value for the different partition configurations across the board.

  • Partition counts should only be increased.

  • Allow the ingestion pipeline to clear and settle down before bringing the system down and changing partition configurations.

Docker Compose Env Config

APPSEC_VIP_MIN_PARTITION_COUNT=10

APPSEC_DATA_MIN_PARTITION_COUNT=10

 

K8 Env Config

 

 

Scaling Guide

 

When to scale?

 

Important

  • Scaling a service/consumer to N number of replicas requires the inbound Kafka topic partition counts to be at least N or greater. See the partition configuration guide.

  • Use this report to review the current ingestion throughput percentiles and identify if improving throughput is desired.
    Scan Ingestion Percentile Throughput Report 

  • The following query report provides a breakdown of the average time spent in each PendingScan stage. This can be leveraged to identify pipeline bottlenecks and what service may need to be scaled accordingly.
    Average Scan Ingestion Stage Time Breakdown Report

A. Remote Provider Imports
Imports for different Remote Provider Connection Configurations can be imported concurrently if the Importer Service is scaled to match the desired concurrency.

Ideally the number of Importer Services and Remote Provider Import concurrency can match the number of Remote Provider connection configurations. Scaling beyond this number will likely result in idle services/consumers.

A bulk import for a single Remote Provider connection configuration is picked up and processed by a single Importer Service. The Bulk Import job will sequentially import scans for each mapped app and drop them on the ingestion pipeline.

B. Scan File Imports
If reducing time for the following Pending Scan stages is desired:

  • Queued for Import (Scan File)

If reducing time for the following Pending Scan stages is desired:

  • Queued for Processing

 

If reducing time for the following Pending Scan stages is desired:

  • Database Ingestion Time

  • Queued for Reconcile (conditional)

Warnings
Monitor CPU/RAM average/peak utilization on the database server during peak scan ingestion activity. An under-resourced database server can hinder performance and stability of the ingestion pipeline.

If the database server resources aren’t enough, and increasing them isn’t an option, consider scaling down the number of data writer services.

 

 

 

Limited to 1

Docker Compose Scaling Command

docker-compose scale appsec-importer=2

docker-compose scale appsec-vip=4

docker-compose scale appsec-data=4

K8 Env Config

 References

www.threadfix.it | www.coalfire.com
Copyright © 2024 Coalfire. All rights reserved.

This Information Security Policy is CoalFire - Public: Distribution of this material is not limited.