Trước đây mình có trình bày sử dụng talisman để scan code và kiểm tra sem ai để secret trong repo ko?
Hôm nay chúng ta sẽ đến với TruffleHog
https://github.com/trufflesecurity/trufflehog

Project này khá nhiều star and fork.
root@LP11-D7891:~# docker run --rm -it -v "$PWD:/pwd" trufflesecurity/trufflehog:latest --help usage: TruffleHog [<flags>] <command> [<args> ...] TruffleHog is a tool for finding credentials. Flags: --help Show context-sensitive help (also try --help-long and --help-man). --debug Run in debug mode. --trace Run in trace mode. --profile Enables profiling and sets a pprof and fgprof server on :18066. -j, --json Output in JSON format. --json-legacy Use the pre-v3.0 JSON format. Only works with git, gitlab, and github sources. --github-actions Output in GitHub Actions format. --concurrency=8 Number of concurrent workers. --no-verification Don't verify the results. --only-verified Only output verified results. --filter-unverified Only output first unverified result per chunk per detector if there are more than one results. --config=CONFIG Path to configuration file. --print-avg-detector-time Print the average time spent on each detector. --no-update Don't check for updates. --fail Exit with code 183 if results are found. --verifier=VERIFIER ... Set custom verification endpoints. --archive-max-size=ARCHIVE-MAX-SIZE Maximum size of archive to scan. (Byte units eg. 512B, 2KB, 4MB) --archive-max-depth=ARCHIVE-MAX-DEPTH Maximum depth of archive to scan. --archive-timeout=ARCHIVE-TIMEOUT Maximum time to spend extracting an archive. --include-detectors="all" Comma separated list of detector types to include. Protobuf name or IDs may be used, as well as ranges. --exclude-detectors=EXCLUDE-DETECTORS Comma separated list of detector types to exclude. Protobuf name or IDs may be used, as well as ranges. IDs defined here take precedence over the include list. --version Show application version. Commands: help [<command>...] Show help. git [<flags>] <uri> Find credentials in git repositories. github [<flags>] Find credentials in GitHub repositories. gitlab --token=TOKEN [<flags>] Find credentials in GitLab repositories. filesystem [<flags>] [<path>...] Find credentials in a filesystem. s3 [<flags>] Find credentials in S3 buckets. gcs [<flags>] Find credentials in GCS buckets. syslog [<flags>] Scan syslog circleci --token=TOKEN Scan CircleCI
docker run --rm -it -v "$PWD:/pwd" trufflesecurity/trufflehog:latest git file:///pwd
docker run --rm -it -v "$PWD:/pwd" trufflesecurity/trufflehog:latest filesystem /pwd
Với old version thì nó scan High Entropy rất ngon.
Đây là file Dockerfile:
FROM python:3-alpine RUN apk add --no-cache git && pip install gitdb2==3.0.0 trufflehog RUN adduser -S truffleHog USER truffleHog WORKDIR /proj ENTRYPOINT [ "trufflehog" ] CMD [ "-h" ]
docker run -v $BITBUCKET_CLONE_DIR:/target mrnim94/trufflehog:v2 file:///target

Quá chúng ta đã detect được 1 privatekey.
nhưng nó cũng có 1 vấn đề

Nó nhận diện những strings của go.mod đây cũng là issues
In the context of TruffleHog, high entropy is a measure used to identify potential secrets, such as passwords or API keys, in the code.
When TruffleHog talks about high entropy, it’s referring to a string of text that has a lot of randomness and might be a secret. It calculates the entropy of a string using the Shannon Entropy formula, which is a common method for measuring information entropy.
===> TruffeHog sử dụng Shannon Entropy formula để xem set 1 string có thể là secret hay không
=====> thường cách này sẽ chúng ta 1 cài nhìn tổng quan.
use –exclude-paths
Path to file with newline separated regexes for files to exclude in scan.
docker run --rm -it -v "$PWD:/pwd" trufflesecurity/trufflehog:latest git --help usage: TruffleHog git [<flags>] <uri> Find credentials in git repositories. Flags: --help Show context-sensitive help (also try --help-long and --help-man). --debug Run in debug mode. --trace Run in trace mode. --profile Enables profiling and sets a pprof and fgprof server on :18066. -j, --json Output in JSON format. --json-legacy Use the pre-v3.0 JSON format. Only works with git, gitlab, and github sources. --github-actions Output in GitHub Actions format. --concurrency=8 Number of concurrent workers. --no-verification Don't verify the results. --only-verified Only output verified results. --filter-unverified Only output first unverified result per chunk per detector if there are more than one results. --config=CONFIG Path to configuration file. --print-avg-detector-time Print the average time spent on each detector. --no-update Don't check for updates. --fail Exit with code 183 if results are found. --verifier=VERIFIER ... Set custom verification endpoints. --archive-max-size=ARCHIVE-MAX-SIZE Maximum size of archive to scan. (Byte units eg. 512B, 2KB, 4MB) --archive-max-depth=ARCHIVE-MAX-DEPTH Maximum depth of archive to scan. --archive-timeout=ARCHIVE-TIMEOUT Maximum time to spend extracting an archive. --include-detectors="all" Comma separated list of detector types to include. Protobuf name or IDs may be used, as well as ranges. --exclude-detectors=EXCLUDE-DETECTORS Comma separated list of detector types to exclude. Protobuf name or IDs may be used, as well as ranges. IDs defined here take precedence over the include list. --version Show application version. -i, --include-paths=INCLUDE-PATHS Path to file with newline separated regexes for files to include in scan. -x, --exclude-paths=EXCLUDE-PATHS Path to file with newline separated regexes for files to exclude in scan. --exclude-globs=EXCLUDE-GLOBS Comma separated list of globs to exclude in scan. This option filters at the `git log` level, resulting in faster scans. --since-commit=SINCE-COMMIT Commit to start scan from. --branch=BRANCH Branch to scan. --max-depth=MAX-DEPTH Maximum depth of commits to scan. --allow No-op flag for backwards compat. --entropy No-op flag for backwards compat. --regex No-op flag for backwards compat. Args: <uri> Git repository URL. https://, file://, or ssh:// schema expected.
Đầu tiên bạn cần tạo file: trufflehog-exclude_paths.txt
Mình muốn không scan file : infra-structure/aws/62277851XXXX/prod-engines-us-east-2/coralogix-cloudwatchexporter/variables.tf
Và nội dụng file sẽ là:
.*encrypted_file.*
Bạn sẽ cần follows to written in a syntax known as a regular expression, or regex
và run command:
docker run -v $BITBUCKET_CLONE_DIR:/target mrnim94/trufflehog:v2 -x /target/trufflehog-exclude_paths.txt file:///target