OSINT: Certificate Transparency Lists

NOTE: The wonderful people over at OSINT Curious wrote a similar blog post about this last year. The link is HERE. I recommend giving it a read if you want to know more about how to “hunt” using information from certificates.

It’s 2020 and I made myself a promise to try and write more blog posts about things so here is the first for the year. SSL/TLS certificates have been a great resource for a while now to identify, pivot from and hunt for infrastructure on the internet. The introduction of the Certificate Transparency Lists were first introduced by Google in 2013 after DigiNotar was compromised in 2011. It basically provides public logs of certificates issued by trusted certificate authorities and is a great source of intel for a number of functions (such as, but not limited to):

  • Detection of phishing websites
  • Brand/Reputation Monitoring
  • Tracking malicious infrastructure
  • Bug Bounties

For a long time newly registered/observed domains have been an indicator or potential malicious activity, for example if ebaythisisfake.com was registered you could guess that this was going to be a phishing website. The issue with watching for newly registered domains (it’s still a good indicator), is that a domain could be purchased and then not used for days/months/years.

The wonderful thing about using the CT lists is that the fact a certificate has been registered usually (probably 95% of the time) means that the infrastructure is active and in use, it also means that there is active DNS (required for requesting a certificate) for that domain and an IP address to investigate. It also allows you to catch when a legitimate domain has been compromised and is being used for malicious stuff, for example ebay.legitimatedomain.com which is why CT lists are great for catching phishing websites.

Since the initial launch of number of companies have started collecting the information published on the CT lists (why not it’s free), and either providing it as part of an existing service (Cyber Threat Intel), providing tools to look up specific domains (Facebook for example) or giving people access to the raw data in an easy to consume way (this is the one we are interested in).

It’s worth noting that while the CT lists are publicly available, the sheer volume of data generated by them means it’s much easier to use a feed from someone else than build your own, unless of course you want to..

For the rest of this post we are going to be focusing on one specific provider, which is from Calidog (https://certstream.calidog.io/), this company provide the data for free, and have a number of libraries for different programming languages and a command line tool as well.

The command line tool is python based so you can simply install it using Python’s PIP library (we use pip3 because Python 2.7 is end of life).

pip3 install certstream

Once it’s been installed you can then run “certstream” from the command line, running it without any options will just connect and start pulling down the firehose of certificate information.

certstream just being run from the command linecertstream just being run from the command line

Straight away you can start to see the certificates information flying across your terminal. The default option just shows a timestamp, certificate authority and the certificate common name that is being registered. OK its interesting but not very helpful (for our needs anyway). If you run “certstream — full” you will get the same information as well as “all domains” which are any other domain names assigned to that certificate.

certstream with the — full switchcertstream with the — full switch

This “all domains” information is really useful if you are for example tracking a domain or keyword that uses Cloudflare or similar platforms that provide SSL certificates. If you were to run certstream — full | grep cloudflare you would see all the new certificates for services behind Cloudflare.

certstream with — full and grepping for cloudflarecertstream with — full and grepping for cloudflare

Let’s have a look at some use cases, how about if you were interesting in bug bounties you can use CT lists to discover new infrastructure being created by your target company. If you were to run certstream — full | grep tesla.com for example you see any new certificates being created that contained the word “tesla.com”. You could even remove the “.com” part and search for the word tesla, this would generate more false positives but would show third party services that Tesla might be using (for example tesla.glassdoor.com). Another use case similar to this if you were to run certstream — full | grep amazon (for example) it would show any new certificates (as well as some false positives) that are generated by Amazon.

certstream with — full and grepping for amazoncertstream with — full and grepping for amazon

What about brand reputation monitoring, this is where someone is registering a certificate that will be part of a phishing website. This is important if you have customers that have to log into your website, or have to enter some kind of credentials. You can easily do this at the command line using similar commands to before, such as certstream — full | grep [keyword].

The problem with using the command line in this way is that it’s great for short time monitoring but if you want to include this as part of an overall monitoring solution or longer term research you will need to code up a solution. The good thing is that there are libraries available on the calidog.io website and they provide code examples and data pulled from the CT lists (via calidog.io) is JSON formatted so easy to use (well sometimes at least).

For this example we are going to use the Python library and extend the code example to allow the use of multiple keywords. The output will write to the command line but can easily be changed to send either the full JSON or the parts you want to another platform (NoSQL, Elastic, Splunk, Slack etc. etc.).

This is the code currently, it takes a list of keywords and then looks for those in each newly registered certificate, if there is a match it will print it out to the command line. The code uses “paypal”, “ebay” & “amazon” as our example keywords but you can could change this to anything you want.

    import logging
    import sys
    import datetime
    import certstream

    keyword = ['paypal', 'amazon', 'ebay']

    def print_callback(message, context):
        logging.debug("Message -> {}".format(message))

    if message['message_type'] == "heartbeat":
            return

    if message['message_type'] == "certificate_update":
            domains = message['data']['leaf_cert']['all_domains']
            if [k for k in keyword if k in ' '.join(domains)]:
                sys.stdout.write(u"[{}] {} {} \n".format(datetime.datetime.now().strftime('%m/%d/%y %H:%M:%S'), domains[0], ','.join(domains)))
                sys.stdout.flush()

    logging.basicConfig(format='[%(levelname)s:%(name)s] %(asctime)s - %(message)s', level=logging.INFO)

    certstream.listen_for_events(print_callback, url='wss://certstream.calidog.io/')

Now the code won’t necessarily output something straight away as it’s looking just for the keywords you’ve specified but in the screenshot below you can see the output from about 5 minutes using the example keywords.

Example output of the above python script after 5 minutesExample output of the above python script after 5 minutes

If you want to use the data retrieved from the CT list to pivot into other data sources, you can expand the number of fields returned (or use the whole thing) to get the certificate serial number, fingerprint etc.

All the available fields are shown in the Github repository for the calidog.io Python library which is available at the link below:
https://github.com/CaliDog/certstream-python

Python Code:

I’ve made the code used in this post a bit more robust and pushed it to my “Junk” Github repo which you can find HERE.

To run the code you first need to make sure you are using Python3, and have installed the “certstream” library. To install the certstream python library you just need to type on a terminal/command line.

pip3 install certstream

Once the certstream python library is installed you need to create a text file with your list of “keywords”. A single keyword per line is perfect so for example if you wanted to look for new certificate requests for Amazon, EBay and PayPal (for example) just create a file (doesn’t matter what the filename is) and add;

    amazon
    ebay
    paypal

To run the code just execute the following;

python3 osint-certstream.py [filename]

So if your keywords filename is keywords.txt for example you would just run;

python3 osint-certstream.py keywords.txt

To stop the code at any time you can just CTRL + C to kill it.

Let me know if you have any issues, the code works it’s just production ready but gives you an idea of what is possible.

OSINT: Getting email addresses from PGP Signatures

This morning I was reviewing some new Tor onion addresses one of my scripts had collected over the weekend, a lot of the websites I collect are either junk or no longer working.

One of the ones I checked was working, and on the website there was a PGP signature block. That got me thinking if you don’t know someones email address can you determine it from the PGP signature block??

Well the answer is yes, yes you can and with a little of time and some Python (who doesn’t love Python on a Monday morning) you can work out the email address(es) associated from just a PGP signature. The magic comes in determining the KeyId associated with a PGP signature and then using that Key ID querying a PGP key server to find the email address.

The code only currently works on PGP signatures that are the format as the one below (I may update it to work on other formats):

DISCLAIMER: I found this on the “darkweb” so I hold no responsibility/liability for what you do with it.

——-BEGIN PGP SIGNATURE ——-

iQIzBAEBCgAdFiEEekpX72xWccPpVa0cjllHEKAeV0MFAlzthwgACgkQjllHEKAe
V0N9pxAApaQWg36TkemWvTu5a4BNuDEVjRQlnYPXg3XGllnIhsMFbiJvUbUJ4JHB
9qe979q1NEjHUaj2Z9u5n7zEs3w6DseZ5/yHAX63ozYfzhTCoix1SZmd+s5bAuPD
EvxCWPDe9jZSnNEPdMSsUMaD47YzNhQW8ZZniefBh1VUArZoUN96EaTk3KNsWYza
lzDH5gs+dALohQYdQiXp65bOsX0jrs7ouqBbY0kEdr/KymPLtiHsOPzzOgOrkLX8
t0oide+/wowbgV/8efH2ryu9s+hr1imVOIeH//0sjm6l64elNv3gp+81hQwCYtTN
NaHP2xDLq/pceFBDksWcPBSrzg4Zm7zJ7Tq0DJ/2HakM0RlpHUVnUoel/SYPWjYY
46Hwom0m5uFE6Lwf8uFaoWnxVYlppt0RxBefIFFX1EBpTCgGgkU6ARInLt0OfHee
M5xmgIjv6DXuebfZ0C1I+tMHI/NcnypUt+fKFTQyI0j6GqJcmIWCRquKHb5Z9LXk
HP491lO/9irqvTf9HYXWTB6pS7pjV8oZQ1E07CEOKcsfBymYoiJ6cCWTNsGikmNM
M7YhxK+ebF78YO13jbKg2hvEtasRtHfrVfB1GLO8XchUkMQx0ib7sktZf6TRAO/F
E5UJnleebONjITCiNffP7yys3E5EbaHJkcq/0bbbUs0H0bqHIe0==

——-END PGP SIGNATURE -—-

There is a specific RFC associated with PGP formats, which you can find HERE. I’ve not read it to be honest (it’s not that kind of Monday morning). In order to get the Key ID you need to a few steps.

Firstly we need to strip out the start and end of the signature block (the bits with — — ).

regex_pgp = re.compile(
r” — — -BEGIN [^-]+ — — -([A-Za-z0–9+\/=\s]+) — — -END [^-]+ — — -”, re.MULTILINE)
matches = regex_pgp.findall(m)[0]

Then we need to decode the signature, which is base64 encoded.

b64 = base64.b64decode(matches)

Convert that output to hex, so you can pull out the values you need.

hx = binascii.hexlify(b64)

Then get the values you need.

keyid = hx.decode()[48:64]

Once you have the Key ID you can just make a web call to a PGP key server to search for the key.

server = 'http://keys.gnupg.net/pks/lookup?search=0x{0}&fingerprint=on&op=index'.format(keyid)

Then because well I’m lazy I used regex to find any email address.

regex_email = re.compile(r’([\w.-]+@[\w.-]+\.\w+)’, re.DOTALL | re.MULTILINE)
email = re.findall(regex_email, resp.text)

The full script can be found HERE

NOTE: I’ve not tested it exhaustively but it works enough for my needs.

Maltego: AWS Lambda

One of the awesome things about Maltego (and Paterva, the company that makes it), is that they allow people like me, to host remote Maltego transforms (Transform Host Server) using a mixture of their Community iTDS server and your own infrastructure.

For a few years now I’ve been running remote Maltego Transforms on an AWS instance (t2.micro). All you need to get it up and working is a Linux server, Apache2 (you could probably use nginx) and a little bit of time. Paterva provide all the files and installation notes you need HERE and it works a treat.

If you work in tech you can’t really escape the flood of people talking about “Serverless” deployments and all that kind of stuff so I thought it would be cool to try and recreate the Transform Host Server but using Python 3.x, Flask (instead of Bottle), Zappa, and AWS’s Lambda functions.

Paterva’s original Maltego.py (the Python library for remote transforms) is written in Python 2.x, which means the first job was to tweak the file so Python 3.x didn’t get all funny about print statements (the good news is the tweaked version is available in the Github repo below).

Once that was done, I then needed to recreate the web server component (which was originally written in Python’s bottle framework) to make use of the awesome Python Flask framework. This probably took the longest as I had to work out the differences between Flask & Bottle.

A skeleton Flask server is shown below, the transform I wrote to prove the concept was to simply connect to a website, and return the status code (200, 404 etc. etc.).

#!/usr/bin/env python

from flask import Flask, request
from maltego import MaltegoMsg
from transforms import trx_getstastuscode

app = Flask(__name__)

@app.route(‘/’, methods=[‘GET’])
def lambda_test():
    return ‘Hello World’


@app.route(‘/tds/robots’, methods=[‘GET’, ‘POST’])
def get_robots():
    return (trx_getstastuscode(MaltegoMsg(request.data)))

if __name__ == "__main__":
    app.run()

The lamba_test function, is just there so I could make sure it was working, you can (and probably should remove it).

Now the next step was to deploy this to AWS’s Lambda service, being lazy I decided to try Zappa;

Zappa makes it super easy to build and deploy server-less, event-driven Python applications (including, but not limited to, WSGI web apps) on AWS Lambda + API Gateway. Think of it as “serverless” web hosting for your Python apps. That means infinite scaling, zero downtime, zero maintenance — and at a fraction of the cost of your current deployments!

Zappa is super easy to use, just follow the instructions and make sure to use Python virtual environments (not like someone we won’t mention, who forgot). Provide Zappa with some AWS credentials that have the level of access and within minutes you will be deploying your new app as a AWS Lambda function (seriously it only takes a few minutes). It’s important that you take note of the URL provided at the end of the deployment as you will need it in the final stage.

The final stage to this great masterpiece is to configure your account on Paterva’s Community iTDS server to point to your new transform. The documentation is HERE if you’ve never done before. Just one thing to note, the *Transform URL *is the URL outputted by Zappa above (it should stay the same not matter how many times you deploy).

The nice thing about using AWS’s Lambda functions is that its really easy and quick to deploy and the pricing model works great if you aren’t expecting heavy usage (1 million requests per month or 400,000 GB-seconds per month on the free-tier). Now there is no reason for you not to be hosting Maltego transforms for the world to share…

All the files you need to deploy your first Maltego Lambda function are in my Github repo below, clone the repo, configure your virtual environment (there is a handy requirements.txt to help) and off you go.

Simple example of how to use AWS’s Lambda functions to host Maltego remote transforms