OSINT: Certificate Transparency Lists

NOTE: The wonderful people over at OSINT Curious wrote a similar blog post about this last year. The link is HERE. I recommend giving it a read if you want to know more about how to “hunt” using information from certificates.

It’s 2020 and I made myself a promise to try and write more blog posts about things so here is the first for the year. SSL/TLS certificates have been a great resource for a while now to identify, pivot from and hunt for infrastructure on the internet. The introduction of the Certificate Transparency Lists were first introduced by Google in 2013 after DigiNotar was compromised in 2011. It basically provides public logs of certificates issued by trusted certificate authorities and is a great source of intel for a number of functions (such as, but not limited to):

  • Detection of phishing websites
  • Brand/Reputation Monitoring
  • Tracking malicious infrastructure
  • Bug Bounties

For a long time newly registered/observed domains have been an indicator or potential malicious activity, for example if ebaythisisfake.com was registered you could guess that this was going to be a phishing website. The issue with watching for newly registered domains (it’s still a good indicator), is that a domain could be purchased and then not used for days/months/years.

The wonderful thing about using the CT lists is that the fact a certificate has been registered usually (probably 95% of the time) means that the infrastructure is active and in use, it also means that there is active DNS (required for requesting a certificate) for that domain and an IP address to investigate. It also allows you to catch when a legitimate domain has been compromised and is being used for malicious stuff, for example ebay.legitimatedomain.com which is why CT lists are great for catching phishing websites.

Since the initial launch of number of companies have started collecting the information published on the CT lists (why not it’s free), and either providing it as part of an existing service (Cyber Threat Intel), providing tools to look up specific domains (Facebook for example) or giving people access to the raw data in an easy to consume way (this is the one we are interested in).

It’s worth noting that while the CT lists are publicly available, the sheer volume of data generated by them means it’s much easier to use a feed from someone else than build your own, unless of course you want to..

For the rest of this post we are going to be focusing on one specific provider, which is from Calidog (https://certstream.calidog.io/), this company provide the data for free, and have a number of libraries for different programming languages and a command line tool as well.

The command line tool is python based so you can simply install it using Python’s PIP library (we use pip3 because Python 2.7 is end of life).

pip3 install certstream

Once it’s been installed you can then run “certstream” from the command line, running it without any options will just connect and start pulling down the firehose of certificate information.

certstream just being run from the command linecertstream just being run from the command line

Straight away you can start to see the certificates information flying across your terminal. The default option just shows a timestamp, certificate authority and the certificate common name that is being registered. OK its interesting but not very helpful (for our needs anyway). If you run “certstream — full” you will get the same information as well as “all domains” which are any other domain names assigned to that certificate.

certstream with the — full switchcertstream with the — full switch

This “all domains” information is really useful if you are for example tracking a domain or keyword that uses Cloudflare or similar platforms that provide SSL certificates. If you were to run certstream — full | grep cloudflare you would see all the new certificates for services behind Cloudflare.

certstream with — full and grepping for cloudflarecertstream with — full and grepping for cloudflare

Let’s have a look at some use cases, how about if you were interesting in bug bounties you can use CT lists to discover new infrastructure being created by your target company. If you were to run certstream — full | grep tesla.com for example you see any new certificates being created that contained the word “tesla.com”. You could even remove the “.com” part and search for the word tesla, this would generate more false positives but would show third party services that Tesla might be using (for example tesla.glassdoor.com). Another use case similar to this if you were to run certstream — full | grep amazon (for example) it would show any new certificates (as well as some false positives) that are generated by Amazon.

certstream with — full and grepping for amazoncertstream with — full and grepping for amazon

What about brand reputation monitoring, this is where someone is registering a certificate that will be part of a phishing website. This is important if you have customers that have to log into your website, or have to enter some kind of credentials. You can easily do this at the command line using similar commands to before, such as certstream — full | grep [keyword].

The problem with using the command line in this way is that it’s great for short time monitoring but if you want to include this as part of an overall monitoring solution or longer term research you will need to code up a solution. The good thing is that there are libraries available on the calidog.io website and they provide code examples and data pulled from the CT lists (via calidog.io) is JSON formatted so easy to use (well sometimes at least).

For this example we are going to use the Python library and extend the code example to allow the use of multiple keywords. The output will write to the command line but can easily be changed to send either the full JSON or the parts you want to another platform (NoSQL, Elastic, Splunk, Slack etc. etc.).

This is the code currently, it takes a list of keywords and then looks for those in each newly registered certificate, if there is a match it will print it out to the command line. The code uses “paypal”, “ebay” & “amazon” as our example keywords but you can could change this to anything you want.

    import logging
    import sys
    import datetime
    import certstream

    keyword = ['paypal', 'amazon', 'ebay']

    def print_callback(message, context):
        logging.debug("Message -> {}".format(message))

    if message['message_type'] == "heartbeat":

    if message['message_type'] == "certificate_update":
            domains = message['data']['leaf_cert']['all_domains']
            if [k for k in keyword if k in ' '.join(domains)]:
                sys.stdout.write(u"[{}] {} {} \n".format(datetime.datetime.now().strftime('%m/%d/%y %H:%M:%S'), domains[0], ','.join(domains)))

    logging.basicConfig(format='[%(levelname)s:%(name)s] %(asctime)s - %(message)s', level=logging.INFO)

    certstream.listen_for_events(print_callback, url='wss://certstream.calidog.io/')

Now the code won’t necessarily output something straight away as it’s looking just for the keywords you’ve specified but in the screenshot below you can see the output from about 5 minutes using the example keywords.

Example output of the above python script after 5 minutesExample output of the above python script after 5 minutes

If you want to use the data retrieved from the CT list to pivot into other data sources, you can expand the number of fields returned (or use the whole thing) to get the certificate serial number, fingerprint etc.

All the available fields are shown in the Github repository for the calidog.io Python library which is available at the link below:

Python Code:

I’ve made the code used in this post a bit more robust and pushed it to my “Junk” Github repo which you can find HERE.

To run the code you first need to make sure you are using Python3, and have installed the “certstream” library. To install the certstream python library you just need to type on a terminal/command line.

pip3 install certstream

Once the certstream python library is installed you need to create a text file with your list of “keywords”. A single keyword per line is perfect so for example if you wanted to look for new certificate requests for Amazon, EBay and PayPal (for example) just create a file (doesn’t matter what the filename is) and add;


To run the code just execute the following;

python3 osint-certstream.py [filename]

So if your keywords filename is keywords.txt for example you would just run;

python3 osint-certstream.py keywords.txt

To stop the code at any time you can just CTRL + C to kill it.

Let me know if you have any issues, the code works it’s just production ready but gives you an idea of what is possible.

OSINT: Getting email addresses from PGP Signatures

This morning I was reviewing some new Tor onion addresses one of my scripts had collected over the weekend, a lot of the websites I collect are either junk or no longer working.

One of the ones I checked was working, and on the website there was a PGP signature block. That got me thinking if you don’t know someones email address can you determine it from the PGP signature block??

Well the answer is yes, yes you can and with a little of time and some Python (who doesn’t love Python on a Monday morning) you can work out the email address(es) associated from just a PGP signature. The magic comes in determining the KeyId associated with a PGP signature and then using that Key ID querying a PGP key server to find the email address.

The code only currently works on PGP signatures that are the format as the one below (I may update it to work on other formats):

DISCLAIMER: I found this on the “darkweb” so I hold no responsibility/liability for what you do with it.




There is a specific RFC associated with PGP formats, which you can find HERE. I’ve not read it to be honest (it’s not that kind of Monday morning). In order to get the Key ID you need to a few steps.

Firstly we need to strip out the start and end of the signature block (the bits with — — ).

regex_pgp = re.compile(
r” — — -BEGIN [^-]+ — — -([A-Za-z0–9+\/=\s]+) — — -END [^-]+ — — -”, re.MULTILINE)
matches = regex_pgp.findall(m)[0]

Then we need to decode the signature, which is base64 encoded.

b64 = base64.b64decode(matches)

Convert that output to hex, so you can pull out the values you need.

hx = binascii.hexlify(b64)

Then get the values you need.

keyid = hx.decode()[48:64]

Once you have the Key ID you can just make a web call to a PGP key server to search for the key.

server = 'http://keys.gnupg.net/pks/lookup?search=0x{0}&fingerprint=on&op=index'.format(keyid)

Then because well I’m lazy I used regex to find any email address.

regex_email = re.compile(r’([\w.-]+@[\w.-]+\.\w+)’, re.DOTALL | re.MULTILINE)
email = re.findall(regex_email, resp.text)

The full script can be found HERE

NOTE: I’ve not tested it exhaustively but it works enough for my needs.

OSINT: Etag you’re it…

Investigators, malware analysts and a whole host of other people spend time mapping out infrastructure that may or may not belong to criminal entities. Some of the time this infrastructure is hidden behind services that are designed to provide anonymity (think Tor and Cloudflare) which makes any kind of attribution difficult.

Now I’m a data magpie, I have a tendency to collect data and work out later what I’m going to do with it. A while back someone suggested ETag (HTTP Header) as something worth collecting, so I thought I would have a look at potential uses.

ETag’s are part of the HTTP response header and is used for web cache validation. ETag’s are an opaque identifier assigned by a web server to a specific version of a resource found at a URL. The method for how ETag’s are generated isn’t specified in the HTTP specification but generated ETag’s are supposed to use a “collision resistant hash function” (source: https://en.wikipedia.org/wiki/HTTP_ETag)..

I wanted to see if you could use ETag’s as a way to fingerprint web servers, however ETag’s are optional so not all web servers will return an ETag in the HTTP response header. Let’s look at an example, I have an image hosted on an Amazon S3 bucket, this S3 bucket is also fronted by Cloudflare (mostly for caching and performance). The HTTP headers for both requests are below;

Direct request:

HTTP/1.1 200 OK

Accept-Ranges: bytes
Content-Length: 1150
Content-Type: image/x-icon
Date: Thu, 25 Oct 2018 09:06:25 GMT
*ETag: “e910c46eac10d2dc5ecf144e32b688d6”*
Last-Modified: Mon, 06 Aug 2018 12:18:30 GMT
Server: AmazonS3
x-amz-id-2: LxghTaCpRKQhaP69Qm942BrdyhN87m+SIJTh1xz403c5nVEGj4Y7fsPtNrZ2uedLPRL3zCIeHas=
x-amz-request-id: 5116840933538C62

Through Cloudflare:

HTTP/1.1 200 OK

CF-RAY: 46f3876dbb4b135f-LHR
Cache-Control: public, max-age=14400
Connection: keep-alive
Content-Encoding: gzip
Content-Type: image/x-icon
Date: Thu, 25 Oct 2018 09:06:46 GMT
*ETag: W/”e910c46eac10d2dc5ecf144e32b688d6"
*Expect-CT: max-age=604800, report-uri=”https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Expires: Thu, 25 Oct 2018 13:06:46 GMT
Last-Modified: Mon, 06 Aug 2018 12:18:30 GMT
Server: cloudflare
Set-Cookie: __cfduid=d5bd0309883f2470956c30bdbca9c99751540458406; expires=Fri, 25-Oct-19 09:06:46 GMT; path=/; domain=.sneakersinc.net; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
x-amz-id-2: vJsCSPCMJFdBmxdPemrtoX07SCKHliZicOpUYM32Do9TPnCFFLiZxsNXeNo3SK/lv/90sqDBPa0=
x-amz-request-id: CDC9A8EF960ABE44

From this example you can see that the ETag’s are the same, with the exception that the Cloudflare returned ETag has an additional “W/” in the front. According to the Wikipedia article a “W/” at the start of an ETag means it’s a “weak ETag validator”, whereas the direct ETag (not through Cloudflare) had a “strong ETag validator”.

A search on Shodan shows over 23 million results where ETag’s are shown in the results, this gives us a big data set to compare results against. Let’s look at another example, if we search Shodan for this ETag “122f-537eaccb76800” we find 263 results, which shows that ETag’s aren’t necessarily unique. However, if you look at the results the web page titles are the same “Test Page for the Apache HTTP Server on Fedora”. So, it seems that the ETag does relate to the content of the web page, which is in line with the information from Wikipedia.

So, in some instances the ETag is unique, in others it might not be, but it does give you another pivot point to work from.

Let’s have another look at an example, the ETag “dd8094–2c-3e9564c23b600” provides 16,042 results. That’s a lot but if you look at the results (not all of them obviously), you will notice that they are all the same organisation “Team Cymru”. So, while the ETag in this example doesn’t provide a unique match it does provide a unique organisation, which again gives you an additional pivot point.

I’m going to continue looking at ways to collate ETag’s to web servers, if nothing else it’s an interesting way to match content being hosted on a web server. Have a look at this ETag “574d9ce4–264”.

OSINT: Email Verification API

When I started writing Python my focus was on building tools, then I realised that the tools I built I never actually used, mostly because at the time my job wasn’t security related and it was more of a hobby. Now I work in security and the majority of the code I write is focused around collecting, processing and displaying data. I’m lucky I love the work I do, for me the “fun” is around solving problems and because of that the code I write (and I still write a LOT of it) has evolved.

Open Source Intelligence (OSINT) is my new “hobby” there is some cross over into my job but for the most part I write code to perform OSINT related functions. One of the things I discovered on my OSINT journey is that there are lots of API’s, data feeds out there and all sorts that you can use but a lot you have to pay for and in some cases it’s not always clear on how the data is collected or stored. When faced with issues like this I tend to work on the following principal, “if in doubt, write it yourself”.

A while back I discovered a new Python library that is just awesome when you want to create your own API’s called flask_api (an extension of the Flask framework) and I’ve been using it a lot lately to create new “things”.

Today I wanted to share with you my email verification API, essentially you pass an email address to it and it will check to see if it can determine whether or not the email is “valid”. It will also check to see if the target email server is configured as a “catch-all” (in other words any email address at the specified domain will be accepted).

All of this is return in a lovely JSON formatted output which makes it really easy to use as part of a script of another service and even easier to dump the returned data into a database (if you like that sort of thing) and to ease installation there is also a Docker container available if you are into that sort of thing.

The README.md in the Github repository has the necessary installation instructions but its nice and simple once you’ve cloned the repo.

  • pip install -r requirements.txt
  • python server.py

Then to test it you simply use this URL:

http://localhost:8080/api/v1/verify/?q=anon@anon.com (replacing anon@anon.com with your target email address). If all works, you should get something similar to the screenshot below.


The code can be found HERE

Any questions, queries, feedback etc. etc. let me know.