General: Back to Blogging…

I started writing blog posts back in 2011, before I moved into “Cyber” and before I had been introduced into the world of Maltego and Python. Over the last few years my blogging has died down a lot due to the stuff I’ve been working on hasn’t been for public consumption (sorry about that).

What has happened is that over time I’ve ended up with blog posts in two different places, over 100 code repo’s (some public, mostly private) all over the place and generally just a different outlook on how I want to write, code and share information.

With that in mind I have a new domain and I will migrate the blog posts from Medium and my old blog (where it still applies, or I might update some) and organise all my code repo’s. This will no doubt take me a while, but at least it will be in one place again and we can move forward towards the 10 year anniversary of when I started blogging (ignoring the years without posts).

I’m going to be focusing on the things I enjoy doing, so the blog will contain lots of content relating to:

  • Python
  • OSINT type stuff
  • Maltego
  • Darkweb (I’ve spent years mapping darkweb services)
  • More Python…

OSINT: Certificate Transparency Lists

NOTE: The wonderful people over at OSINT Curious wrote a similar blog post about this last year. The link is HERE. I recommend giving it a read if you want to know more about how to “hunt” using information from certificates.

It’s 2020 and I made myself a promise to try and write more blog posts about things so here is the first for the year. SSL/TLS certificates have been a great resource for a while now to identify, pivot from and hunt for infrastructure on the internet. The introduction of the Certificate Transparency Lists were first introduced by Google in 2013 after DigiNotar was compromised in 2011. It basically provides public logs of certificates issued by trusted certificate authorities and is a great source of intel for a number of functions (such as, but not limited to):

  • Detection of phishing websites
  • Brand/Reputation Monitoring
  • Tracking malicious infrastructure
  • Bug Bounties

For a long time newly registered/observed domains have been an indicator or potential malicious activity, for example if ebaythisisfake.com was registered you could guess that this was going to be a phishing website. The issue with watching for newly registered domains (it’s still a good indicator), is that a domain could be purchased and then not used for days/months/years.

The wonderful thing about using the CT lists is that the fact a certificate has been registered usually (probably 95% of the time) means that the infrastructure is active and in use, it also means that there is active DNS (required for requesting a certificate) for that domain and an IP address to investigate. It also allows you to catch when a legitimate domain has been compromised and is being used for malicious stuff, for example ebay.legitimatedomain.com which is why CT lists are great for catching phishing websites.

Since the initial launch of number of companies have started collecting the information published on the CT lists (why not it’s free), and either providing it as part of an existing service (Cyber Threat Intel), providing tools to look up specific domains (Facebook for example) or giving people access to the raw data in an easy to consume way (this is the one we are interested in).

It’s worth noting that while the CT lists are publicly available, the sheer volume of data generated by them means it’s much easier to use a feed from someone else than build your own, unless of course you want to..

For the rest of this post we are going to be focusing on one specific provider, which is from Calidog (https://certstream.calidog.io/), this company provide the data for free, and have a number of libraries for different programming languages and a command line tool as well.

The command line tool is python based so you can simply install it using Python’s PIP library (we use pip3 because Python 2.7 is end of life).

pip3 install certstream

Once it’s been installed you can then run “certstream” from the command line, running it without any options will just connect and start pulling down the firehose of certificate information.

certstream just being run from the command linecertstream just being run from the command line

Straight away you can start to see the certificates information flying across your terminal. The default option just shows a timestamp, certificate authority and the certificate common name that is being registered. OK its interesting but not very helpful (for our needs anyway). If you run “certstream — full” you will get the same information as well as “all domains” which are any other domain names assigned to that certificate.

certstream with the — full switchcertstream with the — full switch

This “all domains” information is really useful if you are for example tracking a domain or keyword that uses Cloudflare or similar platforms that provide SSL certificates. If you were to run certstream — full | grep cloudflare you would see all the new certificates for services behind Cloudflare.

certstream with — full and grepping for cloudflarecertstream with — full and grepping for cloudflare

Let’s have a look at some use cases, how about if you were interesting in bug bounties you can use CT lists to discover new infrastructure being created by your target company. If you were to run certstream — full | grep tesla.com for example you see any new certificates being created that contained the word “tesla.com”. You could even remove the “.com” part and search for the word tesla, this would generate more false positives but would show third party services that Tesla might be using (for example tesla.glassdoor.com). Another use case similar to this if you were to run certstream — full | grep amazon (for example) it would show any new certificates (as well as some false positives) that are generated by Amazon.

certstream with — full and grepping for amazoncertstream with — full and grepping for amazon

What about brand reputation monitoring, this is where someone is registering a certificate that will be part of a phishing website. This is important if you have customers that have to log into your website, or have to enter some kind of credentials. You can easily do this at the command line using similar commands to before, such as certstream — full | grep [keyword].

The problem with using the command line in this way is that it’s great for short time monitoring but if you want to include this as part of an overall monitoring solution or longer term research you will need to code up a solution. The good thing is that there are libraries available on the calidog.io website and they provide code examples and data pulled from the CT lists (via calidog.io) is JSON formatted so easy to use (well sometimes at least).

For this example we are going to use the Python library and extend the code example to allow the use of multiple keywords. The output will write to the command line but can easily be changed to send either the full JSON or the parts you want to another platform (NoSQL, Elastic, Splunk, Slack etc. etc.).

This is the code currently, it takes a list of keywords and then looks for those in each newly registered certificate, if there is a match it will print it out to the command line. The code uses “paypal”, “ebay” & “amazon” as our example keywords but you can could change this to anything you want.

    import logging
    import sys
    import datetime
    import certstream

    keyword = ['paypal', 'amazon', 'ebay']

    def print_callback(message, context):
        logging.debug("Message -> {}".format(message))

    if message['message_type'] == "heartbeat":
            return

    if message['message_type'] == "certificate_update":
            domains = message['data']['leaf_cert']['all_domains']
            if [k for k in keyword if k in ' '.join(domains)]:
                sys.stdout.write(u"[{}] {} {} \n".format(datetime.datetime.now().strftime('%m/%d/%y %H:%M:%S'), domains[0], ','.join(domains)))
                sys.stdout.flush()

    logging.basicConfig(format='[%(levelname)s:%(name)s] %(asctime)s - %(message)s', level=logging.INFO)

    certstream.listen_for_events(print_callback, url='wss://certstream.calidog.io/')

Now the code won’t necessarily output something straight away as it’s looking just for the keywords you’ve specified but in the screenshot below you can see the output from about 5 minutes using the example keywords.

Example output of the above python script after 5 minutesExample output of the above python script after 5 minutes

If you want to use the data retrieved from the CT list to pivot into other data sources, you can expand the number of fields returned (or use the whole thing) to get the certificate serial number, fingerprint etc.

All the available fields are shown in the Github repository for the calidog.io Python library which is available at the link below:
https://github.com/CaliDog/certstream-python

Python Code:

I’ve made the code used in this post a bit more robust and pushed it to my “Junk” Github repo which you can find HERE.

To run the code you first need to make sure you are using Python3, and have installed the “certstream” library. To install the certstream python library you just need to type on a terminal/command line.

pip3 install certstream

Once the certstream python library is installed you need to create a text file with your list of “keywords”. A single keyword per line is perfect so for example if you wanted to look for new certificate requests for Amazon, EBay and PayPal (for example) just create a file (doesn’t matter what the filename is) and add;

    amazon
    ebay
    paypal

To run the code just execute the following;

python3 osint-certstream.py [filename]

So if your keywords filename is keywords.txt for example you would just run;

python3 osint-certstream.py keywords.txt

To stop the code at any time you can just CTRL + C to kill it.

Let me know if you have any issues, the code works it’s just production ready but gives you an idea of what is possible.

OSINT: Getting email addresses from PGP Signatures

This morning I was reviewing some new Tor onion addresses one of my scripts had collected over the weekend, a lot of the websites I collect are either junk or no longer working.

One of the ones I checked was working, and on the website there was a PGP signature block. That got me thinking if you don’t know someones email address can you determine it from the PGP signature block??

Well the answer is yes, yes you can and with a little of time and some Python (who doesn’t love Python on a Monday morning) you can work out the email address(es) associated from just a PGP signature. The magic comes in determining the KeyId associated with a PGP signature and then using that Key ID querying a PGP key server to find the email address.

The code only currently works on PGP signatures that are the format as the one below (I may update it to work on other formats):

DISCLAIMER: I found this on the “darkweb” so I hold no responsibility/liability for what you do with it.

——-BEGIN PGP SIGNATURE ——-

iQIzBAEBCgAdFiEEekpX72xWccPpVa0cjllHEKAeV0MFAlzthwgACgkQjllHEKAe
V0N9pxAApaQWg36TkemWvTu5a4BNuDEVjRQlnYPXg3XGllnIhsMFbiJvUbUJ4JHB
9qe979q1NEjHUaj2Z9u5n7zEs3w6DseZ5/yHAX63ozYfzhTCoix1SZmd+s5bAuPD
EvxCWPDe9jZSnNEPdMSsUMaD47YzNhQW8ZZniefBh1VUArZoUN96EaTk3KNsWYza
lzDH5gs+dALohQYdQiXp65bOsX0jrs7ouqBbY0kEdr/KymPLtiHsOPzzOgOrkLX8
t0oide+/wowbgV/8efH2ryu9s+hr1imVOIeH//0sjm6l64elNv3gp+81hQwCYtTN
NaHP2xDLq/pceFBDksWcPBSrzg4Zm7zJ7Tq0DJ/2HakM0RlpHUVnUoel/SYPWjYY
46Hwom0m5uFE6Lwf8uFaoWnxVYlppt0RxBefIFFX1EBpTCgGgkU6ARInLt0OfHee
M5xmgIjv6DXuebfZ0C1I+tMHI/NcnypUt+fKFTQyI0j6GqJcmIWCRquKHb5Z9LXk
HP491lO/9irqvTf9HYXWTB6pS7pjV8oZQ1E07CEOKcsfBymYoiJ6cCWTNsGikmNM
M7YhxK+ebF78YO13jbKg2hvEtasRtHfrVfB1GLO8XchUkMQx0ib7sktZf6TRAO/F
E5UJnleebONjITCiNffP7yys3E5EbaHJkcq/0bbbUs0H0bqHIe0==

——-END PGP SIGNATURE -—-

There is a specific RFC associated with PGP formats, which you can find HERE. I’ve not read it to be honest (it’s not that kind of Monday morning). In order to get the Key ID you need to a few steps.

Firstly we need to strip out the start and end of the signature block (the bits with — — ).

regex_pgp = re.compile(
r” — — -BEGIN [^-]+ — — -([A-Za-z0–9+\/=\s]+) — — -END [^-]+ — — -”, re.MULTILINE)
matches = regex_pgp.findall(m)[0]

Then we need to decode the signature, which is base64 encoded.

b64 = base64.b64decode(matches)

Convert that output to hex, so you can pull out the values you need.

hx = binascii.hexlify(b64)

Then get the values you need.

keyid = hx.decode()[48:64]

Once you have the Key ID you can just make a web call to a PGP key server to search for the key.

server = 'http://keys.gnupg.net/pks/lookup?search=0x{0}&fingerprint=on&op=index'.format(keyid)

Then because well I’m lazy I used regex to find any email address.

regex_email = re.compile(r’([\w.-]+@[\w.-]+\.\w+)’, re.DOTALL | re.MULTILINE)
email = re.findall(regex_email, resp.text)

The full script can be found HERE

NOTE: I’ve not tested it exhaustively but it works enough for my needs.

OSINT: Etag you’re it…

Investigators, malware analysts and a whole host of other people spend time mapping out infrastructure that may or may not belong to criminal entities. Some of the time this infrastructure is hidden behind services that are designed to provide anonymity (think Tor and Cloudflare) which makes any kind of attribution difficult.

Now I’m a data magpie, I have a tendency to collect data and work out later what I’m going to do with it. A while back someone suggested ETag (HTTP Header) as something worth collecting, so I thought I would have a look at potential uses.

ETag’s are part of the HTTP response header and is used for web cache validation. ETag’s are an opaque identifier assigned by a web server to a specific version of a resource found at a URL. The method for how ETag’s are generated isn’t specified in the HTTP specification but generated ETag’s are supposed to use a “collision resistant hash function” (source: https://en.wikipedia.org/wiki/HTTP_ETag)..

I wanted to see if you could use ETag’s as a way to fingerprint web servers, however ETag’s are optional so not all web servers will return an ETag in the HTTP response header. Let’s look at an example, I have an image hosted on an Amazon S3 bucket, this S3 bucket is also fronted by Cloudflare (mostly for caching and performance). The HTTP headers for both requests are below;

Direct request:

HTTP/1.1 200 OK

Accept-Ranges: bytes
Content-Length: 1150
Content-Type: image/x-icon
Date: Thu, 25 Oct 2018 09:06:25 GMT
*ETag: “e910c46eac10d2dc5ecf144e32b688d6”*
Last-Modified: Mon, 06 Aug 2018 12:18:30 GMT
Server: AmazonS3
x-amz-id-2: LxghTaCpRKQhaP69Qm942BrdyhN87m+SIJTh1xz403c5nVEGj4Y7fsPtNrZ2uedLPRL3zCIeHas=
x-amz-request-id: 5116840933538C62

Through Cloudflare:

HTTP/1.1 200 OK

CF-Cache-Status: REVALIDATED
CF-RAY: 46f3876dbb4b135f-LHR
Cache-Control: public, max-age=14400
Connection: keep-alive
Content-Encoding: gzip
Content-Type: image/x-icon
Date: Thu, 25 Oct 2018 09:06:46 GMT
*ETag: W/”e910c46eac10d2dc5ecf144e32b688d6"
*Expect-CT: max-age=604800, report-uri=”https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Expires: Thu, 25 Oct 2018 13:06:46 GMT
Last-Modified: Mon, 06 Aug 2018 12:18:30 GMT
Server: cloudflare
Set-Cookie: __cfduid=d5bd0309883f2470956c30bdbca9c99751540458406; expires=Fri, 25-Oct-19 09:06:46 GMT; path=/; domain=.sneakersinc.net; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
x-amz-id-2: vJsCSPCMJFdBmxdPemrtoX07SCKHliZicOpUYM32Do9TPnCFFLiZxsNXeNo3SK/lv/90sqDBPa0=
x-amz-request-id: CDC9A8EF960ABE44

From this example you can see that the ETag’s are the same, with the exception that the Cloudflare returned ETag has an additional “W/” in the front. According to the Wikipedia article a “W/” at the start of an ETag means it’s a “weak ETag validator”, whereas the direct ETag (not through Cloudflare) had a “strong ETag validator”.

A search on Shodan shows over 23 million results where ETag’s are shown in the results, this gives us a big data set to compare results against. Let’s look at another example, if we search Shodan for this ETag “122f-537eaccb76800” we find 263 results, which shows that ETag’s aren’t necessarily unique. However, if you look at the results the web page titles are the same “Test Page for the Apache HTTP Server on Fedora”. So, it seems that the ETag does relate to the content of the web page, which is in line with the information from Wikipedia.

So, in some instances the ETag is unique, in others it might not be, but it does give you another pivot point to work from.

Let’s have another look at an example, the ETag “dd8094–2c-3e9564c23b600” provides 16,042 results. That’s a lot but if you look at the results (not all of them obviously), you will notice that they are all the same organisation “Team Cymru”. So, while the ETag in this example doesn’t provide a unique match it does provide a unique organisation, which again gives you an additional pivot point.

I’m going to continue looking at ways to collate ETag’s to web servers, if nothing else it’s an interesting way to match content being hosted on a web server. Have a look at this ETag “574d9ce4–264”.

Maltego: Meta Search Engines

The other day on the train, I was reading “The Tao of Open Source Intelligence” by Stewart Bertram and there is a section when he talks about “Metasearch Engines”. Basically when you use search engines like Google or Bing you are searching their database of results. Metasearch engines typically aggregate the results from multiple different search engines which is useful when performing OSINT related searches or just if you want to save time.

A while back I came across searx:

Searx is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled. Additionally, searx can be used over Tor for online anonymity.

Now I like searx for a number of reasons;

  • It’s written in Python (flask).
  • You can run it in a Docker container.
  • You can create your own (or add) search engines.
  • You can search for images, files, bittorents etc.
  • It has the ability to output in JSON, CSV and RSS

The part of searx that really interests me is the ability to add your own search engines. I’ve yet to try this but in theory not only can you add “normal” search engines but you should also be able to add your own internal data sources. For example if you have an Elastic search index you could write a simple API to query it and then add it as a search engine within searx.

You could in theory (I am going to test this) remove all the normal search engines and use searx as a search engine over all your internal data sources (some assembly required).

Now what’s any of this got to do with Maltego? Well Maltego (out of the box) has some transforms for passing entities through search engines, however sometimes you just want to “Google” something and see what you get back (well that might just be me).

The idea was to run a docker instance of searx, and then create a local Maltego transform to run a query and return the first 10 pages of results.

Lets break down the steps:

  1. Build a docker image for searx.
  2. Write a local transform to query searx and return the results.

Before I get into the good stuff, let me quickly explain local transforms (in case you don’t know). Local transforms work the same as remote transforms except that they are run locally on your machine. Local transforms have some benefits over remote transforms but also some downsides as well (not an exhaustive list).

Benefits:

  • You get all the benefits of the processing power of your local machine.
  • You can run transforms that query local files on your machine, for example if you want to have a transform to parse a pcap file you can (more difficult with remote transforms).
  • It’s much quicker to develop/make changes to local transforms and you don’t have any other infrastructure requirements.

Downsides:

  • Its more difficult to share local transforms with others (sharing is caring).
  • You have to worry about library dependencies and all that stuff.

Ok enough of that stuff, lets get onto the interesting stuff.

First off we need to get a Docker image for searx, luckily this is really easy as the author’s documentation is really good. The instructions for this can be found HERE.

All being well you should have a running instance of searx which should look like the image below:

Have a play with searx, it’s well worth the time and has lots of cool features.

The local Maltego transform (it’s written in Python 3.x ) is pretty simple, pass it your query (based on a Maltego Phrase entity) and it will return the results as URL entities.

Paterva provide a Python library for local transforms, and similar to the remote TRX library it was written in Python 2.x so I tweaked it to work with Python 3.x.

Here is the code for the transform (Github repo linked at the end).

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import sys
import json
from settings import BaseConfig as settings
from MaltegoTransform import *


def metasearch(query):
    m = MaltegoTransform()
    for page in range(1, settings.MAX_PAGES):
        url = '{0}{1}&format=json&pageno={2}'.format(
            settings.SEARX, query, page)
        response = requests.post(url).json()
        for r in response['results']:
            ent = m.addEntity('maltego.URL', r['url'])
            ent.addAdditionalFields('url', 'URL', True, r['url'])
            if r.get('title'):
                ent.addAdditionalFields(
                    'title', 'Title', True, r['title'])
            if r.get('content'):
                ent.addAdditionalFields(
                    'content', 'Content', True, r['content'])
    m.returnOutput()


if __name__ == '__main__':
    metasearch(sys.argv[1])

Now the transform maps across the URL, the title of the page (if it exists) and the content (which is a snippet of text, similar to Google results) as an additional field.

Below is a screenshot of what it looks like in Maltego.

To get the transform working, clone the Github repo (link below) and then you will just need to add the new transform, if you’ve never done it before Maltego provide instructions HERE.

One of the things I love about Maltego is it doesn’t need to be complicated, creating transforms is easy, and then using them is just as simple. The ability to define what the tool can do for you, makes it powerful and versatile.

The Github repo is HERE

Maltego: AWS Lambda

One of the awesome things about Maltego (and Paterva, the company that makes it), is that they allow people like me, to host remote Maltego transforms (Transform Host Server) using a mixture of their Community iTDS server and your own infrastructure.

For a few years now I’ve been running remote Maltego Transforms on an AWS instance (t2.micro). All you need to get it up and working is a Linux server, Apache2 (you could probably use nginx) and a little bit of time. Paterva provide all the files and installation notes you need HERE and it works a treat.

If you work in tech you can’t really escape the flood of people talking about “Serverless” deployments and all that kind of stuff so I thought it would be cool to try and recreate the Transform Host Server but using Python 3.x, Flask (instead of Bottle), Zappa, and AWS’s Lambda functions.

Paterva’s original Maltego.py (the Python library for remote transforms) is written in Python 2.x, which means the first job was to tweak the file so Python 3.x didn’t get all funny about print statements (the good news is the tweaked version is available in the Github repo below).

Once that was done, I then needed to recreate the web server component (which was originally written in Python’s bottle framework) to make use of the awesome Python Flask framework. This probably took the longest as I had to work out the differences between Flask & Bottle.

A skeleton Flask server is shown below, the transform I wrote to prove the concept was to simply connect to a website, and return the status code (200, 404 etc. etc.).

#!/usr/bin/env python

from flask import Flask, request
from maltego import MaltegoMsg
from transforms import trx_getstastuscode

app = Flask(__name__)

@app.route(‘/’, methods=[‘GET’])
def lambda_test():
    return ‘Hello World’


@app.route(‘/tds/robots’, methods=[‘GET’, ‘POST’])
def get_robots():
    return (trx_getstastuscode(MaltegoMsg(request.data)))

if __name__ == "__main__":
    app.run()

The lamba_test function, is just there so I could make sure it was working, you can (and probably should remove it).

Now the next step was to deploy this to AWS’s Lambda service, being lazy I decided to try Zappa;

Zappa makes it super easy to build and deploy server-less, event-driven Python applications (including, but not limited to, WSGI web apps) on AWS Lambda + API Gateway. Think of it as “serverless” web hosting for your Python apps. That means infinite scaling, zero downtime, zero maintenance — and at a fraction of the cost of your current deployments!

Zappa is super easy to use, just follow the instructions and make sure to use Python virtual environments (not like someone we won’t mention, who forgot). Provide Zappa with some AWS credentials that have the level of access and within minutes you will be deploying your new app as a AWS Lambda function (seriously it only takes a few minutes). It’s important that you take note of the URL provided at the end of the deployment as you will need it in the final stage.

The final stage to this great masterpiece is to configure your account on Paterva’s Community iTDS server to point to your new transform. The documentation is HERE if you’ve never done before. Just one thing to note, the *Transform URL *is the URL outputted by Zappa above (it should stay the same not matter how many times you deploy).

The nice thing about using AWS’s Lambda functions is that its really easy and quick to deploy and the pricing model works great if you aren’t expecting heavy usage (1 million requests per month or 400,000 GB-seconds per month on the free-tier). Now there is no reason for you not to be hosting Maltego transforms for the world to share…

All the files you need to deploy your first Maltego Lambda function are in my Github repo below, clone the repo, configure your virtual environment (there is a handy requirements.txt to help) and off you go.

Simple example of how to use AWS’s Lambda functions to host Maltego remote transforms

OSINT: Email Verification API

When I started writing Python my focus was on building tools, then I realised that the tools I built I never actually used, mostly because at the time my job wasn’t security related and it was more of a hobby. Now I work in security and the majority of the code I write is focused around collecting, processing and displaying data. I’m lucky I love the work I do, for me the “fun” is around solving problems and because of that the code I write (and I still write a LOT of it) has evolved.

Open Source Intelligence (OSINT) is my new “hobby” there is some cross over into my job but for the most part I write code to perform OSINT related functions. One of the things I discovered on my OSINT journey is that there are lots of API’s, data feeds out there and all sorts that you can use but a lot you have to pay for and in some cases it’s not always clear on how the data is collected or stored. When faced with issues like this I tend to work on the following principal, “if in doubt, write it yourself”.

A while back I discovered a new Python library that is just awesome when you want to create your own API’s called flask_api (an extension of the Flask framework) and I’ve been using it a lot lately to create new “things”.

Today I wanted to share with you my email verification API, essentially you pass an email address to it and it will check to see if it can determine whether or not the email is “valid”. It will also check to see if the target email server is configured as a “catch-all” (in other words any email address at the specified domain will be accepted).

All of this is return in a lovely JSON formatted output which makes it really easy to use as part of a script of another service and even easier to dump the returned data into a database (if you like that sort of thing) and to ease installation there is also a Docker container available if you are into that sort of thing.

The README.md in the Github repository has the necessary installation instructions but its nice and simple once you’ve cloned the repo.

  • pip install -r requirements.txt
  • python server.py

Then to test it you simply use this URL:

http://localhost:8080/api/v1/verify/?q=anon@anon.com (replacing anon@anon.com with your target email address). If all works, you should get something similar to the screenshot below.

emailverify-screenshot

The code can be found HERE

Any questions, queries, feedback etc. etc. let me know.

Beginners Guide to OSINT – Chapter 1

DISCLAIMER: I’m not an Open Source Intelligence (OSINT) professional (not even close). I’ve done some courses (got a cert), written some code and spend far too much using Maltego. OSINT is a subject I enjoy, it’s like doing a jigsaw puzzle with most of the pieces missing. This blog series is MY interpretation of how I do (and view) OSINT related things. You are more than welcome to disagree or ignore what I say.

The first chapter in the OSINT journey is going to cover the subject of “What is OSINT and what can we use it for”, sorry it’s the non technical one but I promise not to make it too long or boring.

What is OSINT??

OSINT is defined by wikipedia as:

“Open-source intelligence (OSINT) is intelligence collected from publicly available sources. In the intelligence community (IC), the term “open” refers to overt, publicly available sources (as opposed to covert or clandestine sources); it is not related to open-source software or public intelligence.” (source)

For the purpose of this blog series we are going to be talking about OSINT from online sources, so basically anything you can find on the internet. The key point for me about OSINT is that it (in my opinion) only relates to information you can find for free. Having to pay to get access to information such as an API or raw data isn’t really OSINT material. You are essentially paying someone else to collect the data (that’s the OSINT part) and then just accessing their data. I’m not saying that’s wrong or should be a reason not to use data from paid sources, it’s just (and again just my opinion) not really OSINT in its truest form.

Pitfalls of OSINT

Before we go any further I just wanted to clarify something about collecting data via OSINT. This is something that I often talk to people about and there are varying different opinions about it. When you collect some data via OSINT methods it’s important to remember that the data is only as good as the source you collect it from. The simple rule is “Don’t trust the source, don’t use it”.

You also need to consider about the way that the data is collected. Let me explain a bit more, consider this scenario (totally made up).

You spot someone (within a corporate environment) called Ronny emailing a file called “secretinformation.docx” to an external email address of ronnythespy@madeupemail.com. You decide to do some “OSINT” to work out if the two Ronnies are the same people. Using a tool or chunk of code (in a language you don’t know) you decide that you have enough information to link the two Ronnies together.

Corporate Ronny takes you to court to claim unfair dismissal, during the court procedures you are asked (as the expert witness) how the information was collected. Now you can explain the process you followed (run code or click on the tool) but can you explain how the tool or chunk of code provided you with that information or the methods it used to collect it (where they lawful for example)?

For me, that’s the biggest consideration when using OSINT material if you want to use it to provide true value to what you are trying to accomplish. Being able to collect the information is one thing, validating the methods or techniques on how it was collected is another. Again this is a conversation I have many, many times and I work on this simple principle, “if in doubt, create it yourself” which basically means I have to/get to write some code or build a tool.

This quote essentially sums up everything I just said, “In OSINT, the chief difficulty is in identifying relevant, reliable sources from the vast amount of publicly available information.” (source)

What is OSINT good for?

Absolutely everything!! Well ok nearly everything, but there are a lot of ways that OSINT can be used for fun or within your current job. Here are some examples;

  • Company Due Diligence
  • Recruitment
  • Threat Intelligence
  • Fraud & Theft
  • Marketing
  • Missing Persons

What are we going to cover??

At the moment I’ve got a few topics in mind to cover in this blog series, I am open to suggestions or ideas so if you have anything let me know and I will see what I can do. Here are the topics I’ve come up with so far (which is subject to change).

  • Image Searching
  • Social Media
  • Internet Infrastructure
  • Companies
  • Websites

Hopefully you found this blog post of use (or interesting), leave a comment if you want me to cover another subject or have any questions/queries/concerns/complaints.

Building your own Whois API Server

So it’s been a while since I’ve blogged anything not because I haven’t been busy (I’ve actually been really busy), but more because a lot of the things I work on now I can’t share (sorry). However every now and again I end up coding something useful (well I think it is) that I can share.

I’ve been looking at Domain Squatting recently and needed a way to codify whois lookups for domains. There are loads of APIs out there but you have to pay and I didn’t want to, so I wrote my own.

It’s a lightweight Flask application that accepts a domain, does a whois lookup and then returns a nice JSON response. Nothing fancy, but it will run quite happily on a low spec AWS instance or on a server in your internal environment. I built it to get around having to fudge whois on a Windows server (lets not go there).

In order to run the Flask application you need the following Python libraries (everything else is standard Python libraries).

  • Flask
  • pythonwhois (this is needed for the pwhois command that is used in the code)

To run the server just download the code (link at the bottom of page) and then run.

python whois-server.py

The server runs on port 9119 (you can change this) and you can submit a query like this:

http://localhost:9119/query/?domain=example.com

You will get a response like the picture below:

From here you can either roll it into your own tool set or just it for fun (not sure what sort of fun you are into but..).

You can find the code in my GitHub repo MyJunk or if you just want this code it’s here.

There may well be some bugs but I haven’t found any yet, it runs best on Linux (or Mac OSX).

Any questions, queries etc. etc. you know where to find me.

Pandora – Maltego Graph Thingy

I talk to a lot of different people about Maltego, whether its financial institutions, law enforcement, security professionals or plain old stalkers (only kidding) and the question I usually end up asking them is this;

What do you want to do with the data once it’s in Maltego?

The reporting features in Maltego are good, but sometimes you want something a little bit “different”, usually because you want to add the data you collect in Maltego to another tool you may have, or you want to share the information with others (who don’t have Maltego) or just because you want to spin you own reports.

A few weeks ago on my OSINT course we talked about classifying open source intelligence against the National Intelligence Model (5x5x5), so I decided to see if I could write a tool that would take a Maltego graph and do just that. In addition (more as a by-product) you can now export Maltego graphs (including link information) to a JSON file.

I would like to thank Nadeem Douba (@ndouba) for the inspiration (and some of the code) which is part of the Canari Framework and originally allowed you to export a Maltego Graph to CSV format.

Pandora is a simple lightweight tool that has two main uses. The first is a simple command interface (pandora.py) that will allow you to specify and Maltego Graph and it just spits out a JSON file.

The second usage has the same functionality but via a simple web interface (webserver.py), you can export your Maltego graph to JSON and then get a table based view of the entities held within, you can also click on a link that shows all the outgoing and incoming linked entity types.

This is still a BETA at the moment, the JSON stuff works but the web interface has a few quirks to it. Over the next few weeks I will be adding extra stuff like reporting, the ability to send the JSON to ElasticSearch, Splunk (which has a new HTTP listener available) and some other cool stuff.

You can find Pandora HERE and some screenshots are below:

Maltego Graph (example)

MaltegoGraphExample

Pandora – Web Interface

PandoraWebInterface

Pandora – Web interface with imported graph

PandoraWebInterfaceWithGRaph

Pandora – Graph Information

PandoraUploadedJSON

Pandora – Link Information

LinkInformation

As always any questions, issues etc etc please let me know.