Maltego: Meta Search Engines

The other day on the train, I was reading “The Tao of Open Source Intelligence” by Stewart Bertram and there is a section when he talks about “Metasearch Engines”. Basically when you use search engines like Google or Bing you are searching their database of results. Metasearch engines typically aggregate the results from multiple different search engines which is useful when performing OSINT related searches or just if you want to save time.

A while back I came across searx:

Searx is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled. Additionally, searx can be used over Tor for online anonymity.

Now I like searx for a number of reasons;

  • It’s written in Python (flask).
  • You can run it in a Docker container.
  • You can create your own (or add) search engines.
  • You can search for images, files, bittorents etc.
  • It has the ability to output in JSON, CSV and RSS

The part of searx that really interests me is the ability to add your own search engines. I’ve yet to try this but in theory not only can you add “normal” search engines but you should also be able to add your own internal data sources. For example if you have an Elastic search index you could write a simple API to query it and then add it as a search engine within searx.

You could in theory (I am going to test this) remove all the normal search engines and use searx as a search engine over all your internal data sources (some assembly required).

Now what’s any of this got to do with Maltego? Well Maltego (out of the box) has some transforms for passing entities through search engines, however sometimes you just want to “Google” something and see what you get back (well that might just be me).

The idea was to run a docker instance of searx, and then create a local Maltego transform to run a query and return the first 10 pages of results.

Lets break down the steps:

  1. Build a docker image for searx.
  2. Write a local transform to query searx and return the results.

Before I get into the good stuff, let me quickly explain local transforms (in case you don’t know). Local transforms work the same as remote transforms except that they are run locally on your machine. Local transforms have some benefits over remote transforms but also some downsides as well (not an exhaustive list).

Benefits:

  • You get all the benefits of the processing power of your local machine.
  • You can run transforms that query local files on your machine, for example if you want to have a transform to parse a pcap file you can (more difficult with remote transforms).
  • It’s much quicker to develop/make changes to local transforms and you don’t have any other infrastructure requirements.

Downsides:

  • Its more difficult to share local transforms with others (sharing is caring).
  • You have to worry about library dependencies and all that stuff.

Ok enough of that stuff, lets get onto the interesting stuff.

First off we need to get a Docker image for searx, luckily this is really easy as the author’s documentation is really good. The instructions for this can be found HERE.

All being well you should have a running instance of searx which should look like the image below:

Have a play with searx, it’s well worth the time and has lots of cool features.

The local Maltego transform (it’s written in Python 3.x ) is pretty simple, pass it your query (based on a Maltego Phrase entity) and it will return the results as URL entities.

Paterva provide a Python library for local transforms, and similar to the remote TRX library it was written in Python 2.x so I tweaked it to work with Python 3.x.

Here is the code for the transform (Github repo linked at the end).

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import sys
import json
from settings import BaseConfig as settings
from MaltegoTransform import *


def metasearch(query):
    m = MaltegoTransform()
    for page in range(1, settings.MAX_PAGES):
        url = '{0}{1}&format=json&pageno={2}'.format(
            settings.SEARX, query, page)
        response = requests.post(url).json()
        for r in response['results']:
            ent = m.addEntity('maltego.URL', r['url'])
            ent.addAdditionalFields('url', 'URL', True, r['url'])
            if r.get('title'):
                ent.addAdditionalFields(
                    'title', 'Title', True, r['title'])
            if r.get('content'):
                ent.addAdditionalFields(
                    'content', 'Content', True, r['content'])
    m.returnOutput()


if __name__ == '__main__':
    metasearch(sys.argv[1])

Now the transform maps across the URL, the title of the page (if it exists) and the content (which is a snippet of text, similar to Google results) as an additional field.

Below is a screenshot of what it looks like in Maltego.

To get the transform working, clone the Github repo (link below) and then you will just need to add the new transform, if you’ve never done it before Maltego provide instructions HERE.

One of the things I love about Maltego is it doesn’t need to be complicated, creating transforms is easy, and then using them is just as simple. The ability to define what the tool can do for you, makes it powerful and versatile.

The Github repo is HERE

Maltego: AWS Lambda

One of the awesome things about Maltego (and Paterva, the company that makes it), is that they allow people like me, to host remote Maltego transforms (Transform Host Server) using a mixture of their Community iTDS server and your own infrastructure.

For a few years now I’ve been running remote Maltego Transforms on an AWS instance (t2.micro). All you need to get it up and working is a Linux server, Apache2 (you could probably use nginx) and a little bit of time. Paterva provide all the files and installation notes you need HERE and it works a treat.

If you work in tech you can’t really escape the flood of people talking about “Serverless” deployments and all that kind of stuff so I thought it would be cool to try and recreate the Transform Host Server but using Python 3.x, Flask (instead of Bottle), Zappa, and AWS’s Lambda functions.

Paterva’s original Maltego.py (the Python library for remote transforms) is written in Python 2.x, which means the first job was to tweak the file so Python 3.x didn’t get all funny about print statements (the good news is the tweaked version is available in the Github repo below).

Once that was done, I then needed to recreate the web server component (which was originally written in Python’s bottle framework) to make use of the awesome Python Flask framework. This probably took the longest as I had to work out the differences between Flask & Bottle.

A skeleton Flask server is shown below, the transform I wrote to prove the concept was to simply connect to a website, and return the status code (200, 404 etc. etc.).

#!/usr/bin/env python

from flask import Flask, request
from maltego import MaltegoMsg
from transforms import trx_getstastuscode

app = Flask(__name__)

@app.route(‘/’, methods=[‘GET’])
def lambda_test():
    return ‘Hello World’


@app.route(‘/tds/robots’, methods=[‘GET’, ‘POST’])
def get_robots():
    return (trx_getstastuscode(MaltegoMsg(request.data)))

if __name__ == "__main__":
    app.run()

The lamba_test function, is just there so I could make sure it was working, you can (and probably should remove it).

Now the next step was to deploy this to AWS’s Lambda service, being lazy I decided to try Zappa;

Zappa makes it super easy to build and deploy server-less, event-driven Python applications (including, but not limited to, WSGI web apps) on AWS Lambda + API Gateway. Think of it as “serverless” web hosting for your Python apps. That means infinite scaling, zero downtime, zero maintenance — and at a fraction of the cost of your current deployments!

Zappa is super easy to use, just follow the instructions and make sure to use Python virtual environments (not like someone we won’t mention, who forgot). Provide Zappa with some AWS credentials that have the level of access and within minutes you will be deploying your new app as a AWS Lambda function (seriously it only takes a few minutes). It’s important that you take note of the URL provided at the end of the deployment as you will need it in the final stage.

The final stage to this great masterpiece is to configure your account on Paterva’s Community iTDS server to point to your new transform. The documentation is HERE if you’ve never done before. Just one thing to note, the *Transform URL *is the URL outputted by Zappa above (it should stay the same not matter how many times you deploy).

The nice thing about using AWS’s Lambda functions is that its really easy and quick to deploy and the pricing model works great if you aren’t expecting heavy usage (1 million requests per month or 400,000 GB-seconds per month on the free-tier). Now there is no reason for you not to be hosting Maltego transforms for the world to share…

All the files you need to deploy your first Maltego Lambda function are in my Github repo below, clone the repo, configure your virtual environment (there is a handy requirements.txt to help) and off you go.

Simple example of how to use AWS’s Lambda functions to host Maltego remote transforms