WEB DATA COLLECTION

SPIDA Suite Web Crawlers by Point Duty are a set of tools that collect unstructured data from the clear, deep and dark web. SPIDA comes in three configurations that enable investigators to acquire and collate the material in the most appropriate form for any task.

Huntsman SPIDA by Point Duty is designed specifically for investigators and analysts using IBM’s i2 Analyst’s Notebook charts. Huntsman helps disseminate documents and websites extracting data into entities, link and attributes and allowing seamless addition into IBM i2 Analyst’s Notebook charts for further investigation.

  • Huntsman saves time – Huntsman allows for extraction straight from web, text and image sources into IBM i2 ANB charts directly as entities, links, attributes whilst maintaining the chart.
  • Huntsman is efficient – Huntsman will extract images and text, website data and create an archive of the site or capture a screenshot, all from right click menu.
  • Huntsman is discreet – Huntsman features a built in support for I2P and TOR for private data extraction from anything on clear, deep or dark web.

Huntsman SPIDA opens a window within i2 ANB, creating the interface for Huntsman to work. Extraction is as simple as right click on contextual menus.

Huntsman can be used to manually extract text and images. Extracted data can import directly into i2 Analyst’s Notebook as entities, links and attributes. Huntsman is used to extract data from webpages, forums, bulletin boards and social networks and from documents from formats, such as PDF, Word, HTML or txt files. Huntsman can extract web data using the inbuilt browser from the clear, deep or dark web. Huntsman captures data from the entire website, collected data is logged and maintained with a screenshot and as text, images and scripting. All items are linked to original sources - for archival and evidence purposes.

Wolf SPIDA by Point Duty is our automated site capture program, that captures a site in entirety for further analysis. Wolf SPIDA has a crawl function that searches based on keywords or URL to extract all available data from a targeted term.
Wolf SPIDA is unique, with our heuristic learning engine enabling Wolf to learn the layouts of websites types such as forums, bulletin boards and social networks and the variety of formatting and layout conventions that data is presented in. Wolf SPIDA learns date formats, name conventions, post configurations and reply formats.

  • Wolf SPIDA is adaptive – learns formatting and layouts of bulletins and forums.
  • Wolf SPIDA is efficient – fully automated keyword search and site capture.

Funnelweb SPIDA by Point Duty is our fully automatic, complete website structure extraction program on keyword, queries or URL. Funnelweb allows for multiple search and capture tasks to be run in tandem allowing for scalable and extensive data collection. Funnelweb offers discreet collection options utilising the inbuilt TOR Browser allowing anonymous collection of web data.

  • Funnelweb SPIDA is scalable – many searches can be done simultaneously.
  • Funnelweb SPIDA is through – it captures everything.