diskover - Elasticsearch file system crawler and storage analytics

diskover is a multi-threaded file system crawler that uses Elasticsearch and Kibana to index your file metadata and visualize your storage analytics. diskover crawls and indexes your files on a local computer or remote server using NFS or SMB.

File metadata is bulk added and streamed into Elasticsearch, allowing you to search and visualize your files in Kibana without having to wait until the crawl is finished. diskover is written in Python and runs on Linux, OS X/macOS and Windows.

diskover aims to help manage your storage by identifying old and unused files and give better insights into file duplication and wasted space. It is designed to help deal with managing large amounts of data growth and provide detailed storage analytics.

Screenshots

Kibana dashboards / saved searches and visualizations (included in diskover download) kibana-screenshot diskover-web (diskover’s web file manager and file system search engine) diskover-web Gource visualization support (see videos below) diskover-gource

diskover Gource videos

Installation Guide

Requirements

Windows Additional Requirements

Optional Installs

Download

$ git clone https://github.com/shirosaidev/diskover.git
$ cd diskover

Download latest version

Requirements

You need to have at least Python 2.7. or Python 3.5. and have installed required Python dependencies using pip.

$ sudo pip install -r requirements.txt

Getting Started

Start diskover as root user with:

$ cd /path/you/want/to/crawl
$ sudo python /path/to/diskover.py

For Windows, run CygWin terminal as administrator and then run diskover.

Defaults for crawl with no flags is to index from . (current directory) and files >0 MB and 0 days modified time. Empty files are skipped. Use -h to see cli options.

A successfull crawl should look like this:

  ________  .__        __
  \______ \ |__| _____|  | _________  __ ___________
   |    |  \|  |/  ___/  |/ /  _ \  \/ // __ \_  __ \ /)___(\
   |    `   \  |\___ \|    <  <_> )   /\  ___/|  | \/ (='.'=)
  /_______  /__/____  >__|_ \____/ \_/  \___  >__|   (\")_(\")
          \/        \/     \/   v1.2.0      \/
                      https://github.com/shirosaidev/diskover

2017-09-10 13:23:53,385 [INFO][diskover] Connecting to Elasticsearch
2017-09-10 13:23:53,437 [INFO][diskover] Checking ES index: diskover-2017.04.22
2017-09-10 13:23:53,581 [WARNING][diskover] ES index exists, deleting
2017-09-10 13:23:53,823 [INFO][diskover] Creating ES index
2017-09-10 13:23:54,055 [INFO][diskover] Crawling using 4 threads
Crawling: [100%] |########################################| 10684/10684
2017-09-10 13:24:37,443 [INFO][diskover] Finished crawling

********************************* CRAWL STATS *********************************
 Directories: 10684 / Skipped: 0
 Files: 68818 (56.99 GB) / Skipped: 899 (0B)
 Elapsed time: 0h:00m:44s
*******************************************************************************

User Guide

Read the wiki for more documentation on how to use diskover.

Discussions/Questions

For discussions or questions about diskover, please ask on Google Group.

Bugs

For bugs about diskover, please use the issues page.

License

See the license file.