Skip to content

A visualization utility for Apache Nutch data structures

License

Notifications You must be signed in to change notification settings

lewismc/reperio

reperio

Build status Python Version Dependencies Status

Code style: ruffSecurity: bandit Pre-commit Semantic Versions License Coverage Report

Reperio is a visualization utility for Apache Nutch CrawlDB, LinkDB and HostDB data structures.

Reperio is written in Python. It leverages networkx and Bokeh to generate network graph vizualizations.

Quick start

Conda package manager is recommended. Create a conda environment.

conda create -n reperio python==3.10

Activate conda environment and install poetry

conda activate reperio
pip install poetry

Then you can run the client using the following command:

reperio --help

or with Poetry:

poetry run reperio --help

Makefile usage

Makefile contains a lot of functions for faster development.

Install all dependencies and pre-commit hooks

Install requirements:

make install

Pre-commit hooks coulb be installed after git init via

make pre-commit-install

Codestyle and type checks

Automatic formatting uses ruff.

make polish-codestyle

# or use synonym
make formatting

Codestyle checks only, without rewriting files:

make check-codestyle

Note: check-codestyle uses ruff and darglint library

Code security

If this command is not selected during installation, it cannnot be used.

make check-safety

This command launches Poetry integrity checks as well as identifies security issues with Safety and Bandit.

make check-safety

Tests with coverage badges

Run pytest

make test

All linters

Of course there is a command to run all linters in one:

make lint

the same as:

make check-codestyle && make test && make check-safety

Docker

make docker-build

which is equivalent to:

make docker-build VERSION=latest

Remove docker image with

make docker-remove

More information about docker.

Cleanup

Delete pycache files

make pycache-remove

Remove package build

make build-remove

Delete .DS_STORE files

make dsstore-remove

Remove .mypycache

make mypycache-remove

Or to remove all above run:

make cleanup

🛡 License

License

This project is licensed under the terms of the Apache Software License 2.0 license. See LICENSE for more details.

📃 Citation

@misc{reperio,
  author = {lewismc},
  title = {Reperio is a cvisualization utility for Apache Nutch data structures.},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/lewismc/reperio}}
}

Credits 🚀 Your next Python package needs a bleeding-edge project structure.

This project was generated with 3PG

About

A visualization utility for Apache Nutch data structures

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published