Skip to content

scrapinghub/wappalyzer-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wappalyzer-python -- UNMAINTAINED

pypi badge

Python wrapper for Wappalizer (utility that uncovers the technologies used on websites)

Warning: this package is not maintained anymore.

Scrapinghub and Javier Casas, the original author, have no plans to support wappalyzer-python in the foreseeable future (this includes fixing bugs, supporting upgraded dependencies like PyV8 etc.)

If you are interested in continuing the work, please get in touch via opensource@scrapinghub.com so that we can discuss transferring ownership of this repository.

How to use it

>>> from wappalyzer import Wappalyzer
>>> w = Wappalyzer()

>>> w.analyze('http://wikipedia.org')
{u'Apache': {u'confidence': 100, u'version': u'', u'categories': [u'web-servers']},
u'Varnish': {u'confidence': 100, u'version': u'', u'categories': [u'cache-tools']}}

>>> w.analyze('http://tripadvisor.com')
{u'Apache': {u'confidence': 100, u'version': u'', u'categories': [u'web-servers']},
u'Google Analytics': {u'confidence': 100, u'version': u'', u'categories': [u'analytics']},
u'comScore': {u'confidence': 100, u'version': u'', u'categories': [u'analytics']}}

>>> w.analyze('http://facebook.com')
{u'reCAPTCHA': {u'confidence': 100, u'version': u'', u'categories': [u'captchas']}}

You can specify the User-Agent to use:

>>> w.analyze('http://www.google.com', user_agent='your_user_agent')

Or analyze from already downloaded pages (in this case you'll need to have the url and response headers too):

>>> w.analyze_from_data(url=the_url, html=the_html, headers=the_response_headers)

Apps and Categories are available as dict objects:

>>> w.apps
{u'Google Wallet': {u'website': u'wallet.google.com', u'cats': [41], u'script': [u'checkout\\.google\\.com',
u'wallet\\.google\\.com']}, u'Lockerz Share': ...}

>>> w.categories
{u'42': u'tag-managers', u'48': u'network-storage', u'43': u'paywalls', u'49': u'feed-readers', u'24':
u'rich-text-editors', u'25': u'javascript-graphics', u'26': u'mobile-frameworks', ...}

Data can be also updated with the latest version available from the Wappalyzer Github repo:

>>> from wappalyzer import updater
>>> updater.update_all()

By default app icons will be updated to the data/icons folder, in case you need them somewhere else you can specify the destination folder:

>>> from wappalyzer import updater
>>> updater.update_all(icons_folder='your_icons_folder')

Or update them individually:

>>> updater.update_icons(icons_folder='your_icons_folder')

Requirements

Note for macos users: If you have problems installing PyV8 you can use PyV8-OS-X:

pip install -e git://github.com/brokenseal/PyV8-OS-X#egg=pyv8

Install

Using setup:

python setup.py install

Using pypi:

pip install wappalyzer-python

About

UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages