Beautiful Soup is a Python library using which you can scrape data from various webpages online. Although Python has another vast and better framework called the Scrapy for web-data scraping purposes but Beautiful Soup is a very light-weight library and does the job quickly.
You can install Beautiful Soup using following two commands:
Some sample outputs...
pip install beautifulsoup4
easy_install beautifulsoup4[email protected]:~# pip install beautifulsoup4 Downloading/unpacking beautifulsoup4 Downloading beautifulsoup4-4.3.1.tar.gz (142Kb): 142Kb downloaded Running setup.py egg_info for package beautifulsoup4 Installing collected packages: beautifulsoup4 Running setup.py install for beautifulsoup4 Successfully installed beautifulsoup4 Cleaning up... [email protected]:~#
Some more sample outputs...
[email protected] [/home/d]# easy_install beautifulsoup4 Searching for beautifulsoup4 Reading http://pypi.python.org/simple/beautifulsoup4/ Best match: beautifulsoup4 4.3.1 Downloading https://pypi.python.org/packages/source/b/beautifulsoup4/beautifulsoup4-4.3.1.tar.gz#md5=508095f2784c64114e06856edc1dafed Processing beautifulsoup4-4.3.1.tar.gz Running beautifulsoup4-4.3.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-gMGxK0/beautifulsoup4-4.3.1/egg-dist-tmp-0kAJy4 zip_safe flag not set; analyzing archive contents... Adding beautifulsoup4 4.3.1 to easy-install.pth file Installed /usr/local/lib/python2.7/site-packages/beautifulsoup4-4.3.1-py2.7.egg Processing dependencies for beautifulsoup4 Finished processing dependencies for beautifulsoup4 [email protected] [/home/d]#
Web Data Extraction / scraping from public websites is common but you should always evaluate the legality of any scraping before you do so.