Scrape, normalize and mine Google News with Python
If Google News had a Python library.
Created by Artem from newscatcherapi.com but you do not need anything from us or from anyone else to get the software going, it just works out of the box.
A Python wrapper of the Google News RSS feed.
Top stories, topic-related news feeds, geolocation news feed, and an extensive full-text search feed.
This work is more of a collection of all things we could find out about how Google News functions.
- 1.Integrating a news feed to your platform/application/website
- 2.Collecting data by topic to train your own ML model
- 3.Search for the latest mentions for your new product
- 4.Media monitoring of people/organizations — PR
Before we start, if you want to integrate Google News data into your production then I would advise you to use one of the 3 methods described below. Why? Because you do not want your server's IP address to be blocked by Google. Every time you call any function there is an HTTPS request to Google's servers. Don't get me wrong, this Python package still works out of the box.
- 1.NewsCatcher's Google News API — all code is written for you, clean & structured JSON output. Low price. You can test it yourself with no credit card.
- 2.ScrapingBee API which handles proxy rotation for you. Each function in this package has
scraping_beeparameter where you paste your API key. You can also try it for free, no credit card is required. See example
- 3.Your own proxy — already have a pool of proxies? Each function in this package has
proxiesparameter (python dictionary) where you just paste your own proxies.
v0.1.1 -- fixed language-country issues