PyGoogleNews

Scrape, normalize and mine Google News with Python

If Google News had a Python library.

Link to the GitHub repo.

Created by Artem from newscatcherapi.com but you do not need anything from us or from anyone else to get the software going, it just works out of the box.

Demo

About

A Python wrapper of the Google News RSS feed.

Top stories, topic-related news feeds, geolocation news feed, and an extensive full-text search feed.

This work is more of a collection of all things we could find out about how Google News functions.

Examples of Use Cases

  1. Integrating a news feed to your platform/application/website

  2. Collecting data by topic to train your own ML model

  3. Search for the latest mentions for your new product

  4. Media monitoring of people/organizations — PR

Working with Google News in Production

Before we start, if you want to integrate Google News data into your production then I would advise you to use one of the 3 methods described below. Why? Because you do not want your server's IP address to be blocked by Google. Every time you call any function there is an HTTPS request to Google's servers. Don't get me wrong, this Python package still works out of the box.

  1. NewsCatcher's Google News API — all code is written for you, clean & structured JSON output. Low price. You can test it yourself with no credit card.

  2. ScrapingBee API which handles proxy rotation for you. Each function in this package has scraping_bee parameter where you paste your API key. You can also try it for free, no credit card is required. See example

  3. Your own proxy — already have a pool of proxies? Each function in this package has proxies parameter (python dictionary) where you just paste your own proxies.

Stack Overflow thread from which it all began

Google XML reference for the search query

Google News Search parameters (The Missing Manual)

Change Log

v0.1.1 -- fixed language-country issues

Last updated