</newscatcher>
Search…
Scroll Through All Pages To Get All Found Articles
Learn how to iterate through found pages and extract articles

Introduction

Whenever you make and an API call using /v2/search or /v2/latest_headlines endpoints, we give you key information about your search. Here is an example of a search API call:
1
curl -XGET 'https://api.newscatcherapi.com/v2/search?q="Elon%20Musk&countries=in,GB&from=13%20days%20ago&page_size=50&page=1' -H 'x-api-key: your_key_1'
Copied!
The output:
1
"status": "ok",
2
"total_hits": 1300,
3
"page": 1,
4
"total_pages": 26,
5
"page_size": 50,
6
"articles": [ ...
Copied!
total_hits tells you how many articles are found.
total_pages indicates how many API calls you will have to make in order to get these articles.
One API call can bring a maximum of 100 articles.
You can increase the page_size parameter for your search to make fewer calls and get more data.
1
curl -XGET 'https://api.newscatcherapi.com/v2/search?q="Elon%20Musk&countries=in,GB&from=13%20days%20ago&page_size=100&page=1' -H 'x-api-key: your_key_1'
Copied!
gives
1
"status": "ok",
2
"total_hits": 1300,
3
"page": 1,
4
"total_pages": 13,
5
"page_size": 100,
6
"articles": [ ...
Copied!
Based on the information above, we can say that given the page_size and being on the 1st page, we are seeing all found articles from 1 to 100 out of 1300.
By incrementing the page parameter to 2, I will get all articles from 101 to 200 out of 1300. You get the logic.
To summarize, your goal is to iterate through all found pages and extract articles.
The whole process should be divided into 2 parts:
  • Make 1 call and identify the total number of pages in total_pages
  • Increment page input until the total_pages value to get all articles.

Python (SDK)

The Python library can be installed using pip install launched from terminal. All the details can be found either on PyPi website or our GitHub Repository.
1
pip install newscatcherapi
Copied!
When installed, the package can be directly called from Python application.
We prepared separate functions get_search_all_pages and get_latest_headlines_all_pages to simplify the process of extracting all articles.
1
from newscatcherapi import NewsCatcherApiClient
2
​
3
newscatcherapi = NewsCatcherApiClient(x_api_key='your_key_1')
4
​
5
# /v2/search Endpoint
6
all_articles = newscatcherapi.get_search_all_pages(
7
q='\"Elon Musk\"',
8
from_='13 days ago',
9
countries='IN,GB',
10
page_size=100,
11
page=1)
Copied!

Python (requests)

1
# Preinstalled packages
2
import requests # 2.24.0
3
​
4
# Default packages
5
import json
6
import time
7
​
8
# URL of our News API
9
base_url = "https://api.newscatcherapi.com/v2/search"
10
​
11
# Your API key
12
X_API_KEY = 'PUT_YOUR_API_KEY'
13
​
14
# Define your desired parameters
15
params = {
16
"q": "\"Elon Musk\"",
17
"from": "13 days ago",
18
"countries": "IN,GB",
19
"page_size": 100,
20
"page": 1
21
}
22
​
23
# Put your API key to headers in order to be authorized to perform a call
24
headers = {"x-api-key": X_API_KEY}
25
​
26
​
27
# Variable to store all found news articles
28
all_news_articles = {}
29
​
30
# Ensure that we start from page 1
31
params['page'] = 1
32
​
33
# Infinite loop which ends when all articles are extracted
34
while True:
35
​
36
# Wait for 1 second between each call
37
time.sleep(1)
38
​
39
# GET Call
40
response = requests.get(base_url, headers=headers, params=params)
41
results = json.loads(response.text.encode())
42
if response.status_code == 200:
43
print(f'Done for page number => {params["page"]}/{results["total_pages"]}')
44
​
45
​
46
# Storing all found articles
47
if not all_news_articles:
48
all_news_articles = results
49
else:
50
all_news_articles['articles'].extend(results['articles'])
51
​
52
# Ensuring to cover all pages by incrementing "page" value at each iteration
53
params['page'] += 1
54
if params['page'] > results['total_pages']:
55
print("All articles have been extracted")
56
break
57
else:
58
print(f'Proceed extracting page number => {params["page"]}')
59
else:
60
print(results)
61
print(f'ERROR: API call failed for page number => {params["page"]}')
62
break
63
​
64
print(f'Number of extracted articles => {str(len(all_news_articles))}')
65
​
Copied!
Last modified 1mo ago