Get More Than 10 000 Articles

Learn to fetch more than 10,000 articles

When you make an API call using the /v2/search endpoint, our news API returns some key information. For example, an API call about "Tesla" for the last week

curl -XGET 'https://api.newscatcherapi.com/v2/search?q="Tesla"&from=1%20week%20ago&page_size=100' -H 'x-api-key: your_key_1'

returns

    "status": "ok",
    "total_hits": 10000,
    "page": 1,
    "total_pages": 100,
    "page_size": 100,
    "articles": [ ...

Here, total_hits tells you how many articles are found.

One API search can yield a maximum of 10,000 hits.

When your API call returns 10,000 total_hits, chances are that there are more than 10,000 news articles for your query. The above Telsa example API call actually matches 25,290 articles. To get all of these results, you have two choices:

  1. Manually divide your search query into smaller time periods and then combine the results.

  2. Use the get_search_all_articles method from our Python SDK to automatically divide your query.

Manually Divide Your News Search Query

The idea is simple, break down your queries into smaller time periods so our API can accommodate all the matches.

Continuing the Tesla search example, to get all the 25,290 news articles you'll need to make a number of smaller search requests. Although technically you should be able to fetch all the results in just three API calls, that's not advisable because you don't know how many articles were published in a time period without making additional API calls.

As Day 4 has 7,841 articles, it would cause issues regardless of whether you combine it with Day 3 or Day 5. And you can't figure out if there is a case like Day 4 for your search queries without making a bunch of API calls.

The recommended approach for dividing your search query is to go down one level in the time scale. So if your search query spans a year or multiple years, you should break it down to individual months. If it spans months, break it down to weeks, and so on and so forth.

So our Tesla query would be divided into seven sub-queries:

curl -XGET 'https://api.newscatcherapi.com/v2/search?q="Tesla"&from=1%20week%20ago&to=6%days%20ago&page_size=100' -H 'x-api-key: your_key_1'
curl -XGET 'https://api.newscatcherapi.com/v2/search?q="Tesla"&from=6%days%20ago&to=5%days%20ago&page_size=100' -H 'x-api-key: your_key_1'
curl -XGET 'https://api.newscatcherapi.com/v2/search?q="Tesla"&from=5%days%20ago&to=4%days%20ago&page_size=100' -H 'x-api-key: your_key_1'
curl -XGET 'https://api.newscatcherapi.com/v2/search?q="Tesla"&from=4%days%20ago&to=3%days%20ago&page_size=100' -H 'x-api-key: your_key_1'

and so on.

Once you have the responses to all these sub-queries you can combine the lists of articles to get all the news articles that match your search query.

Use get_search_all_articles Method

You can also automate this process of dividing your queries with the get_search_all_articles method. It takes your query and based on the by parameter divides it into a bunch of sub-queries. It then returns the combination of all their results.

To get all the articles for the Telsa query you can simply:

from newscatcherapi import NewsCatcherApiClient

API_KEY = "YOUR API KEY"

newscatcherapi = NewsCatcherApiClient(x_api_key=API_KEY)

news_articles = newscatcherapi.get_search_all_articles(q="Tesla", from_ = "1 week ago", by = "day")
    "status": "ok",
    "total_hits": 25290,
    "page": 257,
    "total_pages": 257,
    "page_size": 100,
    "articles": [ ...

The by parameter accepts four values: 'hour', 'day', 'week', and 'months'. And it is set to 'week' by default.

Last updated