Links

v3 Search News

Main endpoint that allows you to find news article by keyword, date, language, country, etc.
get
https://v3-api.newscatcherapi.com
/api/search?q=Apple&from_=1 day ago&countries=CA&page_size=1
Get News
Parameters
Query
q*
string
Keyword/keywords you're searching for. This is the most important part of your query. Please, refer to the Advanced Query Parameter section below for more examples and explanations.
search_in
string
By default, we search what you specified in the q parameter in both title and content of the article. However, you can choose between:
-title
-content
-summary (if enabled for your plan)
-title,summary
-content,summary
sources
array
One or more news sources to narrow down your search.
The format should be a domain url from your URL. Subdomains, like finance.yahoo.com are also accepted. Comma-separated string or a list/array. For example, nytimes.com,theguardian.com,finance.yahoo.com
not_sources
array
One or more sources to be excluded from the search. Comma-separated string or a list/array.
For example, cnn.com,wsj.com
predefined_sources
string
Use our TOP predifined sources per country.
Later we are going to improve it and add more functionality, like top categories etc.
The format should be strictly like this:
- starting with word top
- put the number of desired sources top source
- 2 letter country code ISO 3166-1 alpha-2
For example:
top 100 US
top 33 AT
top 5 GB
It is also possible to put multiple countries with custom number of top sources, should be comma separated.
For example:
top 100 US, GB
top 33 AT, 55 IT
lang
array
Specifies the languages of the search. For example, en. The only accepted format is ISO 639-1 — 2 letter code. Refer to the language format section below for more details.
not_lang
array
Inverse to the lang parameter
countries
array
Countries where the news publisher is located. Important: This parameter is not responsible for the countries mentioned in the news article. One or multiple countries can be used in the search. The only acceptable format is ISO 3166-1 alpha-2 For example, US,CA,MX or just US
not_countries
array
The inverse of the countries parameter.
from_
string
From which point in time to start the search. Defaults to the past week. Availabe formats : YYYY/mm/dd YYYY/mm/dd HH:MM:SS English phrases like 1 day ago
to_
string
Until which point in time to search for. The default timezone is UTC. Availabe formats : YYYY/mm/dd YYYY/mm/dd HH:MM:SS
English phrases like1 day ago
published_date_precision
string
There are 3 types of date precision we define: full — day and time of an article is correctly identified with the appropriate timezone timezone unknown — day and time of an article is correctly identified without timezone date — only the day is identified without an exact time
by_parse_date
book
When set to True, transforms your from_ and to_ parameters to filter by parse_date instead of published_date
Be aware that a new variable parse_date will be added to the output list with each article.
sort_by
string
relevancy (default value) — the most relevant results first date — the most recently published results first rank — the results from the highest-ranked sources first
ranked_only
boolean
Default: True Limit the search only for the sources which are in the top 1 million online websites. Unranked sources are assigned a rank that equals 999999
from_rank
integer
[0:999999] The lowest boundary of the rank of a news website to filter by. Important: lower rank means that a source is more popular
to_rank
integer
[0:999999] The upper boundary of the rank of a news website to filter by.
is_headline
boolean
When set to True, only articles that were posted on the home page of a given news domain will be shown.
is_paid_content
[Still in development phase]
When set to False, only articles that publish full public available content will be shown.
Some news publishers partially block content of their articles, so we get only several sentences from them. This filter will help you get full content.
parent_url
array
One or more categorical URL to filter your search. It should be the normal form of the URL, For example, https://www.washingtonpost.com/politics,https://www.washingtonpost.com/technology,https://www.washingtonpost.com/business
all_links
array
Search for desired URL mentioned in the article.
Please, refer to the All Links And Domains Format section below for more examples and explanations.
all_domain_links
array
Search for desired domain URL mentioned in the article.
Please, refer to the All Links And Domains Format section below for more examples and explanations.
word_count_min
integer
Set a minimum number of words that an article must contain.
To be used for avoiding avoid articles with small content.
word_count_max
integer
Set a maximum number of words that an article must contain.
To be used for avoiding avoid articles with big content.
page_size
integer
[1:100] How many articles to return per page.
page
integer
The number of the page. Use it to scroll through the results. This parameter is used to paginate: scroll through results because one API response cannot return more than 100 articles.
clustering_enabled
boolean
[Available only if NLP enabled for your plan]
When set to True, enables clustering on articles. Instead of showing a list of articles, you will be given a list of clustering to put together similar articles.
Please, refer to the Deduplicate Data With Clustering section below for more examples and explanations.
clustering_threshold
float
[Available only if NLP enabled for your plan]
Set a threshold for an article to be similar.
Default value: 0.6
The value can vary from 0 to 1.
clustering_variable
string
[Available only if NLP enabled for your plan]
Select the data on which you want the similarity to be calculated on.
Accepted values:
content, title, summary
Default value:
content
include_domain_info
boolean
[Still in development phase] When set to True, shows an additional information about the News Domains.
include_nlp_data
boolean
When set to True, adds to each article a NLP layer.
Not available for all plans. Please contact us to enable it.
has_nlp
boolean
[Available only if NLP enabled for your plan]
When set to True, filter data only to those articles that have an NLP layer.
theme
string
[Available only if NLP enabled for your plan]
Accepted values:
Business, Economics, Entertainment, Finance, Health, Politics, Science, Sports, Tech, Crime
Comma-separated string or a list/array.
Multiple themes can be selected.
For example:
Business
Business, Finance
Topic labelling is based on the actual content of an article.
ORG_entity_name
string
[Available only if NLP enabled for your plan]
ORG stands for Organisation.
We identify company names mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
PER_entity_name
string
[Available only if NLP enabled for your plan]
PER stands for Person.
We identify people names mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
LOC_entity_name
string
[Available only if NLP enabled for your plan]
LOC stands for Location.
We identify geographical locations mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
MISC_entity_name
string
[Available only if NLP enabled for your plan]
MISC stands for Others.
We identify product and other names, mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
title_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from -1 to 1.
title_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from -1 to 1.
content_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from -1 to 1.
content_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from -1 to 1.
Header
x-api-token*
string
Your unique authentication token
Responses
200
Success
403: Forbidden
Invalid API Key
406
Unsupported Parameters
408
Request Timeout
422: Unprocessable Entity
Parameter is not allowed
429
Too many API calls
post
https://v3-api.newscatcherapi.com
/api/search?q=Apple&from_=1 day ago&countries=CA&page_size=1
Get News
Parameters
Query
q*
string
Keyword/keywords you're searching for. This is the most important part of your query. Please, refer to the Advanced Query Parameter section below for more examples and explanations.
search_in
string
By default, we search what you specified in the q parameter in both title and content of the article. However, you can choose between:
-title
-content
-summary (if enabled for your plan)
-title,summary
-content,summary
sources
array
One or more news sources to narrow down your search.
The format should be a domain url from your URL. Subdomains, like finance.yahoo.com are also accepted. Comma-separated string or a list/array. For example, nytimes.com,theguardian.com,finance.yahoo.com
not_sources
array
One or more sources to be excluded from the search. Comma-separated string or a list/array.
For example, cnn.com,wsj.com
predefined_sources
string
Use our TOP predifined sources per country.
Later we are going to improve it and add more functionality, like top categories etc.
The format should be strictly like this:
- starting with word top
- put the number of desired sources top source
- 2 letter country code ISO 3166-1 alpha-2
For example:
top 100 US
top 33 AT
top 5 GB
It is also possible to put multiple countries with custom number of top sources, should be comma separated.
For example:
top 100 US, GB
top 33 AT, 55 IT
lang
array
Specifies the languages of the search. For example, en. The only accepted format is ISO 639-1 — 2 letter code. Refer to the language format section below for more details.
not_lang
array
Inverse to the lang parameter
countries
array
Countries where the news publisher is located. Important: This parameter is not responsible for the countries mentioned in the news article. One or multiple countries can be used in the search. The only acceptable format is ISO 3166-1 alpha-2 For example, US,CA,MX or just US
not_countries
array
The inverse of the countries parameter.
from_
string
From which point in time to start the search. Defaults to the past week. Availabe formats : YYYY/mm/dd YYYY/mm/dd HH:MM:SS English phrases like 1 day ago
to_
string
Until which point in time to search for. The default timezone is UTC. Availabe formats : YYYY/mm/dd YYYY/mm/dd HH:MM:SS
English phrases like1 day ago
published_date_precision
string
There are 3 types of date precision we define: full — day and time of an article is correctly identified with the appropriate timezone timezone unknown — day and time of an article is correctly identified without timezone date — only the day is identified without an exact time
by_parse_date
book
When set to True, transforms your from_ and to_ parameters to filter by parse_date instead of published_date
Be aware that a new variable parse_date will be added to the output list with each article.
sort_by
string
relevancy (default value) — the most relevant results first date — the most recently published results first rank — the results from the highest-ranked sources first
ranked_only
boolean
Default: True Limit the search only for the sources which are in the top 1 million online websites. Unranked sources are assigned a rank that equals 999999
from_rank
integer
[0:999999] The lowest boundary of the rank of a news website to filter by. Important: lower rank means that a source is more popular
to_rank
integer
[0:999999] The upper boundary of the rank of a news website to filter by.
is_headline
boolean
When set to True, only articles that were posted on the home page of a given news domain will be shown.
is_paid_content
[Still in development phase]
When set to False, only articles that publish full public available content will be shown.
Some news publishers partially block content of their articles, so we get only several sentences from them. This filter will help you get full content.
parent_url
array
One or more categorical URL to filter your search. It should be the normal form of the URL, For example, https://www.washingtonpost.com/politics,https://www.washingtonpost.com/technology,https://www.washingtonpost.com/business
all_links
array
Search for desired URL mentioned in the article.
Please, refer to the All Links And Domains Format section below for more examples and explanations.
all_domain_links
array
Search for desired domain URL mentioned in the article.
Please, refer to the All Links And Domains Format section below for more examples and explanations.
word_count_min
integer
Set a minimum number of words that an article must contain.
To be used for avoiding avoid articles with small content.
word_count_max
integer
Set a maximum number of words that an article must contain.
To be used for avoiding avoid articles with big content.
page_size
integer
[1:100] How many articles to return per page.
page
integer
The number of the page. Use it to scroll through the results. This parameter is used to paginate: scroll through results because one API response cannot return more than 100 articles.
clustering_enabled
boolean
[Available only if NLP enabled for your plan]
When set to True, enables clustering on articles. Instead of showing a list of articles, you will be given a list of clustering to put together similar articles.
Please, refer to the Deduplicate Data With Clustering section below for more examples and explanations.
clustering_threshold
float
[Available only if NLP enabled for your plan]
Set a threshold for an article to be similar.
Default value: 0.6
The value can vary from 0 to 1.
clustering_variable
string
[Available only if NLP enabled for your plan]
Select the data on which you want the similarity to be calculated on.
Accepted values:
content, title, summary
Default value:
content
include_domain_info
boolean
[Still in development phase] When set to True, shows an additional information about the News Domains.
include_nlp_data
boolean
When set to True, adds to each article a NLP layer.
Not available for all plans. Please contact us to enable it.
has_nlp
boolean
[Available only if NLP enabled for your plan]
When set to True, filter data only to those articles that have an NLP layer.
theme
string
[Available only if NLP enabled for your plan]
Accepted values:
Business, Economics, Entertainment, Finance, Health, Politics, Science, Sports, Tech, Crime
Comma-separated string or a list/array.
Multiple themes can be selected.
For example:
Business
Business, Finance
Topic labelling is based on the actual content of an article.
ORG_entity_name
string
[Available only if NLP enabled for your plan]
ORG stands for Organisation.
We identify company names mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
PER_entity_name
string
[Available only if NLP enabled for your plan]
PER stands for Person.
We identify people names mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
LOC_entity_name
string
[Available only if NLP enabled for your plan]
LOC stands for Location.
We identify geographical locations mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
MISC_entity_name
string
[Available only if NLP enabled for your plan]
MISC stands for Others.
We identify product and other names, mentioned in articles and give you the possibility to specify a search on it.
More information on Search By Entity
title_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from -1 to 1.
title_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from -1 to 1.
content_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from -1 to 1.
content_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from -1 to 1.
Header
x-api-token*
string
Your unique authentication token
Responses
200
Success
403: Forbidden
Invalid API Key
406
Unsupported Parameters
408
Request Timeout
422: Unprocessable Entity
Parameter is not allowed
429
Too many API calls

Successful Request Response

1
{
2
"status": "ok",
3
"total_hits": 6936,
4
"page": 1,
5
"total_pages": 6936,
6
"page_size": 1,
7
"articles": [
8
{
9
"title": "'I Love It': Elon Musk Elated After AI-Generated Photo of Tesla CEO in Desi Attire Leaves Netizens Impressed",
10
"author": "",
11
"authors": [],
12
"published_date": "2023-06-03 05:37:20",
13
"published_date_precision": "full",
14
"updated_date": "2023-06-03 05:39:42",
15
"updated_date_precision": "full",
16
"link": "https://www.latestly.com/socially/social-viral/i-love-it-elon-musk-expresses-happiness-after-ai-generated-pictures-of-himself-as-indian-groom-leaves-netizens-impressed-5174266.html",
17
"domain_url": "latestly.com",
18
"full_domain_url": "latestly.com",
19
"name_source": "LatestLY",
20
"is_headline": false,
21
"parent_url": "https://www.latestly.com/socially",
22
"country": "IN",
23
"rights": "latestly.com",
24
"rank": 13754,
25
"media": "https://st1.latestly.com/wp-content/uploads/2023/06/Elon-Musk-AI-Pics-784x441.jpg",
26
"language": "en",
27
"description": "Earlier, AI-generated pictures f Elon Musk as an Indian groom went viral after it was shared on Instagram. Reportedly, the series of pictures of Musk in Indian wedding attire was shared by Rolling…",
28
"content": "AI-generated pictures of Tesla CEO Elon Musk have taken the internet by storm. The Artificial Intelligence (AI) artistic pictures of the Twitter owner as an Indian Groom has not only won the hearts of netizens but of Musk himself as well. Reacting to one AI-generated picture of himself as an Indian groom, Musk said, \"I love it!\" He also shared two emojis showing the Indian Flag. Earlier, AI-generated pictures f Elon Musk as an Indian groom went viral after it was shared on Instagram. Reportedly, the series of pictures of Musk in Indian wedding attire was shared by Rolling Canvas Presentations. The photography page used an AI application named Midjourney to create the images. Interestingly, the pictures show Musk wearing sherwani as he dances with the wedding guests and even rides a horse. Porn Will Kill Generative AI? Elon Musk Shares Meme Predicting Future of Artificial Intelligence. Midjourney Art of Elon Musk in an Indian Attire A midjourney art of Elon Musk in an Indian attire is going viral in India. 🇮🇳 pic.twitter.com/LD1KuIAHET — DogeDesigner (@cb_doge) June 3, 2023 I Love It, Says Elon Musk 🇮🇳 I love it! 🇮🇳 — Elon Musk (@elonmusk) June 3, 2023 When Elon Musk Had an Indian Wedding View this post on Instagram",
29
"word_count": 209,
30
"is_opinion": false,
31
"twitter_account": "@latestly",
32
"all_links": [
33
"https://apps.apple.com/us/app/latestly/id1382168376?ls=1 ",
34
"https://t.co/LD1KuIAHET",
35
"https://twitter.com/cb_doge/status/1664798316775841795?ref_src=twsrc%5Etfw",
36
"https://facebook.com/Latestly",
37
"https://twitter.com/elonmusk/status/1664799183063138304?ref_src=twsrc%5Etfw",
38
"https://www.youtube.com/channel/UC3Fci14HzLNhI9_stjLWV7g",
39
"https://www.instagram.com/latestly/",
40
"https://www.dailymotion.com/LatestLY",
41
"https://news.google.com/publications/CAAqBwgKMOyHggsw6Mf-Ag?oc=3&ceid=IN:en&hl=en-IN&gl=IN",
42
"https://play.google.com/store/apps/details?id=com.media.latestly.latestlymedia&hl=en_IN",
43
"https://www.linkedin.com/company/13592851/",
44
"https://telegram.me/LatestLYNewsBot?start",
45
"https://news.google.com/publications/CAAqBwgKMOyHggsw6Mf-Ag?oc=3&ceid=IN:en",
46
"https://www.instagram.com/p/Cs0k-68Ls4x/?utm_source=ig_embed&utm_campaign=loading",
47
"https://twitter.com/Latestly"
48
],
49
"all_domain_links": [
50
"linkedin.com",
51
"instagram.com",
52
"facebook.com",
53
"telegram.me",
54
"youtube.com",
55
"dailymotion.com",
56
"apple.com",
57
"google.com",
58
"twitter.com",
59
"t.co"
60
],
61
"id": "b1559e2928cfc60cc451b484323645d7",
62
"score": 23.17752
63
}
64
],
65
"user_input": {...}
66
}

Return Body Fields

Object
Sub Object
Description
status
Returns ok if everything went well.
Returns error in case of an error (plus 2 additional fields in case of error — error_code and message)
total_hits
How many news articles match your search criterion. Maximum is 10,000
page
The page where you are at
total_pages
How many pages you can access given your page_size parameter
page_size
How many news articles are in the returned JSON object
articles:
News articles found. list
title
The title of the article
author
The author of the article
authors
An array of all author names
published_date
Published date & time
published_date_precision
Accuracy of the published_date field.
There are 3 types of date precision we define:
full — day and time of an article is correctly identified with the appropriate timezone
timezone unknown — day and time of an article is correctly identified without timezone
date — only the day is identified without an exact time
updated_date
Updated date & time
updated_date_precision
Accuracy of the updated_datefield.
There are 3 types of date precision we define:
full — day and time of an article is correctly identified with the appropriate timezone
timezone unknown — day and time of an article is correctly identified without timezone
date — only the day is identified without an exact time
link
Full URL where the article was originally published
domain_url
The domain URL of the article's source
full_domain_url
The full domain URL with a subcategory of the article's source
name_source
The common name of the News Source
is_headline
True when an article has been seen on the main page of the news source.
parent_url
The URL where an article was initially found
country
The country of the publisher
rights
Copyright
rank
The page rank of the source website (which is given in the clean_url)
media
A link to a thumbnail image of the article
language
The language of the article
description
Short summary of the article provided by the publisher
content
The full content of the article
word_count
Number of words in the article's content
is_opinion
True if the article is an "Opinion" article
twitter_account
The Twitter account of the publisher
all_links
All URL links embedded in the article's content HTML
all_domain_links
All domain URL embedded in the article's content HTML
nlp
Depending on your plan your can have : - summary - sentiment - theme - ner - embeddings
id
Newscatcher API's unique identifier for each news article
score
How well the article is matching your search criteria. _score is different for each search you make. The best matching article has the highest score
user_input
An object that returns how our backend saw your request. It shows you which parameters have been used to perform a search. Useful for debugging, especially to check if there is any problem with URL encoding

Search By Entity

In our three years of experience as a news data provider, we have discovered that relying solely on simple keyword-based searches may not yield satisfactory results. To address this, we have enhanced our data pipeline in v3 by introducing additional article enrichment modules and empowering users to filter article searches using these modules. One of the key modules we have integrated is Named Entity Recognition (NER).
While searching, if you search for something straightforward such as "inflation" or something specific like "Twitter" or "Elon Musk", you may not require the NER filter. However, this filter proves to be exceptionally valuable when your query contains a common word or name.
Let's say you are searching for articles about Apple, the renowned tech giant, and you want to exclude any references to apple prices, apple farmers, etc. The only way to achieve this in the past was by constructing a more detailed query and including additional related keywords like "iPhone" or "mac".
With the introduction of v3, you now have the option to utilize our NER parameters, making it much simpler to obtain precisely what you are looking for.
By leveraging NER, you can refine your search to focus on the entity "Apple" of type ORG while excluding unrelated articles:
ORG_entity_name=Apple
The same Advanced Query Rules work here as well:
ORG_entity_name=Apple OR "Apple Inc"
We can go further and find articles mentioning the CEO along with the company
ORG_entity_name=Apple OR "Apple Inc"
PER_entity_name=Tim Cook

language format

You should use ISO 639-1 — 2 letter code.
For example, English - en
Important: We distinguish Chinese (China) and Chinese (Taiwan) languages, cn and tw accordingly. That is the only difference between us and ISO 639-1 code.
The list of languages we support:
af,ar,bg,bn,ca,cs,cy,cn,da,de,el,en,es,et,fa,fi,fr,gu,he,hi,hr,hu,id,it,ja,kn,ko,lt,lv,mk,ml,mr,ne,nl,no,pa,pl,pt,ro,ru,sk,sl,so,sq,sv,sw,ta,te,th,tl,tr,tw,uk,ur,vi

countries format

You should use ISO 3166-1 alpha-2 code.
For example, France - FR
You can find the entire list via the following link: https://www.iso.org/obp/ui/#search
Add usefulness of all_links and all_domain_links
For example:
If you want to find all the articles where Elon Musk's twitter is mentioned. You put:
We save all the URLs that were mentioned in the article. Use this parameter to search by the domain name mentioned in the article.
For example:
facebook.com,twitter.com

Debugging

While developing, look at user_input object that returns all of your parameters. If you made a mistake, or some characters were not correctly parsed because of the URL encoding, you will see that.
{
"q": "Elon Musk",
"search_in": [
"title_content"
],
"sources": null,
"not_sources": null,
"lang": [
"en"
],
"not_lang": null,
"countries": null,
"not_countries": null,
"from_": "2023-05-31T00:00:00",
"to_": null,
"published_date_precision": null,
"by_parse_date": false,
"sort_by": "relevancy",
"ranked_only": null,
"from_rank": null,
"to_rank": null,
"is_headline": null,
"parent_url": null,
"all_links": null,
"all_domain_links": null,
"word_count_min": null,
"word_count_max": null,
"page": 1,
"page_size": 1,
"include_domain_info": null,
"include_nlp_data": null,
"has_nlp": true,
"theme": null,
"ner_name": null,
"title_sentiment_min": null,
"title_sentiment_max": null,
"content_sentiment_min": null,
"content_sentiment_max": null
}