v3 Search News
Main endpoint that allows you to find news article by keyword, date, language, country, etc.
get
https://v3-api.newscatcherapi.com
/api/search?q=Apple&from_=1 day ago&countries=CA&page_size=1
Get News
Parameters
Query
q*
string
Keyword/keywords you're searching for. This is the most important part of your query.
Please, refer to the Advanced Query Parameter section below for more examples and explanations.
search_in
string
By default, we search what you specified in the
q
parameter in both title
and content
of the article.
However, you can choose between:-title
-content
-summary
(if enabled for your plan)-title,summary
-content,summary
sources
array
One or more news sources to narrow down your search.
The format should be a domain url from your URL. Subdomains, like
finance.yahoo.com
are also accepted. Comma-separated string or a list/array.
For example, nytimes.com,theguardian.com,finance.yahoo.com
not_sources
array
One or more sources to be excluded from the search.
Comma-separated string or a list/array.
For example,
cnn.com,wsj.com
predefined_sources
string
Use our TOP predifined sources per country.
Later we are going to improve it and add more functionality, like top categories etc.
The format should be strictly like this:
- starting with word
top
- put the number of desired sources top source
For example:
top 100 US
top 33 AT
top 5 GB
It is also possible to put multiple countries with custom number of top sources, should be comma separated.
For example:
top 100 US, GB
top 33 AT, 55 IT
lang
array
Specifies the languages of the search. For example,
en
.
The only accepted format is ISO 639-1 — 2 letter code.
Refer to the language format section below for more details.not_lang
array
Inverse to the
lang
parametercountries
array
Countries where the news publisher is located.
Important: This parameter is not responsible for the countries mentioned in the news article.
One or multiple countries can be used in the search.
The only acceptable format is ISO 3166-1 alpha-2
For example,
US,CA,MX
or just US
not_countries
array
The inverse of the
countries
parameter.from_
string
From which point in time to start the search. Defaults to the past week.
Availabe formats :
YYYY/mm/dd
YYYY/mm/dd HH:MM:SS
English phrases like 1 day ago
to_
string
Until which point in time to search for. The default timezone is UTC.
Availabe formats :
YYYY/mm/dd
YYYY/mm/dd HH:MM:SS
English phrases like
1 day ago
published_date_precision
string
There are 3 types of date precision we define:
full
— day and time of an article is correctly identified with the appropriate timezone
timezone unknown
— day and time of an article is correctly identified without timezone
date
— only the day is identified without an exact timeby_parse_date
book
When set to
True
, transforms your from_ and to_ parameters to filter by parse_date instead of published_date
Be aware that a new variable parse_date will be added to the output list with each article.
sort_by
string
relevancy
(default value) — the most relevant results first
date
— the most recently published results first
rank
— the results from the highest-ranked sources firstranked_only
boolean
Default:
True
Limit the search only for the sources which are in the top 1 million online websites. Unranked sources are assigned a rank that equals 999999
from_rank
integer
[0:999999]
The lowest boundary of the rank of a news website to filter by.
Important: lower rank means that a source is more popularto_rank
integer
[0:999999]
The upper boundary of the rank of a news website to filter by.is_headline
boolean
When set to
True
, only articles that were posted on the home page of a given news domain will be shown.is_paid_content
[Still in development phase]
When set to
False
, only articles that publish full public available content will be shown.
Some news publishers partially block content of their articles, so we get only several sentences from them. This filter will help you get full content.
parent_url
array
One or more categorical URL to filter your search. It should be the normal form of the URL,
For example,
https://www.washingtonpost.com/politics
,
https://www.washingtonpost.com/technology,https://www.washingtonpost.com/business
all_links
array
Search for desired URL mentioned in the article.
all_domain_links
array
Search for desired domain URL mentioned in the article.
word_count_min
integer
Set a minimum number of words that an article must contain.
To be used for avoiding avoid articles with small content.
word_count_max
integer
Set a maximum number of words that an article must contain.
To be used for avoiding avoid articles with big content.
page_size
integer
[1:100]
How many articles to return per page.page
integer
The number of the page. Use it to scroll through the results.
This parameter is used to paginate: scroll through results because one API response cannot return more than 100 articles.
clustering_enabled
boolean
[Available only if NLP enabled for your plan]
When set to True, enables clustering on articles. Instead of showing a list of articles, you will be given a list of clustering to put together similar articles.
Please, refer to the Deduplicate Data With Clustering section below for more examples and explanations.
clustering_threshold
float
[Available only if NLP enabled for your plan]
Set a threshold for an article to be similar.
Default value:
0.6
The value can vary from
0
to 1
. clustering_variable
string
[Available only if NLP enabled for your plan]
Select the data on which you want the similarity to be calculated on.
Accepted values:
content
, title
, summary
Default value:
content
include_domain_info
boolean
[Still in development phase]
When set to
True
, shows an additional information about the News Domains. include_nlp_data
boolean
When set to
True
, adds to each article a NLP layer.
Not available for all plans. Please contact us to enable it.
has_nlp
boolean
[Available only if NLP enabled for your plan]
When set to
True
, filter data only to those articles that have an NLP layer.theme
string
[Available only if NLP enabled for your plan]
Accepted values:
Business
, Economics
, Entertainment
, Finance
, Health
, Politics
, Science
, Sports
, Tech
, Crime
Comma-separated string or a list/array.
Multiple themes can be selected.
For example:
Business
Business
, Finance
Topic labelling is based on the actual content of an article.
ORG_entity_name
string
[Available only if NLP enabled for your plan]
ORG stands for Organisation.
We identify company names mentioned in articles and give you the possibility to specify a search on it.
PER_entity_name
string
[Available only if NLP enabled for your plan]
PER stands for Person.
We identify people names mentioned in articles and give you the possibility to specify a search on it.
LOC_entity_name
string
[Available only if NLP enabled for your plan]
LOC stands for Location.
We identify geographical locations mentioned in articles and give you the possibility to specify a search on it.
MISC_entity_name
string
[Available only if NLP enabled for your plan]
MISC stands for Others.
We identify product and other names, mentioned in articles and give you the possibility to specify a search on it.
title_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from
-1
to 1
. title_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from
-1
to 1
. content_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from
-1
to 1
. content_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from
-1
to 1
. Header
x-api-token*
string
Your unique authentication token
Responses
200
Success
403: Forbidden
Invalid API Key
406
Unsupported Parameters
408
Request Timeout
422: Unprocessable Entity
Parameter is not allowed
429
Too many API calls
post
https://v3-api.newscatcherapi.com
/api/search?q=Apple&from_=1 day ago&countries=CA&page_size=1
Get News
Parameters
Query
q*
string
Keyword/keywords you're searching for. This is the most important part of your query.
Please, refer to the Advanced Query Parameter section below for more examples and explanations.
search_in
string
By default, we search what you specified in the
q
parameter in both title
and content
of the article.
However, you can choose between:-title
-content
-summary
(if enabled for your plan)-title,summary
-content,summary
sources
array
One or more news sources to narrow down your search.
The format should be a domain url from your URL. Subdomains, like
finance.yahoo.com
are also accepted. Comma-separated string or a list/array.
For example, nytimes.com,theguardian.com,finance.yahoo.com
not_sources
array
One or more sources to be excluded from the search.
Comma-separated string or a list/array.
For example,
cnn.com,wsj.com
predefined_sources
string
Use our TOP predifined sources per country.
Later we are going to improve it and add more functionality, like top categories etc.
The format should be strictly like this:
- starting with word
top
- put the number of desired sources top source
For example:
top 100 US
top 33 AT
top 5 GB
It is also possible to put multiple countries with custom number of top sources, should be comma separated.
For example:
top 100 US, GB
top 33 AT, 55 IT
lang
array
Specifies the languages of the search. For example,
en
.
The only accepted format is ISO 639-1 — 2 letter code.
Refer to the language format section below for more details.not_lang
array
Inverse to the
lang
parametercountries
array
Countries where the news publisher is located.
Important: This parameter is not responsible for the countries mentioned in the news article.
One or multiple countries can be used in the search.
The only acceptable format is ISO 3166-1 alpha-2
For example,
US,CA,MX
or just US
not_countries
array
The inverse of the
countries
parameter.from_
string
From which point in time to start the search. Defaults to the past week.
Availabe formats :
YYYY/mm/dd
YYYY/mm/dd HH:MM:SS
English phrases like 1 day ago
to_
string
Until which point in time to search for. The default timezone is UTC.
Availabe formats :
YYYY/mm/dd
YYYY/mm/dd HH:MM:SS
English phrases like
1 day ago
published_date_precision
string
There are 3 types of date precision we define:
full
— day and time of an article is correctly identified with the appropriate timezone
timezone unknown
— day and time of an article is correctly identified without timezone
date
— only the day is identified without an exact timeby_parse_date
book
When set to
True
, transforms your from_ and to_ parameters to filter by parse_date instead of published_date
Be aware that a new variable parse_date will be added to the output list with each article.
sort_by
string
relevancy
(default value) — the most relevant results first
date
— the most recently published results first
rank
— the results from the highest-ranked sources firstranked_only
boolean
Default:
True
Limit the search only for the sources which are in the top 1 million online websites. Unranked sources are assigned a rank that equals 999999
from_rank
integer
[0:999999]
The lowest boundary of the rank of a news website to filter by.
Important: lower rank means that a source is more popularto_rank
integer
[0:999999]
The upper boundary of the rank of a news website to filter by.is_headline
boolean
When set to
True
, only articles that were posted on the home page of a given news domain will be shown.is_paid_content
[Still in development phase]
When set to
False
, only articles that publish full public available content will be shown.
Some news publishers partially block content of their articles, so we get only several sentences from them. This filter will help you get full content.
parent_url
array
One or more categorical URL to filter your search. It should be the normal form of the URL,
For example,
https://www.washingtonpost.com/politics
,
https://www.washingtonpost.com/technology,https://www.washingtonpost.com/business
all_links
array
Search for desired URL mentioned in the article.
all_domain_links
array
Search for desired domain URL mentioned in the article.
word_count_min
integer
Set a minimum number of words that an article must contain.
To be used for avoiding avoid articles with small content.
word_count_max
integer
Set a maximum number of words that an article must contain.
To be used for avoiding avoid articles with big content.
page_size
integer
[1:100]
How many articles to return per page.page
integer
The number of the page. Use it to scroll through the results.
This parameter is used to paginate: scroll through results because one API response cannot return more than 100 articles.
clustering_enabled
boolean
[Available only if NLP enabled for your plan]
When set to True, enables clustering on articles. Instead of showing a list of articles, you will be given a list of clustering to put together similar articles.
Please, refer to the Deduplicate Data With Clustering section below for more examples and explanations.
clustering_threshold
float
[Available only if NLP enabled for your plan]
Set a threshold for an article to be similar.
Default value:
0.6
The value can vary from
0
to 1
. clustering_variable
string
[Available only if NLP enabled for your plan]
Select the data on which you want the similarity to be calculated on.
Accepted values:
content
, title
, summary
Default value:
content
include_domain_info
boolean
[Still in development phase]
When set to
True
, shows an additional information about the News Domains. include_nlp_data
boolean
When set to
True
, adds to each article a NLP layer.
Not available for all plans. Please contact us to enable it.
has_nlp
boolean
[Available only if NLP enabled for your plan]
When set to
True
, filter data only to those articles that have an NLP layer.theme
string
[Available only if NLP enabled for your plan]
Accepted values:
Business
, Economics
, Entertainment
, Finance
, Health
, Politics
, Science
, Sports
, Tech
, Crime
Comma-separated string or a list/array.
Multiple themes can be selected.
For example:
Business
Business
, Finance
Topic labelling is based on the actual content of an article.
ORG_entity_name
string
[Available only if NLP enabled for your plan]
ORG stands for Organisation.
We identify company names mentioned in articles and give you the possibility to specify a search on it.
PER_entity_name
string
[Available only if NLP enabled for your plan]
PER stands for Person.
We identify people names mentioned in articles and give you the possibility to specify a search on it.
LOC_entity_name
string
[Available only if NLP enabled for your plan]
LOC stands for Location.
We identify geographical locations mentioned in articles and give you the possibility to specify a search on it.
MISC_entity_name
string
[Available only if NLP enabled for your plan]
MISC stands for Others.
We identify product and other names, mentioned in articles and give you the possibility to specify a search on it.
title_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from
-1
to 1
. title_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's title sentiment.
The value can vary from
-1
to 1
. content_sentiment_min
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from
-1
to 1
. content_sentiment_max
float
[Available only if NLP enabled for your plan]
Narrow down your search to only possitive or negative news based on article's content sentiment.
The value can vary from
-1
to 1
. Header
x-api-token*
string
Your unique authentication token
Responses
200
Success
403: Forbidden
Invalid API Key
406
Unsupported Parameters
408
Request Timeout
422: Unprocessable Entity
Parameter is not allowed
429
Too many API calls
1
{
2
"status": "ok",
3
"total_hits": 6936,
4
"page": 1,
5
"total_pages": 6936,
6
"page_size": 1,
7
"articles": [
8
{
9
"title": "'I Love It': Elon Musk Elated After AI-Generated Photo of Tesla CEO in Desi Attire Leaves Netizens Impressed",
10
"author": "",
11
"authors": [],
12
"published_date": "2023-06-03 05:37:20",
13
"published_date_precision": "full",
14
"updated_date": "2023-06-03 05:39:42",
15
"updated_date_precision": "full",
16
"link": "https://www.latestly.com/socially/social-viral/i-love-it-elon-musk-expresses-happiness-after-ai-generated-pictures-of-himself-as-indian-groom-leaves-netizens-impressed-5174266.html",
17
"domain_url": "latestly.com",
18
"full_domain_url": "latestly.com",
19
"name_source": "LatestLY",
20
"is_headline": false,
21
"parent_url": "https://www.latestly.com/socially",
22
"country": "IN",
23
"rights": "latestly.com",
24
"rank": 13754,
25
"media": "https://st1.latestly.com/wp-content/uploads/2023/06/Elon-Musk-AI-Pics-784x441.jpg",
26
"language": "en",
27
"description": "Earlier, AI-generated pictures f Elon Musk as an Indian groom went viral after it was shared on Instagram. Reportedly, the series of pictures of Musk in Indian wedding attire was shared by Rolling…",
28
"content": "AI-generated pictures of Tesla CEO Elon Musk have taken the internet by storm. The Artificial Intelligence (AI) artistic pictures of the Twitter owner as an Indian Groom has not only won the hearts of netizens but of Musk himself as well. Reacting to one AI-generated picture of himself as an Indian groom, Musk said, \"I love it!\" He also shared two emojis showing the Indian Flag. Earlier, AI-generated pictures f Elon Musk as an Indian groom went viral after it was shared on Instagram. Reportedly, the series of pictures of Musk in Indian wedding attire was shared by Rolling Canvas Presentations. The photography page used an AI application named Midjourney to create the images. Interestingly, the pictures show Musk wearing sherwani as he dances with the wedding guests and even rides a horse. Porn Will Kill Generative AI? Elon Musk Shares Meme Predicting Future of Artificial Intelligence. Midjourney Art of Elon Musk in an Indian Attire A midjourney art of Elon Musk in an Indian attire is going viral in India. 🇮🇳 pic.twitter.com/LD1KuIAHET — DogeDesigner (@cb_doge) June 3, 2023 I Love It, Says Elon Musk 🇮🇳 I love it! 🇮🇳 — Elon Musk (@elonmusk) June 3, 2023 When Elon Musk Had an Indian Wedding View this post on Instagram",
29
"word_count": 209,
30
"is_opinion": false,
31
"twitter_account": "@latestly",
32
"all_links": [
33
"https://apps.apple.com/us/app/latestly/id1382168376?ls=1 ",
34
"https://t.co/LD1KuIAHET",
35
"https://twitter.com/cb_doge/status/1664798316775841795?ref_src=twsrc%5Etfw",
36
"https://facebook.com/Latestly",
37
"https://twitter.com/elonmusk/status/1664799183063138304?ref_src=twsrc%5Etfw",
38
"https://www.youtube.com/channel/UC3Fci14HzLNhI9_stjLWV7g",
39
"https://www.instagram.com/latestly/",
40
"https://www.dailymotion.com/LatestLY",
41
"https://news.google.com/publications/CAAqBwgKMOyHggsw6Mf-Ag?oc=3&ceid=IN:en&hl=en-IN&gl=IN",
42
"https://play.google.com/store/apps/details?id=com.media.latestly.latestlymedia&hl=en_IN",
43
"https://www.linkedin.com/company/13592851/",
44
"https://telegram.me/LatestLYNewsBot?start",
45
"https://news.google.com/publications/CAAqBwgKMOyHggsw6Mf-Ag?oc=3&ceid=IN:en",
46
"https://www.instagram.com/p/Cs0k-68Ls4x/?utm_source=ig_embed&utm_campaign=loading",
47
"https://twitter.com/Latestly"
48
],
49
"all_domain_links": [
50
"linkedin.com",
51
"instagram.com",
52
"facebook.com",
53
"telegram.me",
54
"youtube.com",
55
"dailymotion.com",
56
"apple.com",
57
"google.com",
58
"twitter.com",
59
"t.co"
60
],
61
"id": "b1559e2928cfc60cc451b484323645d7",
62
"score": 23.17752
63
}
64
],
65
"user_input": {...}
66
}
Object | Sub Object | Description |
status | | Returns ok if everything went well.Returns error in case of an error (plus 2 additional fields in case of error — error_code and message ) |
total_hits | | How many news articles match your search criterion. Maximum is 10,000 |
page | | The page where you are at |
total_pages | | How many pages you can access given your page_size parameter |
page_size | | How many news articles are in the returned JSON object |
articles : | | News articles found. list |
| title | The title of the article |
| author | The author of the article |
| authors | An array of all author names |
| published_date | Published date & time |
| published_date_precision | Accuracy of the published_date field. There are 3 types of date precision we define: full — day and time of an article is correctly identified with the appropriate timezone timezone unknown — day and time of an article is correctly identified without timezone date — only the day is identified without an exact time |
| updated_date | Updated date & time |
| updated_date_precision | Accuracy of the updated_date field. There are 3 types of date precision we define: full — day and time of an article is correctly identified with the appropriate timezone timezone unknown — day and time of an article is correctly identified without timezone date — only the day is identified without an exact time |
| link | Full URL where the article was originally published |
| domain_url | The domain URL of the article's source |
| full_domain_url | The full domain URL with a subcategory of the article's source |
| name_source | The common name of the News Source |
| is_headline | True when an article has been seen on the main page of the news source. |
| parent_url | The URL where an article was initially found |
| country | The country of the publisher |
| rights | Copyright |
| rank | The page rank of the source website (which is given in the clean_url ) |
| media | A link to a thumbnail image of the article |
| language | The language of the article |
| description | Short summary of the article provided by the publisher |
| content | The full content of the article |
| word_count | Number of words in the article's content |
| is_opinion | True if the article is an "Opinion" article |
| twitter_account | The Twitter account of the publisher |
| all_links | All URL links embedded in the article's content HTML |
| all_domain_links | All domain URL embedded in the article's content HTML |
| nlp | Depending on your plan your can have :
- summary
- sentiment
- theme
- ner
- embeddings |
id | Newscatcher API's unique identifier for each news article | |
| score | How well the article is matching your search criteria. _score is different for each search you make. The best matching article has the highest score |
user_input | | An object that returns how our backend saw your request. It shows you which parameters have been used to perform a search.
Useful for debugging, especially to check if there is any problem with URL encoding |
In our three years of experience as a news data provider, we have discovered that relying solely on simple keyword-based searches may not yield satisfactory results. To address this, we have enhanced our data pipeline in v3 by introducing additional article enrichment modules and empowering users to filter article searches using these modules. One of the key modules we have integrated is Named Entity Recognition (NER).
While searching, if you search for something straightforward such as "
inflation"
or something specific like "Twitter"
or "Elon Musk"
, you may not require the NER filter. However, this filter proves to be exceptionally valuable when your query contains a common word or name.Let's say you are searching for articles about Apple, the renowned tech giant, and you want to exclude any references to apple prices, apple farmers, etc. The only way to achieve this in the past was by constructing a more detailed query and including additional related keywords like "
iPhone"
or "mac"
.With the introduction of v3, you now have the option to utilize our NER parameters, making it much simpler to obtain precisely what you are looking for.
By leveraging NER, you can refine your search to focus on the entity
"Apple"
of type ORG
while excluding unrelated articles:ORG_entity_name=Apple
ORG_entity_name=Apple OR "Apple Inc"
We can go further and find articles mentioning the CEO along with the company
ORG_entity_name=Apple OR "Apple Inc"
PER_entity_name=Tim Cook
For example, English -
en
Important: We distinguish Chinese (China) and Chinese (Taiwan) languages,
cn
and tw
accordingly. That is the only difference between us and ISO 639-1 code. The list of languages we support:
af,ar,bg,bn,ca,cs,cy,cn,da,de,el,en,es,et,fa,fi,fr,gu,he,hi,hr,hu,id,it,ja,kn,ko,lt,lv,mk,ml,mr,ne,nl,no,pa,pl,pt,ro,ru,sk,sl,so,sq,sv,sw,ta,te,th,tl,tr,tw,uk,ur,vi
For example, France -
FR
Add usefulness of all_links and all_domain_links
For example:
If you want to find all the articles where Elon Musk's twitter is mentioned. You put:
We save all the URLs that were mentioned in the article. Use this parameter to search by the domain name mentioned in the article.
For example:
facebook.com,twitter.com
While developing, look at
user_input
object that returns all of your parameters. If you made a mistake, or some characters were not correctly parsed because of the URL encoding, you will see that.{
"q": "Elon Musk",
"search_in": [
"title_content"
],
"sources": null,
"not_sources": null,
"lang": [
"en"
],
"not_lang": null,
"countries": null,
"not_countries": null,
"from_": "2023-05-31T00:00:00",
"to_": null,
"published_date_precision": null,
"by_parse_date": false,
"sort_by": "relevancy",
"ranked_only": null,
"from_rank": null,
"to_rank": null,
"is_headline": null,
"parent_url": null,
"all_links": null,
"all_domain_links": null,
"word_count_min": null,
"word_count_max": null,
"page": 1,
"page_size": 1,
"include_domain_info": null,
"include_nlp_data": null,
"has_nlp": true,
"theme": null,
"ner_name": null,
"title_sentiment_min": null,
"title_sentiment_max": null,
"content_sentiment_min": null,
"content_sentiment_max": null
}
Last modified 3d ago