Search By Entity

Beyond pure keyword-based search

When conducting searches, the need for our NER (Named Entity Recognition) filter depends on the complexity of your query. If you're searching for straightforward topics like "inflation," "Twitter," or "Elon Musk," you may not require the NER filter. However, it becomes highly valuable when your query involves common words or names.

For instance, imagine you're looking for articles about the tech giant "Apple" and want to avoid references to apple prices or apple farmers. In the past, achieving this required constructing a more detailed query and adding related keywords like "iPhone" or "mac."

With the introduction of version 3 (v3), you now have the option to use our NER parameters, simplifying the process of finding precisely what you need. By leveraging NER, you can refine your search to focus on the entity "Apple" of type ORG (organization) while excluding unrelated articles. Here's how it works:

ORG_entity_name='Apple'

You can also apply advanced query rules like:

ORG_entity_name=Apple OR "Apple Inc"

To take it a step further and find articles mentioning the CEO along with the company:

ORG_entity_name='Apple OR "Apple Inc"'
PER_entity_name='Tim Cook'

Search By Entity Count

In addition to entity-based searches, we understand that sometimes, even these searches can produce false positives. For example, consider an article mentioning that a news publication has a show on 'Apple Podcasts'. If, like one of our clients, you want to gather and analyze all articles related to the company 'Apple', you'd want to exclude articles like this.

To address this challenge, we've introduced the Search by Entity Count (COUNT) functionality, which allows you to filter results based on the frequency of entity mentions. Here's how to use it:

COUNT(word, count, operator)

For example, if you want to find articles mentioning 'Apple' at least twice, you can use:

ORG_entity_name = 'COUNT("Apple", 2, "gt")'

By default, the operator in COUNT is set to greater than ('gt'). You also set it to less than ('lt') or equal to ('eq').

Last updated