Skip to content

Search tool API

Thomas Piller edited this page Jul 16, 2021 · 16 revisions

The MapX Search tool API was built on top of MeiliSearch, an open source (MIT License) search-engine.

The configuration of MeiliSearch specific to MapX as well as an example of API usage are described in this document.

⚠ Take a few minutes to read the MeiliSearch documentation, you'll find everything you need to learn about the search-engine and its use.

How to use the MapX search tool via the API?

The Search tool API key and host required to search the MapX public data catalog from the API can be retrieved in the toolbox -> Get search API Key and configuration. To access this information, you must be logged into MapX with a valid account.

The index referencing MapX public views used in the query is dependent on the language of the search (see next section for more details).

Once all parameters have been retrieved, it is possible to query the API like this (basic example using cURL):

curl '<search engine host>/indexes/<index>/search' \
-H 'X-Meili-API-Key: <search engine API key>' \
--data-raw '{"q":"water"}' \
--compressed;

To learn more about how to use the search parameters, refer directly to the MeiliSearch documentation.

Special case: using the correct parameters, it is possible to extract an entire index:

curl '<search engine host>/indexes/<index>/search' \
-H 'X-Meili-API-Key: <search engine API key>' \
--data-raw '{"q":"", "limit"=10000}' \
--compressed > index.json;

⚠ The number of requests is limited by user during a specific time range. If the limit is reached, request will be denied until the next time-frame.

Indexes

Indexes referencing MapX public views are updated hourly using a automatic routine. This means that a newly created view will appear in the MapX Search tool only after the execution of the next routine. The same reasoning applies to view deletion.

An index is generated per language supported by MapX thus allowing users to specify the language when searching. When generating indexes from the database, if a field in the view or source metadata is not filled in for a given language, the fallback is English.

Indexes (format = views_{language ISO 639-1 codes}) are available for the following languages:

  • Arabic: views_ar
  • Bengali: views_bn
  • Chinese: views_zh
  • English: views_en
  • French: views_fr
  • German: views_de
  • Pashto: views_ps
  • Persian: views_fa
  • Russian: views_ru
  • Spanish: views_es

Searchable attributes

searchableAttributes designates the fields that are searchable in the indexes.

In MapX, some fields are more relevant to search than others. The attributes' order in searchableAttributes determines their impact on relevancy, from most impactful to least.

searchableAttributes: [
    'view_title',
    'view_abstract',
    'source_title',
    'source_abstract',
    'source_keywords', // custom keywords
    'source_keywords_m49', // geographic keywords
    'source_keywords_gemet', // keywords from the GEMET thesaurus
    'source_notes',
    'project_title',
    'project_abstract',
    'view_id',
    'project_id',
    'view_type',
    'view_modified_at',
    'view_created_at',
    'source_start_at', // start date if the data has a temporal component
    'source_end_at', // end date if the data has a temporal component
    'source_released_at',
    'source_modified_at',
    'range_start_at', // see below
    'range_end_at', // see below
    'range_start_at_year', // year extracted from range_start_at
    'range_end_at_year', // year extracted from range_end_at
    'range_years', // generate_series(range_start_at_year, range_end_at_year)
    'projects_data' // information related to projects where the view has been shared
]

In MapX metadata, it is possible that some date fields were not filled in by publishers. This would have had a negative impact when filtering results by date in the MapX UI. In our initial tests, many views were not returned thus distorting the search. Therefore, the MapX team decided to generate new date fields to overcome this situation:

LEAST(
    source_start_at,
    source_released_at,
    source_modified_at,
    source_modified_at,
    view_created_at,
    view_modified_at
) AS range_start_at

GREATEST(
    source_end_at,
    source_released_at,
    source_modified_at,
    source_modified_at,
    view_created_at,
    view_modified_at
) AS range_end_at

Dedicated page in MeiliSearch documentation.

Ranking rules

Search responses are sorted according to a set of rules (i.e. ranking rules).

Whenever a search query is made, MeiliSearch uses a bucket sort to rank documents. The first ranking rule is applied to all documents, while each subsequent rule is only applied to documents that are considered equal under the previous rule (i.e. as a tiebreaker).

The order in which ranking rules are applied matters. The first rule in the array has the most impact, and the last rule has the least.

MeiliSearch Documentation v0.20

rankingRules: [
    'attribute', //  attribute ranking order (see searchableAttributes)
    'exactness', // similarity of the matched words with the query words
    'proximity', // increasing distance between matched query terms
    'words', // decreasing number of matched query terms
    'wordsPosition', // location of the query word in the field
    'typo', // increasing number of typos
    'asc(view_modified_at)' // custom rule: ascending sort on the view_modified_at attribute (see searchableAttributes)
]

Dedicated page in MeiliSearch documentation.


Contact us in case of technical questions: [email protected]

Clone this wiki locally