Full text search scenarios

User story I want to see if there are any suitable items in the federation, so that I can use them for my published data instead of creating my own extension.
Input

Search query: the name/keyword of the searched item.

 

EXAMPLE A query with the keyword "addresses"

Expected result

A list of items that match my search keyword(s).

EXAMPLE: For the example content, a query with the keyword "addresses" will return the register item reg1/theme/addresses.

Components
  • Register of Registers (RoR)
  • Local registries
  • Search indexes

Figures legend:  Requests;   Responses;   Indexing;


1. Decentralized search index scenario

Description

  • The search indexes are kept and managed at the local registry level.
  • The query done at the RoR level, is then propagated to each local registry being part of the federation.
  • Each registry returns the search result to the RoR, which then aggregate the results and present it to the user.

Workflow (Figure 1)

  1. The federated registries have to create/mantain/update the search indexes;
  2. The user ask to the RoR the search request passing as input the search keyword(s);
  3. The RoR propagates the query to the federated registeries;
  4. The federated registries return the results (if available) to the RoR;
  5. The RoR aggregates the results;
  6. THe RoR returns the results to the user.

Figure 1 - Decentralized search index scenario workflow

 

Implications

  • A common exchange format (query encoding and result encoding) has to be supported by all the federated registries.
  • The federated registries shall provide a search API. The entry point related to the search API has to be provided during the registration to the RoR.
  • The query to the federated registers shall only return matches in the extended values, e.g. in the case of reg2 the query q=addresses shall not return any results

 

2. Centralized search index scenario

Description

  • There is one centralized search index at the RoR level.
  • The RoR will either
    • harvest the content from each registry's index to the central index (according to an agreed harvesting interval), or
    • the federated registers will upload their index files to the RoR whenever there is a change in the federated register, or
    • crawl the federated register in order to retrieve all the data and build the index
  • The RoR level uses the centralized index to find the results and return them to the user.

Workflow (Figure 2)

  1. The RoR has to update the central search index following a harvesting/upload/crawling from the federated registries
  2. The user ask to the RoR the search request passing as input the search keyword(s)
  3. The RoR search in the central index to find the results
  4. The RoR returns the results to the user

Figure 2 - Centralized search index scenario workflow

 

Implications

  • The RoR shall implement the index harvesting/crawling mechanism.
  • The federated registries shall agree on a common format for the index-data exchange.
  • The harvested/uploaded/crawled indexes of the federated registers shall only include the extension items.

Proposed common excange format for the index

The following exchange format handles also the multilingual content.

<add>
   <doc>
      <field name="id">http://_SPECIFIC_URL_/applicationschema/ad</field>

      <field name="label_en">Addresses</field>
      <field name="definition_en">text here</field>
      <field name="description_en">text here</field>

      <field name="label_de">Addresses</field>
      <field name="definition_de">text here</field>
      <field name="description_de">text here</field>

      <field name="label_fr">Addresses</field>
      <field name="definition_fr">text here</field>
      <field name="description_fr">text here</field>

      <field name="label_it">Addresses</field>
      <field name="definition_it">text here</field>
      <field name="description_it">text here</field>

      ...
 
   </doc>
</add>

 


Pros and cons

  PRO CONS
1. Decentralized index scenario
  • The indexes are kept at the local registries level (less effort in the index update procedure for the RoR);
  • The search results are always aligned with the federated registries search index.
  • Possible request delay due to a distributed query (the query response time depends on network, local registries performances);
  • The federated registries have to provide the search API.
  • Hard to rank results from a distributed search.
2. Centralized index scenario
  • Federated registries have only to support the agreed index-update format;
  • Fastest response (single query point).
  • All the workload is on the RoR side;
  • More effort to keep the centrl index updated (the RoR has to copy all the index from the federated registries indexes to the central index);
  • The search results may not be up to date (depending on the index-update interval).