Skip to content

Latest commit

 

History

History
537 lines (424 loc) · 13.6 KB

mapping.md

File metadata and controls

537 lines (424 loc) · 13.6 KB

Mapping

Elasticsearch bundle requires mapping definitions for it to work with complex operations, like insert and update documents, do a full-text search, etc.

App mapping

In order for the app's classes to be mapped a fake bundle with the name App is added. This is not a perfect solution but was the easiest to implement and add support for it.

It requires a class in the project's source root folder.

By default, the App\Kernel class is used. This class can be changed to any other class by using the configuration:

ongr_elasticsearch:
  app_root_class: 'App\YourKernel'

The referenced class has to exist as it is used with various ReflectionClass instances to find their folder etc., but can otherwise be empty. If the class does not exist, no mappings for App can be configured.

Mapping configuration

Here's an example of configuration containing the definitions of filter and analyzer:

ongr_elasticsearch:
    analysis:
        filter:
            incremental_filter:
                type: edge_ngram
                min_gram: 1
                max_gram: 20
        analyzer:
            incrementalAnalyzer:  #-> analyzer name
                type: custom
                tokenizer: standard
                filter:
                    - lowercase
                    - incremental_filter
    managers:
        default:
            index:
                index_name: your_index_name
                hosts:
                    - 127.0.0.1:9200
            mappings:
                - App

From 5.0 version mapping was enchased, and now you can change documents directory. See the example below:

#...
    managers:
        custom_dir:
            index:
                index_name: your_index_name
                hosts:
                    - 127.0.0.1:9200
            mappings:
                App: ~ #Document dir will be Document.
                CustomBundle:
                    document_dir: Entity #For this bundle will search documents in the Entity.
                    
        default:
            index:
                index_name: your_index_name
                hosts:
                    - 127.0.0.1:9200
            mappings:
                - App

Both mappings are valid. In the above example, you can change the directory for the particular bundles where to find documents. Default dir remains Document.

At the very top, you can see analysis node. It represents Elasticsearch analysis. You can define here analyzers, tokenizers, token filters and character filters. Once you define any analysis, then it can be used in any document mapping.

e.g. let's say you want to use incremental analyzer and custom lowercase filter analyzer in your index. The elasticsearch settings mapping would like this:

//PUT my_index
{
    "settings": {
        "analysis": {
            "filter": {
                "incremental_filter": {
                    "type": "edge_ngram",
                    "min_gram": "1",
                    "max_gram": "100"
                }
            },
            "analyzer": {
                "keywordAnalyzer": {
                "filter": [
                    "lowercase"
                ],
                "type": "custom",
                "tokenizer": "keyword"
            },
            "incrementalAnalyzer": {
                "filter": [
                    "lowercase",
                    "asciifolding",
                    "incremental_filter"
                ],
                "type": "custom",
                "tokenizer": "standard"
                }
            }
        }
    }
}

The representation of this particular example in the elasticsearch configuration:

ongr_elasticsearch:
    analysis:
        analyzer:
            keywordAnalyzer:
                type: custom
                tokenizer: keyword
                filter: [lowercase]
            incrementalAnalyzer:
                type: custom
                tokenizer: standard
                filter:
                    - lowercase
                    - asciifolding
                    - incremental_filter
        filter:
            incremental_filter:
                type: edge_ngram
                min_gram: 1
                max_gram: 100
    managers:
        default:
            index:
                index_name: your_index_name
                hosts:
                    - 127.0.0.1:9200
            mappings:
                - App

Document class annotations

Lets start with a document class example.

// src/App/Document/Content.php

namespace App\Document;

use ONGR\ElasticsearchBundle\Annotation as ES;

/**
 * @ES\Document(type="product")
 */
class Product
{
    /**
     * @ES\Property(type="text", name="title_in_es")
     */
    private $title;

    /**
     * Sets title
     *
     * @param string $title
     */
    public function setTitle($title)
    {
        $this->title = $title;
    }

    /**
     * Returns title
     *
     * @return string
     */
    public function getTitle()
    {
        return $this->title;
    }
}

It is not mandatory to have private properties, and public will work as well. However, we firmly recommend using private according to OOP best practices.

Document annotation configuration

  • @ES\Document(type="product") Annotation defines that this class will represent elasticsearch type with name content.
  • You can append any valid elasticsearch type options to the options variable. E.g. if you want to add enable:false it will look like this: @ES\Document(type="product", options={"enable":"false"})
  • type parameter is for type name. This parameter is optional, if there will be no parameter set, ElasticsearchBundle will create a type with lower cased class name.

Properties annotations

For defining type properties, there is a @ES\Property annotation. The only required attribute is type - Elasticsearch field type to specify what kind of information will be indexed. By default, the field name is generated from property name by converting it to "snake case" string. You can specify a custom name by setting the name attribute.

Read more about elasticsearch supported types in the official documentation.

To add a custom setting for the property like analyzer include it in the options variable. Analyzers names must be defined in config.yml under the analysis node (read more in the topic above). Here's an example how to add it:

// src/App/Document/Product.php
namespace App\Document;

use ONGR\ElasticsearchBundle\Annotation as ES;

/**
 * @ES\Document()
 */
class Product
{
    // ...

    /**
     * @ES\Property(
        type="text",
        options={"analyzer":"incrementalAnalyzer"}
      )
     */
    private $title;
    
    //....

options container accepts any parameters in annotation array format. We leave mapping validation to elasticsearch and elasticsearch-php client. If there will be invalid format annotations reader will throw exception, otherwise elasticsearch-php or elasticsearch database will throw an exception if something is wrong.

Object and Nested types

To define a nested or object type you have to use @ES\Embedded annotation and create a separate class for this annotation. Here's an example, lets assume we have a Product type with Variant object field.

// src/App/Document/Product.php

namespace App\Document;

use ONGR\ElasticsearchBundle\Annotation as ES;

/**
 * @ES\Document()
 */
class Product
{
    /**
     * @ES\Property(type="string")
     */
    private $title;

    /**
     * @var ContentMetaObject
     *
     * @ES\Embedded(class="App\Document\CategoryObject")
     */
    private $category;

    //...
}

And the Category object will look like:

// src/App/Document/CategoryObject.php

namespace App\Document;

use ONGR\ElasticsearchBundle\Annotation as ES;

/**
 * @ES\ObjectType
 */
class CategoryObject
{
    /**
     * @ES\Property(type="string")
     */
    private $title;

    //...
}

Class name can be anything, we called it CategoryObject to make it more readable. Notice that it is an object, not a document.

For this particular example the mapping in elasticsearch will look like this:

 {
    "product": {
        "properties": {
            "title": {
                "type": "text"
            },
            "category": {
                "type": "object",
                "properties": {
                    "title": {
                        "type": "string"
                    }
                }
            }
        }
    }
}
Saving documents with relations

To insert a document with mapping from example above you have to create 2 objects:

 
  $category = new CategoryObject();
  $category->setTitle('Jeans');
  
  $product = new Product();
  $product->setTitle('Orange Jeans');
  $product->setCategoryObject($category);
  
  //manager to work with elasticsearch index
  $manager->persist($product);
  $manager->commit();
 
Multiple objects

As shown in the example above, by ElasticsearchBundle default, only a single object will be saved in the document. Meanwhile, Elasticsearch database doesn't care if in an object is stored as a single value or as an array. If it is necessary to store multiple objects (array), you have to add multiple=true to the annotation. While initiating a document with multiple items you need to initialize property with the new instance of ArrayCollection().

Here's an example:

// src/App/Document/Product.php

namespace App\Document;

use Doctrine\Common\Collections\ArrayCollection;
use ONGR\ElasticsearchBundle\Annotation as ES;

/**
 * @ES\Document()
 */
class Product
{
    /**
     * @ES\Property(type="string")
     */
    private $title;

    /**
     * @var ContentMetaObject
     *
     * @ES\Embedded(class="App\Document\VariantObject", multiple=true)
     */
    private $variants;
    
    public function __construct()
    {
        $this->variants = new ArrayCollection();
    }
    
    /**
     * Adds variant to the collection.
     *
     * @param VariantObject $variant
     * @return $this
     */
    public function addVariant(VariantObject $variant)
    {
        $this->variants[] = $variant;

        return $this;
    }
    
    //...
}

And the object:

// src/App/Document/VariantObject.php

namespace App\Document;

use ONGR\ElasticsearchBundle\Annotation as ES;

/**
 * @ES\ObjectType
 */
class VariantObject
{
    /**
     * @ES\Property(type="string")
     */
    private $color;

    //...
}

Insert action will look like this:

<?php
  
  $product = new Product();
  $product->setTitle('Orange Jeans');
  
  $variant = new VariantObject();
  $variant->setColor('orange');
  $product->addVariant($variant);

  $variant = new VariantObject();
  $variant->setColor('red');
  $product->addVariant($variant);

  $manager->persist($product);
  $manager->commit();

There is no bounds to define other objects within objects.

Nested types can be defined the same way as objects, except @ES\Nested annotation must be used.

The difference between @ES\Embedded and @ES\Nested is in the way that the Elasticsearch indexes them. While the values of the fields in embedded objects are extracted and put into the same array with all the other values of other embedded objects in the same field, during the indexation process, the values of the fields of nested objects stored separately. This introduces differences when querying and filtering the index.

Multi field annotations and usage

Within the properties annotation, you can specify the field attribute. It enables you to map several core types of the same value. This can come very handy, e.g. when you want to map a text type with analyzed and not analyzed values.

Lets take a look at example below:

    /**
     * @var string
     * @ES\Property(
     *  type="text",
     *  name="title",
     *  options={
     *    "analyzer"="keywordAnalyzer",
     *    "fields"={
     *        "raw"={"type"="keyword"},
     *        "standard"={"type"="text", "analyzer"="standard"}
     *    }
     *  }
     * )
     */
    public $title;

The mapping in elasticsearch would look like this:

{
    "product": {
        "properties": {
            "title": {
                "type": "text",
                "analyzer": "keywordAnalyzer",
                "fields": {
                    "raw": {
                        "type": "keyword"
                    },
                    "standard": {
                        "type": "text",
                        "analyzer": "standard"
                    }
                }
            }
        }
    }
}

You will notice that now title value is mapped both with and without the analyzer. Querying these fields will look like this:

        $query = new TermQuery('title', 'Bar');
        $search->addQuery($query);

        $result1 = $repo->execute($search);

        $query = new MatchQuery('title.raw', 'Bar');
        $search->addQuery($query);

        $result2 = $repo->execute($search);

        $query = new MatchQuery('title.standard', 'Bar');
        $search->addQuery($query);

        $result3 = $repo->execute($search);

Meta-Fields Annotations

There are specialized meta fields that introduce different behaviours of elasticsearch. Read the dedicated page about meta-field annotations here.

More information about mapping can be found in the Elasticsearch mapping documentation.