Multilingual Search(MLS) – Breaking the Language Barrier

Adobe is committed to provide its AEM customers the ability to serve people across different countries and regions with a streamlined product. In its endeavor to achieve the goal for no-language barrier, the AEM team at Adobe has introduced the feature of Multilingual Search (MLS) with the release of AEM 6.2

Lets try to better understand it through an example.

Suppose an automobile giant in Germany uses AEM as a comprehensive content management platform solution. We have three people here to demonstrate how it works:

  • A businessman in London
  • An engineer in Germany
  • A car enthusiast in China

The businessman is having some trouble with his car’s transmission system and he goes to the online community for help. Here is how the conversation  between the three gentleman goes like:

Businessman: Hi! I am facing problem with my transmission system. The shiftsare making unusual noise. Any help?

Engineer searches for Getriebe problem (Transmission problem) to see all the queries people might have and he will see this post.

Engineer: Wann war das letzte Mal Sie das Motoröl geändert  (When was the last time you changed the engine oil)

Meanwhile, the Car enthusiast from China responds to this:

Car Enthusiast: 即使我面一些与我汽速器,我他写了和它取代 (Even I face some issue with my car’s transmission, I wrote to them and got it replaced.)

Businessman: I did change it recently. I guess I would go for complete replacement. Thank you.

Conversation Over.

The businessman comes back and searches for ‘Transmission system’ and he gets all the above replies, even though they were written in different languages.

This was a simple demo, how MLS works.

AEM 6.2 and FP4 for 6.1 come equipped with this powerful feature. It is being offered in two variants

  • Simple MLS
  • Advances MLS

The major difference between these two is the ability of the latter  to detect, modify the query language and search in appropriate index. It should also be noted that for MLS to work, currently, only Mongo Backend is supported along-with SOLR as a search platform.

Whenever a user makes a contribution, for example a comment, reply or question/answer, the User Generated Content(UGC) gets stored in the verbatim_default index. Once the system detects the language, it gets stored in verbatim_en, verbatim_fr and verbatim_de etc.

If we have Simple MLS deployed, the system searches in the following indexes:

  • Verbatim_default
  • Verbatim_lang ; where lang: user preference language

The user has the liberty to choose the language from the dropdown. Suppose he chooses, German(de), then the indexes that would be searched:

  • Verbatim_default
  • Verbatim_de

Sample Query Generated by AEM for the search text –“sprechen Sie laut”

INFO – 2016-07-26 10:19:51.171; org.apache.solr.core.SolrCore; [collection1]webapp=/solr path=/selectparams{q=%2B(cqtags_ss(sprechen+Sie+laut)+author_username:sprechen+Sie+laut+verbatim_default:(sprechen+Sie+laut)+verbatim_de:(sprechen+Sie+laut)+title_t:(sprechen+Sie+laut)+author_display_name(sprechen+Sie+laut))+%2Bprovider_id:\/content/usergenerated/asi/mongo/content/sites/checkMLS/en/*+%2Bresource_type_s:*&df=provider_id&el=de&start=0&trf=verbatim&sort=timestamp+desc&fq={!cost%3D100}report_suite:mongo&rows=10&wt=javabin&version=2} hits=1 status=0 QTime=4

In case of Advanced MLS:

The system itself detects the language of the query and after some modification,generate a query which will search in the desired index and we will get search result.

Sample Query Generated in this case – –“sprechen Sie laut”

INFO – 2016-07-26 10:47:53.633; com.adobe.tat.LangDetectRequestHandler; FOR TAT LOG,params:{params(q=%2B(cqtags_ss:*(sprechen+Sie+laut)+author_username:sprechen+Sie+laut+verbatim_default:(sprechen+Sie+laut)+verbatim_en:(sprechen+Sie+laut)+title_t:(sprechen+Sie+laut)+author_display_name(sprechen+Sie+laut))+%2Bprovider_id:\/content/usergenerated/asi/mongo/content/sites/checkMLS/en/*+%2Bresource_type_s:*&df=provider_id&start=0&trf=verbatim&bl=en&sort=timestamp+desc&fq{!cost%3D100}report_suite:mongo&pl=en&rows=10&wt=javabin&version=2),defaults(df=text&echoParams=explicit&rows=10)}

INFO – 2016-07-26 10:47:53.633; com.adobe.tat.LangDetectRequestHandler;

FOR TAT LOG, q={+(cqtags_ss:*(sprechen Sie laut) author_username:sprechen Sie lautverbatim_default:(sprechen Sie laut) verbatim_en:(sprechen Sie laut) title_t:(sprechen Sielaut) author_display_name:(sprechen Sie laut))+provider_id:\/content/usergenerated/asi/mongo/content/sites/checkMLS/en/*+resource_type_s:*}INFO – 2016-07-26 10:47:53.633; com.adobe.tat.LangDetectRequestHandler; translate fileds name :verbatim

INFO – 2016-07-26 10:47:53.633; com.adobe.tat.langdetect.ShortTextLangDetector; There are signals for this language detector

INFO – 2016-07-26 10:47:53.636; com.adobe.nlp.core.processor.AdobeNLPComponentRunner; using: 2315431 ns, to run process method: processINFO – 2016-07-26 10:47:53.636; com.adobe.tat.LangDetectRequestHandler;new_q={+(cqtags_ss:*(sprechen Sie laut) author_username:sprechen Sie lautverbatim_default:(sprechen Sie laut) verbatim_de:(sprechen Sie laut) verbatim_en:(sprechenSie laut) title_t:(sprechen Sie laut) author_display_name:(sprechenSie laut))

While these sample queries above may look all gibberish and be in-comprehendible to the naked eye, with MultiLingual Search, Adobe has certainly broken the language barriers across the community members speaking different languages. Stay tuned for more insights into this!!

I would like to thank Arun Rajan for his major contribution to this blogpost.



One thought on “Multilingual Search(MLS) – Breaking the Language Barrier

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s