Getting started with the new Domino Query Language (DQL)

Wednesday, August 22, 2018 at 10:54 AM UTC

During the beta 2 period of Domino V10 I dabbled around with DQL for the first time. I already saw what you can do in a video and screenshots that were made during the HCL factory tour.

The first throw of this new feature is not complete of course, you currently cannot utilize it in your coding in LotusScript or Java. This is planned for the final release which ships after October 9, 2018.

However, despite the documentation which shipped with the beta 2 there were some caveats and problems I faced during my testing which I want to write down here. Maybe this helps you taking your first steps when you are using it for yourself. Changes may be imminent for the GA version, this is based on beta 2.

Thanks

Special thanks goto John Curtis, who patiently worked together with me and others in the beta forum to get it finally working.

DomQuery

This is the executable currently usable via the command line. It is located in the program directory of Domino. For some reason this file got corrupted (acually replaced) by a Windows update so that I wasn't able to execute it anymore ("access denied"). I had to turn off system recovery and re-install the server to get the original file back.

The binary comes with several parameters- Some of them don't work yet, some of them act differently from what the description says. This should be fixed for GA. Personally I assume that I won't use the command line when programming is possible with LotusScript and Java.

The output of the query also isn't the final, we currently only get a result that tells us how many documents where found and how much time it consumed to perform the query. The numbers are already impressing.

Preparations

In order to use DQL you have to build indices for your databases as this won't necessarily use the view or full-text index for search. There are some important and required notes.ini settings to be set before you can process to build those indices:

UPDATE_DESIGN_CATALOG=1

This activates the build of the new indices in general.

DEBUG_GQF_QUERY=30

This enables a more verbose debug output when performing a query and may be helpful for troubleshooting. The debug parameter of DomQuery itself did not work for me yet.

Update catalog and indices

To create the new index you need to issue the updall command for either all databases (which may take some time) or a certain database by specifying the path to it relative from the data directory.

Just issue

load updall <path> -e
load updall <path> -d

in the server console, where the switch -e enables the index for data itself and -d updates the design data for the database(s).

I used the updall without a path and so all my databases where updated.

Performing a query

I used the famous fakenames.nsf to query 40.000 documents, so you may want to make sure to initialize this database, too:

load updall fakenames.nsf -e
load updall fakenames.nsf -d

Note: when you work with the fakenames for the first time and want to replicate it over several servers, please clear the replication history before.

Then I opened a command line window, navigated to the program directory of my Domino server and issued the query command:

C:\Program Files\IBM\Domino>domquery.exe -f "fakenames.nsf" -q "firstname in ('Adam', 'Ken', 'Zach')" -e

0. IN            (childct 3) (totals when complete:) Prep 0.0 msecs, Exec 346.541 msecs, ScannedDocs 40656, Entries 0, FoundDocs 117

        1.firstname = 'Adam' NSF document search estimated cost = 100
[0C68:0002-0F68]                Prep 0.343 msecs, Exec 346.537 msecs, ScannedDocs 40656, Entries 0, FoundDocs 95

        1.firstname = 'Ken' NSF document search estimated cost = 100
[0C68:0002-0F68]                Prep 0.106 msecs, Exec 0.0 msecs, ScannedDocs 40561, Entries 0, FoundDocs 22

        1.firstname = 'Zach' NSF document search estimated cost = 100
[0C68:0002-0F68]                Prep 0.97 msecs, Exec 0.0 msecs, ScannedDocs 40539, Entries 0, FoundDocs 0

[0C68:0002-0F68] Total Documents found = 117

Output explained

You see the different queries and the time that has being consumed to find data. Note that the second search contains less documents than the first (95 to be precise) as we already found documents during the first query - that saves some time, too, which is very smart!

Parameters explained

In the query above I used the required paramaters f (path to database) and q (the query itself). The e parameter stands for "explain" and shows how the query was executed. I used the "in" operator here to search for more than just a single value. Wildcards are currently not supported but should be in GA.

Using the server console

You can also issue the query via the server console:

load domquery

with all the parameters explained above.

The Linux version

Though the Linux version of beta 2 doesn't ship DomQuery, we got this file compiled via the beta forum from John Curtis. To get this running via the server console you have to treat the file this way:

  • rename the file to all lowercase (domquery)
  • makeit executable (as root): chmod u+x domquery
  • copy the file to /opt/ibm/domino/notes/latest/linux
  • create a symlink called "domquery" in /opt/ibm/domino/bin that points to /opt/ibm/domino/bin/tools/startup

Then you can execute the query via the server console with

load domquery

Smile

Thanks also go to Tihomir Pantović for the heads-up!

What's next?

As said, DQL will be available in LotusScript and Java - and the video demo shows how to use it even in Node.js. Of course there will be methods to return the collection of found documents to loop through etc. This post only shows the tip of the ice berg and the simple command line.

I guess it will completely replace full-text searches or building NoteCollections in future development!






Latest comments to this post

John Curtis wrote on 29.03.2019, 17:02

"DEBUG_GQF_QUERY=30

This enables a more verbose debug output when performing a query and may be helpful for troubleshooting. The debug parameter of DomQuery itself did not work for me yet."

I must say as a matter of course - please do NOT use this option if you want good performance - it is verbose.  Plus - do NOT depend upon its output in any code or even manual diagnostic practices - it is completely subject to change - it WILL change.

 Reply to this comment
 Link to this comment
John Curtis wrote on 02.11.2018, 18:54

No danger of copyright infringement, but I thought I'd post the actual values of the limits being discussed directly from the code

#define QUERY_MAX_DOCSSCANNED_DEFAULT 500000
#define QUERY_MAX_VIEW_ENTRIES_SCANNED_DEFAULT 200000
#define QUERY_MAX_TOTAL_MILLISECONDS 300000

They are self-describing and can be overridden in notes.ini (below names) and via parameters when invoking the DQL engine.

QUERY_MAX_DOCS_SCANNED

QUERY_MAX_VIEW_ENTRIES_SCANNED

QUERY_MAX_MSECS_TOTAL

We will get complaints of their being too high and too low; there simply is no ideal setting.  Indeed, there may be a HUGE query people run at night that sets them only for that one time.  The somewhat heartbreaking thing is that if they are exceeded, the query will fail even after running for a long time.  So, you need to probably set them high and then mediate them after a run.  Or just see what works in your environment.  But the intent is to keep a query from going on forever - server citizenship.

To Karsten's point, DQL does results injection - meaning once we run the cheapest of all sibling terms in a boolean "family", we feed the next and the next with those results, greatly cutting down on cost.
 

 Reply to this comment
 Link to this comment
Karsten Lehmann wrote on 03.09.2018, 15:36

DQL uses C API calls, not Elastic Search.

It combines view lookups with NSF searches. View looks are used if there are SELECT @All type of views that have user sortable columns that just contain an item name mentioned in one of the search query terms, e.g. Lastname="Lehmann".

In addition, the DB design needs to be scanned/indexed by the server using a server console command.

If multiple search terms are combined with AND, the terms are sorted first to optimize the documents to be scanned and later terms do not run on the whole NSF, but on a subset (a smaller IDTable).

Let's say you use the query

Lastname="Lehmann" & Firstname>"K"

then a view lookup in a "Lastname" column would result in a small IDTable of all documents that have "Lehmann" in the lastname item, which then could be NSF searched for Firstname>"K".

If a "Firstname" column exists, no NSF scan would be required at all.

 Reply to this comment
 Link to this comment
Oliver Busse wrote on 30.08.2018, 11:00

Sebastian,

there are default limits regarding scanned docs and maximum search time to prevent excessive CPU consumption. Interesting: the limits documented about scanned documents is 200.000, not 500.000 - maybe the documentation is wrong here.

However, you can overwrite them by notes.ini settings (which then is global) or (later in the GA version) via the API for a certain query.

 Reply to this comment
 Link to this comment
Sebastan Grund wrote on 30.08.2018, 10:11

Hi Oliver, thanks for your post!

I just tried it on a copy of a customer's database (>800K docs)- the first search took ~35 sec (via CMD), the following calls 4-5 sec (both CMD and console) - not bad!

But there seems to be a 500K documents limit ("ScannedDocs 500001") - do you know something about such a limit, whether it will be removed in future, or can be changed (e.g. via notes.ini)?.

Thanks and kind regards,

Sebastian

 

 Reply to this comment
 Link to this comment
Oliver wrote on 23.08.2018, 21:53

Nick, I think that DQL is the Elastic Search implementation, so it should also work properly with big data sets. Just a guess, not confirmed, yet, but ES was mentioned on the slides at the beginning of this year.

 Reply to this comment
 Link to this comment
Nick Wall wrote on 23.08.2018, 21:49

Looking forward to V10 release, going to Icon UK in Sept.

In production we have some Domino DBs that have 1 million+ docs, we are in process of pushing these to SQL.  I wonder what DQL perf is like querying these when there is high doc modifications per minute.

 Reply to this comment
 Link to this comment

Leave a comment right here