EntryStore Benchmark¶
The purpose of this benchmark is a volume/speed testing of synthetic data loaded directly into Entrystore, without REST API endpoints. It uses RDF4j storage engine with its API.
Further documentation can be found here https://docs.google.com/document/d/1lY26c56Z9Vty46K9HomdVL9l5fgdhKYTcrREFmu0vAI.
Synthetic data¶
Data is generated with the help of Java Faker library https://github.com/DiUS/java-faker. The time for generating the data depends on the environment setup, but does not depend on the RDF4j storage, modulo or other processing variables.
There are two types of data models to be generated: simple or complex. See the documentation above for further information on these models.
Simple model = 10 triples = 2 Entries¶
Model of a Person
with identifier
, firstName
, lastName
and iterator
. Every Person
has an Address
with identifier
, iterator
, street
, city
, zip
.
Complex model = 43 triples = 8 Entries¶
Extended Simple model with Person
having age
, phone
. On top of having Address
, every Person
has a Spouse
and owns a Company
. Spouse
is another Person
with same properties. Company
has identifier
and legalName
, but also an Address
.
Benchmark modules¶
Benchmark commons¶
- Abstract module that contains common logic used by other benchmark modules. It also contains default configuration, data models and synthetic data generator.
RDF4j benchmark¶
- This benchmark is an implementation using RDF4j API directly. It inserts and then reads the generated data from triplestore, based on storage type (
memory
,native
orlmdb
).
Entrystore benchmark¶
- This benchmark is an implementation using Entrystore API, which is a wrapper around RDF4j API, and adds additional logic, entries and context management and ACL layer. This is the benchmark which should be run to get proper comparable results.
Solr benchmark¶
- This benchmark ads additional overhead to Entrystore benchmark with Solr indexing turned on. Solr indexing is performed asynchronously, and so the whole benchmark waits until last entry is sent to Solr, and not until Solr really indexes it. The results so far have shown, that Solr is not adding critical speed overhead.
Prerequisites¶
- A Java Runtime Environment, version 11 or higher.
Installation¶
- This is a runnable application, you do not need to install anything.
Running¶
java -jar benchmark-entrystore-5.5-SNAPSHOT.jar
-s
type of store. You can choose fromnative
,memory
, or experimentallmdb
. The store folder is created temporarily during runtime, then removed right after benchmark is done with the reading part.-u
size of the universe. How many objects should be generated and inserted.-c
complexity of the model.simple
is default.-t
transaction boolean switch,true
is default. RDF4j can store data in 1 big transaction, or, each object can be stored in a separate transaction. Entrystore can use only multiple transactions, due to extra processing of the object in context etc.-a
ACL boolean switch,false
is default. ACL is adding some overhead, therefore needs to be tested as well. When turned on, all objects are stored with admin privileges.-m
modulo, by which the whole universe is divided in batches. Each batch is then logged as well, so the insert operation time can be seen after each batch is being inserted.-i
intercontexts boolean switch,false
is default. Entrystore can create contexts for various objects. With this switch on, the benchmark will create new context for every batch per modulo. If off, only 1 context is being used.
usage: benchmark [-a <ACL>] [-c <COMPLEXITY>] [-i <INTERCONTEXTS>] [-m
<MODULO>] -s <STORE> [-t <TRANSACTION>] -u <UNIVERSE>
-a,--acl <ACL> Run with ACL: @boolean.
-c,--complexity <COMPLEXITY> Complexity of universe/objects to
process: 'complex' | 'simple'.
-i,--intercontexts <INTERCONTEXTS> Run with interim contexts:
@boolean.
-m,--modulo <MODULO> Record intermediate requests with
modulo: @int.
-s,--store <STORE> Type of store: 'native' | 'memory'
| 'lmdb'.
-t,--transaction <TRANSACTION> Run with multiple transactions:
@boolean.
-u,--universe <UNIVERSE> Size of universe/objects to
process: @int.