Quantcast
Viewing all 391 articles
Browse latest View live

Bi-Weekly Newsletter #40 | Beta 2.8 with Array Indexes and AQL graph traversals

Our upcoming release ArangoDB 2.8 is currently available as BETA , adding stunning new features to AQL and allowing to make use of array indexing. In our beta announcement we list all the details and explain how to use graph traversals in pure AQL. Please give it a try and help us finish the release sooner.

Also in the past two weeks, we’ve published outstanding cluster performance benchmarks, scaling ArangoDB to Gigabyte/s bandwidth on Mesosphere.

Starting with 8 database nodes, we scaled ArangoDB in a Mesos Cluster to 80 nodes – measuring throughput and latency for reads/writes and mixed loads. In the end, we wrote 1.1 Million documents per second – or about 1 GB of data per second. That’s amazing.

All test details and results are published in a cluster performance white paper that you can download today. Interesting for those with big cluster ambitions: Deployment of ArangoDB in a running Mesosphere DCOS environment is just a single command! Furthermore, ArangoDB is the only operational database that is fully certified for the Mesosphere DCOS .

ArangoDB Releases

The first BETA of ArangoDB 2.8 is available for download.

New features include:

  • Hash indexes and skiplist indexes can now optionally be defined for array values so they index individual array members.
  • added AQL keywords GRAPH, OUTBOUND, INBOUND and ANY for use in graph traversals
  • Several replication improvements
  • bind parameters in AQL editor
  • automatic deadlock detection for transactions

You can find a full list of changes in our change-log . Also available: ArangoDB 2.7.2 with replication improvements.

Articles and Presentations

Documentation and Cookbook

New and Updated Drivers

Projects and Integrations

Questions on Stack Overflow

Answered:

Not answered yet:


ArangoDB 2.8 beta 2

The second beta of ArangoDB 2.8 is available for testing. Now it’s your turn – give it a try, report bugs and provide us with your early feedback on the new features (Array Indexes and graph traversal in AQL). Do you like what you see?

Here’s what changed recently (beta 1 / beta 2):

  • added AQL query optimizer rule “sort-in-values”

    This rule pre-sorts the right-hand side operand of the IN and NOT IN operators so the operation can use a binary search with logarithmic complexity instead of a linear search. The rule is applied when the right-hand side operand of an IN or NOT IN operator in a filter condition is a variable that is defined in a different loop/scope than the operator itself. Additionally, the filter condition must consist of solely the IN or NOT IN operation in order to avoid any side-effects.

  • changed collection status terminology in web interface for collections for which an unload request has been issued from in the process of being unloaded to will be unloaded.

  • unloading a collection via the web interface will now trigger garbage collection in all v8 contexts and force a WAL flush. This increases the chances of performing the unload faster.

  • added the following attributes to the result of collection.figures() and the corresponding HTTP API at PUT /_api/collection/<name>/figures:

    • documentReferences: The number of references to documents in datafiles that JavaScript code currently holds. This information can be used for debugging compaction and unload issues.
    • waitingFor: An optional string value that contains information about which object type is at the head of the collection’s cleanup queue. This information can be used for debugging compaction and unload issues.
    • compactionStatus.time: The point in time the compaction for the collection was last executed. This information can be used for debugging compaction issues.
    • compactionStatus.message: The action that was performed when the compaction was last run for the collection. This information can be used for debugging compaction issues.

    Note: waitingFor and compactionStatus may be empty when called on a coordinator in a cluster.

  • the compaction will now provide queryable status info that can be used to track its progress. The compaction status is displayed in the web interface, too.

  • better error reporting for arangodump and arangorestore

  • arangodump will now fail by default when trying to dump edges that refer to already dropped collections. This can be circumvented by specifying the option --force true when invoking arangodump

  • fixed cluster upgrade procedure

  • the AQL functions NEAR and WITHIN now have stricter validations for their input parameters limit, radius and distance. They may now throw exceptions when invalid parameters are passed that may have not led to exceptions in previous versions.

  • deprecation warnings now log stack traces

  • Foxx: improved backwards compatibility with 2.5 and 2.6

    • reverted Model and Repository back to non-ES6 “classes” because of compatibility issues when using the extend method with a constructor

    • removed deprecation warnings for extend and controller.del

    • restored deprecated method Model.toJSONSchema

    • restored deprecated type, jwt and sessionStorageApp options in Controller#activateSessions

ArangoDB 2.7.3 – Maintenance release

ArangoDB 2.7.3 – a maintenance release – is ready for download.

Changes:

  • fixed disappearing of documents for collections transferred via sync or syncCollection if the collection was dropped right before synchronization and drop and (re-)create collection markers were located in the same WAL file

  • fixed an issue where overwriting the system sessions collection would break the web interface when authentication is enabled

ArangoDB 2.8 (beta 3)

ArangoDB 2.8 (beta3) is available for testing.

The last beta release of ArangoDB 2.8 – at least for 2015 – comes with the following bugfixes and improvements:

  • web interface: fixed a graph display bug concerning dashboard view

  • web interface: fixed several bugs during the dashboard initialize process

  • web interface: included several bugfixes: #1597, #1611, #1623

  • added --create-collection-type option to arangoimp

    This allows specifying the type of the collection to be created when --create-collection is set to true.

  • AQL query optimizer now converts LENGTH(collection-name) to an optimized expression that returns the number of documents in a collection

  • slightly adjusted V8 garbage collection strategy so that collection eventually happens in all contexts that hold V8 external references to documents and collections.

    also adjusted default value of --javascript.gc-frequency from 10 seconds to 15 seconds, as less internal operations are carried out in JavaScript.

  • fixes for AQL optimizer and traversal

  • Foxx export cache should no longer break if a broken app is loaded in the web admin interface.

  • adjusted the behavior of the expansion ([*]) operator in AQL for non-array values

    In ArangoDB 2.8, calling the expansion operator on a non-array value will always return an empty array. Previous versions of ArangoDB expanded non-array values by calling the TO_ARRAY() function for the value, which for example returned an array with a single value for boolean, numeric and string input values, and an array with the object’s values for an object input value. This behavior was inconsistent with how the expansion operator works for the array indexes in 2.8, so the behavior is now unified:

    • if the left-hand side operand of [*] is an array, the array will be returned as is when calling [*] on it
    • if the left-hand side operand of [*] is not an array, an empty array will be returned by [*]

    AQL queries that rely on the old behavior can be changed by either calling TO_ARRAY explicitly or by using the [*] at the correct position.

    The following example query will change its result in 2.8 compared to 2.7:

    LET values = “foo” RETURN values[*]

In 2.7 the query has returned the array [ "foo" ], but in 2.8 it will return an empty array [ ]. To make it return the array [ "foo" ] again, an explicit TO_ARRAY function call is needed in 2.8 (which in this case allows the removal of the [*] operator altogether). This also works in 2.7:

LET values = "foo" RETURN TO_ARRAY(values)

Another example:

LET values = [ { name: "foo" }, { name: "bar" } ]
  RETURN values[*].name[*]

The above returned [ [ "foo" ], [ "bar" ] ] in 2.7. In 2.8 it will return [ [ ], [ ] ], because the value of name is not an array. To change the results to the 2.7 style, the query can be changed to

LET values = [ { name: "foo" }, { name: "bar" } ]
  RETURN values[* RETURN TO_ARRAY(CURRENT.name)]

The above also works in 2.7. The following types of queries won’t change:

LET values = [ 1, 2, 3 ] RETURN values[*]
  LET values = [ { name: "foo" }, { name: "bar" } ] RETURN values[*].name
  LET values = [ { names: [ "foo", "bar" ] }, { names: [ "baz" ] } ] RETURN values[*].names[*]
  LET values = [ { names: [ "foo", "bar" ] }, { names: [ "baz" ] } ] RETURN values[*].names[**]

Running ArangoDB on a Mac with Docker and Kitematic / Docker-Machine

When I work with ArangoDB on my Mac, I usually install ArangoDB via homebrew and do tests on the latest new developments based on the devel-branch, compiling ArangoDB right from source.

To test a feature in a special version I use docker images that need a virtual machine on MacOS. I struggled with Boot2Docker several times and recently tried Docker Machine – with the Web UI Kitematic – currently in beta.

Here’s how to start with ArangoDB and Docker Machine:

First you need to install Docker Machine as described here. In my case, the installation process detects an earlier Boot2Docker Installation and migrates it to a new Docker Machine VM. Nice.

Image may be NSFW.
Clik here to view.
Screen Shot 2016-01-18 at 16.34.38

During the installation process you can choose to install the Docker Machine and Kitematic (current 0.8). I installed both to have a try. After installation I chose the fancy UI stuff and opened Kitematic.

Searching for ArangoDB you get the official Docker Hub Image as the first result – and by default, the latest stable release.

Image may be NSFW.
Clik here to view.
Screen Shot 2016-01-15 at 16.47.17

Just one click on “create” and your ArangoDB container is up and running!

Image may be NSFW.
Clik here to view.
Screen Shot 2016-01-15 at 15.50.41

Voilà, ArangoDB is available (1) – you can use the web-interface from the stated ACCESS URL (2), and access the arangosh console with one click on the EXEC (3) icon and then typing arangosh in the console.

In the settings of my container I can specify a local folder on my Mac to store my Foxx based JavaScript extensions. That allows me to use my favorite editor (e.g. Sublime Text) to edit my Foxx apps right away.

Image may be NSFW.
Clik here to view.
docker_volumes0
Image may be NSFW.
Clik here to view.
Screen Shot 2016-01-22 at 14.50.43

Caution: There’s an option to change the database directory to a local folder too. But that’s not supported in OS X, so you might not want to make use of that option and stick with the default path (or mount a folder in the virtual machine from command line).

Now change the port to something I can remember…:

Image may be NSFW.
Clik here to view.
docker_port

Now I want to get productive with ArangoDB, so I want to import some data. Therefore, I go to the DOCKER CLI (Link is also provided in Kitematic on the bottom left) and start inspecting what’s running here…:

docker-machine ls

This command shows my running VM called default on tcp://192.168.99.100:2376 which I want to use with docker commands.

docker-machine env default
eval $(docker-machine env default)

Image may be NSFW.
Clik here to view.
docker_env

Now I have my docker commands available and can copy some test data that I can import via arangoimp to my new database.

docker cp mydata.tar.gz arangodb:/tmp/mydata.tar.gz

Next, I’ll extract the JSON data and import everything in the collection customers using arangoimp:

arangoimp --file RESULTS_CUSTOMERS.json --type json --collection customers

Image may be NSFW.
Clik here to view.
arangoimp

That’s it.

Kitematic allows me to get started with ArangoDB with really just one click. And changing settings like volumes is easy. Do you really need the UI…? Well, I’ve not decided yet.

Updated on 01/22: db volume could not be mounted to a local Mac folder.

AQL optimizer improvements for 2.8

With the 2.8 beta phase coming to an end it’s time to shed some light on the improvements in the 2.8 AQL optimizer. This blog post summarizes a few of them, focusing on the query optimizer. There’ll be a follow-up post that will explain dedicated new AQL features soon.

Array indexes

2.8 allows creating hash and skiplist indexes on attributes which are arrays. Creating such index works similar to creating a non-array index, with the exception that the name of the array attribute needs to be followed by a [*] in the index fields definition:

db._create("posts");
db.posts.ensureIndex({ type: "hash", fields: [ "tags[*]" ] });

Now if the tags attribute of a document in the posts collection is an array, each array member will be inserted into the index:

db.posts.insert({ tags: [ "arangodb", "database", "aql" ] });
db.posts.insert({ tags: [ "arangodb", "v8", "javascript" ] });
db.posts.insert({ tags: [ "javascript", "v8", "nodejs" ] });

The index on tags[*] will now contain the values arangodb, database, aql and nosql for the first document, arangodb, v8 and javascript for the second, and javascript, v8 and nodejs for the third.

The following AQL will find any documents that have a value of javascript contained in their tags value:

FOR doc IN posts
  FILTER 'javascript' IN doc.tags[*]
  RETURN doc`

This will use the array index on tags[*].

The array index works by inserting all members from an array into the index separately. Duplicates are removed automatically when populating the index.

An array index can also be created on a sub-attribute of array members. For example, the following definition will make sure the name sub-attributes of the tags array values will make it into the index:

db.posts.ensureIndex({ type: "hash", fields: [ "tags[*].name" ] });`

That will allow storing data as follows:

db.posts.insert({ tags: [ { name: "arangodb" }, { name: "database" }, { name: "aql" } ] });
db.posts.insert({ tags: [ { name: "arangodb" }, { name: "v8" }, { name: "javascript" } ] });
db.posts.insert({ tags: [ { name: "javascript" }, { name: "v8" }, { name: "nodejs" } ] });`

The (index-based) selection query for this data structure then becomes

FOR doc IN posts
  FILTER 'javascript' IN doc.tags[*].name
  RETURN doc`

Contrary to MongoDB, there is no automatic conversion to array values when inserting non-array values in ArangoDB. For example, the following plain strings will not be inserted into an array index, simply because the value of the index attribute is not an array:

db.posts.insert({ tags: "arangodb" });
db.posts.insert({ tags: "javascript" });
db.posts.insert({ tags: "nodejs" });`

Note that in this case a non-array index can still be used.

Use of multiple indexes per collection

The query optimizer can now make use of multiple indexes if multiple filter conditions are combined with logical ORs, and all of them are covered by indexes of the same collection.

Provided there are separate indexes present on name and status, the following query can make use of index scans in 2.8, as opposed to full collection scans in 2.7:

FOR doc IN users
  FILTER doc.name == 'root' || doc.status == 'active'
  RETURN doc`

Image may be NSFW.
Clik here to view.
multiple-indexes
If multiple filter conditions match for the same document, the result will automatically be deduplicated, so each document is still returned at most once.

Sorted IN comparison

Another improvement for the optimizer is to pre-sort comparison values for IN and NOT IN so these operators can use a (much faster) binary search instead of a linear search.

The optimization will be applied automatically for IN / NOT IN comparison values used in filters, which are used inside of a FOR loop, and depend on runtime values. For example, the optimization will be applied for the following query:

LET values = /* some runtime expression here */
FOR doc IN collection
  FILTER doc.value IN values
  RETURN doc`

The optimization will not be applied for IN comparison values that are value literals and those that are used in index lookups. For these cases the comparison values were already deduplicated and sorted.

“sort-in-values” will appear in the list of applied optimizer rules if the optimizer could apply the optimization:

Optimization for LENGTH(collection)

There are multiple ways for counting the total number of documents in a collection from inside an AQL query. One obvious way is to use RETURN LENGTH(collection).

That variant however was inefficient as it fully materialized the documents before counting them. In 2.8 calling LENGTH() for a collection will get automatically replaced by a call to a special function that can efficiently determine the number of documents. For larger collections, this can be several thousand times faster than the naive 2.7 solution.

C++ implementation for many AQL functions

Many existing AQL functions have been backed with a C++ implementation that removes the need for some data conversion that would otherwise happen if the function were implemented in V8/JavaScript only. More than 30+ functions have been changed, including several that may produce bigger result sets (such as EDGES(), FULLTEXT(), WITHIN(), NEAR()) and that will hugely benefit from this.

Improved skip performance

2.8 improves the performance of skipping over many documents in case no indexes and no filters are used. This might sound like an edge case, but it is quite common when the task is to fetch documents from a big collection in chunks and there is certainty that there will be no parallel modifications.

For example, the following query runs about 3 to 5 times faster in 2.8, and this improvements can easily sum up to notable speedups if the query is called repeatedly with increasing offset values for LIMIT:

FOR doc IN collection
  LIMIT 1000000, 10
  RETURN doc`

Maintenance Release – ArangoDB 2.7.5

Still waiting for the 2.8 release announcement…
So in the meantime, let’s have a look at the latest maintenance release of ArangoDB 2.7.

Here are the changes that come with version 2.7.5:

  • backported added automatic deadlock detection for transactions

    In case a deadlock is detected, a multi-collection operation may be rolled back automatically and fail with error 29 (deadlock detected). Client code for operations containing more than one collection should be aware of this potential error and handle it accordingly, either by giving up or retrying the transaction.

  • improved internal datafile statistics for compaction and compaction triggering conditions, preventing excessive growth of collection datafiles under some workloads. This should also fix issue #1596.

  • Foxx export cache should no longer break if a broken app is loaded in the web admin interface.

  • Foxx: removed some incorrect deprecation warnings.

  • Foxx: mocha test paths with wildcard characters (asterisks) now work on Windows

ArangoDB 2.8 w/ AQL Graph Traversals, Array Indexes, Aggregation

We welcome 2016 with our first big news yet – the release of ArangoDB 2.8!

Now you can use new AQL keywords to traverse a graph even more convenient – a big deal for those who like to get the maximum out of their connected data. ArangoDB is getting faster with every iteration, in this release we have implemented several AQL functions and arithmetic operations in super-fast C++ code, optimizer rules and indexing improved further to help you getting things done faster. Download ArangoDB 2.8 here.

Array Indexes

The added Array Indexes are a major improvement to ArangoDB that you will love and never want to miss again. Hash indexes and skiplist indexes can now be defined for array values as well, so it’s freaking fast to access documents by individual array values. Let assume you want to retrieve articles that are tagged with “graphdb”, you can now use an index on the tags array:

{
    text: "Here's what I want to retrieve...",
    tags: [ "graphdb", "ArangoDB", "multi-model" ]
  }

An added hash-index on tags (ensureHashIndex("tags[*]")) can be used for finding all documents having "graphdb" somewhere in their tags array using the following AQL query:

FOR doc IN documents
    FILTER "graphdb" IN doc.tags[*]
    RETURN doc

Have fun with these new indexes!


AQL Graph Traversal

Next, the mentioned AQL graph traversals. The query language AQL adds the keywords GRAPH, OUTBOUND, INBOUND and ANY for use in graph traversals. Using plain AQL in ArangoDB 2.8 you can create a shopping list for your friends birthday gifts, related to products they already own and up to 5 ideas ordered by price.

FOR friend IN OUTBOUND @me isFriendOf
  LET toBuy = (
  FOR bought IN OUTBOUND friend hasBought
    FOR combinedProduct IN OUTBOUND bought combinedProducts
      SORT combinedProduct.price
      LIMIT 5
      RETURN combinedProduct
)
RETURN { friend, toBuy }

You can improve this list by limiting the result to friends that have a birthday within the next 2 months (assuming

birthday: "1970-01-15").

LET maxDate = DATE_ADD(DATE_NOW(), 2, 'months')
  ...
  FILTER DATE_ISO8601(DATE_YEAR(DATE_NOW()),DATE_MONTH(friend.birthday),DATE_DAY(friend.birthday)) < maxDate

Image may be NSFW.
Clik here to view.
graph4

Using Dave as a bind parameter for @me, we get the following result for our shopping tour:

[
  {
    "friend": {
      "name": "Julia",
      "_id": "users/Julia",
      "_rev": "1868379126",
      "_key": "Julia"
    },
    "toBuy": [
      {
        "price": 12,
        "name": "SanDisk Extreme SDHC UHS-I/U3 16GB Memory Card",
        "_id": "products/SanDisk16",
        "_rev": "2012820470",
        "_key": "SanDisk16"
      },
      {
        "price": 21,
        "name": "Lightweight Tripod 60-Inch with Bag",
        "_id": "products/Tripod",
        "_rev": "2003514358",
        "_key": "Tripod"
      },
      {
        "price": 99,
        "name": "Apple Pencil",
        "_id": "products/ApplePencil",
        "_rev": "2019177462",
        "_key": "ApplePencil"
      },
      {
        "price": 169,
        "name": "Smart Keyboard",
        "_id": "products/SmartKeyboard",
        "_rev": "2020160502",
        "_key": "SmartKeyboard"
      }
    ]
  },
  {
    "friend": {
      "name": "Debby",
      "city": "Dallas",
      "_id": "users/Debby",
      "_rev": "1928803318",
      "_key": "Debby"
    },
    "toBuy": [
      {
        "price": 12,
        "name": "Lixada Bag for Self Balancing Scooter",
        "_id": "products/LixadaScooterBag",
        "_rev": "2018194422",
        "_key": "LixadaScooterBag"
      }
    ]
  }
]

Usage of these new keywords as collection names, variable names or attribute names in AQL queries will not be possible without quoting. For example, the following AQL query will still work as it uses a quoted collection name and a quoted attribute name:

FOR doc IN `OUTBOUND`
  RETURN doc.`any`

Please have a look in the documentation for further details.

Syntax for managed graphs:

FOR vertex[, edge[, path]] IN MIN [..MAX] OUTBOUND|INBOUND|ANY startVertex GRAPH graphName

Working on collection sets:

FOR vertex[, edge[, path]] IN MIN[..MAX] OUTBOUND|INBOUND|ANY startVertex edgeCollection1, .., edgeCollectionN


AQL COLLECT … AGGREGATE

Additional, there is a cool new aggregation feature that was added after the beta releases. AQL introduces the keyword AGGREGATE for use in AQL COLLECT statements.

Using AGGREGATE allows more efficient aggregation (incrementally while building the groups) than previous versions of AQL, which built group aggregates afterwards from the total of all group values.

AGGREGATE can be used inside a COLLECT statement only. If used, it must follow the declaration of grouping keys:

FOR doc IN collection
  COLLECT gender = doc.gender AGGREGATE minAge = MIN(doc.age), maxAge = MAX(doc.age)
  RETURN { gender, minAge, maxAge }

or, if no grouping keys are used, it can follow the COLLECT keyword:

FOR doc IN collection
  COLLECT AGGREGATE minAge = MIN(doc.age), maxAge = MAX(doc.age)
  RETURN { minAge, maxAge }

Only specific expressions are allowed on the right-hand side of each AGGREGATE assignment:

  • on the top level the expression must be a call to one of the supported aggregation functions LENGTH, MIN, MAX, SUM, AVERAGE, STDDEV_POPULATION, STDDEV_SAMPLE, VARIANCE_POPULATION, or VARIANCE_SAMPLE

  • the expression must not refer to variables introduced in the COLLECT itself

Within the last weeks we have already published blog posts on several new features and enhancements in ArangoDB 2.8. So have a look at AQL function speedups, automatic deadlock detection (which is backported to 2.7.5 as well). The blog post about using multiple indexes per collection is worth to read, as well as the index speedups article. In the web interface you can now use bind parameters in the AQL editor.

There is a lot more to read in the changelog of ArangoDB 2.8 and we will proceed with the presentation of some features in detailed blog posts. You can find the latest documentation on docs.arangodb.com.


Killing a long-running query

Suppose there is an AQL query that’s executing in the server for a long time already and you want to get rid of it. What can be done to abort that query?

If a connection to the server can still be established, the easiest is to use the ArangoShell to fetch the list of currently executing AQL queries and send a kill command to the server for the correct query.

To start, we can fetch the list of all running queries and print their ids, query strings and runtimes. This is only inspection and does not abort any query:

var queries = require("org/arangodb/aql/queries"); queries.current();

Here’s an example result for the list of running queries:

[
  {
    "id" : "190",
    "query" : "RETURN SLEEP(1000)",
    "started" : "2016-01-26T22:41:24Z",
    "runTime" : 218.49146389961243
  }
]

To now kill a query from the list, we can pass the query’s id to kill:

var queries = require("org/arangodb/aql/queries");
queries.kill("190");  /* insert actual query id here */

If a query was actually killed on the server, that call should return without an error, and the server should have logged a warning in addition.

If we wanted to abort one or many queries from the list solely by looking at query string patterns or query runtime, we could iterate over the list of current queries and kill each one that matches a predicate.

For example, the following snippet will abort all currently running queries that contain the string SLEEP anywhere inside their query string:

var queries = require("org/arangodb/aql/queries");

queries.current().filter(function(query) {
  return query.query.match(/SLEEP/);    /* predicate based on query string */
}).forEach(function(query) {
  print("killing query: ", query);      /* print what we're killing */
  queries.kill(query.id);               /* actually kill query */
});

Filtering based on current query runtime is also simple, by adjusting the predicate. To abort all queries that ran longer than 30 seconds use:

var queries = require("org/arangodb/aql/queries");

queries.current().filter(function(query) {
  return query.runTime > 30;            /* predicate based on query runtime */
}).forEach(function(query) {
  print("killing query: ", query);      /* print what we're killing */
  queries.kill(query.id);               /* actually kill query */
});

Please make sure the predicates are correct so only the actually intended queries get aborted!

To test a predicate without killing a query, use the above code without the forEach part that did the killing.

Bi-Weekly Newsletter #41 | ArangoDB 2.8 release

Big news this week – ArangoDB 2.8 is generally available for download. This is a huge step forward for our still young project, great improvements and new features await exploration. To name just some of the biggest ones…

  • Graph Pattern Matching enables you to traverse a graph using plain AQL
  • Arrax Indexes allow quick access to data stored in arrays, e.g. indexing tags in articles
  • Automatic deadlock detection for transactions.

With 2.8 you will benefit from dozens of internal changes that improve performance and robustness of the database, e.g. more than 30 existing AQL functions have been backed with a C++ implementation.

More than that, we are delighted that we can share some new case studies of clients using ArangoDB in production. Flightstats Inc. (OR) – the leading provider of real-time global flight data and Liaison Technologies (GA), a global data management and integration company, shared how they use ArangoDB.

ArangoDB Releases

ArangoDB 2.8 is available via your favorite package manager, as official docker image and for download from our homepage. Additional, an ArangoDB 2.7.5 release is available too, which includes the backported automatic deadlock detection and some other improvements and fixes.

You can find a full list of changes in our change-log (2.8).

Articles and Presentations

Documentation and Cookbook

New and Updated Drivers

Projects and Integrations

Questions on Stack Overflow

Answered:

Not answered yet:

Events

Did you know?

We will organize ArangoDB Hackathon and workshop series for our open source community in US and Europe soon… So stay tuned!

Our final info about the beauty of code: Visualization of our project on github

Small Things in ArangoDB 2.8: Explain Improvements, POW, Arangoimp

Explain Improvements

Explaining AQL queries becomes even easier in ArangoDB 2.8. While previous versions required writing a hard-to-memorize command like

require("org/arangodb/aql/explainer").explain(query);

to explain an AQL query from the ArangoShell, 2.8 reduces this task to a mere

db._explain(query);

Apart from that, explain in 2.8 is smarter when confronted with very lengthy query strings, and with queries that contain huge hard-coded string, array, or object values.

For example, when creating an array bind variable with 1,000 values and using them in an explained query, 2.7 would print the entire 1,000 array values in the explain output:

var keys = [];
for (var i = 0; i < 1000; ++i) {
  keys.push("test" + i);
}

var query = "FOR i IN @keys RETURN i";
require("org/arangodb/aql/explainer").explain({
  query: query,
  bindVars: {
    keys: keys
  }
});

Image may be NSFW.
Clik here to view.
explain-27

2.8 will instead truncate longer arrays and objects in the explain output for much more improved readability:

Image may be NSFW.
Clik here to view.
explain-28

Automatic value truncation will occur for array and object values with more than 20 elements or for string values longer than 1,024 characters. The truncation for explain will occur if these values are hard-coded into the query or are passed via bind parameters.

Truncation only happens inside the explain results processing and thus cannot affect the actual query results.


POW

ArangoDB 2.8 now provides a dedicated AQL function for exponentiation. This will save users a lot of trouble in case exponentiation is needed inside an AQL query, which up to 2.7 required writing and registering an AQL user-defined function.

With 2.8 it becomes as simple as RETURN POW(2, 16) to raise 2 to the power of 16 from inside AQL.


Collection type for Arangoimp

When trying to import data into ArangoDB from a JSON or CSV file using the arangoimp binary, there is always the chance that the target collection does not yet exist.

In order to create a missing target collection arangoimp has provided the option
--create-collection true:

arangoimp                       \
  --file users.json             \
  --collection users            \
  --create-collection true

However there hasn’t been a way of specifying the type for the target collection, so the new collection was always created as document collection.

To import data into an edge collection, the target collection needed to be created by another means, e.g. by using the ArangoShell. It would have been more handy if arangoimp were able to create edge collections too.

2.8 finally adds that feature, and it’s simple to use: to create an edge collection if the target collection does not exist, append the --create-collection-type edge option when invoking arangoimp:

arangoimp                       \
  --file users.json             \
  --collection users            \
  --create-collection true      \
  --create-collection-type edge

ArangoDB 2.8.2 with Replication Improvements

ArangoDB 2.8.2 maintenance release comes with several replication improvements and bug fixes. You can download the latest version from our download page.

What’s changed:

  • the continuous replication applier will now prevent the master’s WAL logfiles from being removed if they are still needed by the applier on the slave. This should help slaves that suffered from masters garbage collection WAL logfiles which would have been needed by the slave later.

    The initial synchronization will block removal of still needed WAL logfiles on the master for 10 minutes initially, and will extend this period when further requests are made to the master. Initial synchronization hands over its handle for blocking logfile removal to the continuous replication when started via the setupReplication function. In this case, continuous replication will extend the logfile removal blocking period for the required WAL logfiles when the slave makes additional requests.

    All handles that block logfile removal will time out automatically after at most 5 minutes should a master not be contacted by the slave anymore (e.g. in case the slave’s replication is turned off, the slaves loses the connection to the master or the slave goes down).

  • added all-in-one function setupReplication to synchronize data from master to slave and start the continuous replication:

    require("@arangodb/replication").setupReplication(configuration);

    The command will return when the initial synchronization is finished and the continuous replication has been started, or in case the initial synchronization has failed.

    If the initial synchronization is successful, the command will store the given configuration on the slave. It also configures the continuous replication to start automatically if the slave is restarted, i.e. autoStart is set to true.

    If the command is run while the slave’s replication applier is already running, it will first stop the running applier, drop its configuration and do a resynchronization of data with the master. It will then use the provided configration, overwriting any previously existing replication configuration on the slave.

    The following example demonstrates how to use the command for setting up replication for the _system database. Note that it should be run on the slave and not the master:

    db._useDatabase("_system");
      require("@arangodb/replication").setupReplication({
        endpoint: "tcp://master.domain.org:8529",
        username: "myuser",
        password: "mypasswd",
        verbose: false,
        includeSystem: false,
        incremental: true,
        autoResync: true
      });

  • the sync and syncCollection functions now always start the data synchronization as an asynchronous server job. The call to sync or syncCollection will block until synchronization is either complete or has failed with an error. The functions will automatically poll the slave periodically for status updates.

    The main benefit is that the connection to the slave does not need to stay open permanently and is thus not affected by timeout issues. Additionally the caller does not need to query the synchronization status from the slave manually as this is now performed automatically by these functions.

  • fixed undefined behavior when explaining some types of AQL traversals, fixed display of some types of traversals in AQL explain output.

ArangoDB Bi-Weekly #43 | Release 2.8.2 with Replication Improvements & more

ArangoDB gains more and more traction in 2016! The whole team is amazed by the feedback of our users and the accelerating number of Stargazers and Contributors joining the ArangoDB Community. Big thanks to all of you from the US, Brazil, India and Europe… you make everything possible!

On the technical side we just released ArangoDB 2.8.2 and it is now available for download.

ArangoDB Releases

ArangoDB 2.8.2 maintenance release with several replication improvements and bug fixes is available for download .

You can find a full list of changes in our change-log (2.8.2).

Articles and Presentations

Documentation and Cookbook

New and Updated Drivers

Projects and Integrations

Questions on Stack Overflow

Answered:

Not answered yet:

Events

  • Feb 9, 2016 – Graz: Graz JavaScript Meetup Big thanks to our community member Romana Dorfer Image may be NSFW.
    Clik here to view.
    :)
  • Feb 25, 2016 – Cologne: ThoughtWorks

Did you know?

There is a cool POC for ArangoDB front-end from our community member – ArangoDB-view. Image may be NSFW.
Clik here to view.
:)
Remember /view when mounting.

By the way, you can vote for ArangoDB to get packaged by Bitnami cloud Hosting Library.

Linenoise NG – a BSD licensed readline replacement with UTF8 support

For projects that are BSD or Apache 2 licensed, Linenoise (by Salvatore Sanfilippo) is a pretty small, portable GNU readline (GPL) replacement. Based on the work of Salvatore and 10gen Inc. this Linenoise NG implementation adds UTF8 and Windows support, uses a BSD license and can be used in any kind of program.

Features:

  • single-line and multi-line editing mode with the usual key bindings implemented
  • history handling
  • completion
  • BSD license source code
  • Only uses a subset of VT100 escapes (ANSI.SYS compatible)
  • UTF8 aware
  • support for Linux, MacOS and Windows

Linenoise NG deviates from Salvatore’s original goal to have a minimal readline replacement for the sake of supporting UTF8 and Windows. It deviates from 10gen Inc.’s goal to create a C++ interface to linenoise. This library uses C++ internally, but to the user it provides a pure C interface that is compatible with the original linenoise API.

Contributions welcome! See the repository on Github for details.

Using GraphQL with NoSQL database ArangoDB

GraphQL is a query language created by Facebook for modern web and mobile applications as an alternative to REST APIs. Following the original announcement alongside Relay, Facebook has published an official specification and reference implementation in JavaScript. Recently projects outside Facebook like Meteor have also begun to embrace GraphQL.

Users have been asking us how they can try out GraphQL with ArangoDB. While working on the 2.8 release of our NoSQL database we experimented with GraphQL and published an ArangoDB-compatible wrapper for GraphQL.js. With the general availability of ArangoDB 2.8 you can now use GraphQL in ArangoDB using Foxx services (JavaScript in the database).

A GraphQL primer

GraphQL is a query language that bears some superficial similarities with JSON. Generally GraphQL APIs consist of three parts:

The GraphQL schema is implemented on the server using a library like graphql-sync and defines the types supported by the API, the names of fields that can be queried and the types of queries that can be made. Additionally it defines how the fields are resolved to values using a backend (which can be anything from a simple function call, a remote web service or accessing a database collection).

The client sends queries to the GraphQL API using the GraphQL query language. For web applications and JavaScript mobile apps you can use either GraphQL.js or graphql-sync to make it easier to generate these queries by escaping parameters.

The server exposes the GraphQL API (e.g. using an HTTP endpoint) and passes the schema and query to the GraphQL implementation, which validates and executes the query, later returning the output as JSON.

GraphQL vs REST

Whereas in REST APIs each endpoint represents a single resource or collection of resources, GraphQL is agnostic of the underlying protocols. When used via HTTP it only needs a single endpoint that handles all queries.

The API developer still needs to decide what information should be exposed to the client or what access controls should apply to the data but instead of implementing them at each API endpoint, GraphQL allows centralising them in the GraphQL schema. Instead of querying multiple endpoints, the client can pick and choose from the schema when defining the query and filter the response to only contain the fields it actually needs.

For example, the following GraphQL query:

query {
 user(id: "1234") {
   name
   friends {
     name
   }
 }
}

could return a response like this:

{
 "data": {
   "user": {
     "name": "Bob",
     "friends": [
       {
         "name": "Alice"
       },
       {
         "name": "Carol"
       }
     ]
   }
 }
}

whereas in a traditional REST API accessing the names of the friends would likely require additional API calls and filtering the responses to certain fields would either require proprietary extensions or additional endpoints.

GraphQL Demo Service

If you are running ArangoDB 2.8 you can install the Foxx service demo-graphql from the Store. The service provides a single HTTP POST endpoint /graphql that accepts well-formed GraphQL queries against the Star Wars data set used by GraphQL.js.

It supports three queries:

  • hero(episode) returns the human or droid that was the hero of the given episode or the hero of the Star Wars saga if no episode is specified. The valid IDs of the episodes are "NewHope", "Empire", "Jedi" and "Awakens" corresponding to episodes 4, 5, 6 and 7.
  • human(id) returns the human with the given ID (a string value in the range of "1000" to "1007"). Humans have an id, name and optionally a homePlanet.
  • droid(id) does the same for droids (with IDs "2000", "2001" and "2002"). Droids don’t have a homePlanet but may have a primaryFunction.

Both droids and humans have friends (which again can be humans or droids) and a field appearsIn mapping them to episodes (which have an id, title and description).

For example, the following query:

{
 human(id: "1007") {
   name
   friends {
     name
   }
   appearsIn {
     title
   }
 }
}

returns the following JSON:

{
 "data": {
   "human": {
     "name": "Wilhuff Tarkin",
     "friends": [
       {
         "name": "Darth Vader"
       }
     ],
     "appearsIn": [
       {
         "title": "A New Hope"
       }
     ]
   }
 }
}

It’s also possible to do deeply nested lookups like “what episodes have the friends of friends of Luke Skywalker appeared in” (but note that mutual friendships will result in some duplication in the output):

{
 human(id: "1000") {
   friends {
     friends {
       appearsIn {
         title
       }
     }
   }
 }
}

Additionally it’s possible to make queries about the API itself using __schema and __type. For example, the following tells us the “droid” query returns something of a type called "Droid":

{
 __schema {
   queryType {
     fields {
       name
       type {
         name
       }
     }
   }
 }
}

And the next query tells us what fields droids have (so we know what fields we can request when querying droids):

{
 __type(name: "Droid") {
   fields {
     name
   }
 }
}

GraphQL: The Good

GraphQL shifts the burden of having to specify what particular subset of information should be returned to the client. Unlike traditional REST based solutions this is built into the language from the start: a client will only see information they explicitly request, they don’t have to know about anything they’re not already interested in.

At the same time a single GraphQL schema can be written to represent the entire global state graph of an application domain without having to hard-code any assumptions about how that data will be presented to the user. By making the schema declarative GraphQL avoids the necessary duplication and potential for subtle bugs involved in building equally exhaustive HTTP APIs.

GraphQL also provides mechanisms for introspection, allowing developers to explore GraphQL APIs without external documentation.

GraphQL is also protocol agnostic. While REST directly builds on the semantics of the underlying HTTP protocol, GraphQL brings its own semantics, making it easy to re-use GraphQL APIs for non-HTTP communication (such as Web Sockets) with minimal effort.

GraphQL: The Bad

The main drawback of GraphQL as implemented in GraphQL.js is that each object has to be retrieved from the data source before it can be queried further. For example, in order to retrieve the friends of a person, the schema has to first retrieve the person and then retrieve the person’s friends using a second query.

Currently all existing demonstrations of GraphQL use external databases with ORMs or ODMs with complex GraphQL queries causing multiple consequent network requests to an external database. This added cost of network latency, transport overhead, serialization and deserialization makes using GraphQL slow and inefficient compared to an equivalent API using hand-optimized database queries.

This can be mitigated by inspecting the GraphQL Abstract Syntax Tree to determine what fields will be accessed on the retrieved document. However, it doesn’t seem feasible to generate efficient database queries ad hoc, foregoing a lot of the optimizations otherwise possible with handwritten queries in databases.

Conclusion

Although there doesn’t seem to be any feasible way to translate GraphQL requests into database-specific queries (such as AQL), the impact of having a single GraphQL request result in a potentially large number of database requests is much less significant when implementing the GraphQL backend directly inside the database.

While RESTful HTTP APIs are certainly here to stay and GraphQL like any technology has its own trade-offs, the advantages of having a standardized yet flexible interface for accessing and manipulating an application’s global state graph are undeniable.

GraphQL is a promising fit for schema-free databases and dynamically typed languages. Instead of having to spread validation and authorization logic across different HTTP endpoints and native database format restrictions a GraphQL schema can describe these concerns. Thus guaranteeing that sensitive fields are not accidentally exposed and the data formats remain consistent across different queries.

We’re excited to see what the future will hold for GraphQL and encourage you to try out GraphQL in the database with ArangoDB 2.8 and Foxx today. Have a look at the demo-graphql from the Store. If you have built or are planning to build applications using GraphQL and ArangoDB, let us know in the comments.


ArangoDB Bi-Weekly #44 | ArangoDB 2.8.3 Maintenance Release & more

The year seems to continue quite well for ArangoDB. We are super excited about our community becoming bigger and bigger… we might hit the magic 2.000 stargazers mark next month! Image may be NSFW.
Clik here to view.
😉

You can now download the ArangoDB 2.8.3 maintenance release with several bug fixes and Foxx improvements. Our architect Max joined ThoughtWorks Cologne this week to show the participants indepth capabilities of ArangoDB. The slides are now available online.

ArangoDB Releases

The third maintenance release of ArangoDB 2.8 is available for download. After fixing the last issues we are happy to release the next version of ArangoDB with these improvements:

  • Deleting a Foxx service in the frontend should now always succeed even if the files no longer exist on the file system
  • Enhancements for AQL: added parentheses in AQL explain command output to correctly display precedence of logical and arithmetic operators

You can find a full list of changes in our change-log (2.8.3).

Articles and Presentations

Also an interesting read:

New and Updated Drivers

Projects and Integrations

Questions on StackOverflow

Answered:

Not answered yet:

Events

Did you know?

On the other note, we got a workshop request from our community in Odense (DEN) and were chuffed to pursue. On March 5th Michael from our core team will do a 6h hands-on workshop on how to create your own Foxx Apps. All seats were booked in a just few days!

We are thinking about continuing those free workshops around Europe. If you are interested to have one for your local community – drop Jana a line at jana@arangodb.com so that we get an impression of your needs and preferable topics (Multi-model, Foxx microservices or Cluster).

How to put ArangoDB to Spartan-Mode

Most of us saw the fantastic movie 300 (I did it last night…again) or at least read the comics. 300 spartans barely wearing anything but achieving a lot. This little how-to will show you how to put ArangoDB into Spartan-Mode and thereby reduce memory-footprint and CPU usage.

Big thanks to Conrad from L.A. for his time and for giving us the impulse for this little how-to!

Background

Recently, we had a lot of cool talks with users who are new to ArangoDB or doing their PoC at the moment. Two topics came up several times. First ArangoDB’s memory footprint which is higher compared to some other DBs. We stated this already in our latest performance benchmark. The second issue is the CPU usage in standby mode. Users claimed 4-6% CPU usage when ArangoDB isn’t really doing anything.

First of all we use Google’s V8 engine for our JS framework Foxx. At first glance it might seem that V8 is memory thirsty but most of the memory is virtual memory and therefore does not actually fill up your physical RAM at all. If you don’t need V8 (not using Foxx) then you can run ArangoDB with less V8 contexts which reduces the memory usage significantly.

Let’s start with reducing the memory footprint

On our little test system (Laptop, with x86_64 Ubuntu Linux) we installed a new ArangoDB instance. On this system we measured a memory-footprint of ~300 MB and 5.6 GB of reserved disk space (all the numbers might vary a bit on other systems) for an empty database. With ArangoDB in Spartan-Mode we reduced memory-footprint to ~110 MB and reserved disk space to under 1GB … and here is how:

We need to change the configuration of ArangoDB, which you can either do by editing the configuration file or by using command line arguments when you start ArangoDB from the console. The location of the configuration file on Unix (Linux, BSD) with the normal binary packages is /etc/arangodb/arangod.conf, on other systems or installation options the place varies, for more details see this chapter in the manual

You can reduce the number of V8 contexts by changing the configuration option javascript.v8-contexts to 1. For this, you would add the command line option

--javascript.v8-contexts 1

to the command with which you start ArangoDB on the console. In the configuration file, you would add

[javascript] v8-contexts=1

IMPORTANT NOTE: You can NOT set the number of contexts to 0 because ArangoDB uses them for large parts of the API as well as for example for the emergency shell and user generated functions.

The next step to Spartan-Mode is to reduce the memory needed for the Write Ahead Logs (WAL). This can be done using three different WAL options:

Reduce the number of historic WAL files which will reduce the memory usage when ArangoDB is in use. You can do this as follows (on the command line)

--wal.historic-logfiles 1

This is suitable if you don’t plan to use asynchronous replication.

Reduce the prepared WAL log files which are kept ready for future write operations

--wal.reserve-logfiles 1

This is suitable if you can live with a small delay when the write load suddenly increases.

In addition you can reduce the size of all WAL files to e.g. 8 MB by setting

--wal.logfile-size 8388608

With all these changes you should see a real decrease in memory used by ArangoDB.

So now that we reduced the memory footprint…

Let’s reduce CPU usage

With ArangoDB you’re able to reduce the CPU usage baseline as well. In our test we could put it to nearly 0 when no requests hit ArangoDB.

You can reduce the CPU usage in two steps.

You can get rid of the Foxx queues which allow Foxx apps to run regular tasks or background jobs. You can add the following command line option and you are done with step 1:

--server.foxx-queues false

NOTE: If you use this configuration option your Foxx apps cannot run background jobs and regularly repeating tasks.

In some cases you might want to turn off all statistics of ArangoDB (request statistics, resource usage, etc). If you turn them off you’ll further reduce CPU usage.

--server.disable-statistics true

By these small configurations you can reduce the resources ArangoDB needs and put ArangoDB into full Spartan-Mode.

Note: Putting ArangoDB into Spartan-Mode might be useful for testing, little projects with low data volume or if you run ArangoDB on small machines. We DON’T recommend Spartan-Mode for high performance needs!

If you have further questions on this topic just leave a comment or ask on Stack Overflow

Your Arangos

ArangoDB Bi-Weekly #45 | 2.8.4 Release, ArangoDB in Spartan-Mode & more

The fourth maintenance release of ArangoDB 2.8 is available for download. On the other note, one of our core developers, Michael Hackstein @mchacki , did a 6h long hands-on workshop for our community in Denmark, where participants learned how to build an “eBay-style” application using ArangoDB JavaScript Framework – Foxx.

Interested in having a workshop or meetup for your local community focussed on multi-model, cluster, graphs or Foxx? – Drop us a line! Image may be NSFW.
Clik here to view.
:)
Don’t forget to mention your favourite topics.

ArangoDB Releases

The ArangoDB 2.8.4 Maintenance Release comes with some bug-fixes and Foxx improvements. It is now available for download. You can find a full list of changes in our change-log (2.8.4).

Articles and Presentations

New and Updated Drivers

Projects and Integrations

Questions on Stack Overflow

Answered:

Not answered yet:

Events

Did you know?

New Recipe:

Jan tried it this week and wants to share the healthy spring news. Try it – it’s awesome for a nice vitamin dose.

Maintenance Release – ArangoDB 2.8.6

The ArangoDB 2.8.6 maintenance release comes with improved arangosh and some general bug fixes. You can download the latest version from our download page.

  • arangosh can now execute JavaScript script files that contain a shebang in the first line of the file. This allows executing script files directly. Provided there is a script file /path/to/script.js with the shebang #!arangosh --javascript.execute:

    cat /path/to/script.js #!arangosh --javascript.execute print("hello from script.js");

    If the script file is made executable

    chmod a+x /path/to/script.js

    it can be invoked on the shell directly and use arangosh for its execution:

    /path/to/script.js hello from script.js

    This did not work in previous versions of ArangoDB, as the whole script contents (including the shebang) were treated as JavaScript code. Now shebangs in script files will be ignored for all files passed to arangosh’s --javascript.execute parameter.

    The alternative way of executing a JavaScript file with arangosh still works:

    arangosh --javascript.execute /path/to/script.js hello from script.js

  • added missing reset of traversal state for nested traversals. The state of nested traversals (a traversal in an AQL query that was located in a repeatedly executed subquery or inside another FOR loop) was not reset properly, so that multiple invocations of the same nested traversal with different start vertices led to the nested traversal always using the start vertex provided on the first invocation.

  • fixed issue #1781: ArangoDB startup time increased tremendously

  • fixed issue #1783: SIGHUP should rotate the log

ArangoDB Bi-Weekly #46 | ArangoDB 2.8.6 Maintenance Release

Heading towards our 3.0 release we get to know more and more teams working on innovative stuff. Fraud detection, intellectual property management, business process management and much more. We learn so many things during those calls that we´d like to encourage even more of you to drop us a line about the things you are working on. What are the problems you want to solve and which features would help you do it? Drop a line to jan.stuecke@arangodb.com, we would be happy to learn more!

In the other news, we just released ArangoDB 2.8.6 and it’s available for download on our website. Our awesome community member, Mike Williamson from Ottawa, Canada, gave a talk at a local Graph user group. All of the participants enjoyed an interesting and interactive presentation. Have a look at the slides!

ArangoDB Releases

The ArangoDB 2.8.6 maintenance release is available for download. After fixing the last issues we are happy to release the next version of ArangoDB with these improvements:

  • arangosh can now execute JavaScript script files that contain a shebang in the first line of the file. This allows executing script files directly.
  • added missing reset of traversal state for nested traversals.

You can find a full list of changes in our change-log (2.8.6) or in the corresponding blog post.

Happy Easter from ArangoDB

Good luck with the Easter egg hunt! We hope that you find the Easter Bunny’s secret stash!

Image may be NSFW.
Clik here to view.
Happy-Easter

New and Updated Drivers

Projects and Integrations

Questions on Stack Overflow

Answered:

Not answered yet:

Events

  • April 13th, 2016 – Ottawa, Canada: Ottawa JavaScript Meetup Talk by our community member – Mike Williamson “Building on Graphs with ArangoDB”.

Did you know?

We had an awesome starter at our Belgium off-site last week. The whole ArangoDB team liked the yummy snack. You should try it, it was that good. You can find a quick and easy recipe here.

Viewing all 391 articles
Browse latest View live