Quantcast
Channel: ArangoDB
Viewing all 391 articles
Browse latest View live

Maintenance Release – ArangoDB 2.8.7

$
0
0

Our next maintenance release – ArangoDB 2.8.7 – comes with several bug fixes and improved Foxx backwards compatibility. Here is a list of changes:

  • optimized primary=>secondary failover

  • fix detection of TRUE for whole documents

  • expose the User-Agent http header since github requires it

  • work with http servers that only send

  • fixed potential race condition between compactor and collector threads

  • fix removal of temporary directories on arangosh exit

  • javadoc-style comments in Foxx services are no longer interpreted as Foxx comments outside of controller/script/exports files (#1748)

  • removed remaining references to class syntax for Foxx Model and Repository from the documentation

  • added a safe-guard for corrupted master-pointer

You can download the latest version of ArangoDB from our download page.


Index Free Adjacency or Hybrid Indexes for Graph Databases

$
0
0

Some graph database vendors propagandize index-free adjacency for the implementation of graph models. There has been some discussion on Wikipedia about what makes a database a graph database. These vendors tried to push the definition of index-free adjacency as foundation of graph databases, but were stopped by the community.

Index Free Adjacency or Hybrid Indexes for Graph Databases

Therefore a graph database remains “a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data” – independent of the way the data is stored internally. It’s really the model and the implemented algorithms that matter.

Example Graph with just small super-nodes

Example Graph with just small super-nodes

Still, the question remains if index-free adjacency as an implementation detail is a suitable concept for a graph database. The idea is that each node contains a list of pointers of its edges, therefore avoiding look-ups. In a distributed world, it’s clear that such a definition is meaningless, as the edges can live on different servers and there is no such thing as a “pointer” of an edge. In this case, all potential improvements discussed below are negligible in comparison to the communication latency, so much cleverer algorithms are needed in the distributed case.

So, let’s concentrate on the single server case and dive into the computer science behind it.

The key message of index-free adjacency is, that the complexity to traverse the whole graph is O(n), where n is the number of nodes. In contrast, using any index will have complexity O(n log n). While this sounds plausible at first, it is simply wrong. Using a novel index, which combines hashes with linked-list, it is possible to gain the same complexity O(n) when traversing the whole graph. However, index-free adjacency has some severe pitfalls. It has drawbacks when there exist super-nodes. Access can be done fast, but deleting or modifying edges of such nodes becomes a nightmare. If you have a typical social network, then celebrities will have millions of followers. If you hit such a node, modifying operations suddenly become too complex to handle without indexes.

Can one do better? Yes, by using our novel hybrid index, one gets the best of both worlds. By combining a hash-index with a linked-list we were able to achieve the same access speed as an index-free implementation, while being way more efficient when deleting or modifying, especially when dealing with edges of super-nodes. Let’s explore the complexity analysis behind this approach.

If you store the vertices at each node as list of direct pointers, then traversing all neighbors has complexity O(k), if a vertex has k edges. Note that this is the best possible complexity because O(k) is the size of the answer. Deleting a single edge also has the same complexity of O(k) (assuming a doubly linked list), which is much worse than optimal. Furthermore, usually one will want to be able to traverse edges in both directions, which makes it necessary to store direct pointers on both vertices that are incident with an edge. A consequence of this is that deleting a supernode is even worse: To remove all incident edges one has to visit every adjacent vertex – and perform a potentially expensive removal operation for each of them.

Using indexes naively will not remedy the problem, as the complexity of traversing the edges raises to O(log E) + O(k) where E is the total number of edges. The choice of a hybrid index between hash and linked-list, solves this dilemma. We store all edges in a big hash table, and at the same time we put for each vertex V all those edges that are incident with V into a doubly linked list. Details of the implementation can be found in our github repository.

This allows to proceed as follows: Finding the starting point for the linked list of a vertex is a single hash lookup by the vertex key, which is O(1) with a very good constant. If a vertex has k neighbors, traversing all its neighbors has complexity O(k) because of the linked-list aspect of the index, we simply have to run through the linked list once we have found the start in O(1), but O(1) + O(k) = O(k). For single-edge modifications and single-edge deletions we first look up by the vertex key, in case the edge is the first in the linked list of the vertex, and if this fails to find the edge, we do one further lookup to find the edge by its own key. Thus, we can find the edge in O(1) and then modify or delete it. This is only possible, because the edge index is both a big hash table for all edges and a linked list for each vertex’s neighbours. Overall, the complexity is now O(k) for neighbors – as good as theoretically possible – and O(1) for single-edge modifications – again the best possible result.

ArangoDB_Hybrid_Hash_Linked-List_Index

ArangoDB Hybrid Hash/Linked-List – Index

Because of the much better complexity for modifications, it is unnecessary to investigate the involved constants. For the traversal case it is worthwhile to do so, though. So, how large are the speedups promised by direct pointers – at least when no modifications of the graph are involved? If the graph is unrealistically small (and fits into the L2 cache), then it is plausible that pointers will be faster. However, the potential savings with a direct pointer approach is negligible, since the direct pointer also has to fetch at least one cache line. Even this small benefit is immediately lost again if one does not use C++ or a similar low-level language.

On the other hand, if you have a large graph that does not completely fit into memory, a drawback of index-free adjacency becomes apparent. If you look at the iconic graph query, namely a pattern matching in a large graph, an index might even be faster, because you do not need to look at these vertices at all. You only need the index itself, which is much smaller than the complete node data. Therefore it is much more memory and cache friendly, which is of most importance with nowadays’ technology.

Conclusion

By selecting the right index, it is possible to reach the same read complexity with superior write complexity.

In addition, using an index keeps the payload out of the main memory. This makes the algorithms much more memory and cache friendly.

If you are interested in performance of such a solution have a look here.

Open Source DC/OS: The modern way to run a distributed database

$
0
0

The mission of ArangoDB is to simplify the complexity of data work. ArangoDB is a distributed native multi-model NoSQL database that supports JSON documents, graphs and key-value pairs in one database engine with one query language. The cluster management is based on Apache Mesos, a battle-hardened technology. With the launch of DC/OS by a community of more than 50 companies all ArangoDB users can easily scale.

Just a little while ago setup, management, and maintenance of a database cluster was just a world of pain. Everybody who has put effort into getting automatic failover to work or who updated their database cluster know what I am talking about. Many of us may have experienced calls at 4 am in the morning notifying us that something within the cluster just went bad. Say hello to the Fail Whale.

Deploying #ArangoDB on #DCOS is as simple as typing: dcos package install arangodb

Now we have stepped across the edge to a new era in which open-source technology is all you need to run distributed applications at scale. With DC/OS you can now put your data center on autopilot, automatically scale to current needs and the best thing about it … this technology is backed by a huge community and by reputable enterprises. The whole team of ArangoDB is thrilled about Mesosphere going open-source with DC/OS!

DC/OS is the easiest way to run an ArangoDB cluster in production. Deploying ArangoDB is as easy as typing

dcos package install arangodb

or clicking on a button in the Mesosphere UI. If you need to edit parameters, you can either edit a JSON file or use the GUI.

DCOS

ArangoDB is a distributed and stateful application. There are many reasons for using distributed systems: scaling out for performance and/or data size reasons, and adding resilience and fault tolerance are probably the most common ones. Massive amounts of computing and storage power in the form of farms of commodity hardware have become relatively cheap and universally available. All this needs to be managed and having “cluster operating systems” is a logical consequence.

When we first started to look for a new base of our cluster management some months ago, we wanted to realize two ideas. First and foremost we needed a really simple way for our users to deploy, manage and maintain an ArangoDB cluster. Second, we strongly believe that it does not make sense if a database implements its own low level cluster management.

The following and similar questions should not be the business of the distributed database:

  • Has a machine or a task been lost?
  • Which machine or task has been replaced automatically by another one?
  • Where and with which resources do they run?

Rather, these details should be handled automatically by the infrastructure. Other issues are genuine duties of the database, like distributing the execution of a query across the cluster, handling replication of data, deciding when a transaction commit was successful and the like. In our terminology we call this “high level cluster management”.

In summary, we need to have an infrastructure that makes it easy to administrate a database cluster and that provides the low level cluster management. Furthermore, it needs to be a solution that has demonstrated that it works on enterprise level. The latter is crucial since it is a large leap of faith to rely on another system for the low level cluster management. We see the future of distributed computing in systems like DC/OS. We view Apache Mesos with its persistent primitives as a solid basis for our first version of a modern cluster management for databases. DC/OS extends this basis by a multitude of useful capabilities and thus provides a solid enterprise level foundation.

Teaming up with Mesosphere was an exciting and valuable time for us because we could leverage their broad and indepth experience with handling large and distributed environments which they gained at Twitter and AirBnB. The team of ArangoDB worked hard and became the first fully certified operational database for DC/OS including the persistent primitives and is by now the only multi-model database available for the DC/OS environment.

As we are open source ourselves, the team of ArangoDB is more than happy to see Mesosphere going open source as well and contribute even more to the community.

For users of ArangoDB this means that they just got enabled to:

  • Simple deployment of ArangoDB to clusters, with command line installation and easy horizontal scaling of ArangoDB for users.
  • Having a straightforward path to making ArangoDB available on all major cloud distros that DC/OS supports (AWS, Azure, Google Cloud Platform) without relying on closed or proprietary and expensive software.
  • Use the persistent volume primitives of Mesos that solve the problem of persisting data in clusters, and provide predictable restoring of data across failures.
  • Leverage service discovery for connecting distributed applications with the distributed ArangoDB cluster instances.

ArangoDB Bi-Weekly #48 |Alpha Release & What’s Coming With ArangoDB 3.0

$
0
0

The whole team at ArangoDB has been hacking “day-and-night” and the alpha version of the upcoming ArangoDB 3.0 release is available for testing! All our tests (290.000 lines of code) are green so it’s worth giving it a spin. We would really appreciate your feedback e.g. via our #feedback30 channel on Slack.

In the other news, our CTO Dr. Frank Celler attended the great Percona Live conference in Santa Clara and presented the latest developments of ArangoDB alongside many other database experts and big names.

ArangoDB Releases

The alpha version of the upcoming ArangoDB 3.0 release is available for testing. You can download a technical preview here. Please note, that this is an alpha version; it is not for production, but for testing only. To upgrade ArangoDB for testing, please refer to the Upgrading Changes log.

If you are running tests with the alpha version, we would love to hear from you in our Slack Community channel -> #feedback30

Also, have a look at a quick preview of what new features and improvements are coming with ArangoDB 3.0.

Download Technical Preview

Articles and Presentations

New and Updated Drivers

Projects and Integrations

Questions on Stack Overflow

Answered:

Not answered yet:

Events

  • June 01 – 02, 2016 – Denver, CO: MesosCon NA One of our core developers Max Neunhöffer (@neunhoef ) will talk about Persistence Primitives in Action 2.0 together with Jörg Schad & Neil Conway from Mesosphere.

Did you know?

We are great fans of avocados at ArangoDB :) Here is an awesome recipe we recently tried out. It is tasty and fresh – perfect for a warm spring morning.

And also, big thanks and loads of love to our fast-growing open-source community for supporting us on GitHub!

Getting closer: ArangoDB 3.0 alpha release

$
0
0

There is this German saying “If it takes long enough, it will be all right in the end.” However, since just “all right” isn’t our quality standard this first alpha of 3.0 took us a bit longer to finish up than planned. We´d like to invite you to give this fully tested alpha a serious spin, test the new functionalities and share your thoughts and feedback with us on Slack in our “feedback30” channel

Within this short release note you´ll find 1) a quick overview of the most important changes; 2) an instruction on how to get the new version and 3) how to get your (test) data from your 2.x version into the 3.0 alpha which has our new binary storage format VelocyPack implemented.

For those who haven´t read about VelocyPack yet: We successfully did a kind of open heart surgery on the ArangoDB core by implementing our very own binary storage format VelocyPack. VelocyPack is even more compact than MessagePack and is very promising to further improve ArangoDBs query response time and memory usage. Before VelocyPack we used two formats (ShapeJSON, tri_json_t) internally which led to a lot of duplicate code. We tested existing formats but didn´t find one that met our needs. Now that we only have one storage format we can simplify and, hopefully, speedup our development cycle significantly.

We think that our alpha version is definitely worth testing. 3.0 alpha consists of 173.000 lines of code: not to mention the 290.000 lines of testcode… all our tests are green and even our performance tests are running smoothly (benchmark results are looking good, we´ll post a new performance benchmark blogpost with the 3.0 release).

What’s new in 3.0 alpha:

Under the hood

  • We did an open heart surgery on the ArangoDB core and implemented our own binary storage format VelocyPack (VPack) for storing documents, query results and temporarily computed values.
  • _from and _to attributes of edges are now updatable and usable in indexes
  • ArangoDB 3.0 now has reduced memory allocation for queries and a significant speedup in query response times
  • Unified API for CRUD Operations

Cluster Management

  • Master/Master setup: Clients can talk to any node. No dedicated master.
  • Shared nothing architecture. No single point of failure.
  • Synchronous Replication: We have implemented synchronous replication with automatic failover. The number of replicas for each shard can be configured for each collection separately. This improves reliability in the distributed setup considerably.
  • Self Organizing Cluster State Management: We developed our own cluster management agency which organizes itself. Cluster setup and maintenance got much easier with 3.0
  • Preparations for Jepsen-Test: We got rid of ETCD which turned out to be too slow for our goals and implemented RAFT with transactions and callbacks instead. With these improvements we can take the challenge of the Jepsen-Test later this year

AQL

AQL now uses VelocyPack internally for storing intermediate values. For many value types it can now get away without extra memory allocations and less internal conversions. Values can be passed into internal AQL functions without copying them. This leads to reduced query execution times for queries that use C++-based AQL functions.

Foxx Updates

  • Legacy Mode for 2.8 services
  • Dedicated Foxx Documentation with how-tos and examples
  • No more global variables and magical comments
  • Repository and Model have been removed: Instead of repositories use ArangoDB collections directly
  • Controllers have been replaced with nestable routers
  • Routers can be named and reversed: No more memorizing URLs
  • Simpler express-like middleware

Complete overhaul and simplification of graph functions

We have learned a lot over the past two years. Now, we’re seizing an opportunity with this new major release to simplify ArangoDB. Our philosophy is that every minor release is downward compatible. That’s why we utilize major releases for in depth changes if needed.

During all the 2.x releases you have seen that we had many updates and improvements on our graph capabilities. In this period we wanted to stay as backwards compatible as possible. Therefore, the amount of graph functions drastically increased and using them got more and more complicated. We now streamlined the graph functions and created a more flexible and faster API, which is feature-rich, while easy to learn. The graph functions are now implemented in C++ and allow better index utilization.

With the release of 3.0 we want to take the chance to unify and simplify the overall graph API in AQL. We took the radical decision to integrate all graph features natively into the language and get rid of all the functions that we were using until now.This nativeness of the features enables us to optimize all graph features for instance by using more efficient indexes, improving the overall query execution ordering, early filtering and so on. Furthermore, all graph features are now more flexible (you can execute ANY AQL based filter, not only fixed examples).

In summary, what are your gains:

  • Better performance
  • Less context switches
  • More automatic optimization
  • Use better suited indices
  • In-detail explain output of graph features
  • More flexible filters
  • Easier to use
  • No guessing which specialized function performs best in your case
  • No confusion between GRAPH_-prefixed and edge collection functions

We also simplified a lot of minor details. For example, we removed the double negation in the startup parameters. E. g., instead of --server.disable-authentication false you now write --server.authentication true. Much easier to read, isn’t it? We will document als these improvements and changes in the next days. The docker container is ready-to-run. All these changes have been incorporated there.

We will also make some further simplifications to the user manager and cluster start. Stay tuned.

How to get ArangoDB 3.0 alpha

Please note that this alpha is NOT for production use! It’s suitable for testing only!

If you want to try out the new 3.0 alpha, you can start a docker container. This will not disturb any existing ArangoDB 2.x installation. Just use

docker run -it -e ARANGO_NO_AUTH=1 -p 9000:8529 arangodb/arangodb-preview

You can now point your browser to port 9000 and play with ArangoDB 3. The login is “root” with empty password. If you are using docker machine or something similar, you need to use the IP of the DOCKER_HOST.

If want to use the ArangoDB shell, you can use a second instance of the docker container.

docker run -it arangodb/arangodb-preview /usr/bin/arangosh --server.endpoint tcp://192.168.99.100:9000

The 192.168.99.100 should be the IP address of the DOCKER_HOST.

Getting Test data into v3.0 alpha

Because the storage engine and the datafile format have changed considerably, you have to dump and restore your data of your current ArangoDB version.

Assume, you have a running 2.8. You can export your data using

arangodump --output-directory example

This will create a dump in the directory “example”. In order to import the data into ArangoDB 3 use:

tar cf - example | docker run -i arangodb/arangodb-preview /bin/bash -c '/bin/tar xf - ; /usr/bin/arangorestore --server.endpoint tcp://192.168.99.100:9000 example'

Again replace the 192.168.99.100 with the IP address of the DOCKER_HOST.

Now you’re good to go! We’re excited about your feedback and first impressions to help us put the final touch on 3.0!

If you have any feedback we would be happy to hear your thoughts on Slack in our “feedback30” channel

ArangoDB 3.0 alpha5: Step-by-step instruction to setup an ArangoDB test cluster on DC/OS

$
0
0

As we get closer to the final release of ArangoDB 3.0 you can now test our brand new cluster capabilities with 3.0 alpha5.

In this post you will find a step-by-step instruction on how-to get a 3.0 alpha3 cluster up and running on Mesosphere DC/OS for testing. Any feedback is well appreciated. Please use our Slack Channel “feedback30” for your hints, ideas and all other wisdom you´d like to share with us.

Please note that we are well aware of the “clunkyness” of the current cluster setup in 3.0alpha5. The final release of 3.0 will make it much easier.

Launching the DC/OS cluster on AWS

Go to this page and choose a cloud formation template in a region of your choice. It does not matter whether you use “Single Master” or “HA: Three Master”. Once you click you will be taken to the AWS console login and then be presented with various choices. Do not bother with public slaves, just select a number of private slaves.

This will take approximately 15 minutes and in the end you have a Mesos Master instance and some Mesos Agents.

In the AWS console on the “CloudFormation” page under the tab “Outputs” you find the public DNS address of the service router, something like

ArangoDB3-ElasticL-A6TTKUWKO1AO-1880801302.eu-west-1.elb.amazonaws.com

You can now point your browser to the above DNS address of the service router to see the DC/OS web UI.

Aquire the public IP of your cluster from the UI (printed on the top left in the DC/OS UI below the clusters name).

You will need this to setup an ssh tunnel to access your ArangoDB coordinators. We are hard at work to fix this nuisance by integrating a reverse proxy into our Mesos framework scheduler.

Setting up an sshuttle proxy to access servers

Install sshuttle from its github repository (https://github.com/sshuttle/sshuttle), you need a reasonably new version for this to work with DC/OS.

Make sure that the PEM file associated with the DC/OS cluster is loaded into ssh:

ssh-add /home/<USER>/.ssh/<AWS-PEM-FILE>.pem

Then issue:

sshuttle --python /opt/mesosphere/bin/python -r core@<PUBLIC-IP> 10.0.0.0/8

(replacing the machine address with the one of your Mesos master, see above). Do not forget the core@ prefix. You will need the private ssh key which you used to create the DC/OS cluster.

This forwards all requests to local IP addresses in the range 10.0.0.0/8 over an ssh connection into your DC/OS cluster. It is necessary to conveniently reach Marathon with the curl utility, to inspect the sandboxes of your ArangoDB tasks and to access the web UIs of your ArangoDB coordinators.

Launching an ArangoDB3 cluster on DC/OS

Just use Marathon and the following JSON configuration arangodb3.json:

{
  "id": "arangodb",
  "cpus": 0.125,
  "framework-name": "arangodb",
  "mem": 4196.0,
  "ports": [0, 0],
  "instances": 1,
  "args": [
    "framework",
    "--framework_name=arangodb",
    "--master=zk://master.mesos:2181/mesos",
    "--zk=zk://master.mesos:2181/arangodb",
    "--user=",
    "--principal=princ",
    "--role=arangodb",
    "--mode=cluster",
    "--async_replication=false",
    "--minimal_resources_agent=mem(*):512;cpus(*):0.125;disk(*):512",
    "--minimal_resources_dbserver=mem(*):4096;cpus(*):1;disk(*):4096",
    "--minimal_resources_secondary=mem(*):4096;cpus(*):1;disk(*):4096",
    "--minimal_resources_coordinator=mem(*):4096;cpus(*):1;disk(*):4096",
    "--nr_agents=1",
    "--nr_dbservers=5",
    "--nr_coordinators=5",
    "--failover_timeout=30",
    "--secondaries_with_dbservers=true",
    "--coordinators_with_dbservers=true",
    "--arangodb_privileged_image=false",
    "--arangodb_image=arangodb/arangodb-preview-mesos:3.0.0a5"
  ],
  "env": {
    "ARANGODB_WEBUI_HOST": "",
    "ARANGODB_WEBUI_PORT": "0",
    "MESOS_AUTHENTICATE": "",
    "ARANGODB_SECRET": ""
  },
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "arangodb/arangodb-preview-mesos-framework:3.0.0a4",
      "forcePullImage": true,
      "network": "HOST"
    }
  },
  "healthChecks": [
    {
      "protocol": "HTTP",
      "path": "/v1/health.json",
      "gracePeriodSeconds": 3,
      "intervalSeconds": 10,
      "portIndex": 0,
      "timeoutSeconds": 10,
      "maxConsecutiveFailures": 0
    }
  ]
}

Edit the following fields to your needs:

- `--nr_dbservers=5` to change the number of database servers, you should
  use at most as many as your DC/OS cluster has nodes
- `--nr_coordinators=5` to change the number of coordinators, you should
  use at most as many as dbservers, our standard configuration is to
  use the same number.

Make sure that your instances are large enough to satisfy the resource requirements specified in the --minimal-resources* options.

Do not change the name of the Docker images.

Aquire the internal IP address of the node running marathon by clicking on

System => Marathon

in the DC/OS UI. We will use 10.0.7.72 in this example.

curl -X POST -H "Content-Type: application/json" http://10.0.7.72:8080/v2/apps -d @arangodb3.json --dump - && echo

You should then see the ArangoDB cluster start up. It should appear as a Marathon service offering a web UI. From there you should be able to reach your coordinators, once the cluster is healthy. Furthermore you should see tasks on the Mesos console (use

http://ArangoDB3-ElasticL-A6TTKUWKO1AO-1880801302.eu-west-1.elb.amazonaws.com/mesos

with your service router URL. Your cluster is ready.

Shutting down the ArangoDB3 cluster

This needs a bit of care to make sure that all resources used by ArangoDB are properly freed. Therefore you have to find out the endpoint of ArangoDB’s web UI. This can be found by opening Services => marathon. On the following page you will find an application called “arangodb”. On the bottom of it you will find two links containing links to the two ports of the arangodb framework. The first one is pointing to the web ui of the framework scheduler and to properly clean up any state in our mesos cluster we will need this one (first IP) and the marathon IP (second IP):

curl -X POST -H "Content-Type: application/json" http://10.0.0.253:15091/v1/destroy.json -d '{}' ; curl -X DELETE http://10.0.7.72:8080/v2/apps/arangodb

Please perform both commands in one command line to make sure that the ArangoDB service is deinstalled in Marathon as soon as it is properly shut down.

ArangoDB Bi-Weekly #49 | Beta Version Is Here!

$
0
0

Some “sleepless” nights and a few beers later, the beta version of the upcoming ArangoDB 3.0 is here! Hurray! It arrives with several bug fixes, improved docs and a more complete set of features. So you can get a better feel of the awesomeness soon to come :)

Feedback welcome -> feedback30 channel on Slack

Our CEO Claudius will do his regular US west-coast round-trip to visit our customers, attend MesosCon (Denver) and DockerCon (Seattle). He will enjoy the sun in San Francisco from 6th to 14th June. If you’d like to meet Claudius in person to discuss your project or learn about our newest ideas – contact Jan who is coordinating all meeting requests (jan.stuecke@arangodb.com).

ArangoDB Releases

Great news this week – we are a step closer to ArangoDB 3.0! The beta version is available for testing. Please note it is not for production, but for testing purposes only. You can download a technical preview here. ArangoDB 3.0 beta comes with several bug fixes, improved documentation and a more complete set of features. The documentation can be found here.

Let us know what you think Slack Community channel -> #feedback30

Articles and Presentations

New and Updated Drivers

Projects and Integrations

Questions on Stack Overflow

Answered:

Not answered yet:

Events

  • June 01 – 02, 2016 – Denver, CO: MesosCon NA – our architect Max Neunhöffer (@neunhoef) will talk about Persistence Primitives in Action 2.0 together with Jörg Schad & Neil Conway from Mesosphere.
  • June 30, 2016 – Kassel, DE: Java User Group Hessen – one of our core devs Michael Hackstein (@mchacki) will talk about Polyglot Persistence & Multi-Model NoSQL Databases.

Did you know?

We have lots of fun @ ArangoDB. Not a day goes by without some silly geekery. This week we celebrated The Towel day which “shocked” a few people on the street. :) #DontPanicButRather use ArangoDB!

Oh and also, we’ve hosted two great events in our HQ in Cologne: Kubernetes User Group and Cloud and AWS Meetup where Frank also did a talk on Mesosphere’s DC/OS and ArangoDB.

ArangoDB 3.0 new Cluster features

$
0
0

The 3.0 release of ArangoDB will introduce a completely overhauled cluster and marks a major milestone on its road to “zero-maintenance” where you can keep focus on your product instead of your datacenter.

Synchronous replication

Earlier releases of ArangoDB already featured asynchronous replication. This was already a great method to do backups and allowed for failover in case of a disaster. However that was mostly a manual job and furthermore – due to its asynchronous nature – data loss could happen.

With the release of ArangoDB 3.0 the cluster will feature synchronous replication. When creating collections you can now specify a replication factor. The cluster will then figure out a proper leader and some followers from its dbserver pool. After creating such a replicated collection writes to that collection will then be synchronously distributed across your cluster.

From that moment on your collection is accsessible in a fail-safe manner (unless a serious bomb is taking your datacenter away). Should the leader of the collection bail out for whatever reason the cluster will automatically find a solution and failover to a follower without any data loss.

Asynchronous replication

Why would you need asynchronous replication then anyway?

One word: Backups!

Disorders during operation will cause the cluster to reorganize itself (which might result in a short timed slower performance). However asynchronous replicas (we call them Secondaries) are simple, silent followers. Should there be any disorders with any secondary server the cluster will not care at all. If the secondary is coming back at some point it will slowly catch up.

This makes them perfectly suitable for doing backups (apart from bombs there might also be floods, fire etc.):

Start a secondary, let it catch up, shut it down, backup, repeat.

Grow your cluster as your data grows

With ArangoDB 3.0 scaling your cluster is dead simple. Start a new primary dbserver for example and it will be included into your cluster straight away and used for new data shards out of the box. This is sufficient for scaling slowly but surely but of course you want to be ready when the moment comes and your product suddenly supercharges and you are facing 20 times the traffic and load. In that case you need more computing power on the existing shards. 3.0 introduces zero downtime shard rebalancing. Add a few servers, rebalance and watch the load go away.

 

cluster topology

 

Cluster setup greatly simplified

The cluster setup in ArangoDB 3.0 has been greatly simplified. We were getting a lot of questions regarding how to setup a cluster in different environments and with earlier releases it was very hard to deploy a cluster in a different environment than on our supported cluster environments. You should now finally be able to set it up on whatever platform you like as it is really dead simple.

The missing puzzle piece: DC/OS

Nevertheless we continue to pursue a perfectly smooth DC/OS experience and our integration continues to be the model of a good cluster integration. When launched on a DC/OS cluster ArangoDB will automatically offer up and downscaling with just a click.

Furthermore DC/OS will handle task supervision and our integration will handle automatic failover, task rescheduling and so forth so you don’t have to care about your datacenter.

Striving for zero-maintenance

To sum it up the revamped cluster mode offers a near-zero-maintenance operation while still promising cutting edge scaling capabilities. A simple failure scenario:

1. A DBServer dies
2. ArangoDB’s internal cluster management will quickly discover it and fail over to the followers of all synchronously replicated
3. DC/OS will discover that the cluster setup is missing a DBServer
4. DC/OS will reschedule the start of a new DBServer
5. The new DBServer will announce itself to the ArangoDB cluster and integrate itself

No user interaction whatsoever was necessary and the cluster was accessible and useable at any time. Only logfiles will reveal that anything happended at all.

ArangoDB’s cluster mode strives to be much more than a simple master-master replication and offers a self-organizing and self-healing cluster management. A true happy-place for developers and operators alike.


ArangoDB 3.0 – A Solid Ground to Scale

$
0
0

After 6 months of development we are happy and excited to announce the fully production ready ArangoDB 3.0 today! Get ArangoDB 3.0 now!

We designed ArangoDB as a native multi-model DB from the first line of code. By providing three major NoSQL data models in one technology the ArangoDB team wants to fulfill its mission to simplify data work. With ArangoDB 3 we believe that our users will come more than one step closer to a dramatically simpler way to create their applications.

The current constraint isn’t hardware anymore. Computing power is spinned with a few clicks. The current constraint is software. Software is written and operated by people. It’s now all about making people as informed, enabled and efficient as possible. The team behind ArangoDB wants to contribute to this new era. By providing a complete overhauled cluster, streamlined graph functionalities, persistent indexes, a new binary storage format and simplified extensibility via our Foxx framework. With these upgrades we created a solid ground to scale for our community.

We want to thank all of you who tested the alpha, beta and release candidates! Your feedback and contributions made 3.0 the best ArangoDB ever! Thank you so much!

 

Main improvements in ArangoDB 3.0

ArangoDB 3.0 comes with largely improved cluster capabilities. You can now build self-organizing clusters which are managed by ArangoDB’s new agency. Multi-Master setups, synchronous replication and the new shared-nothing architecture with auto-failover just made deployment and maintenance of clusters very easy. Read more about our cluster improvements .

All ArangoDB graph functions are now natively and deeply integrated into AQL which makes the usage of our graph capabilities way easier. All graph features are now much more flexible and you can use any AQL based filter. We didn’t just achieve better performance with it but streamlined the whole process of working with graphs.

Persistent indexes via RocksDB is the first step of ArangoDB to persist indexes in general. With the new persistent index you get the same functionality as our skiplist-index provides thus, it is perfectly suitable for sorting and range-queries. During our 3.x development phase we will provide many other types of persistent indexes.

ArangoDB’s new binary storage format VelocyPack has been implemented. With VelocyPack ArangoDB is now able to store your data in a very compact way, reduce the overall memory-usage and directly access subvalues without decoding the whole document. With many other improvements we managed to increase ArangoDBs performance up to 5x while reducing memory usage by 20%. A detailed performance benchmark for 3.0 will follow soon!

If you liked Foxx before you will definitely fall in love with Foxx 3.0 – our JS framework for data-centric microservices. You can now use ArangoDB collections directly. No more need for repositories. Controllers have been replaced with nestable routers. We cleaned up Foxx and added a dedicated and  detailed documentation for Foxx for it. If you already use Express for Node.js you will directly feel at home with the redesigned express-like middleware.

Read a full list of the all upgrades in our alpha release note.

ArangoDB is used by teams around the globe. Matured companies like Demonware, Liaison or ICmanage leverage the power of our multi-model database and striving startups like FlightStats, Fedger.io or AboutYou love the freedom to adapt quickly to customer needs and prototype their ideas real fast. We are so happy to see a widespread adoption of ArangoDB across different company sizes, markets and use cases.

Get started with ArangoDB 3.0 today and use one of our 10 minutes tutorials!

Tutorial for Java
Tutorial for PHP
Tutorial for NodeJS

Our core team would love to hear your feedback on ArangoDB 3.0

Have fun playing around with 3.0!

 

Running ArangoDB 3.0.0 on a DC/OS cluster

$
0
0

As you surely recognized we´ve released ArangoDB 3.0 a few days ago. It comes with great cluster improvements like synchronous replication, automatic failover, easy up- and downscaling via the graphical user interface and with lots of other improvements. Furthermore, ArangoDB 3 is even better integrated with Apache Mesos and DC/OS.

Currently, the new release is undergoing the certification procedure to get the same “fully-certified” status from our friends at Mesosphere as v2.8 has. This may take a few more days.

But we have good news for all the passionate people out there who don´t want to wait…  You can fire up an ArangoDB 3 cluster today on your DC/OS cluster. In this blog post, we describe how this is done.

Deploying ArangoDB 3

We assume that you have installed a DC/OS cluster using the tools and instructions here. You will probably have chosen OAuth authentification for you DC/OS cluster and have logged in using your github account. Furthermore, you should install the dcos command line interface by clicking on “Install CLI” in the menu in the lower left corner of your DC/OS dashboard, and authenticate dcos by doing

--dcos auth login

Once this is all done, your are all set and can simply deploy an ArangoDB cluster to your DC/OS cluster with the following command:

--dcos marathon app add arangodb3.json

where the file arangodb3.json is the following:

--{
"id": "arangodb3",
"framework-name": "arangodb3",
"cpus": 0.125,
"mem": 1024.0,
"ports": [0, 0, 0],
"instances": 1,
"args": [
"framework",
"--framework_name=arangodb3",
"--master=zk://master.mesos:2181/mesos",
"--zk=zk://master.mesos:2181/arangodb3",
"--user=",
"--principal=arangodb3",
"--role=arangodb3",
"--mode=cluster",
"--async_replication=false",
"--minimal_resources_agent=mem(*):512;cpus(*):0.125;disk(*):512",
"--minimal_resources_dbserver=mem(*):4096;cpus(*):1;disk(*):4096",
"--minimal_resources_secondary=mem(*):4096;cpus(*):1;disk(*):4096",
"--minimal_resources_coordinator=mem(*):4096;cpus(*):1;disk(*):4096",
"--nr_agents=1",
"--nr_dbservers=3",
"--nr_coordinators=3",
"--failover_timeout=30",
"--secondaries_with_dbservers=false",
"--coordinators_with_dbservers=false",
"--arangodb_privileged_image=true",
"--arangodb_image=arangodb/arangodb-mesos:3.0.0"
],
"env": {
"ARANGODB_WEBUI_HOST": "",
"ARANGODB_WEBUI_PORT": "0",
"MESOS_AUTHENTICATE": "",
"ARANGODB_SECRET": ""
},
"container": {
"type": "DOCKER",
"docker": {
"image": "arangodb/arangodb-mesos-framework:3.0.0",
"forcePullImage": true,
"network": "HOST"
}
},
"healthChecks": [
{
"protocol": "HTTP",
"path": "/framework/v1/health.json",
"gracePeriodSeconds": 3,
"intervalSeconds": 10,
"portIndex": 0,
"timeoutSeconds": 10,
"maxConsecutiveFailures": 0
}
]
}

In this file, you might want to edit the minimal resources as well as the initial number of coordinators and DBservers. You can scale these numbers up and down after the launch via the cluster user interface.

A few minutes later you should see a new service arangodb3 in your DC/OS dashboard and if you open that service, you will automatically be taken to the ArangoDB 3 cluster frontend.

Alternatively, you can use Marathon’s web UI, switch to JSON mode (upper right corner), and use the same JSON file.

Getting rid of the ArangoDB 3 cluster again

Shutting down the cluster is not yet as easy as we wished for, but we are working hard to improve this experience.

The easiest way is simply to destroy the DC/OS cluster as a whole.

However, you can keep your DC/OS cluster, by doing:

    Destroy the app in Marathon, either via the user interface or by doing

--dcos marathon app remove arangodb3

    Run our cleanup tool by doing

--dcos marathon app add cleanup.json

using the file

--{
"id": "cleanup",
"cpus": 1,
"mem": 4196.0,
"ports": [],
"instances": 1,
"args": [
"/arangodb-cleanup-framework",
"--name=arangodb3",
"--master=zk://master.mesos:2181/mesos",
"--zk=zk://master.mesos:2181/arangodb3",
"--principal=arangodb3",
"--role=arangodb3"
],
"env": {
},
"container": {
"type": "DOCKER",
"docker": {
"image": "m0ppers/arangodb-cleanup-framework",
"forcePullImage": true,
"network": "HOST"
}
}
}

This will properly clean out the persistent state in zookeeper, destroy all persistent volumes and unreserve all resources for arangodb3.

After a minute or so, simply destroy the cleanup app in Marathon or do

--dcos marathon app remove cleanup

Have fun testing and playing around with ArangoDB 3.0 on DC/OS!

We´ll keep you posted as soon as ArangoDB 3.0 has completed the certification process via Twitter and our Slack Channel.

Deploying an ArangoDB 3 Cluster with 2 Clicks

$
0
0

Hurray! Last week finally saw the release of ArangoDB 3.0 with lots of new features and in particular various improvements for ArangoDB clusters. In this blog post, I want to talk about one aspect of this, which is deployment.

DC/OS

As of last Wednesday, deploying an ArangoDB 3.0 cluster on DC/OS has become even simpler, because the new version of our framework scheduler has been accepted to the DC/OS Universe. Therefore, deployment is literally only two clicks:

Choose ArangoDB3 in the DC/OS Universe 

DCOSArangoDBInstall1

Hit “Install Package” and you are done!

DCOSArangoDBInstall2

More details about this process can be found in the manual. On the same page one can learn how to deploy ArangoDB on an Apache Mesos cluster using Marathon. This is a tiny bit more effort because DC/OS makes life for the user a lot simpler.

In both cases you get the additional benefit that permanently failing instances are automatically replaced, and that one can easily scale the ArangoDB cluster up and down via the web UI.

Local tests on your laptop

If you only want to try out an ArangoDB cluster on your laptop, we have you covered in the manual, too. We explain how to use a simple bash script in the source repository and explain in detail, which processes with which command line options it fires up to create an ArangoDB cluster within seconds.

Although a number of processes are involved, Version 3.0 of ArangoDB makes this exercise particularly simple because all that needs doing is to start up a process for the “Agency”, which keeps the cluster configuration and metadata, and to point all other servers to the network endpoint of the Agency. For high availability the one Agency process is replaced by three Agency processes tied together with our new implementation of the Raft consensus protocol.

More details can be found in this bash script which does everything for you once a standard ArangoDB binary package is installed, just run

./startArangoDBClusterLocal.sh 3 2 1

to start a cluster with 3 agents, 2 DBservers and 1 coordinator on your laptop, for exapmle.

Manual deployment of ArangoDB clusters

Even if you do not want to use any orchestration framework and still want to deploy an ArangoDB 3.0 cluster on a bunch of machines using ssh, you can just follow the advice in the manual to start the necessary processes on multiple machines. The information in this chapter will also enable you to run an ArangoDB cluster on any orchestration framework you want to choose, even if we do not yet support it out of the box.

Docker

You are a Docker fan? Just read the manual to learn how all these nice things are achieved by just using the standard ArangoDB Docker image.

Conclusion

Aren’t these interesting times? Deploying a distributed database, which has traditionally been a huge headache, has never been easier and scaling it up and down is actually fun!

Learn About The Startup Accelerator Program

$
0
0

Blog Post Image

After giving the idea some thought we decided to launch a brand new ArangoDB service for startups. First, we need to explain how we decided to go for the new service. As fellow startupers we can only sympathize with ambitious and innovative new business founders and their teammates. The whole journey from getting ideas across to growing the number of customers is challenging as it is. Making sure that a product/market fit is found as fast as possible makes every day in business towards the finish line even more resource draining. Oh, another mountain to climb. Correction. What an important milestone to achieve in business. Try getting high value and long-term goals achieved while juggling through brainstorming and testing ideas, talking to investors, managing daily operations, monthly encounters with the authorities and so much more.

Here is what others think about the Startup Accelerator Program:

“Awesome startup program! You can discuss anything with those guys and you’ll always get sophisticated answers. Thanks ArangoDB team for all your support!”

Benedikt Knobloch, Founder fedger.io

“Great support. With this program we saved 50% of development support.”

Florian Krause, Head of Development ABOUT YOU

 

If you work in a startup you should know that fast product development and getting it to the market fast is crucial. You should also know that it’s easier said than done especially, when you have limited resources. We understand and share your pain and concern when it comes to bringing your product to market, the result of your hard labor, at a low cost. There goes your explanation. One sentence to explain why we started the Startup Accelerator Program.

What will startups get from the Startup Accelerator Program?

  • Discounted professional developer support
  • Reduced time to market & faster product development
  • Quicker response time to critical issues
  • More time to spend on other company efforts

Apply Now!

 

When we were designing the Program we set out several objectives for it to bring our clients. Startups will get discounted professional developer support from our experts in the ArangoDB team. Professional support will reduce product development time which, in its turn, will help get the product to market faster. The quicker response time to critical issues will also help devote some attention to other company efforts. Find out more about how other ArangoDB users leveraged the features of our multi-model database and how ArangoDB has been used in production.

Still not sure why we want to help startups? ArangoDB was a tiny startup as well just a few years ago. We didn’t have an established product. Just ideas and goals. We’ve been through what young companies struggle with – being in need of professional support but lacking resources. Think of the Startup Accelerator Program as helping your neighbor with Math homework.

Release Candidate 2 of ArangoDB 3.1

$
0
0

We are glad to announce that the second release candidate (RC2) of ArangoDB 3.1 is publicly available. What makes this release particularly special to us is that it also includes an official release candidate of our new Enterprise Edition with a few extra add-ons up its sleeve. The upcoming ArangoDB 3.1 will be a significant release taking effort from the solid base built with ArangoDB 3.0 which introduced our binary storage format VelocyPack, the ArangoDB Agency for a self-managing cluster architecture and the first persistent index based on Facebook’s RocksDB.

RC2 of ArangoDB 3.1 is available for download and evaluation: Community Edition & Enterprise Edition. The documentation for the ArangoDB 3.1 RC 2 can be found here.

Here is a short overview of the new features and changes.

General upgrades:

  • New boost-ASIO based server infrastructure
  • Overhauling of the ArangoDB query optimizer
  • Improved internal abstraction for storage-engines as a preparation for MVCC and pluggable storage-engines
  • VelocyPack over HTTP: Use our binary storage format VelocyPack over HTTP
  • VelocyStream: Our new binary protocol

The Enterprise Edition features:

  • SmartGraphs: This is “the” big new feature of our Enterprise Edition. It enables you to shard really huge graphs in a cluster and achieve close to the same performance as on a single instance
  • Auditing: fail safe logging of all operations to a central place
  • Encryption control: higher level of security thanks to increased control over SSL encryption

Cluster:

  • Parallel Intra-Cluster-Communication
  • HLC: The Hybrid Logical Clock is used for timestamps in revision strings which is part of the preparation for cluster-wide transactions
  • Auto-failover timeouts: you can now configure the timeouts for automatic failovers
  • Progress display when relocating shards in the administration frontend
  • Stand-Alone Agency: You can now use ArangoDB as a resilient, RAFT-based key/value store as an alternative to e.g. ZooKeeper or etcd. (You’ll surely ask yourself why we created it and we’ll answer this legitimate question with a blog post soon)

Graph Features:

  • Vertex-centric indixes for graph traversals in AQL: You can now generate indexes on edge collections which are a combination of the vertex and any suitable attribute

WebUI:

  • New Graph Viewer: The new graph viewer got a serious upgrade to handle large amounts of nodes and edges. It also includes our first WebGL implementation. Your feedback is welcome!
  • AQL Editor: With the new Query Performance Profiler you can now get infos on the query performance – you can investigate which part of the execution took how long. As a final bonus you can choose between graph, JSON or tabular view of your query results

For the Java World:

We put a lot of effort into our new Java Driver which will only work with ArangoDB 3.1 and onwards. Our Java team completely refactored the driver which is now up to 4x faster than the previous one. The new features include:

  • Multi document operations
  • Support for VelocyPack
  • Support for VelocyStream
  • Asynchronous request handling

We hope that you will enjoy the new features and updates that will soon arrive with the official ArangoDB 3.1 release. In the meantime, have fun evaluating the RC2 Community and Enterprise Editions of ArangoDB. We would really appreciate your feedback via our Slack Community channel or directly via our contact form. Your feedback keeps us going.

Have fun playing around with the RC2 of ArangoDB 3.1.

Updated Sync & Async Java Drivers with ArangoDB 3.1

$
0
0

The upcoming 3.1 release comes with a binary protocol – VelocyStream – to transport VelocyPack (internal storage format of ArangoDB introduced with the 3.0 release) data between ArangoDB and client applications. VelocyPack stores a superset of JSON, is more compact and has a fast attribute lookup. On the other hand, VelocyStream allows to send VelocyPack in an optimized form over the network. We think it would be the right time to update our official Java Driver to modernize it and to let it be the first to fully support VelocyStream.

VelocyStream is the new bi-directional async binary protocol of ArangoDB. It supports sending messages pipelined, multiplexed, uni-directional or bi-directional. The messages themselves are VelocyPack objects.

Alongside the http protocol, ArangoDB also uses VelocyPack and VelocyStream. We have decided that the new Java Driver will exclusively use the new more efficient binary protocol. The new version natively supports de-/serialization of Java objects in and out of VelocyPack and also parsing of JSON documents from and in VelocyPack. But this does not only belong to the client – server communication. As a developer you can work directly on the raw VelocyPack returned by ArangoDB over a light API and profit from the performance and data-types supported by VelocyPack like date, binary and different number-types which is much more comfortable than JSON.

To get a stable driver that takes into consideration all benefits of the new ArangoDB release, the updated version of our Java driver renounces the http protocol and goes with VelocyStream.

ArangoDB Java Driver 4.0:

In combination with all of the above and other improvements of ArangoDB 3.1, the new Java Driver (ArangoDB-Java-Driver 4.0) shows up to 4 times better performance in synchronous operations compared to the previous 3.0 version of the driver.

Performance Java-Driver – synchronous read – single Server:

velocystream-java-driver

We also have a detailed 10 min Java tutorial for the updated driver, explaining how to perform operations in ArangoDB.

Java Driver with asynchronous support:

ArangoDB 3.1 and VelocyStream enable the possibility of asynchronous communication. By using Java 8 it becomes possible to get the right API for asynchronous computation – CompletableFuture.

Keeping that in mind, we have decided to also offer a second driver which supports asynchronous calls showing an even better performance – ArangoDB-Java-Driver-Async 4.0. Here we went with Java 8 and built an additional very strong async driver powered by the CompletableFuture.

What about Multi-document operations?

What can be better than reducing the time of a requests? Of course reducing the number of needed requests.

The two new drivers now support multi-document operations for insert/delete/update/replace. So instead of batching thousands of documents with just as many requests into ArangoDB, you can now insert thousands of documents together with only one request. Which dramatically reduces network traffic and allows for better performance.

So here we go, here are the two updated versions for the Java crowd of our community:

ArangoDB-Java-Driver 4.0
ArangoDB-Java-Driver-Async 4.0

Try them out with the RC2 of ArangoDB 3.1 and let us know what you think.

Webinar: ArangoDB and DC/OS Graph, Documents in a scalable Distributed Data-Store

$
0
0

Wednesday, October 26th

DC/OS provides ArangoDB with exactly the infrastructure it needs for implementing a modern distributed stateful service. Join this upcoming webinar to learn how DC/OS quickly and easily deploys ArangoDB to provide scaling and fault tolerance with automatic replacement of failed components. While DC/OS supplies the management of resources and hence allows multiple services to share a common infrastructure, ArangoDB provides a modern persistence layer with its multi-model, fault-tolerant datastore.

dcos-logo

In this webinar, we will discuss:

  • The trend from distributed, stateful applications to fault tolerant architecture
  • Advantages of a multi-model database
  • Scalable Data Storage
  • Distributed Graphs
  • Synchronous Replication

The webinar will include a live demo of ArangoDB cluster deployment, showing some queries and mixed data models of the database, followed by a presentation of fault tolerance in action.

Wednesday, October 26th at 9am PST/12pm EST/ 4pm GMT:

Join the webinar


ArangoDB Spark Connector

$
0
0

Currently we are diving deeper into the Apache Spark world. We started with an implementation of a Spark-Connector written in Scala. The connector supports loading of data from ArangoDB into Spark and vice-versa. Today we release a first prototype with an aim of including our community into the development process early to build a product that fits your needs. Your feedback is more than welcome!

Usage

Setup SparkContext

First you need to initialize a SparkContext with the configuration for the Spark-Connector and the underlying Java Driver (see corresponding blog post here) to connect to your ArangoDB server.

Scala

val conf = new SparkConf()
    .set("arangodb.host", "127.0.0.1")
    .set("arangodb.port", "8529")
    .set("arangodb.user", "myUser")
    .set("arangodb.password", "myPassword")
    ...
val sc = new SparkContext(conf)

Java

SparkConf conf = new SparkConf()
    .set("arangodb.host", "127.0.0.1")
    .set("arangodb.port", "8529")
    .set("arangodb.user", "myUser")
    .set("arangodb.password", "myPassword");
    ...
JavaSparkContext sc = new JavaSparkContext(conf);

Load data from ArangoDB

To load data from ArangoDB, use the function load – from the object ArangoSpark – with the SparkContext, the name of your collection and the type of your bean to load data in. If needed, there is an additional load function with extra read options like the name of the database.

Scala

val rdd = ArangoSpark.load[MyBean](sc, "myCollection")

Java

ArangoJavaRDD rdd = ArangoSpark.load(sc, "myCollection", MyBean.class);

Save data to ArangoDB

To save data to ArangoDB, use the function save – from the object ArangoSpark – with the SparkContext and the name of your collection. If needed, there is an additional save function with extra write options like the name of the database.

Scala / Java

ArangoSpark.save(rdd, "myCollection")

It would be great if you try it our and give us feedback on what you think.

ArangoDB 3.1 Enterprise Edition – A Solid Ground to Scale with Graphs

$
0
0

In addition to our community version of ArangoDB 3.1 we are excited to release our first Enterprise Edition today. The Enterprise Editions of ArangoDB focuses on enterprise-scale problems and provides useful features to meet the requirements of enterprise customers. You can download a free evaluation-only version here: Download Enterprise Edition. ArangoDB Enterprise Edition also comes with the Enterprise subscription, including comprehensive support SLA.

This first ArangoDB Enterprise Edition includes three major features:

  • SmartGraphs: Scale with graphs to a cluster and stay performant. With SmartGraphs you can use the “smartness” of your application layer to shard your graph efficiently to your machines and let traversals run locally
  • Encryption Control: Choose your level of SSL encryption
  • Auditing: Keep a detailed log of all the important things that happened in ArangoDB


What one can see everywhere is the rise of large data-sets which grow easily beyond one machine. This includes graph data. With the ArangoDB Enterprise Edition we introduce SmartGraphs to provide a performant solution to many cases where you have to traverse through a graph sharded to a cluster. Our new SmartGraphs feature aims to solve the problem of network hops that occur when traversing through a sharded graph. These network hops are very expensive. But with ArangoDB SmartGraphs the sharding and traversal execution becomes highly efficient. Now traversal execution times can be achieved which are in the sphere of a single instance. If you like to read more about SmartGraphs, please find a detailed description here.

Beside our new SmartGraph feature we’ve also put some deeper thoughts into security. As a further step to our existing features you can enjoy enhanced encryption and auditing as a Enterprise Edition user. Please find a brief overview about those features here.

If you want to give ArangoDB Enterprise a spin you can download an evaluation version here.

We are keen to learn more about your project and support it with all our experience – get in touch, we are happy to help.

Have fun giving ArangoDB Enterprise a spin and receiving your feedback is highly appreciated.

ArangoDB 3.1 – A Solid Ground to Scale part II

$
0
0

It’s not that long ago since we released ArangoDB 3.0 in which we introduced our binary storage format VelocyPack, the ArangoDB Agency for a self-managing cluster and the first persistent index by implementing Facebooks RocksDB. With all that we laid the foundation for a solid ground to scale with all three data-models.

With today’s ArangoDB 3.1 release we take things a few steps further and make cluster usage of ArangoDB more performant and convenient. Get ArangoDB 3.1.

General upgrades in 3.1

  • Performance boost with our new boost-ASIO server infrastructure
  • Performance boost by overhauling the ArangoDB query optimizer
  • Improved internal abstraction for storage-engines as a preparation for MVCC and pluggable storage-engines
  • VelocyPack over HTTP: Use our binary storage format VelocyPack over HTTP
  • VelocyStream: for high performance needs you can now directly stream VelocyPack. This is already implemented in our Java driver (all other drivers maintained by ArangoDB will follow soon).

Cluster

  • Parallel Intra-Cluster-Communication
  • HLC: The Hybrid Logical Clock is used for timestamps in revision strings which is part of the preparation for cluster-wide transactions
  • Auto-failover timeouts: you can now configure the timeouts for automatic failovers
  • Progress Display when relocating shards
  • Stand-Alone Agency: You can now use ArangoDB as a resilient, RAFT-based key/value store as an alternative to e.g. ZooKeeper or etcd. (You’ll surely ask yourself why we created it and we’ll answer this legitimate question in a blog post soon).

Graph Features

  • Vertex-centric indices for graphs in AQL: You can now generate indices on edges which are a combination of the vertex and an attribute.
  • SmartGraphs: This is a the big new feature of our Enterprise Edition and enables you to shard really huge graphs to a cluster and achieve close to the same performance as on a single instance. Read more about SmartGraphs and our Enterprise Edition.

For the Java World

We put a lot of effort into our new Java Driver which will only work with ArangoDB 3.1 onwards. Our Java team completely refactored the driver which is now up to 4x faster than the previous one. The new features include:

  • multi document operations
  • Uses VelocyPack
  • VelocyStream ready
  • asynchronous request handling

Read more about the features in the corresponding blog post. You can download the new Java drivers here: ArangoDB-Java-Driver 4.1.0 & ArangoDB-Java-Driver-Async 4.1.0. We also included a new detailed Java drivers documentation.

Web UI

  • New Graph Viewer: The previous solution was not suitable for large graph visualizations. With an extended Canvas support the Graph Viewer is now feature-rich and can handle large graph visualizations. As a second engine we made a first implementation of WebGL. Feedback to our new GraphViewer is highly appreciated.
  • AQL Editor: We invested a lot into usability and e.g. simplified the elaboration of performance issues of your queries. With the Query Performance Profiler you can now get info about the query performance so you can investigate which part of the execution took how long. You can also choose between JSON, tabular and graph output for your results.

We hope that we included many useful features for you into ArangoDB 3.1. We appreciate your feedback about the new release a lot. If you’re missing something, find a bug or want to talk about an idea with us, feel free to get in touch via our Slack Community channel or contact us form.

Have fun playing around with ArangoDB 3.1!

Learn ArangoDB while contributing

$
0
0

We are fortunate to live in an open-source world with a fairly large international community of users and contributors, which has been only growing more and more in the past year. (Big thanks for that, by the way 😉 ) Especially that we have recently received quite a few requests on how one can contribute to ArangoDB in an easy and quick way, we have decided that the time has come to get closer to our community and get even more involved.

A lot of great ideas came up during a chat with some of our long-term users on how we could improve on that front. Among all others, the most straightforward one is asking the community for some help and support on GitHub. So we took a bit of time to go through open issues and selected a few now tagged ‘Help Wanted’. Here is a selection of easy tasks that will get you started contributing to ArangoDB with code or ideas and concepts for features. It is only a few to start with, but if you want to get involved we would appreciate your helping hand!

github-help-wanted

If you have ideas for any other issue where you can help out, feel free to ping us – we are always happy to hear from you.

We have several other possibilities of how you can get involved in the ArangoDB world. If you have an interesting recipe idea in mind – here is how to add it to our Cookbook. Need a driver? You can contribute to already existing community run drivers or build your own. Most recent developments are ArangoDB Elixir drivers: xarango by Michel Benevento @beno and arangoex by Austin Morris @austinsmorris. You will find more options on how to get involved on our community page.

Please note: As we are a company registered in Germany (and everything has to be in “Ordnung”), we kindly ask you to sign a CLA if you do pull requests to the official ArangoDB repository. You can download it here, send it to cla[@]arangodb[.]com or fax it to +49-221-2722999-88

Happy contributing!

How to model customer surveys in a graph database

$
0
0

Use-Case

The graph database use-case we are stepping through in this post is the following: In our web application we have several places where a user is led through a survey, where she decides on details for one of our products. Some of the options within the survey depend on previous decisions and some are independent.

Examples:

  • Configure a new car
  • Configure a new laptop
  • Book extras with your flight (meal, reserve seat etc.)
  • Configure a new complete kitchen
  • Collect customer feedback via logic-jump surveys


We would like to easily offer a generic page which can be seeded with any decision-tree and just plays the Q&A game with the user. We also want to be able to easily create new and modify existing decision trees in case the products change.

Graph Database Data Model

When speaking of trees it directly pops into my mind that a graph database probably is a good solution to store the data in, well every tree is a graph. So in this case the data model is actually pretty simple: First we have questions we would like to ask the user. These questions are modeled as vertices having an attribute query stating the string that should be displayed to the user. In case of a localized application we will not have one query attribute, but one for every language we support (e.g. en, de, jp etc.), for simplicity we now assume our shop is only available in English.

Attached to each question we have a list of possible answers leading the user to the next question, or to the end of the survey. These answers are best realized as the edges connecting everything together. Each answer has again a text attribute that should be presented to the user. Finally we have products which best fit for a user after undergoing the survey.

Let’s now visualize a simple decision tree. We are in an online Avocado Shop and want to give our customer the best Avocado:

survey-in-a-graph-database-arangodb

Tips and Tricks

If we set up the surveys using a graph all vertices in the graph can be reused in several surveys.

This is especially important if you want to offer the same product in completely different surveys. You do not have to store the product twice, making it easier to update. It is also important if you have overlapping surveys, say for two surveys the starting point is different, but based on some decisions the user will end up in the same dialog for both surveys and can only reach identical products with identical questions.

An example for this is if you are selling and leasing cars to a user. You at first have two sub different dialogs asking the user for selling respectively leasing conditions, but at some point in both dialogs the user finally has to decide on the car. This car selection process is again identical for both surveys.

UI

Sorry I am no UX designer 😉 In general I would require only two API endpoints:

  • showQuestion/:id Which would enter the survey at any point and return the question with it’s set of possible answers (the number of answers at this point is unknown and my UI widget is able to display an arbitrary number of option boxes).
  • showProduct/:id Which will show the product because the user is at the end of her survey.

If I want to hold some user state the UI part is responsible for it, it can for instance maintain a list of questions the user has visited and if the user presses “back” it can just resend showQuestion/:id with the second to last question.

Queries

In the first endpoints we need to return a query and all it’s possible answers. Using the data model design from above this means we have to select one question by it’s _id and all its outgoing edges.

In ArangoDB the query to achieve this is the following (including comments):

```
FOR question IN questions
// Select the question. No worries we use the Primary Index O(1), no looping here ;)
FILTER question._key == @id
// Here we start a subquery to join together all answers into a single array
LET answers = (
  // Now we select all outbound edges from question
  FOR next, answer IN OUTBOUND question answers
  // We make a projection to only return the text and the id of the next object
  // If we would use localisation this place has to be adjusted
  // We have to select the correct translation
  // Also we use the indicator that if next has a query we are not at the end
    RETURN {nextId: next._key, text: answer.text, isFinal: next.query == null}
  )
  // Finally we return the query, and the list of possible answers
  RETURN {
    query: question.query,
    answers: answers
  }
```

For the other endpoint we only have to display the product by its id.
This can be done via the following even simpler query:

```
FOR product IN products
  // Select the Product. No worries we use the Primary Index O(1), no looping here ;)
  FILTER product._key == @id
  RETURN product
```

Creating an API out of it

ArangoDB also offers a nice little framework called Foxx. This framework can easily wrap both queries into nice API endpoints by only adding a couple of lines of JS to it.
In this example we will use the nice shorthand notations that are offered by ES6.

```
// First just use our queries:
const questionQuery = `
FOR question IN questions
FILTER question._key == @id
LET answers = (
  FOR next, answer IN OUTBOUND question answers
    RETURN {nextId: next._key, text: answer.text, isFinal: next.query == null}
  )
  RETURN {
    query: question.query,
    answers: answers
  }`;


const productQuery = `
FOR product IN products
  FILTER product._key == @id
  RETURN product
`;


// Now define the routes:
router.get("/showQuestion/:id", (req, res) => {
  let cursor = db._query(questionQuery, {id: req.pathParams.id})
  res.json(cursor.toArray());
});


router.get("/showProduct/:id", (req, res) => {
  let cursor = db._query(productQuery, {id: req.pathParams.id})
  res.json(cursor.toArray());
});
```

And there we go.

See it in action

Luckily Foxx can also serve static files, which can contain HTML and JS code. This feature is useful if you want to ship a small administration frontend with you Foxx App.

In this post I will misuse it to ship my entire survey-demo widget with it. You can simply install it from github using the following repository: https://github.com/arangodb-foxx/survey-generator

Have fun trying it out or even enhance it for your Product if you like it. It is all Apache 2 Licence (and so is ArangoDB).

Viewing all 391 articles
Browse latest View live