Quantcast
Channel: ArangoDB
Viewing all 391 articles
Browse latest View live

ArangoDB Closes 2.2 Million Euro Investment Led by Target Partners

$
0
0

Funding to accelerate the company’s product development and international expansion

Munich/Cologne (Germany), November 24th, 2016: – ArangoDB GmbH (www.arangodb.com), the company behind one of the fastest growing next generation databases, has landed a 2.2 million Euro investment led by Munich-based venture capital firm Target Partners (www.targetpartners.de). The company develops the open-source NoSQL database ArangoDB, which combines three major data models (graph, key/value, JSON-documents) in one database and one query language. ArangoDB’s product allows startups and enterprises alike to speed up their innovation cycles, simplify their technology stack and increase on-time and on-budget delivery of software projects.

Claudius Weinberger, Co-Founder and CEO, declares: “The previous funding round allowed us to build a rock-solid product and with this additional investment we can further accelerate our growth and expand internationally.”

target-partners_kurt-mueller

Kurt Müller, Partner at Target Partners says: “Modern applications increasingly need high-performance access to a variety of different types of data. ArangoDB’s innovative “multi-model” approach uniquely combines data flexibility with high performance. The company’s rapid growth is a testament to the large and growing need for such a product.”

ArangoDB has an enthusiastic global community with over 1.1 million downloads so far in 2016 and 2,800 github stargazers. The product is currently used by over 3,000 organizations worldwide, including Fortune 50 companies, for different use-cases including e-commerce, IoT, analytics, fraud detection, personalization, and genomic research. With its newly released Enterprise Edition the company enables enterprise scale deployments, including high-performance graph cases.

AboutYou, the fast growing e-commerce startup of Otto Group, as well as the US-based companies FlightStats and Liaison Technologies are just some of the successful product deployments. Research organizations like the Karlsruher Institute of Technology (KIT) are also leveraging ArangoDB.

Weinberger and Frank Celler (co-founder & CTO) have been working on this database since 2012 and founded the ArangoDB GmbH in May 2014. They met during their time at OnVista AG. Weinberger studied economics with a major in economic informatics. Weinberger was the Head of Product Development at OnVista and Celler – a PhD in mathematics – guided the data team at OnVista as the Head of Database. In 2004 they founded their first company triAGENS, a database consulting company, where they worked on NoSQL solutions for 15 years. In February 2015 ArangoDB received its first funding round of 1.85m Euro led by Machao Holdings AG and triAGENS.

About ArangoDB GmbH
ArangoDB GmbH develops the NoSQL database ArangoDB. The team has over 15 years of experience designing and developing database solutions for the likes of the NY Stock-Exchange/Euronext, German Postal Service, DHL and several international banks. Find more here: www.arangodb.com or Twitter: twitter.com/arangodb

About Target Partners
With €300 million under management, Target Partners is one of the leading early-stage venture capital firms in Germany. Target Partners invests in start-up and early-stage companies and supports them with venture capital during their build-out and expansion phases. With many years of experience as managers, entrepreneurs and venture capitalists, the team at Target Partners supports entrepreneurs in developing and marketing products and services, building organizations, raising money and taking companies public in Europe and the United States.
For more details: www.targetpartners.de, follow us on Twitter: twitter.com/targetpartners or Facebook: facebook.com/targetpartners


Introducing ArangoDB snapcraft.io Packages

$
0
0

ArangoDB Packaging

With ArangoDB 3.0 we reworked the build process to be based completely on cmake. The packaging was partly done using cpack (Windows, Mac), for the rest regular packaging scripts on the SuSE OBS were used. With ArangoDB 3.1 we reworked all packaging to be included in the ArangoDB Source code and use CPack. Users can now easily use that to build their own Packages from the source, as we do with Jenkins. Community member Artur Janke (@servusoft) contributed the new ubuntu snap packaging assisted by Michael Hall (@mhall119). Big thanks for that!

Download packages for Sap Ubuntu Core16.04

What are snaps

ubuntu-snapSnaps are a new way of packaging and deploying software on Ubuntu Core 16 and Linux in general. As one would expect from a Linux software distribution channel snap is both simple to install and quick to update.

Spoken simply, Snaps are like docker containers without docker. More precisely they share some concepts, and achieve similar results, but differ in other aspects.

  • Both use a system agent to administer the stack to deploy
  • Both use layered filesystems, so packages can derive from each others
  • Both offer automated distribution channels
  • Both offer working deployments on several linux distributions
  • While docker uses kernel namespaces to insulate processes into lightweight VMs, snap uses the Linux kernel AppArmor framework to manage access permissions on the system.
  • Docker uses complex virtual network interfaces and in-kernel routing to provide your app with network connectivity, snap manages access to the host systems network resources via AppArmor
  • You can also use snaps in docker containers
  • You could install docker on a host via snap

By providing ArangoDB as a snap, you get the ease of installation and updates that you would get from your distro’s own archives, while still getting the very latest versions directly from us. And because they are fully self-contained, you don’t need to worry about them breaking other software on your system, or other software interfering your ArangoDB install.

Installing the ArangoDB snap

If you’re running Ubuntu 16.04 or later you already have the ability to install Snap packages. For other distros, you will need to follow the installation instructions provided for your distro. Then installing ArangoDB becomes a simple sudo snap install arangodb3. If this is the first snap you’ve installed, you will see it also downloading the ubuntu-core package, this is the common runtime that all Snap packages use, and you will only need to install it once.

After installing the arangodb3 snap, the service will be running and ready for you to use! You can verify the service with systemctl status snap.arangodb3.arangod. You will also have all of the ArangoDB command-line tools available under /snap/bin/, including arangodump and arangosh, for managing your database.

The ArangoDB snap package stores your data in one of two locations. The database files themselves will be written to /var/snap/arangodb3/common/, including any Foxx services you install on your instance. Your log files and other meta data will be in /var/snap/arangodb3/current/.

Your arangodb snap will receive updates as often as we publish new, stable releases, there’s nothing you need to do to stay up to date. If at any time you want to remove this snap, just run sudo snap remove arangodb3, but be aware that this will delete your data files too, so be sure you’ve backed them up if you want to keep them!

If you’ve installed the snap on your local machine, you now can reach the ArangoDB web-interface via https://127.0.0.1:8529 or https://&ltyour servers ip>:8529 :

arangodb-web-ui

You may download one of our example graph datasets and explore it with our new graph viewer:

aql-result-as-graph

More resources

Download the packages for Sap Ubuntu Core16.04.
Get started with ArangoDB by learning about the basic operations in the database: go through our 10 min CRUD tutorial (Create, Read, Update, Delete) or complete one of the tutorials for drivers in your language of choice.

ArangoDB #FoxxChallenge

$
0
0

The Challenge

arangodb-foxx-logoStarting today we launch the ArangoDB #FoxxChallenge and the winner will receive a brand new Amazon Echo.

Use your knowledge about everyday needs in projects and create a Foxx service that could be helpful for others. If you need some inspiration here some ideas:

  • enable role-based authorisation with Foxx: Such a service might include users and user-groups who have certain rights. The authentication via password or OAuth could be done separately to increase reusability for others
  • Foxx service for document-based permissions: Such a service could have an edge collection and connects users to certain permissions to manipulate the data.
  • Connected services: One could also build both services and by connecting them allow e.g. user groups to manipulate data (similar to Django)
  • A webhook service: a simple but useful service that throws documents into a certain collection
  • Voting service demo: Like a HackerNews voting logic. One could also create a service which offers a JS-API with which an edge collection can be generated to connect users to content (e.g likes)
  • We are really excited to see what other ideas come up in your minds!

For the winner

First price is a brand new Amazon Echo. But all of the participants will get a little something from the team! 🙂

How is the winner picked?

The winner will be picked by our Foxx team. We will take into account the usefulness of the Foxx service as well as the quality of code and service description.

How to participate?

After creating your service and uploading it to your github repository, you can make a pull request to the official ArangoDB Foxx repository.

Please don’t forget to fill in the quick form on our #FoxxChallenge page to enter the competition.

Useful tips

  • Please note that it is always a good idea to use module.context.collection so that the collection names do not interfere with other services in the DB
  • To make services more testable we recommend extracting the data logic from routes into separate modules to allow testing that code directly rather than testing the routes

I’m new to Foxx but I want to participate

Great! Please find a first overview about Foxx here: https://www.arangodb.com/why-arangodb/foxx/ and a detailed description on how to work with Foxx here: https://docs.arangodb.com/3.1/Manual/Foxx/GettingStarted.html

The #FoxxChallenge is starting today and will last till January 1st 2017. The winner will be notified in January 2017.

If you have any questions join our #Foxx Channel on ArangoDB Community Slack: https://slack.arangodb.com/

Good luck and happy hacking!

Starting an ArangoDB cluster the easy way

$
0
0

Recently, we have got a lot of feedback about the fact that standing up an ArangoDB cluster “manually” is an awkward and error-prone affair. We have been aware of this for some time, but always expected that most users running ArangoDB clusters would do so on Apache Mesos or DC/OS, where deployment is a breeze due to our ArangoDB framework.

However, for various valid reasons people do not want to use Apache Mesos and thus are back to square one with the problem of deploying an ArangoDB cluster without Apache Mesos.

So we have listened to this, and have looked what other distributed databases offer, and have put together a tool called arangodb (as opposed to arangod) to help you. It essentially gives you the following experience:

You install ArangoDB in the usual way as binary package. Then, on host A simply do (in an empty directory):

arangodb

This will use port 4000 to wait for colleagues (3 are needed for a resilient agency). On host B (can be the same as A), you do:

arangodb --join A

This will contact A on port 4000 and register. On host C (can be same as A or B), do:

arangodb --join A

This will contact A on port 4000 and register.

From the moment on when 3 have joined, each will fire up an agent, a coordinator and a DBserver and the cluster is up. Ports are shown on the console.

Additional servers can be added in the same way.

If two or more of the arangodb instances run on the same machine, one has to use the --dataDir option to let each use a different directory.

The arangodb program will find the ArangoDB executable and the other installation files automatically.

So far, the tool is experimental, but it works under Linux, OSX and Windows and even allows to mix different operating systems. One should use ArangoDB 3.1.4 or higher for it. The code can be found in this GitHub repository. The README.md contains building, installation and usage instructions.

Please give it a try, we are curious about your opinion and are looking forward to your feedback, be it positive or negative. You can let us know what you think by opening an issue in the relevant GitHub repository.

If people like this, we might bundle it with the distribution and maybe even make it the default way to launch ArangoDB manually.

ArangoDB 2016 – A Year in Review

$
0
0

Important Steps this Year

2016 is about to see its final days and things are calming down, so Frank and I thought about the year that lies behind us. It was a really exciting year for the whole ArangoDB project and for us as founders. In 2016 we saw our team doubling in size, ArangoDB 3 series got launched and we became part of the Target Partners family. Many other great things happened this year and with this post we want to take the chance to say “Thank you” to all our supporters.

For the whole team it was and is super motivating to see that practically the same growth we experienced team-wise happened to the ArangoDB community. Exceeding the 3000 stargazer landmark right before Christmas was indeed a nice present, but it also reminds us that more and more people rely on what we create. Without the support of so many people on Slack, GitHub, Stackoverflow and other channels sharing their feedback, helping others or trying new things based on ArangoDB, the project would not be where it is today. “Period”. So guys “THANK YOU” so much for all your ideas and detailed feedback to new releases!

Technical perspective

From a technical perspective ArangoDB also made some huge steps forward this year. With a larger team we could finally tackle bigger things on our list. The resilient cluster management including our Raft-based Agency, implementing our binary storage format VelocyPack or releasing the first Enterprise Edition including ArangoDB SmartGraphs are important steps towards our vision of simplified data work.

Nonetheless there is still a lot of stuff to accomplish in 2017 including upgrades like:

  • a pluggable storage engine
  • adjustable consistency
  • Satellite Collections
  • and much more

We are excited to hear more about your ideas that you can either describe by filling in our community survey or opening an issue on GitHub. We are looking forward to taking some ideas for a spin.

2016 from a business perspective

From a business perspective 2016 went pretty well for ArangoDB. We could nearly double our customer base over the year and could even win five Forbes 100 companies for the multi-model idea. We hope that we can share more info within detailed case studies soon and show the great work of other teams around the globe. What we also recognized is that since our 3.0 release, including the overhauled cluster management, the number of business critical use cases grew as well and we are proud that teams have built trust in ArangoDB and its capabilities.

These positive news have been recognized by investors, too. Just a few weeks ago we could announce the next funding for ArangoDB and that we became a proud member of the Target Partners family. With Target Partners we won Germany’s leading tech investor for our multi-model vision and with their partner Kurt Müller – a seasoned, smart and great mind to guide us through the next steps. Thank you, Kurt! and Thank you, Target Partners-team! for your trust, vision and guiding thoughts!

A big warm thanks to all our partners who contributed to the project’s development and growth! Especially our friends at Mesosphere and DC/OS for supporting us every step of the way.

Team growth

I already mentioned in the first lines of this post that all of this would not be possible without a great team and even better teamwork. So this post is also about thanking our team members who build ArangoDB from the first line of code onwards. We want to take the chance and thank our existing team for their deep and early trust in the multi-model idea and want to welcome our new team mates.

Here are some of us:
year 2016 in review - ArangoDB

Growing as a company in a highly competitive market also requires great minds on the business side of things. We are more than happy that we could win 4 great minds for our advisory board contributing their decades of experience and knowledge into ArangoDB.

Special “Thank You!” to our whole community

Without you guys ArangoDB would not be possible. All the feedback – good or constructive – guides us in the right direction, lets us learn what we didn’t think of and leads us to new ideas. I will not try and list everyone who supported us throughout the year, as I would end up writing a “100 pages” long essay 🙂 . But we are eternally grateful for every single contribution! All of you did remarkable things for the whole project and the community.

To mention just a few:

  • New Elixir, Ruby, Go, Swift, .NET etc. drivers developed by the community and many useful integrations and extensions for ArangoDB, like Spring Data, Flask, Kubernetes configuration generator and more
  • Of course, all valuable maintainers and developers of already existing drivers
  • ArangoDB SNAP packages for Ubuntu
  • ArangoDB port to ARM & ARM64
  • Continuous support of the Gentoo Portage overlay for ArangoDB
  • and packages to build ArangoDB on Arch linux
  • Arangochair making ArangoDB realtime push ready
  • Django database backend for ArangoDB
  • Interesting and insightful community talks, presentations, revealing scientific papers, journals, blogs, tweets, github stars, likes and all that simply talked about ArangoDB
  • everyone of our dear community members helping with answering questions on Slack, resolving issues on GitHub and supporting us online

I could go on and on talking about all the wonderful clever things the community did for us this year, but as mentioned before the list would be endless. As founders, Frank and I are so grateful for all your support, inputs, thoughts and trust in what we all are building here together. We are already so excited about the future steps and hope that we will see you and new people creating brilliant stuff based on ArangoDB in 2017.

Thank you again for a wonderful 2016! And a Happy New Year!

Reaching and harnessing consensus with ArangoDB

$
0
0
nihil novi nisi commune consensu
nothing new unless by the common consensus

– law of the polish-lithuanian common-wealth, 1505

A warning aforehand: this is a rather longish post, but hang in there it might be saving you a lot of time one day.

Introduction

Consensus has its etymological roots in the latin verb consentire, which comes as no surprise to mean to consent, to agree. As old as the verb equally old is the concept in the brief history of computer science. It designates a crucial necessity of distributed appliances. More fundamentally, consensus wants to provide a fault-tolerant distributed animal brain to higher level appliances such as deployed cluster file systems, currency exchange systems, or specifically in our case distributed databases, etc.

While conceptually, one could easily envision a bunch of computers on a network, who share and manipulate some central truth-base, the actual implementation of such a service will pose paradoxical demands to fail-safety and synchronicity. In other words, making guarantees about the truth of state x at a given time t or more demanding at any given time {t1,…,tn} turns out to be entirely and radically non-trivial.

This is the reason, why most implementations of consensus protocols as deployed in real production systems have relied upon one of two major publications1,2 referred to by their synonyms, Paxos, including its derivatives, and RAFT, respectively.

consensus

Although it would be a beastly joy to discuss here the differences and the pro and contra argumentation of each, I suggest to have a look at the extent of both papers in excess of 15 pages to get a rough idea of the scope of such a discussion. As a matter of fact we were as audacious to try to define and implement a simpler protocol than the above. And we failed miserably – at least in arriving at a simpler solution.

Suffice it to say that we decided to utilise RAFT in our implementation. The choice for RAFT fell mainly because in my humble view it is the overall simpler method to understand.

In short, RAFT ensures at all times that one and only one instance of the deployed services assumes leadership. The leader projects permanently its reign over other instances, replicates all write requests to its followers and serves read requests from its local replica. As a matter of fact, it is crucial to note that the entire deployment needs to maintain the order of write requests, as arbitrary operations on the replicated log will not be commutable.

Furthermore and not least importantly, RAFT guarantees that any majority of instances is functional i.e. is capable of electing a leader and knows the replicated truth as promised to the outside world.

For more on RAFT and consensus in general, we would like to refer to the authors’ website.

What is already out there and why not just take one of those

Say that you have realised by now that you need such a consensus among some set of deployed appliances. Should you do the implementation yourself? Most probably not. Unless obviously, you have good reasons. As there are multiple arguably very good implementations of both of the above algorithms out there. They come as code snippets, libraries or full-fledged services; some of which have enjoyed praise and criticism alike.

Common wisdom suggests to invest your development energy in more urgent matters that will put you ahead of your competition, or at least to spare the tedious work of proving the correctness of the implementation all the way to deliberate abuse case studies.

We, initially, started off building ArangoDB clusters before version 3.0 relying on etcd. etcd is a great and easy to use service, which is actively maintained and developed to date. We did hit limits though as we did have need for replication of transactions rather than single write requests, which etcd did not provide back then. We also dismissed other very good services such as zookeeper, as zookeeper, for example, would have added the requirement to deploy a Java virtual machine along. And so on.

But the need for our own consensus implementation imposed itself upon us for other reasons.

I mentioned earlier the consensus to be the animal brain of a larger service. When we think of our brainstem the first thing that comes to mind is not its great capabilities in storing things but its core function to control respiration and heart beat. So how about being able to make guarantees not only about replicating a log or key-value-store but also running a particular process as a combined effort of all agents. In other words, could one think of a resilient program thread which follows the RAFT leadership? And if so, what would the benefits look like?

Letting the cat entirely out of the bag, we built into the RAFT environment a supervision process, which handles failure as well as maintenance cases at cluster runtime. None of the implementations we were aware of could provide us with that. In my view the most intriguing argument.

This is how we use the agency

After all the nice story telling, let us look how we ended up using the agency.

cluster topology

ArangoDB cluster deployments consist of 3 types or roles of services, namely database servers, coordinators and agents. For details on the function of the first two roles, please refer to the cluster documentation. For them however to function and interact properly they rely on our distributed initialisation and configuration.

Every imaginable meta information is stored there. Which instance is storing a replica of a particular shard? Which one is the primary database server currently responsible for shard X or synchronous replication of shard Y as a follower and vice versa. When was the last heartbeat received from which service. etc.

ArangoDB-speak for the central cluster maintenance and configuration consensus is agency. The agency consists of an odd number of ArangoDB instances, which hold the replicated and persisted configuration of the cluster while maintaining integrity using the RAFT algorithm in their midst.

Database servers and coordinators interact with the agents through an HTTP API. Agents respond as a unit. Read and write API calls are redirected seamlessly by followers to the current leader. Changes in cluster configuration are stored in the “Plan” section of the key value store while actually committed changes are reported by the database servers and coordinators in the “Current” section. A single API call to the agency may consist of a bundle of transactions on the key value store, which are executed atomically with guarantees to transaction safety.

Last but not least, the agency holds a supervision thread whose job it is to perform automated failover of db servers with all the consequences for individual shards of individual collections. But it is also the place where deliberate maintenance jobs are executed such as orderly shutdown of a node for say hardware upgrade and the like.

Having all the power of consensus in such a program, we can guarantee that no 2 machines start giving contradictory orders to the rest of the cluster nodes. Additionally, a job like handling a failed server replacement is continued in the very same way it was meant to, if the agency’s leader catastrophically fails in midst of its execution.

How could you use ArangoDB as a consensus service

The online documentation describes how such an ArangoDB RAFT service is deployed on an odd (and arguably low) number of interconnected hosts.

During initial startup of such an agency the instances find each other, exchange identities and establish a clear key-value and program store. Once the initialisation phase is complete within typically a couple of seconds, the RAFT algorithm is established and the HTTP API is accessible and the replicated log is recording; the documentation is found here. Restarts of individual nodes of the agency must not influence the runtime behaviour and on a good network should go unnoticed.

The key-value store comes with some additional features such as assignment of time to live for entries and entire branches and the possibility of registering callbacks for arbitrary subsections. It also features transactions and allows one to precondition these transactions with high granularity.

Any distributed service could now use arangodb agencies for initialisation and configuration management in the very same way as we do with ArangoDB clusters through integration the HTTP-API.

But in addition, agents run local Foxx threads, which can be used to deploy a resilient monitoring and maintenance thread with access to the RAFT process with its replicated log and the current local RAFT personality.

Some initial performance and compliance test

Making claims about fault tolerance in a distributed deployment, needs some evidence to back that up. Failure scenarios clearly range all the way from malfunctioning network switches and fabric to crashed hosts or faulty hard disks. While the local sources of error as failing host hardware, can be dealt with in unit tests, it turns out that tests of correct function of such a distributed service is anything but trivial.

The main claim and minimum requirement: At all times any response from any member of the RAFT is true. Such a response could be the value of a requested key but it could also be that a preconditioned write to a particular key succeeded or failed. The nice to have and secondary goal: Performance.

Just a brief word on scalability of consensus deployments here. Consensus does not scale. The core nature of consensus could be most readily compared to that of a bottleneck. The amount of operation that go through a consensus deployment will be affected mostly by the number of members and the network’s quality. Not a big surprise when you think about how consensus is reached over back and forth of packets.

Jepsen

Thankfully, Kyle Kingsbury who has been extensively blogging about the subject of distributed correctness, has published a framework for running tests to that effect of github.

Kyle’s framework is a clojure library, which distills the findings of a couple of fundamental papers, which Kyle discusses on his blog into one flexible package. Kyle’s blog has taken the distributed world in a storm because of the way the tests are done and evaluated.

The tests are run and results are recorded as local time series on virtual machines, which are subject to random network partitioning and stability issues. The analysis then tries to find a valid ordering of the test results such that linearisability3 can be established.

Results

We have tested the ArangoDB agency inside Jepsen to validate that the agency does behave linear as above described. We have also stress tested the agency with hundreds of millions of requests over longer periods to both check memory leakage and compaction related issues. The tests show that ArangoDB is ready to be used as a fault tolerant distributed configuration management platform for small to large network appliances.

What is to come

Lookout for more in depth upcoming technical blogs, which will demonstrate best case scenarios for deploying ArangoDB agencies for your own network appliances. Also for 2017 we are planning more comprehensive and large-scale testing of linearisability over the entire ArangoDB cluster.

Examples

We have put together a small collection of go programs, which demonstrate some of the above concepts in hopes that you might find them helpful to get started on the ArangoDB agency at https://github.com/neunhoef/AgencyUsage

tldr;

Consensus is a key asset in distributed appliances. Groups of ArangoDB instances may be well integrated into appliance farms for initialisation and configuration management. Our own exhaustive tests have proven excellent dependability and resilience in vast scenarios of network and hardware failure.

_____________________________

References

1 Pease M, Shostak R and Lamport L. Reaching Agreement in the Presence of Faults. Journal of the Association for Computing Machinery, 27(2). 1980. pp 228-234
2 Ongaro D and Ousterhout J. In search of an understandable consensus algorithm. 2014 USENIX Annual Technical Conference (USENIX ATC 14). 2014. pp. 305-319
3 Herlihy MP and Wing JM. Linearizability: A Correctness Condition for Concurrent Objects. ACM Transactions on Programming Languages and Systems, 12(3). 1990. Pp.463-492
4 Max Neunhöfer. Creating Fault Tolerant Services on Mesos. Recording of the talk, MesosCon Asia 2016

Florian Leibert (CEO, Mesosphere) and Luca Olivari (Oracle, MongoDB) joining Advisory Council of ArangoDB

$
0
0

We have some amazing news today. Two brilliant minds are joining ArangoDB and our recently founded Advisory Council. Florian Leibert is CEO of Mesosphere and Luca Olivari former Executive at Oracle and MongoDB. Together with their rare expertise we can further sharpen our focus on cutting edge technologies and accelerate our growth.

Maybe a few words about Florian and Luca:
Florian-Leibert-Mesosphere
Florian Leibert is the CEO and Co-founder of Mesosphere. As one of the brightest technical minds in Silicon Valley, Florian enabled Twitter to establish a microservice architecture and built AirBnB’s big data team. At Mesosphere, he is now building DC/OS, the revolutionary open source datacenter operating system based on Apache Mesos, which is already used at his former companies in exciting use cases like Verizon Wireless. Florian will help ArangoDB to further sharpen its focus on the needs of large scale enterprises, their problems and solutions.
 
Luca-Olivari-ArangoDB
Luca Olivari is a software executive and advisor. He has held several senior leadership positions in general management, sales, presales and marketing at public as well as venture-backed, high-growth startups. Currently, Luca serves as a Chief Data Officer at Contactlab, a leading engagement marketing cloud provider, and also as advisor to several other startups. Previously Luca ran International Business Development and Channels at MongoDB, and before he led MySQL Sales Consulting across EMEA at Oracle. Luca will work with ArangoDB to further strengthen product demand, market presence, ecosystem and ultimately grow revenue.

 
“On behalf of the whole ArangoDB team I want to extend a warm welcome to Luca and Florian! Our company is entering a phase of explosive international growth and we’re really excited to work with both of you.”

Luca and Florian will guide us on our path towards a world-class open adoption company providing a cutting edge open source database and solid products for large scale needs.

arangoexport – a tool for exporting data from ArangoDB

$
0
0

With the release of the initial alpha of ArangoDB version 3.2 we also include the preview of the new export tool arangoexport. Alpha2 of ArangoDB 3.2 can be downloaded here. An export functionality was initially requested by one of our community members to view an ArangoDB graph view the Cytoscape visualizer.

Arangoexport is capable of exporting a graph or certain collections of a graph to xgmml, Cytoscape’s graph format. But arangoexport is not limited to this. It can also generate JSON or JSONL data exports of arbitrary collections.

For example, to export the collections collectionOne and collectionTwo to JSON just do:

arangoexport --type json --collection collectionOne --collection collectionTwo

If you would like to store the export data in a different directory just use the option:

--output-directory /my/path/to/export

The export type json will produce one big JSON array where every line is one document.

Export type jsonl will output line-wise JSON. So every line contains one document represented as standalone JSON.

JSONL files can be imported with arangoimp:

arangoimp --file "collectionOne.jsonl" --type jsonl --collection collectionOne

But the most anticipated feature is exporting a graph to xgmml. To do so just call arangoexport with type xgmml and a graph name:

arangoexport --type xgmml --graph-name myGraph

You can also specify an unnamed graph by additionally passing all collections that the graph consists of:

arangoexport --type xgmml --graph-name myGraph --collection vertexCollectionOne --collection vertexCollectionTwo --collection edgeCollectionOne

For xgmml there are two more options: one to suppress output of attributes to generate a smaller xgmml file, and the second is to select a different label attribute.

Also visit our documentation for arangoexport or open an issue if your super duper format is missing and should be added.

For questions please visit our Community Slack Channel.


Alpha2 of the upcoming ArangoDB 3.2 release

$
0
0

The official ArangoDB 3.2 release is almost around the corner. In the meantime, you can play around and test some of the upcoming new features as they come along. The alpha2 version of the upcoming ArangoDB 3.2 is available for testing and can be downloaded here. If you already have ArangoDB installed, please remember to backup your data and run an upgrade after installing the alpha2 release. Note that this version is not suitable for production usage and is supplied only for testing purposes.

Not getting into too much detail yet – one major change in ArangoDB 3.2 is that it will contain two storage engines. The current storage engine based on memory mapped files and a new one backed by RocksDB. This alpha2 release contains some steps towards this goal, as well as independent improvements and previews of new features.

This alpha release (alpha2) features:

  • upgraded to V8 5.7.0.0
  • ArangoDB export tool (Read corresponding blog post for more details)
  • better AQL statistics
  • removal of an explicit read-cache
  • improvements in parallel index loading
  • better stacktraces in Foxx apps to improve debugging experience

We will publish new preliminary releases including more updates as soon as they are stable.

Your comments, feedback and bug reports are very welcome – get in touch via our Community Slack #feedback32 channel.

Download alpha2 of ArangoDB 3.2

Happy testing!

arangoexport – a tool for exporting data from ArangoDB

$
0
0

With the release of the initial alpha of ArangoDB version 3.2 we also include the preview of the new export tool arangoexport. Alpha2 of ArangoDB 3.2 can be downloaded here. An export functionality was initially requested by one of our community members to view an ArangoDB graph view the Cytoscape visualizer.

Arangoexport is capable of exporting a graph or certain collections of a graph to xgmml, Cytoscape’s graph format. But arangoexport is not limited to this. It can also generate JSON or JSONL data exports of arbitrary collections.

For example, to export the collections collectionOne and collectionTwo to JSON just do:

arangoexport --type json --collection collectionOne --collection collectionTwo

If you would like to store the export data in a different directory just use the option:

--output-directory /my/path/to/export

The export type json will produce one big JSON array where every line is one document.

Export type jsonl will output line-wise JSON. So every line contains one document represented as standalone JSON.

JSONL files can be imported with arangoimp:

arangoimp --file "collectionOne.jsonl" --type jsonl --collection collectionOne

But the most anticipated feature is exporting a graph to xgmml. To do so just call arangoexport with type xgmml and a graph name:

arangoexport --type xgmml --graph-name myGraph

You can also specify an unnamed graph by additionally passing all collections that the graph consists of:

arangoexport --type xgmml --graph-name myGraph --collection vertexCollectionOne --collection vertexCollectionTwo --collection edgeCollectionOne

For xgmml there are two more options: one to suppress output of attributes to generate a smaller xgmml file, and the second is to select a different label attribute.

Also visit our documentation for arangoexport or open an issue if your super duper format is missing and should be added.

For questions please visit our Community Slack Channel.

Alpha2 of the upcoming ArangoDB 3.2 release

$
0
0

The official ArangoDB 3.2 release is almost around the corner. In the meantime, you can play around and test some of the upcoming new features as they come along. The alpha2 version of the upcoming ArangoDB 3.2 is available for testing and can be downloaded here. If you already have ArangoDB installed, please remember to backup your data and run an upgrade after installing the alpha2 release. Note that this version is not suitable for production usage and is supplied only for testing purposes.

Not getting into too much detail yet – one major change in ArangoDB 3.2 is that it will contain two storage engines. The current storage engine based on memory mapped files and a new one backed by RocksDB. This alpha2 release contains some steps towards this goal, as well as independent improvements and previews of new features.

This alpha release (alpha2) features:

  • upgraded to V8 5.7.0.0
  • ArangoDB export tool (Read corresponding blog post for more details)
  • better AQL statistics
  • removal of an explicit read-cache
  • improvements in parallel index loading
  • better stacktraces in Foxx apps to improve debugging experience

We will publish new preliminary releases including more updates as soon as they are stable.

Your comments, feedback and bug reports are very welcome – get in touch via our Community Slack #feedback32 channel.

Download alpha2 of ArangoDB 3.2

Happy testing!

arangochair – a tool for listening to changes in ArangoDB

$
0
0

The ArangoDB team gave me an opportunity to write a tutorial about arangochair. Arangochair is the first attempt to listen for changes in the database and execute actions like pushing a document to the client or execute an AQL query. Currently it is limited to single nodes.

This tutorial is loosely based on the example at baslr/arangochair-serversendevents-demo

arangochair is a Node.js module hosted on npm which make it fairly easy to install. Just run
npm install arangochair and its installed.

Now we can write our first lines of code

We set up arangochair to listen for changes on the collection tweets and construct a server send event (SSE) message and sent it to all connected sockets. The SSE consists of two lines per message. The first line is the event and the second line is a stringified linke of JSON.

const changes = new arangochair('http://127.0.0.1:8529'); // ArangoDB node to monitor

changes.subscribe({collection:'tweets'});
changes.start();
changes.on('tweets', (docIn, type) => {
    const doc = JSON.parse(docIn);

    const message = 'event: ' + type + '\ndata: ' + JSON.stringify(doc) + '\n\n';
    for(const sse of sses) {
        sse.write(message);
    }
});

no4.on('error', (err, httpStatus, headers, body) => {
    console.log('on error', err);
    // arangochair stops on errors
    // check last http request
    no4.start();
});

On the client side we use the EventSource interface to listen for events that we send on the server.

First we construct a new EventSource and add two EventListeners to listen for insert/update and delete. Separate events for insert and update are currently not possible but will be part of a future update.

const events = new EventSource('/sse');

events.addEventListener('delete', (e) => {
      const doc = JSON.parse(e.data);
      // do something
}, false);
events.addEventListener('insert/update', (e) => {
      const doc = JSON.parse(e.data);
      // do something
}, false);

Handle socket connections on the server with express:

In this example we use express as our framework to handle api calls. We write a middleware that handles the socket of a client to receive SSEs. If the client connection ends we remove the socket from the array of stored sockets.

app.use( (req, res, next) => {
    if ('/sse' === req.url) {
        sses.push(res);
        res.setHeader('Content-Type', 'text/event-stream');
        res.on('close', () => {
            const idx = sses.indexOf(res);
            if (-1 === idx) return;
            sses.splice(idx, 1);
        });

        res.write('data: initial\n\n');
    } else {
	   next();
    }
});

Why not WebSockets?

Since we want only push data to the client we do not need a duplex connection. Also SSE uses a traditional HTTP connection without a special protocol and reconnects itself on connection loss.

Announcing ArangoDB Online Meetup and the Upcoming Webinar

$
0
0

Today we are glad to announce the start of ArangoDB Online meetup. As our international open-source community is growing with every passing day, we keep getting requests from members around the world on doing a tech meet or a short demo on ArangoDB. Quite a few members have already taken the initiative of presenting at conferences and local meetups – big “thank you” for that! Adding to that effort, it’s high time that we all moved to that one place where we can all connect and everyone has a chance to give/ participate in a talk. And what better way is there to bring us all together than meeting online?

So, we invite you all to join our newly born ArangoDB meetup where you can meet other developers and engineers who are using, testing or are interested in learning more about ArangoDB. We will run a series of online talks presented by our developers on different subjects we think could be interesting. We would also like to welcome everyone from the community who wants to do talks, demos or coding sessions. The space will also be used to announce upcoming ArangoDB webinars, online training courses or spread discounts to conferences we sponsor.

But, we need your help – let us know what you like. Propose topics that excite you, you are passionate and want to learn more about. Everyone willing to share their experience with ArangoDB in a short talk – get in touch and let’s schedule! Your thoughts, ideas and suggestions are welcome on Community Slack #meetup channel or via our contact form.

Modern data modeling: Multi-model approach using ArangoDB

To give our Online meetup a nicer kick-off, we start with announcing an upcoming ArangoDB webinar.

Join Michael Hackstein – Thursday March 30th 6PM CEST/12PM ET/ 9AM PT to learn more about ArangoDB. In this webinar Michael will share insights on the “Multi-Model” movement, talk about three data models of ArangoDB, do hands-on examples with AQL and stress what makes AQL more comprehensible for developers in comparison to SQL. He will also touch Foxx Microservices framework, showing how it works with a few exercises.

Read a full abstract here and make sure to join us!

Thursday March 30th 6PM CEST/12PM ET/9AM PT

Introducing milestone release model and why is it better

$
0
0

When developing ArangoDB, we want to share with you new features as early as possible. For example, the pregel implementation that will become part of ArangoDB 3.2 was ready for testing weeks before the release date of the first beta release and the final release of 3.2.

Therefore we decided to create intermediate releases, called alpha releases, which contain new feature as early as they became stable. The benefits for the community and also for our developers team are:

  • early feedback from the community about new features
  • enough time to integrate improvements based on the feedback
  • longer testing cycles, improved quality
  • more time for external contributes to integrate new features
  • better control about the timetable

That worked well, however the name choice created a lot of confusion. Majority of people associate a feature complete, but untested product with an alpha release. As our intention is to bring features out for testing as soon as they become available, such naming structure does not fully work. Listening to the community feedback and our own experience, we decided to introduce milestone releases. From ArangoDB 3.3 onward we will use the following name convention.

The release model

Milestone releases contain major new features of the next release that can be tested. Not all planned features are available, but milestone releases give you a chance to tests the upcoming features and enhancements. It is not suitable for production usage.

Alpha releases contain the first feature complete version of the next release. We still need to do extensive testing ourselves, but if you like to participate in this testing phase you are more than welcome to join and try-out this versions. It is not suitable for production usage.

Beta releases are tested and we are confident that they do not contain major bugs. These version are still only for testing purposes.

Release candidates (RCs) are the final builds where we are testing the packaging. These are stable and can potentially be used in production, but we recommend to test the installation process itself beforehand.

The final version will be then generally available on the download page.

Structure for ArangoDB 3.2 release

As mentioned above, we have already partially used this model with the current 3.2 builds, but in order to avoid confusion we will proceed with the current naming model for ArangoDB 3.2 expecting a few more alpha releases this time (milestone releases according to the future structure).

Alpha3 of ArangoDB 3.2: Support for Distributed Graph Processing

$
0
0

The next alpha release of the upcoming ArangoDB 3.2 is available for testing. You can download and install alpha3 here.

Moving forward

As ArangoDB 3.2 will include several new features and improvements, we realized that the release model that we currently follow has room for improvement. Going forward we will introduce milestone releases with ArangoDB 3.3. For this major release you will see a bit more alphas. You can read detailed info about the new structure model here.

Pregel computing model

In this alpha we introduce support for incremental graph processing algorithms in a single mode server as well as in the cluster.

Internally we implement the pregel computing model, which will enable us to support arbitrary graph algorithms, which will scale with your data (or with the size of your database cluster).

The pregel computing model was developed at Google and published in “Pregel: A System for Large-scale Graph Processing” by Malewicz et al. in 2010 (full paper).

A directed graph in pregel consists of vertices and edges, where each vertex only knows its outgoing edges. Crucially a vertex is not able to see the state of any other vertex. The core idea then is to imagine each vertex as an independent computer program, which is able to send and receive messages from and to all other vertices.

Pregel Messages

The vertex program can process messages and send messages to other machines. To allow for iterative processing there is a concept of Supersteps. A superstep is an iteration where every worker in a computing cluster can process messages and send messages, but sending a message in one superstep guarantees that it will be received in the next superstep. Similarly all messages processed in a superstep were sent during the previous superstep. This guarantee is called the Global Barrier between supersteps, all workers must wait until every other worker is finished before continuing.

Pregel Supersteps

A pregel execution system then becomes a distributed message passing system, which can easily work in a distributed computing cluster.

Example algorithm

To better understand how it works we will describe the model with an example algorithm. The algorithm “Single-Source Shortest Paths” or SSSP calculates the shortest path distance of every vertex from a single source vertex.

The red lines in the image below represent global superstep barriers, the graph is displayed after each superstep. Every superstep a vertex will send its current distance value plus the distance value on each edge to outgoing neighbor vertices.

  1. In superstep 1. the source vertex with value 0 will send the message “1” to his neighbour.
  2. In superstep 2. the second vertex processes the message “1” and updates his local value to “1” (because 1 < ∞). Then he sends the message “4” to his neighbour connected by an edge with value 3.
  3. The third (rightmost) vertex receives the message “4” and updates its local value to “4” (because 4 < ∞).

Afterwards there are no more messages to send, therefore the algorithm ends.

ExampleSSSP (Pregel)

Run a Pregel Algorithm

The first step to running a pregel algorithm is to import a graph, our graph will be named “demo”, and you can just copy and paste it in arangosh.

var graphName = "demo";
var vColl = "demo_v", eColl = "demo_e";
var graph = graph_module._create(graphName);
db._create(vColl, {numberOfShards: 4});
graph._addVertexCollection(vColl);
db._createEdgeCollection(eColl, {
                          numberOfShards: 4,
                          replicationFactor: 1,
                          shardKeys:["vertex"],
                          distributeShardsLike:vColl});

var rel = graph_module._relation(eColl, [vColl], [vColl]);
graph._extendEdgeDefinitions(rel);

var vertices = db[vColl];
var edges = db[eColl];

var A = vertices.insert({_key:'A'})._id;
var B = vertices.insert({_key:'B'})._id;
var C = vertices.insert({_key:'C'})._id;
var D = vertices.insert({_key:'D'})._id;
var E = vertices.insert({_key:'E'})._id;
var F = vertices.insert({_key:'F'})._id;
var G = vertices.insert({_key:'G'})._id;
var H = vertices.insert({_key:'H'})._id;
var I = vertices.insert({_key:'I'})._id;
var J = vertices.insert({_key:'J'})._id;
var K = vertices.insert({_key:'K'})._id;

edges.insert({_from:B, _to:C, vertex:'B'});
edges.insert({_from:C, _to:B, vertex:'C'});
edges.insert({_from:D, _to:A, vertex:'D'});
edges.insert({_from:D, _to:B, vertex:'D'});
edges.insert({_from:E, _to:B, vertex:'E'});
edges.insert({_from:E, _to:D, vertex:'E'});
edges.insert({_from:E, _to:F, vertex:'E'});
edges.insert({_from:F, _to:B, vertex:'F'});
edges.insert({_from:F, _to:E, vertex:'F'});
edges.insert({_from:G, _to:B, vertex:'G'});
edges.insert({_from:G, _to:E, vertex:'G'});
edges.insert({_from:H, _to:B, vertex:'H'});
edges.insert({_from:H, _to:E, vertex:'H'});
edges.insert({_from:I, _to:B, vertex:'I'});
edges.insert({_from:I, _to:E, vertex:'I'});
edges.insert({_from:J, _to:E, vertex:'J'});
edges.insert({_from:K, _to:E, vertex:'K'});

Then you can start for example PageRank on the “demo” graph:

var pregel = require("@arangodb/pregel");
var handle = pregel.start("pagerank", "graphname", {maxGSS: 25})
pregel.status(handle);
// Results can be seen
db.demo_v.all().toArray();

Or shortest paths:

handle = pregel.start("sssp", "demograph", {source: "vertices/V"});
pregel.status(handle);

Supported Algorithms

In the beginning we will support a number of well-known graph algorithms:

If you are interested in adding your own algorithms have a look at the source. For more info check the documentation.

Your comments, feedback and bug reports are very welcome – get in touch via our Community Slack #feedback32 channel.

Download alpha3 of ArangoDB 3.2


ArangoDB Selected as Finalist for Red Herring Top 100 in Europe

$
0
0

“ArangoDB shows great promise and therefore deserves to be among the finalists.”

ArangoDB, the native multi-model database, announced today that it has been selected as a finalist for
Red Herring Top 100 award for the European business region. The nomination underlines the great success of the company and the growing momentum behind the multi-model movement it is heralding.

“The whole team is proud to be nominated for such a prestigious award,” says Claudius Weinberger, CEO of ArangoDB. “Looking back a few years, it was hard work to win people for our multi-model approach and unique query language. Today startups and enterprises enjoy the flexibility our database provides and use it to realize their brightest ideas.”

Red Herring Awards

Since 1996, Red Herring keeps tabs on the up and comers. Red Herring editors were among the first to recognize that companies such as Facebook, Twitter, Google, Yahoo, Skype, Salesforce.com, YouTube, and eBay would change the way we live and work.

This unique assessment of potential is complemented by a review of the track record and standing of a company, which allows Red Herring to see past the “buzz” and make the list a valuable instrument for discovering and advocating the greatest business opportunities in the industry.

“This year was rewarding, beyond all expectations” said Alex Vieux, publisher and CEO of Red Herring. “There are many great companies generating really innovative and disruptive products in Europe. We had a very difficult time narrowing the pool and selecting the finalists. ArangoDB shows great promise and therefore deserves to be among the finalists.”

Finalists for the 2017 edition of the Red Herring 100 Europe award are selected based upon their technological innovation, management strength, market size, investor record, customer acquisition, and financial health. During the months leading up to the announcement, Red Herring reviewed over 1200 companies in the telecommunications, security, cloud, software, hardware, biotech, mobile and other industries that completed their submissions to qualify for the award.

Webinar: The native multi-model approach and its benefits for developers, architects and DevOps

$
0
0

Tuesday, May 16th (6PM CEST/12PM ET/ 9AM PT)Join the webinar here.

This webinar gives a general overview of the multi-model database movement, in particular we will discuss its main advantages and technological benefits from an architect and devops perspective.

Since the first relational databases were invented the needs of companies and the technological possibilities have changed completely. Luca Olivari (recently announced President of ArangoDB) will deep dive into current trends in the database world, how native multi-model databases help companies of all sizes, and walk you through use cases where ArangoDB is beneficial. He will share decades of experience in the field and views on ever-changing needs of developers, companies and customers in the modern times.

webinar arangodb

In particular we will discuss:

  • Quick history of database evolvement since the 70s
  • Shortcomings of relational and the dawn of NoSQL databases
  • What do developers need and companies want?
  • Native Multi-model characteristics
  • One Core. One Query Language. Three Data Models
  • Open Source Advantages
  • Customers and their Use Cases
  • Q&A

Join Luca on Tuesday, May 16th (6PM CEST/12PM ET/ 9AM PT) to understand how ArangoDB, as a native multi-model database can benefit everyone.

Join the webinar

ArangoDB Promotes Luca Olivari to President

$
0
0

Company creates new role to accelerate global expansion

ArangoDB, the company behind the leading native multi-model database, today announced that Luca Olivari has formally joined as President. Luca will work with the founders, board of directors and advisory council to accelerate global expansion.

“It has been a privilege to work with the founders and the rest of the team for a few months as an Advisor and witness first hand how ArangoDB helps enterprise customers to unwind complexity and improve developer productivity,” said Olivari. “As President, I will be focused on building our global customer facing functions, a productive ecosystem and a committed community.”
Luca Olivari President ArangoDB
Luca Olivari is a software executive and advisor with almost two decades of experience. He has held several senior leadership positions at public as well as venture-backed, high-growth startups. Before running the international Business Development and Strategy at MongoDB, he led the MySQL Sales Consulting across EMEA at Oracle. Prior to joining ArangoDB as the next big thing in the database world, Luca served as Chief Data Officer at ContactLab, a leading engagement marketing cloud provider.

As part of the global acceleration, Arun Kumar Dubey, former Sales Executive at Oracle and Corporate Sales Director at OrientDB, will also join the company. Further experienced hires will be announced shortly.

“Luca has been an ideal advisor over the past few months and has been instrumental in helping us establishing firm plans to become a challenger to legacy relational and single-model databases,” said Claudius Weinberger, ArangoDB CEO. “I look forward to working with Luca and Arun as we execute on our vision of simplifying complexity and providing a new breed of database to developers with ambitions projects.”

The last year has brought tremendous progress for ArangoDB, with the company extending its product capabilities, increasing its global community, serving a number of Fortune 500 companies and increasing sales and support resources in key regions.

Luca Olivari will host a webinar on May 16th 2017 – “The native multi-model approach and its benefits for developers, architects and DevOps”. Register here.
 

ArangoDB wins Red Herring Top 100 Award

$
0
0

Jury selected ArangoDB out of 1.200 promising companies in Europe and confirms growing importance of native multi-model databases.

ArangoDB, the native multi-model database, announced today it was selected as a winner for
Red Herring Top 100 award for the European business region.

Red Herring Top 100 Europe enlists outstanding entrepreneurs and promising companies. Winners are selected from approximately 1,200 privately financed companies each year in the European Region. Since 1996, Red Herring has kept tabs on these up-and-comers. Red Herring’s editors were among the first to recognize that companies such as Alibaba, Facebook, Google, Kakao, Skype, SuperCell, Spotify, Twitter, and YouTube would change the way we live and work.

“It was exciting to see so many great ideas and teams from across Europe gathering in Amsterdam for Red Herring Award 2017,” said ArangoDB CEO Claudius Weinberger. “Developers, architects and CIOs around the globe have many options when it comes to databases. With ArangoDB they have a solid high performance technology for multiple purposes to simplify their tech stack. We are pleased that our vision got recognized by such a prestigious award. Now it’s about exceeding the rising expectations.”

Red Herring’s jury evaluated companies on both quantitative and qualitative criteria, such as financial performance, technological innovation and intellectual property, DNA of the founders, business model, customer footprint and addressable market. A review of the track record and standing of startups relative to their sector peers, allowing Red Herring to see past the “buzz” and making the list a valuable instrument of discovery and advocacy for the most promising new business models in Europe, complement this assessment of potential.

ArangoDB was chosen for its innovative and highly flexible multi model approach and for its massive growth on the business and community side over the past year. “In 2017, selecting the top achievers was by no means a small feat,” said Alex Vieux, publisher and CEO of Red Herring. “In fact, we had the toughest time in years because so many entrepreneurs had crossed significant milestones so early in the European tech ecosystem. We believe ArangoDB embodies the vision, drive and innovation that define a successful entrepreneurial venture. ArangoDB should be proud of its accomplishment, as the competition was very strong.”

RocksDB Integration in ArangoDB – FAQs

$
0
0

The new release of ArangoDB 3.2 is just around the corner and will include some major improvements like distributed graph processing with Pregel or a powerful export tool. But most importantly we integrated Facebook’s RocksDB as the first pluggable storage engine in ArangoDB. With RocksDB you will be able to use as much data in ArangoDB as fits on your disc.

As this is an important change and many questions reached us from the community we wanted to share some answers on the most common questions. Please find them below

Will I be able to go beyond the limit of RAM?

Yes. By defining RocksDB as your storage engine you will be able to work with as much data as fits on your disc.

What is the locking behaviour with RocksDB in ArangoDB?

With RocksDB as your storage engine locking is on document level on writes and no locking on reads. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.

… when you say “Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code”, does it mean that the behavior will differ from currently? Won’t writes try to acquire a lock on the document first?

Yes, it does mean the behavior will differ from currently. The current (MMAP files) engine has collection-level locks so write-write conflicts are not possible. The RocksDB engine has document-level locks so write-write conflicts are possible.

Consider the following example of two transactions T1 and T2 both trying to write a document in collection “c”.

In the old (MMFiles) engine, these transactions would be serialized, e.g.

T1 begins
T1 writes document “a” in collection “c”
T1 commits
T2 begins
T2 writes document “a” in collection “c”
T2 commits

so no write conflicts here.

In the RocksDB engine, the transactions can run in parallel, but as they modify the same document, it needs to be locked to prevent lost updates. The following scheduling will cause a write-write conflict:

T1 begins
T2 begins
T1 writes document “a” in collection “c”
T2 writes document “a” in collection “c”

here one of the transactions (T2) will abort to prevent an unnoticed lost update. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.

When using RocksDB as a storage engine will I need a fast disc/SSD if an index is disc based?

It will be beneficial to use fast storage. This is true for the memory-mapped files storage engine as well as for the RocksDB-based storage engine.

Will I be able to choose how different collections are stored, or will it be a per-database choice?

It is a per server / cluster choice. It is not possible yet to mix modes or to use different storage engines in the same arangodb instance or cluster.

Can I switch from RocksDB to memory-mapped files with a collection or a database?

It is a per server / cluster choice. The choice must be made before the first server start. The first server start will store the storage engine selection in a file on disk, and this file is validated on all restarts. If the storage engine must be changed after the initial change, data from the ArangoDB instance can be dumped with arangodump, and then arangodb can be restarted with an empty database directory and a different storage engine. The data produced by arangodump can then be loaded into arangod with arangorestore.

Do indexes always store on disk now? Or only persisted type of index?

If you choose RocksDB as your storage engine all indexes will be persisted on disc.

I’m using Microsoft Azure where virtual machines have very fast local SSD disks that are unfortunately “temporary” (meaning they may not survive a reboot), compared to slower but persistent network-attached disks (that can be SSD as well). Would there be any way to leverage the local disk? I’m thinking about something like, using the local disk for fast queries but having the data persisted to the network-attached disk?

RocksDB in general allows specifying different data directories for the different levels of the database. Data on lower levels in newer data, so it would in general be possible to write low-level data to SSD first and have RocksDB move it to slower HDD or network-attached disks when it is moved to higher levels. Note that this is an option that RocksDB offers but that ArangoDB does not yet exploit. In general we don’t think the “read from fast SSD vs. read from slow disks” can be made on a per query-basis, because a query may touch arbitrary data. But recent data or data that is accessed often will likely sit in RocksDB’s in-memory block cache anyway.

 

If you like to dig a bit deeper into our upgrades in 3.2 please find more infos here in our release notes. If you like to take the latest technical preview including RocksDB for a spin you can download ArangoDB 3.2alpha4.

We hope to have covered all important questions. Please let us know if we missed something important via hackers@arangodb.com.

Viewing all 391 articles
Browse latest View live