Quantcast
Channel: ArangoDB
Viewing all 391 articles
Browse latest View live

From Zero to Advanced Graph Query Knowledge with ArangoDB

$
0
0

Thinking about your data as a highly connected set of information is a powerful way to gain insights, solve problems and bring products faster into the hands of your users.

Unlike other databases, relationships take the first priority in graph databases and with ArangoDBs multi-model approach for graphs, documents and key/value pairs you can even switch between models or combine them in a single query.

The graph concept is booming but still new to many. So we invested a few bazillion coffees and some night shifts to come up with a good plan for a Graph Course:

We also searched for an easy to understand real world dataset and found the US domestic flights data.

We put some thought into presenting the graph concept in a clear way…

… and guide you through the basics of our multi-model query language AQL (ArangoDB Query Language).

To spice things up a bit we thought “Why not take people all the way to advanced query techniques?”. So we added a step-by-step guide through the more advanced topics.

We hope that you will get as excited about the graph concept as we are and please let us know your feedback via learn@arangodb.com.

Get the full graph course


ArangoDB Hires Seasoned Software Executives to Run Americas

$
0
0

Michael Guerra, former MariaDB, Basho and VMWare leader and Ramona Rolando from Oracle join to foster US growth

ArangoDB, the company behind the leading native multi-model database, winner of 2017 Red Herring Top 100 Europe, today announced that Michael Louis Guerra has formally joined as Head of Sales Americas. Mike’s responsibilities will include building a high-performance sales team and a network of partners across the region.

Michael Guerra ArangoDBMike has over 20 years of experience at public, private, as well as venture-backed startups with a particular focus on open source technologies. He’s an accomplished software sales executive with strong leadership, team building, and business development skills. Most recently, Mike was responsible for growing the Western Territory at MariaDB and Basho Technologies. He helped several companies over two decades leading to acquisitions from Morgan Stanley, Yahoo, and VMware.

Ramona Rolando, a seasoned Sales Leader from Oracle and Honeywell, also joins the team to run Inside Sales in the region. Ramona Rolando ArangoDB

“Americas is the largest market for ArangoDB with customers massively adopting our product to simplify their stack and improve developers productivity,” said Luca Olivari, President. “Mike’s experience building from the ground up sales and channels will help to accelerate the upward trajectory in the region and we’re honored to have him and Ramona onboard.”

“This is an exciting time to join ArangoDB as the native multi-model vision is picking up steam and some of the largest US companies are adopting the technology,” said Michael Guerra. “ArangoDB helps customers to build more things with fewer things, thus simplifying their stack, in the process resulting in immediate ROI and reduced time to market.”

The last year has brought tremendous progress for ArangoDB, with the company extending its product capabilities, growing its global community, serving a number of Fortune 500 companies and increasing sales and support resources in key regions.

Meet the Team

ArangoDB is sponsoring Graph Day SF on June 17th. Mike, Claudius Weinberger, CEO and Max Neunhöffer, Architect will be there. If you’d like to meet our team members, get in touch via the “contact us” form.

ArangoDB 3.2 beta release: Pluggable Storage Engine with RocksDB, Distributed Graph Processing and a ClusterFoxx

$
0
0

We’re excited to release today the beta of ArangoDB 3.2. It’s feature rich, well tested and hopefully plenty of fun for all of you. Keen to take it for a spin? Get ArangoDB 3.2 beta here.

With ArangoDB 3.2, we’re introducing the long-awaited pluggable storage engine and its first new citizen, RocksDB from Facebook

  • RocksDB: You can now use as much data in ArangoDB as you can fit on your disk. Plus, you can enjoy performance boosts on writes by having only document-level locks (more info below).
  • Pregel: Furthermore, we implemented distributed graph processing with Pregel to discover hidden patterns, identify communities and perform in-depth analytics of large graph data sets.
  • ClusterFoxx: Another important upgrade is what we internally and playfully call the ClusterFoxx. The Foxx management internals have been rewritten from the ground up to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialised even when all existing coordinators are unavailable.
  • Enterprise: Working with some of our largest customers, we’ve added further security and scalability features to ArangoDB Enterprise like LDAP integration, Encryption at Rest, and the brand new Satellite Collections.

The goal of the whole ArangoDB 3 release cycle has been to scale the multi-model idea to new heights. Getting a ‘ready’ for large scale applications is not done overnight and it’s definitely not possible without the help of a strong community. We’d like to invite all of you to lend us a helping hand to make ArangoDB 3.2 the best release ever. Please push this beta to its limits: test it for your use cases and compare the performance of the new features like RocksDB. Let us know on Github any bug that you find. Don’t worry about hurting our feelings: we want to fix any problems.

Join the Beta Bug Hunt Challenge and win a $200 Amazon Gift Card as first prize. You can find more details about this reward program at the end of this post.

New Storage Engine RocksDB

ArangoDB now comes with two storage engines: mmfiles and RocksDB. If you want to compare the engines, you can use arangodump to export data from either engine and arangorestore to import into the other. MMFILES are generally well suited for use-cases that fit into main memory, while RocksDB allows larger than memory work-sets.

RocksDB has plenty of configuration options; we have selected the general purpose options. Please let us know how it works for your use case so that we can further optimize the implementation. Also notice that we do many tests under Linux, Windows and macOS. However, we optimize for Linux. Any feedback regarding other operating systems is very welcome. Check out the step by step guide to compare both storage engines for your use case and OS!

Benefits of RocksDB Storage Engine:

  • Document-level locks: performance boost for write intensive applications. Writes don’t block reads, and reads don’t block writes
  • Support for large datasets: go beyond the limit of main memory and stay performant
  • Persistent indexes: faster index build after restart

Things to consider before switching to RocksDB

  • RocksDB allows concurrent writes: Write conflicts can be raised. Applications switching from MMFiles must be prepared for exceptions
  • Transaction Limit in RocksDB: The total size of transactions is limited in RocksDB. Modifying statements on large amounts of documents have to commit in between — with AQL this is done by default.
  • Engine Selection on Server/Cluster Level: It’s not possible to mix both storage engines within a single instance or cluster installation. Transaction handling and write ahead log formats are different.

Find all important details about RocksDB in the storage engine documentation, as well as answers to common questions about our RocksDB integration in the FAQ.

Please note that ArangoDB 3.2 beta is fully tested, but not yet fully optimized (known-issues RocksDB). If you find something that is much slower with RocksDB compared to your current queries with the MMFiles engine, please create a github ticket. Please check the comparison guide here.

New Distributed Graph Processing

With the new implementation of distributed graph processing, you are now able to analyze even very large graph data sets as a whole. Internally, we implemented the pregel computing model to enable ArangoDB to support arbitrary graph algorithms, which will scale with your data — or with the size of your database cluster.

Pregel Messages

You can already use a number of well-known graph algorithms:

  • PageRank
  • Weakly Connected Components
  • Strongly Connected Components
  • HITS (hubs and authorities)
  • Single-Source Shortest Path
  • Community Detection via Label Propagation
  • Vertex Centrality measures
    • Closeness Centrality via Effective Closeness
    • Betweenness Centrality via LineRank

By using these new capabilities, you are now able, for example, to detect communities within your graph, shard your data according to these communities and leverage ArangoDB SmartGraphs to perform highly efficient graph traversals even in a cluster setup.

New ClusterFoxx

Managing your JavaScript microservices is now easier and more reliable than ever before. The Foxx management internals have been rewritten to make sure multi-coordinator cluster setups always keep their services in sync and new coordinators are fully initialised even when all existing coordinators are unavailable.

Additionally, the new fully documented REST API for managing Foxx services enables you to install, upgrade and configure your services using your existing devops processes. And if your service only consists of a single JavaScript file, you can now forego the manifest and upload that file directly, instead of creating a full bundle.

Further useful new features included in ArangoDB 3.2 beta

  • geo_cursor: Get documents sorted by distance to a certain point in space. You can also apply filters and limits to geo_cursor.
  • arangoexport: Export data as JSON, JSONL and even graphs as XGMML for visualisation in Cytoscape. You can find details in the Alpha2 release post.

Download ArangoDB 3.2 beta Community

New Enterprise Edition Features in 3.2

The Enterprise Edition of ArangoDB is focused to solve enterprise-scale problems and meet high security standards. In version 3.1, we introduced SmartGraphs to bring fast traversal times to sharded graph datasets. With SatelliteCollections, we enable the same performance boost to join operations at scale.

SatelliteCollections

With SatelliteCollections, we enable the same performance boost to join operations at scale. From genome-sequencing projects to massive online games and beyond, we see the need for join operations including sharded collections and sub-second response times.

With SatelliteCollections, you can define collections to shard to a cluster and collections to replicate to each machine. The ArangoDB query optimizer knows where each shard is located and sends the requests to the DBServers involved, which then executes the query, locally. With this approach, network hops during join operations on sharded collections can be avoided and response times can be close to that of a single instance.

In the example below, collection C is large and sharded to multiple machines while the smaller satellites (S1-S5) are replicated to each machine.

We are super excited to see what you will create with this new feature and welcome any feedback you can provide. The Enterprise Edition of ArangoDB is forever free for evaluation. So feel free to take it for a spin.

Encryption at Rest

With RocksDB, you can encrypt the data stored on disk using a highly secure AES algorithm. With this upgrade, ArangoDB takes another big step towards HIPPA compliance. Even if someone steals one of your discs, they won’t be able to access the data.

Enhanced Authorisation with LDAP

Normally, users are defined and managed in ArangoDB itself. With LDAP, you can use an external server to manage your users. We have implemented a common schema which can be extended. If you have special requirements that do not fit into this schema, please let us know (#feedback32 channel). A general note: The final release will also support read-only users. With this beta, only read/write users are supported.

Download ArangoDB 3.2 beta Enterprise
 

Bug Hunt Competition

We’d love to invite all of you to try ArangoDB 3.2 beta and report all bugs you can find — we’re hoping there won’t be any, but there always are some. Everyone reporting bugs for 3.2 beta on Github will take part in the Beta Bug Hunt Competition. All Github issues count for version 3.2 beta which are marked with bug by the ArangoDB team. The hunter with the most reported bugs wins. The first three runner-ups will receive an honorable mention in the Bug Hunt Challenge post.

How the Bug Hunt Competition works:

  • Duration: from June 13th until June 27th
  • Bugs: Any Github issue for 3.2 beta release which is marked as bug by ArangoDB team
  • Who can participate: everyone
  • First Prize: a $200 Amazon Gift Card + ArangoDB Swag Package
  • Second Prize: $100 Amazon Gift Card + ArangoDB Swag Package
  • Runner-Up: ArangoDB Swag Package

All winners will receive an honorable mention in the bug hunt blogpost and tweets after the challenge. Please note that our team has to be able to reproduce the bug.

Therefore good bug reports

  1. Have only necessary infos included
  2. Provide a step by step description to reproduce the bug
  3. Provide demo data via e.g. gist (if necessary)

We hope you will enjoy this new release – “Gute Jagd!”

Legal

In connection with your participation in this program you agree to comply with all applicable local and national laws. You must not disrupt or compromise any data that is not your own.

ArangoDB reserves the right to discontinue this program or change or modify its terms at any time. The ultimate decision over an award — whether to give one and in what amount — is entirely at ArangoDB’s discretion.
You are exclusively responsible for paying any taxes on any reward you receive.

Vulnerabilities obtained by exploiting ArangoDB users or instances on the Internet are not eligible for an award and will result in immediate disqualification from the program.

A Reminder on Database Security

$
0
0

With billions of objects and people connected to the internet and your precious data sometimes exposed publicly security is one of the most important topics to discuss. In light of recent ransomware attacks, databases exposed and other breaches we’d like to share a quick reminder on how to secure your ArangoDB environment.

Attacks can be prevented with the security protections built into the product. We strive to prevent possible security issues by giving appropriate reminders in our web console when authentication is disabled:

That said, control is in the hand of users and you need to use database features correctly, and our documentation will help you do so. Here are pointers to the relevant documentation and other useful resources:

If you need help to secure your installation please do reach out to us. We’d be happy to run a security assessment and help to configure ArangoDB correctly.

Updated GraphQL Sync in ArangoDB 3.2

$
0
0

Just in time for the upcoming 3.2.0 release, we have updated the graphql-sync module for compatibility with graphql-js versions 0.7.2, 0.8.2, 0.9.6 and 0.10.1. The graphql-sync module allows developers to implement GraphQL backends and schemas in strictly synchronous JavaScript environments like the ArangoDB Foxx framework by providing a thin wrapper around the official GraphQL implementation for JavaScript.

As a long-term database solution, ArangoDB is committed to API stability and avoids upgrades to third-party dependencies that would result in breaking changes. This means ArangoDB will continue to bundle the graphql-js 0.6.2 compatibility version of graphql-sync.

However in order to allow developers to keep up with the rapid development of the GraphQL language and reference implementation, starting with ArangoDB 3.2, services that bring their own version of graphql-sync can now still benefit from the built-in Foxx GraphQL integration.

Simply pass the imported module as the new *graphql* option when creating the router:

```js
const graphql = require('graphql-sync');
const createGraphQLRouter = require('@arangodb/foxx/graphql');
const graphqlSchema = new graphql.Schema({
//...
});
module.context.use(createGraphQLRouter({
schema: graphqlSchema,
graphiql: true,
graphql: graphql
}));

If you want to try out Foxx with the latest release of graphql-sync, grab the ArangoDB 3.2 beta and make sure to include the node_modules/graphql-sync folder in your service bundle.

ArangoDB Finalizes 4.2 Million Euro Investment Led by Target Partners

$
0
0

Funding to accelerate and strengthen the company’s US-based presence

ArangoDB, the company behind one of the fastest growing next generation databases, closed the final tranche of a 4.2 million Euro investment led by Munich-based venture capital firm Target Partners.

The company is developing the open-source NoSQL database ArangoDB, which combines three major data models (graph, key/value, JSON-documents) in one database and with one query language. ArangoDB allows startups and enterprises to speed up innovation cycles, simplify technology stacks and increase on-time and on-budget delivery of software projects.

Claudius Weinberger ArangoDBClaudius Weinberger, Co-Founder and CEO, declared: “We’ve found in Target Partners a supportive and knowledgeable investor. After reaching important business milestones and expanding our customer footprint globally, we will use this funding to further accelerate and strengthen our US-based presence.”

ArangoDB is on an upward growth trajectory. The company recently hired key executives from the likes of MongoDB, MariaDB, and Oracle, further enhancing its customer facing functions. Talented engineers joined from Dell/EMC research lab to extend product leadership.

Kurt MuellerKurt Müller, Partner at Target Partners said: “Our investment in ArangoDB has proved to be very successful. After gaining traction on the product side, the company aggressively moved to sign landmark deals with some of the largest global organizations. This motivated us to further invest in the team and the growth.”

ArangoDB has an enthusiastic global community with over 2 million downloads so far and already close to 4,000 github stargazers. Over 3,000 organizations, including Fortune 50 companies, currently use ArangoDB worldwide for various use-cases including e-commerce, Internet of things, analytics, fraud detection, personalization, and genomic research. With its newly released Enterprise Edition, the company enables enterprises to scale deployments, including high-performance graph cases.

The new Satellite Collections Feature of ArangoDB

$
0
0

With the new Version 3.2 we have introduced a new feature called “Satellite Collections”. This post explains what this is all about, how it can help you, and explains a concrete use case for which it is essential.

Background and Overview

Join operations are very useful but can be troublesome in a distributed database. This is because quite often, a join operation has to bring together different pieces of your data that reside on different machines. This leads to cluster internal communication and can easily ruin query performance. As in many contexts nowadays, data locality is very important to avoid such headaches. There is no silver bullet, because there will be many cases in which one cannot do much to improve data locality.

One particular case in which one can achieve something, is if you need a join operation between a very large collection (sharded across your cluster) and a small one, because then one can afford to replicate the small collection to every server, and all join operations can be executed without network communications.


This is what the Satellite Collections do. In ArangoDB, a Satellite Collection is one that has exactly one shard which is automatically replicated synchronously to every DBserver in the cluster. Furthermore, the AQL query optimizer can recognize join operations with other sharded collections and organize the query execution such that it takes advantage of the replicated data and avoid a lot of costly network traffic.

The name “Satellite Collection” comes from the mental image of a smallish collection (the satellite) which orbits around all the shards of a larger collection (the bigger planets), always close at hand for a next join operation, replacing interplanetary traffic with a quick surface orbit shuttle.

A concrete use case

Imagine that you have an IoT use case with lots of sensor data from a variety of devices. The set of individual events is clearly too large for a single server, but the number of devices is small enough to replicate to each server. If you now need to select certain events, usually restricted by their timestamp, but also filtered according to some condition on the device which produced the event, you have to bring together the data of an event with the data of its device.

One approach is to embed the device data with every event, thereby avoiding any join operation. This needs a lot more storage in the first place, but it also has other disadvantages. First of all, it contradicts the principle of data normalization, because device data is stored redundantly. Secondly (and of course related), there is no longer a single place holding all device data, so it is hard to get an overview over all devices or something like this.

The other sensible approach is to keep the device data separately, say in a smaller collection, and join an event to its device via a foreign key. This is the usual relational approach, which makes a lot of sense.

However, it requires a join operation for a query like this:

FOR e IN events
      FILTER e.timestamp >= "123123123223" && e.timestamp <= "125212323232"
      FOR d IN devices
        FILTER e.device == d._key
        FILTER d.firmwareVersion == "17"
        return e

Obviously, we assume that the large events collection has a sorted index on the timestamp attribute.

This fetches all events in a prescribed time range whose corresponding device has firmware version 17, which is a query that could very well occur in practice. There are lots of similar use cases, the crucial ingredient is that a join between a large and a small collection is needed.

Performance gain

Let’s see what the query engine has to do to execute the above query.

It almost certainly wants to use the index on timestamp, so, on each shard of the events collection it uses this index to find the corresponding events. For each such event found it now has to find the corresponding device to do the filtering on the firmwareVersion attribute.

In the non-satellite case the query engine has to take all found events, move them to the place where the small devices collection is held, then perform a lookup for each such event, and then either discard the event or put it into the result set, if the firmware version matches.
ArangoDB will even collect all results from all shards on the coordinator and then send them on to the DBserver that holds the devices. The result set will be sent back to the coordinator and then to the client.

Let’s analyze this with a few concrete numbers. Assume that there is billion events and a million of them lie in the time range in which we are interested. Assume furthermore, that about one in 1000 devices has the particular firmware and that the events are evenly distributed between the devices.

Then the above query will find a million out of a billion, distributed across the shards of the events collection. Altogether, it will send a million events across the network to the coordinator (each from one of the shards), and then send all of them to the DBserver holding the devices. There it will perform a million lookups locally and discard 999 out of 1000 devices, producing 1000 results, which are sent back to the coordinator. Altogether, we have sent a million events two times over the wire, this is expensive.

If the devices collection is a Satellite Collection, then the data is resident on each server, that is, close to each shard of the events collection. Therefore, the query optimizer can select the events in the time range in question on each shard, and directly perform the lookup for the device, without any network traffic. It discards most of them because of the firmware version and only has to send the 1000 results back to the coordinator and then to the client. It could even perform this in a completely parallel fashion, but this is a future performance enhancement.

That is, the net savings effect, in this case, is not to have to send a million events across the wire twice. If you assume that an event uses 500 bytes, then this is network transfer of approximately 1GB. On a fast ethernet this takes approximately 100s. On gigabit ethernet it is only 8s under optimal conditions, but this is still noticeable.

Our experiments actually confirm these savings in network traffic and time. In the coming weeks we will publish a benchmark including all the data used and measurements done, so watch out for more details.

How to try it out

This is fairly easy. First of all, you have to know that the Satellite Collections are a feature of the enterprise version. So once you have a cluster running that (we offer a free trial license), you simply create your collections as usual, create the smaller one with "numberOfShards":1 and "replicationFactor":"satellite". From then on, your join operations should be faster, provided the query optimizer discovers the situation and acts accordingly. Use the explain feature in the UI or in JavaScript to check the execution plan used. If in doubt, we are always happy to help diagnosing what is going on.

Obviously, a collection that is replicated synchronously to all DBservers has a larger latency on writes. However, we would assume that such a Satellite Collection has a smaller write bandwidth than the larger events collection.

Let me close with another word of warning: Satellite Collections are a nice and very useful feature, but they need planning to unfold their full potential. Please contact us, if you have a particular use-case in mind.

ArangoDB 3.2 GA RocksDB, Pregel, Fault-Tolerant Foxx & Satellite Collections

$
0
0

We are pleased to announce the release of ArangoDB 3.2. Get it here. After an unusually long hackathon, we eliminated two large roadblocks, added a long overdue feature and integrated an interesting new one into this release. Furthermore, we’re proud to report that we increased performance of ArangoDB on average by 35%, while at the same time reduced the memory footprint compared to version 3.1. In combination with a greatly improved cluster management, we think ArangoDB 3.2 is by far our best work. (see release notes for more details)

One key goal of ArangoDB has always been to provide a rock solid platform for building ideas. Our users should always feel safe to try new things with minimal effort by relying on ArangoDB. Todays 3.2 release is an important milestone towards this goal. We’re excited to release such an outstanding product today.

RocksDB

With the integration of Facebook’s RocksDB, as a first pluggable storage engine in our architecture, users can now work with as much data as fits on disk. Together with the better locking behavior of RocksDB (i.e., document-level locks), write intensive applications will see significant performance improvements. With no memory limit and only document-level locks, we have eliminated two roadblocks for many users. If one chooses RocksDB as the storage engine, everything, including indexes will persist on disk. This will significantly reduce start-up time.
See this how-to on “Comparing new RocksDB and mmfiles engine” to test the new engine for your operating system and use case.

Pregel

Distributed graph processing was a missing feature in ArangoDB’s graph toolbox. We’re willing to admit that, especially since we managed to fill this need by implementing the Pregel computing model.

With PageRank, Community Detection, Vertex Centrality Measures and further algorithms, ArangoDB can now be used to gain high-level insights into the hidden characteristics of graphs. For instance, you can use graph processing capabilities to detect communities. You can then use the results to shard your data efficiently to a cluster and thereby enable SmartGraph usage to its full potential. We’re confident that with the integration of distributed graph processing, users will now have one of the most complete graph toolsets available in a single database.

Test the new pregel integration with this Community Detection Tutorial and further sharpen advanced graph skills with this new tutorial about Using SmartGraphs in ArangoDB.

Fault-Tolerant Foxx Services

Many people already enjoy using our Foxx JavaScript framework for data-centric microservices. Defining your own highly configurable HTTP routes with full access to the ArangoDB core on the C++ level can be pretty handy. In version 3.2, our Foxx team completely rewrote the management internals to support fault-tolerant Foxx services. This ensures multi-coordinator clusters will always keep their services in sync, and new coordinators are fully initialized, even when all existing coordinators are unavailable.

Test the new fault-tolerant Foxx yourself or learn Foxx by following the brand new Foxx tutorial.

Powerful Graph Visualization

Managing and processing graph data may not be enough, causing visualizing insights to be important. No worries. With ArangoDB 3.2, this can be handled easily. You can use the open-source option via arangoexport to export the data and then import it into Cytoscape (check out the tutorial).

Or you can just plug in the brand new Keylines 3.5 via Foxx and install an on-demand connection. With this option, you will always have the latest data visualized neatly in Keylines without any export/import hassle. Just follow this tutorial to get started with ArangoDB and Keylines.

Read-Only Users

To enhance basic user management in ArangoDB, we added Read-Only Users. The rights of these users can be defined on database and collection levels. On the database level, users can be given administrator rights, read access or denied access. On the collection level, within a database, users can be given read/write, read only or denied access. If a user is not given access to a database or a collection, the databases and collections won’t be shown to that user. Take the tutorial about new User Management.

We also improved geo queries since this is becoming more important to our community. With geo_cursor, it’s now possible to sort documents by distance to a certain point in space (Take the tutorial). This makes queries simple like, “Where can I eat vegan in a radius of one mile around Times Square?” We plan to add support for other geo-spatial functions (e.g., polygons, multi-polygons) in the next minor release. So watch for that.

ArangoDB 3.2 Enterprise Edition: More Room for Secure Growth

The Enterprise Edition of ArangoDB is focused on solving enterprise-scale problems and secure work with data. In version 3.1, we introduced SmartGraphs to bring fast traversal response times to sharded datasets in a cluster. We also added auditing and enhanced encryption control. Download ArangoDB Enterprise Edition (forever free evaluation).

Working closely with one of our larger clients, we further explored and improved an idea we had about a year ago. Satellite Collections is the exciting result of this collaboration. It’s designed to enable faster join operations when working with sharded datasets. To avoid expensive network hops during join processing among machines, one has ‘only’ to find a solution to enable joins locally.

With Satellite Collections, you can define collections to shard to a cluster, as well as set collections to replicate to each machine. The ArangoDB query optimizer knows where each shard is located and sends requests to the DBServers involved, which then execute the query locally. The DBservers will then send the partial results back to the Coordinator which puts together the final result. With this approach, network hops during join operations on sharded collections can be avoided, hence query performance is increased and network traffic reduced. This can be more easily understood with an example. In the schema below, collection C is sharded to multiple machines, while the smaller satellites (i.e., S1 – S5) are replicated to each machine, orbiting the shards of C.

Use cases for Satellite Collection are plentiful. In this more in-depth blog post, we use the example of an IoT case. Personalized patient treatment based on genome sequencing analytics is another excellent example where efficient join operations involving large datasets, can help improve patient care and save infrastructure costs.

Security Enhancements

From the very beginning of ArangoDB, we have been concerned with security. AQL is already protected from injections. By using Foxx, sensitive data can be contained within in a database, with only the results being passed to other systems, thus minimizing security exposure. But this is not always enough to meet enterprise scale-security requirements. With version 3.1, we introduced Auditing and Enhanced Encryption Control and with ArangoDB 3.2, we added even more protection to safeguard data.

Encryption at Rest

With RocksDB, you can encrypt the data stored on disk using a highly secure AES algorithm. Even if someone steals one of your disks, they won’t be able to access the data. With this upgrade, ArangoDB takes another big step towards HIPAA compliance.

Enhanced Authentication with LDAP

Normally, users are defined and managed in ArangoDB itself. With LDAP, you can use an external server to manage your users. We have implemented a common schema which can be extended. If you have special requirements that don’t fit into this schema, please let us know.

Conclusion & You

The entire ArangoDB team is proud to release version 3.2 of ArangoDB — this should not be a surprise considering all of the improvements we made. We hope you will enjoy the upgrade. We invite you to take ArangoDB 3.2 for a spin and to let us know what you think. We look forward to your feedback!
Download ArangoDB 3.2


Thank you!

$
0
0

“By developers for developers” has been our internal motto since the first lines of ArangoDB code. Building an open-source project at such level of complexity and at a market competitive standard, undoubtedly puts a lot of pressure and almost solely relies on the support and trust of the community.

Every victory counts, be it small appreciation or big success – it’s what gives us inspiration and keeps us going forward. A while ago we’ve been having one of those rainy gray days here in Cologne. Receiving over 10 stars put a smile on faces of our whole team, motivating us to hack harder, brainstorm, bug fix, build, release…

Today we have reached 4,000 stars mark on GitHub!

As we are reaching this milestone on our continuous exploration journey higher and higher “to the stars”, we would like to thank you all for the trust and support you’ve given us. It is largely thanks to your feedback and support, we are able to deliver a great product and have a large international community adopting it around the globe.

Big thanks to you all from the ArangoDB family!

“To infinity and beyond!”

Webinar: Building powerful apps with ArangoDB & KeyLines

$
0
0

Wednesday, September 6th (5PM CEST/11AM ET/8AM PT) – Join the webinar here

Cambridge Intelligence KeyLines and ArangoDBAs data gets bigger, faster and more complex, you need to arm yourself with the best tools. In this webinar we’ll see how KeyLines and ArangoDB combine to create powerful and intuitive data analysis platforms.

Luca Olivari, President of ArangoDB, will introduce the world’s most popular multi-model database. We’ll see how its ‘triple model / single query’ approach helps developers build datastores with outstanding flexibility and power, without the risk and complexity of multiple stacks.

Cambridge Intelligence and ArangoDBNext Christian Miles, Technical Sales Manager at Cambridge Intelligence, will show how graph visualization makes that back-end power accessible to users. He’ll use a complex IT network visualization to demonstrate how a KeyLines component can help your users cut through complexity and overcome data scale to reveal insights.

By the end of the webinar, you’ll understand the advantages of a KeyLines + ArangoDB combination, and be ready to get started. Join the webinar and submit all your questions here.

Wednesday, September 6th (5PM CEST/11AM ET/8AM PT)

Join the webinar

Pronto Move Shard

$
0
0

In July Adobe announced that they plan the End-of-Life for flash at around 2020.
As HTML5 progressed and due to a long history of critical security vulnerabilities this is – technologically speaking – certainly the right decision. However I tended to also become a bit sad.

Flash was the first technology that brought interactivity to the web. We tend to forget how static the web was in the early 2000s. Flash brought life to the web and there were plenty of stupid trash games and animations which I really enjoyed at the time. As a homage to the age of trashy flash games I created a game which resembles the games of this era:

Game info

To start the game select a “shard” in the top right corner. Then the shard will appear as a yellow “dot” and the servers as “cards” on the table. Finally upon pressing the “Pronto move shard” button the dealer will move the shard around and once finished you have to guess where it is. More on shards and servers in the following sections. Make sure to enable sound (and put your headphones on if you are in an open office).

Please note that the game was not created in a responsive fashion. If you want you can open it in a new window.

Any similarities to a german trash tv show from the 80s are totally intended.

For the best and the most authentic experience you should have the Comic Sans MS font on your system. Windows and Mac should have it out of the box.

What is the game about?

The version above is the standalone version. The real version is only playable on an ArangoDB Cluster:

It will move around data in your cluster and you have to guess where the shard was being moved to. Please note that this is not at all a fake process. It will indeed move data around within the cluster!

A short introduction to some ArangoDB features

Foxx

The game has been implemented as a so called Foxx App. ArangoDB allows you to run microservices within the database:

Install service

Once installed the service is being hosted inside the database. This offers several possibilities:

  1. Data locality: Your logic will sit alongside the data and execute a lot faster as you save network overhead
  2. You have access to the internal ArangoDB APIs

This game is only possible because we have access to the ArangoDB cluster API and access to the Agency. Data locality is in this case not really relevant as we are not accessing any data. However this the use case most users will use when doing Foxx apps.

Cluster

In a clustered setup ArangoDB organizes its data in so called shards. The data is being partitioned using some key and stored on multiple database servers. When creating collections (they are our “tables”) you can control the numberOfShards and the replicationFactor.

numberOfShards will determine how many shards are being created and replicationFactor specifies how many replicas across the cluster will be held. By setting the replicationFactor you allow the cluster to be able to fail over to a different machine in case of an outage and achieve high availability.

Supervision

As a user you don’t really have to worry about failovers because we have the so called Supervision running within the Agency that will manage the cluster all by itself and will organize failovers in case a cluster node goes down. Apart from the failover case there is however also a managed shard movement. This comes in handy if you for example want to upgrade a machine or if you want to manually move traffic away from some nodes.

Move shard

This is what this game will use internally and when deployed on the cluster it will look like this:

Pronto move shard

For the game any shard is considered moveable when there are at least 3 possible servers that the shard could be moved to. When you have a shard that has a replicationFactor of 3 and you have 5 servers in total the shard will not be considered moveable. If you add another server the shard will be moveable.

After pressing the “Pronto move shard” button the Foxx endpoint starting the move will choose a random free server.

It will then periodically check if the move job has been finished and if not it will fake some movements across the servers (in the background there is of course just ONE move job happening).

When the job is finally finished it will simulate a few more movements (it is not that simple 😉 ) and ensure the shard is properly placed. Then you can make your guess.

Please note that it is not advisable to play this game on a production cluster. The game was made up very quickly just for illustration purposes. There are many edge cases that are not handled in the game (like servers going down while moving). Also moving shards around will put a lot of pressure on your cluster. Especially if there is a lot of data in the shard.

Code

The code is available on github (please note that the code quality also resembles the early days). Also thanks to Michael – our graph expert – that I could use his photo for this game.

Your move

The game is installable via our foxx store. So go ahead, install an ArangoDB cluster via DC/OS, via our starter or manually and install pronto-move-shard and make your guess. Be sure to reach out in case of questions.

If you like foxx and did something with it feel free to create a pull request to our registry, so that others can install your experiments too.

For more information about the cluster architecture and how you can achieve high availability be sure to check out the chapter about the cluster architecture in our documentation.

VelocyStream 1.1 – async binary protocol

$
0
0

With the 3.2 release, ArangoDB comes with version 1.1 of the binary protocol VelocyStream. VelocyStream is a bi-directional async binary protocol which supports sending messages pipelined, multiplexed, uni-directional or bi-directional. The messages themselves are VelocyPack objects.

The story of VelocyStream began with version 3.0 of ArangoDB which introduced VelocyPack as the new internal storage format of ArangoDB. VelocyPack as a binary format – which stores a superset of JSON, is more compact and has a fast attribute lookup – allows us to use the same byte sequence of data for storage, read-only work and last but not least transport. Next to the already existing advantages in 3.0, like reducing runtime memory and data conversions, we have created all conditions to implement a protocol to transport our storage format in an optimized form directly between ArangoDB and client applications.

With 3.1 the first version of VelocyStream found their way into ArangoDB. Contemporaneous an overhaul version of the official Java driver was released which used VelcoyStream as transport protocol. With this release, we were able to show that our vision of VelocyStream is heading in the right direction and that we are already able to see performance gains with it. In addition, a second Java driver was built to provide an API for asynchronous operations which were possible due to the async characteristic of VelocyStream.

Facing the 3.2 release we gave VelocyStream further improvements with version 1.1 and published the specification so that developers of third party drivers can implement it too. Also, we released our C++ driver and Go driver with VelocyStream support and an updated JavaScript driver is also on its way.

How does it work?

To understand how VelocyStream achieves the benefits over HTTP let’s take a look at its structure. VelocyStream messages consist of one or more VelocyPacks. Each message has a unique id and is split into one or more chunks depending on the size of the message. These Chunks can be sent parallel over one connection and then assembled back into consumable VelocyPack on the receiver. Different to HTTP this prevents time intensive operations on the server from blocking the whole communication with the client.

One additional advantage of VelocyStream is the reduced overhead when it comes to authentication. Unlike HTTP – where every request needs to deliver the user credentials – VelocyStream only has to authenticate the connection after opening it. A specific message – including the user credentials – has to be sent. Every following message doesn’t have to provide any further information.

On the downside, VelocyStream does not work together with an HTTP proxy but as an alternative since version 3.1, ArangoDB supports to communicate over HTTP with VelocyPack as content. This won’t let you use all the benefits from VelocyStream but nevertheless gives you the advantage of decaying data conversions in the transport layer.

Webinar: Use ArangoDB Agency as fault-tolerant persistent data store

$
0
0

Join our Sr Distributed System Engineer, Kaveh Vahedipour, to learn more about ArangoDB Agency on September 19th, 2017 (6PM CEST/12PM ET/ 9AM PT)Join the webinar here.

Distributed systems have become the standard topology on which modern appliances live. While the advantages of distributing workload for both performance as well as fault-tolerance are obvious, the runtime flexible configuration of such deployment becomes non-trivial.

ArangoDB clusters are no different in that regard. A potentially large database cluster’s configuration is manipulated at runtime by addition, alteration and removal of collections, indexes, and even servers. All servers need to trust in a fault-tolerant centralized configuration tree, which we call “the agency” in arango-speak.

The agency is in effect a cluster within the cluster with very special qualities. It is a RAFT-based store for objects and services, which not only hold the larger cluster’s animal brain but also handles server monitoring and failover.

Distributed system webinar arangodb

This webinar:

  • offers a brief introduction into the RAFT consensus protocol and specifics of its implementation in ArangoDB
  • discusses and shows how users may use such ArangoDB agency for their own service deployments
  • enables one to administer the ArangoDB cluster’s agency
  • illuminates aspects of performance and issues related, which need to be considered

September 19th, 2017 (6PM CEST/12PM ET/ 9AM PT)

Join the webinar

Updated GraphQL Sync in ArangoDB 3.2

$
0
0

Just in time for the upcoming 3.2.0 release, we have updated the graphql-sync module for compatibility with graphql-js versions 0.7.2, 0.8.2, 0.9.6 and 0.10.1. The graphql-sync module allows developers to implement GraphQL backends and schemas in strictly synchronous JavaScript environments like the ArangoDB Foxx framework by providing a thin wrapper around the official GraphQL implementation for JavaScript.

As a long-term database solution, ArangoDB is committed to API stability and avoids upgrades to third-party dependencies that would result in breaking changes. This means ArangoDB will continue to bundle the graphql-js 0.6.2 compatibility version of graphql-sync.

However in order to allow developers to keep up with the rapid development of the GraphQL language and reference implementation, starting with ArangoDB 3.2, services that bring their own version of graphql-sync can now still benefit from the built-in Foxx GraphQL integration.

Simply pass the imported module as the new *graphql* option when creating the router:

const graphql = require('graphql-sync');
const createGraphQLRouter = require('@arangodb/foxx/graphql');
const graphqlSchema = new graphql.Schema({
//...
});
module.context.use(createGraphQLRouter({
schema: graphqlSchema,
graphiql: true,
graphql: graphql
}));

If you want to try out Foxx with the latest release of graphql-sync, grab the ArangoDB 3.2 beta and make sure to include the node_modules/graphql-sync folder in your service bundle.

ArangoDB Finalizes 4.2 Million Euro Investment Led by Target Partners

$
0
0

Funding to accelerate and strengthen the company’s US-based presence

ArangoDB, the company behind one of the fastest growing next generation databases, closed the final tranche of a 4.2 million Euro investment led by Munich-based venture capital firm Target Partners.

The company is developing the open-source NoSQL database ArangoDB, which combines three major data models (graph, key/value, JSON-documents) in one database and with one query language. ArangoDB allows startups and enterprises to speed up innovation cycles, simplify technology stacks and increase on-time and on-budget delivery of software projects.

Claudius Weinberger ArangoDBClaudius Weinberger, Co-Founder and CEO, declared: “We’ve found in Target Partners a supportive and knowledgeable investor. After reaching important business milestones and expanding our customer footprint globally, we will use this funding to further accelerate and strengthen our US-based presence.”

ArangoDB is on an upward growth trajectory. The company recently hired key executives from the likes of MongoDB, MariaDB, and Oracle, further enhancing its customer facing functions. Talented engineers joined from Dell/EMC research lab to extend product leadership.

Kurt MuellerKurt Müller, Partner at Target Partners said: “Our investment in ArangoDB has proved to be very successful. After gaining traction on the product side, the company aggressively moved to sign landmark deals with some of the largest global organizations. This motivated us to further invest in the team and the growth.”

ArangoDB has an enthusiastic global community with over 2 million downloads so far and already close to 4,000 github stargazers. Over 3,000 organizations, including Fortune 50 companies, currently use ArangoDB worldwide for various use-cases including e-commerce, Internet of things, analytics, fraud detection, personalization, and genomic research. With its newly released Enterprise Edition, the company enables enterprises to scale deployments, including high-performance graph cases.


The new Satellite Collections Feature of ArangoDB

$
0
0

With the new Version 3.2 we have introduced a new feature called “Satellite Collections”. This post explains what this is all about, how it can help you, and explains a concrete use case for which it is essential.

Background and Overview

Join operations are very useful but can be troublesome in a distributed database. This is because quite often, a join operation has to bring together different pieces of your data that reside on different machines. This leads to cluster internal communication and can easily ruin query performance. As in many contexts nowadays, data locality is very important to avoid such headaches. There is no silver bullet, because there will be many cases in which one cannot do much to improve data locality.

One particular case in which one can achieve something, is if you need a join operation between a very large collection (sharded across your cluster) and a small one, because then one can afford to replicate the small collection to every server, and all join operations can be executed without network communications.


This is what the Satellite Collections do. In ArangoDB, a Satellite Collection is one that has exactly one shard which is automatically replicated synchronously to every DBserver in the cluster. Furthermore, the AQL query optimizer can recognize join operations with other sharded collections and organize the query execution such that it takes advantage of the replicated data and avoid a lot of costly network traffic.

The name “Satellite Collection” comes from the mental image of a smallish collection (the satellite) which orbits around all the shards of a larger collection (the bigger planets), always close at hand for a next join operation, replacing interplanetary traffic with a quick surface orbit shuttle.

A concrete use case

Imagine that you have an IoT use case with lots of sensor data from a variety of devices. The set of individual events is clearly too large for a single server, but the number of devices is small enough to replicate to each server. If you now need to select certain events, usually restricted by their timestamp, but also filtered according to some condition on the device which produced the event, you have to bring together the data of an event with the data of its device.

One approach is to embed the device data with every event, thereby avoiding any join operation. This needs a lot more storage in the first place, but it also has other disadvantages. First of all, it contradicts the principle of data normalization, because device data is stored redundantly. Secondly (and of course related), there is no longer a single place holding all device data, so it is hard to get an overview over all devices or something like this.

The other sensible approach is to keep the device data separately, say in a smaller collection, and join an event to its device via a foreign key. This is the usual relational approach, which makes a lot of sense.

However, it requires a join operation for a query like this:

FOR e IN events
      FILTER e.timestamp >= "123123123223" && e.timestamp <= "125212323232"
      FOR d IN devices
        FILTER e.device == d._key
        FILTER d.firmwareVersion == "17"
        return e

Obviously, we assume that the large events collection has a sorted index on the timestamp attribute.

This fetches all events in a prescribed time range whose corresponding device has firmware version 17, which is a query that could very well occur in practice. There are lots of similar use cases, the crucial ingredient is that a join between a large and a small collection is needed.

Performance gain

Let’s see what the query engine has to do to execute the above query.

It almost certainly wants to use the index on timestamp, so, on each shard of the events collection it uses this index to find the corresponding events. For each such event found it now has to find the corresponding device to do the filtering on the firmwareVersion attribute.

In the non-satellite case the query engine has to take all found events, move them to the place where the small devices collection is held, then perform a lookup for each such event, and then either discard the event or put it into the result set, if the firmware version matches.
ArangoDB will even collect all results from all shards on the coordinator and then send them on to the DBserver that holds the devices. The result set will be sent back to the coordinator and then to the client.

Let’s analyze this with a few concrete numbers. Assume that there is billion events and a million of them lie in the time range in which we are interested. Assume furthermore, that about one in 1000 devices has the particular firmware and that the events are evenly distributed between the devices.

Then the above query will find a million out of a billion, distributed across the shards of the events collection. Altogether, it will send a million events across the network to the coordinator (each from one of the shards), and then send all of them to the DBserver holding the devices. There it will perform a million lookups locally and discard 999 out of 1000 devices, producing 1000 results, which are sent back to the coordinator. Altogether, we have sent a million events two times over the wire, this is expensive.

If the devices collection is a Satellite Collection, then the data is resident on each server, that is, close to each shard of the events collection. Therefore, the query optimizer can select the events in the time range in question on each shard, and directly perform the lookup for the device, without any network traffic. It discards most of them because of the firmware version and only has to send the 1000 results back to the coordinator and then to the client. It could even perform this in a completely parallel fashion, but this is a future performance enhancement.

That is, the net savings effect, in this case, is not to have to send a million events across the wire twice. If you assume that an event uses 500 bytes, then this is network transfer of approximately 1GB. On a fast ethernet this takes approximately 100s. On gigabit ethernet it is only 8s under optimal conditions, but this is still noticeable.

Our experiments actually confirm these savings in network traffic and time. In the coming weeks we will publish a benchmark including all the data used and measurements done, so watch out for more details.

How to try it out

This is fairly easy. First of all, you have to know that the Satellite Collections are a feature of the enterprise version. So once you have a cluster running that (we offer a free trial license), you simply create your collections as usual, create the smaller one with "numberOfShards":1 and "replicationFactor":"satellite". From then on, your join operations should be faster, provided the query optimizer discovers the situation and acts accordingly. Use the explain feature in the UI or in JavaScript to check the execution plan used. If in doubt, we are always happy to help diagnosing what is going on.

Obviously, a collection that is replicated synchronously to all DBservers has a larger latency on writes. However, we would assume that such a Satellite Collection has a smaller write bandwidth than the larger events collection.

Let me close with another word of warning: Satellite Collections are a nice and very useful feature, but they need planning to unfold their full potential. Please contact us, if you have a particular use-case in mind.

ArangoDB 3.2 GA RocksDB, Pregel, Fault-Tolerant Foxx & Satellite Collections

$
0
0

We are pleased to announce the release of ArangoDB 3.2. Get it here. After an unusually long hackathon, we eliminated two large roadblocks, added a long overdue feature and integrated an interesting new one into this release. Furthermore, we’re proud to report that we increased performance of ArangoDB on average by 35%, while at the same time reduced the memory footprint compared to version 3.1. In combination with a greatly improved cluster management, we think ArangoDB 3.2 is by far our best work. (see release notes for more details)

One key goal of ArangoDB has always been to provide a rock solid platform for building ideas. Our users should always feel safe to try new things with minimal effort by relying on ArangoDB. Todays 3.2 release is an important milestone towards this goal. We’re excited to release such an outstanding product today.

RocksDB

With the integration of Facebook’s RocksDB, as a first pluggable storage engine in our architecture, users can now work with as much data as fits on disk. Together with the better locking behavior of RocksDB (i.e., document-level locks), write intensive applications will see significant performance improvements. With no memory limit and only document-level locks, we have eliminated two roadblocks for many users. If one chooses RocksDB as the storage engine, everything, including indexes will persist on disk. This will significantly reduce start-up time.
See this how-to on “Comparing new RocksDB and mmfiles engine” to test the new engine for your operating system and use case.

Pregel

Distributed graph processing was a missing feature in ArangoDB’s graph toolbox. We’re willing to admit that, especially since we managed to fill this need by implementing the Pregel computing model.

With PageRank, Community Detection, Vertex Centrality Measures and further algorithms, ArangoDB can now be used to gain high-level insights into the hidden characteristics of graphs. For instance, you can use graph processing capabilities to detect communities. You can then use the results to shard your data efficiently to a cluster and thereby enable SmartGraph usage to its full potential. We’re confident that with the integration of distributed graph processing, users will now have one of the most complete graph toolsets available in a single database.

Test the new pregel integration with this Community Detection Tutorial and further sharpen advanced graph skills with this new tutorial about Using SmartGraphs in ArangoDB.

Fault-Tolerant Foxx Services

Many people already enjoy using our Foxx JavaScript framework for data-centric microservices. Defining your own highly configurable HTTP routes with full access to the ArangoDB core on the C++ level can be pretty handy. In version 3.2, our Foxx team completely rewrote the management internals to support fault-tolerant Foxx services. This ensures multi-coordinator clusters will always keep their services in sync, and new coordinators are fully initialized, even when all existing coordinators are unavailable.

Test the new fault-tolerant Foxx yourself or learn Foxx by following the brand new Foxx tutorial.

Powerful Graph Visualization

Managing and processing graph data may not be enough, causing visualizing insights to be important. No worries. With ArangoDB 3.2, this can be handled easily. You can use the open-source option via arangoexport to export the data and then import it into Cytoscape (check out the tutorial).

Or you can just plug in the brand new Keylines 3.5 via Foxx and install an on-demand connection. With this option, you will always have the latest data visualized neatly in Keylines without any export/import hassle. Just follow this tutorial to get started with ArangoDB and Keylines.

Read-Only Users

To enhance basic user management in ArangoDB, we added Read-Only Users. The rights of these users can be defined on database and collection levels. On the database level, users can be given administrator rights, read access or denied access. On the collection level, within a database, users can be given read/write, read only or denied access. If a user is not given access to a database or a collection, the databases and collections won’t be shown to that user. Take the tutorial about new User Management.

We also improved geo queries since this is becoming more important to our community. With geo_cursor, it’s now possible to sort documents by distance to a certain point in space (Take the tutorial). This makes queries simple like, “Where can I eat vegan in a radius of one mile around Times Square?” We plan to add support for other geo-spatial functions (e.g., polygons, multi-polygons) in the next minor release. So watch for that.

ArangoDB 3.2 Enterprise Edition: More Room for Secure Growth

The Enterprise Edition of ArangoDB is focused on solving enterprise-scale problems and secure work with data. In version 3.1, we introduced SmartGraphs to bring fast traversal response times to sharded datasets in a cluster. We also added auditing and enhanced encryption control. Download ArangoDB Enterprise Edition (forever free evaluation).

Working closely with one of our larger clients, we further explored and improved an idea we had about a year ago. Satellite Collections is the exciting result of this collaboration. It’s designed to enable faster join operations when working with sharded datasets. To avoid expensive network hops during join processing among machines, one has ‘only’ to find a solution to enable joins locally.

With Satellite Collections, you can define collections to shard to a cluster, as well as set collections to replicate to each machine. The ArangoDB query optimizer knows where each shard is located and sends requests to the DBServers involved, which then execute the query locally. The DBservers will then send the partial results back to the Coordinator which puts together the final result. With this approach, network hops during join operations on sharded collections can be avoided, hence query performance is increased and network traffic reduced. This can be more easily understood with an example. In the schema below, collection C is sharded to multiple machines, while the smaller satellites (i.e., S1 – S5) are replicated to each machine, orbiting the shards of C.

Use cases for Satellite Collection are plentiful. In this more in-depth blog post, we use the example of an IoT case. Personalized patient treatment based on genome sequencing analytics is another excellent example where efficient join operations involving large datasets, can help improve patient care and save infrastructure costs.

Security Enhancements

From the very beginning of ArangoDB, we have been concerned with security. AQL is already protected from injections. By using Foxx, sensitive data can be contained within in a database, with only the results being passed to other systems, thus minimizing security exposure. But this is not always enough to meet enterprise scale-security requirements. With version 3.1, we introduced Auditing and Enhanced Encryption Control and with ArangoDB 3.2, we added even more protection to safeguard data.

Encryption at Rest

With RocksDB, you can encrypt the data stored on disk using a highly secure AES algorithm. Even if someone steals one of your disks, they won’t be able to access the data. With this upgrade, ArangoDB takes another big step towards HIPAA compliance.

Enhanced Authentication with LDAP

Normally, users are defined and managed in ArangoDB itself. With LDAP, you can use an external server to manage your users. We have implemented a common schema which can be extended. If you have special requirements that don’t fit into this schema, please let us know.

Conclusion & You

The entire ArangoDB team is proud to release version 3.2 of ArangoDB — this should not be a surprise considering all of the improvements we made. We hope you will enjoy the upgrade. We invite you to take ArangoDB 3.2 for a spin and to let us know what you think. We look forward to your feedback!
Download ArangoDB 3.2

Thank you!

$
0
0

“By developers for developers” has been our internal motto since the first lines of ArangoDB code. Building an open-source project at such level of complexity and at a market competitive standard, undoubtedly puts a lot of pressure and almost solely relies on the support and trust of the community.

Every victory counts, be it small appreciation or big success – it’s what gives us inspiration and keeps us going forward. A while ago we’ve been having one of those rainy gray days here in Cologne. Receiving over 10 stars put a smile on faces of our whole team, motivating us to hack harder, brainstorm, bug fix, build, release…

Today we have reached 4,000 stars mark on GitHub!

As we are reaching this milestone on our continuous exploration journey higher and higher “to the stars”, we would like to thank you all for the trust and support you’ve given us. It is largely thanks to your feedback and support, we are able to deliver a great product and have a large international community adopting it around the globe.

Big thanks to you all from the ArangoDB family!

“To infinity and beyond!”

Webinar: Building powerful apps with ArangoDB & KeyLines

$
0
0

Wednesday, September 6th (5PM CEST/11AM ET/8AM PT) – Join the webinar here

Cambridge Intelligence KeyLines and ArangoDBAs data gets bigger, faster and more complex, you need to arm yourself with the best tools. In this webinar we’ll see how KeyLines and ArangoDB combine to create powerful and intuitive data analysis platforms.

Luca Olivari, President of ArangoDB, will introduce the world’s most popular multi-model database. We’ll see how its ‘triple model / single query’ approach helps developers build datastores with outstanding flexibility and power, without the risk and complexity of multiple stacks.

Cambridge Intelligence and ArangoDBNext Christian Miles, Technical Sales Manager at Cambridge Intelligence, will show how graph visualization makes that back-end power accessible to users. He’ll use a complex IT network visualization to demonstrate how a KeyLines component can help your users cut through complexity and overcome data scale to reveal insights.

By the end of the webinar, you’ll understand the advantages of a KeyLines + ArangoDB combination, and be ready to get started. Join the webinar and submit all your questions here.

Wednesday, September 6th (5PM CEST/11AM ET/8AM PT)

Join the webinar

Pronto Move Shard

$
0
0

In July Adobe announced that they plan the End-of-Life for flash at around 2020.
As HTML5 progressed and due to a long history of critical security vulnerabilities this is – technologically speaking – certainly the right decision. However I tended to also become a bit sad.

Flash was the first technology that brought interactivity to the web. We tend to forget how static the web was in the early 2000s. Flash brought life to the web and there were plenty of stupid trash games and animations which I really enjoyed at the time. As a homage to the age of trashy flash games I created a game which resembles the games of this era:

Game info

To start the game select a “shard” in the top right corner. Then the shard will appear as a yellow “dot” and the servers as “cards” on the table. Finally upon pressing the “Pronto move shard” button the dealer will move the shard around and once finished you have to guess where it is. More on shards and servers in the following sections. Make sure to enable sound (and put your headphones on if you are in an open office).

Please note that the game was not created in a responsive fashion. If you want you can open it in a new window.

Any similarities to a german trash tv show from the 80s are totally intended.

For the best and the most authentic experience you should have the Comic Sans MS font on your system. Windows and Mac should have it out of the box.

What is the game about?

The version above is the standalone version. The real version is only playable on an ArangoDB Cluster:

It will move around data in your cluster and you have to guess where the shard was being moved to. Please note that this is not at all a fake process. It will indeed move data around within the cluster!

A short introduction to some ArangoDB features

Foxx

The game has been implemented as a so called Foxx App. ArangoDB allows you to run microservices within the database:

Install service

Once installed the service is being hosted inside the database. This offers several possibilities:

  1. Data locality: Your logic will sit alongside the data and execute a lot faster as you save network overhead
  2. You have access to the internal ArangoDB APIs

This game is only possible because we have access to the ArangoDB cluster API and access to the Agency. Data locality is in this case not really relevant as we are not accessing any data. However this the use case most users will use when doing Foxx apps.

Cluster

In a clustered setup ArangoDB organizes its data in so called shards. The data is being partitioned using some key and stored on multiple database servers. When creating collections (they are our “tables”) you can control the numberOfShards and the replicationFactor.

numberOfShards will determine how many shards are being created and replicationFactor specifies how many replicas across the cluster will be held. By setting the replicationFactor you allow the cluster to be able to fail over to a different machine in case of an outage and achieve high availability.

Supervision

As a user you don’t really have to worry about failovers because we have the so called Supervision running within the Agency that will manage the cluster all by itself and will organize failovers in case a cluster node goes down. Apart from the failover case there is however also a managed shard movement. This comes in handy if you for example want to upgrade a machine or if you want to manually move traffic away from some nodes.

Move shard

This is what this game will use internally and when deployed on the cluster it will look like this:

Pronto move shard

For the game any shard is considered moveable when there are at least 3 possible servers that the shard could be moved to. When you have a shard that has a replicationFactor of 3 and you have 5 servers in total the shard will not be considered moveable. If you add another server the shard will be moveable.

After pressing the “Pronto move shard” button the Foxx endpoint starting the move will choose a random free server.

It will then periodically check if the move job has been finished and if not it will fake some movements across the servers (in the background there is of course just ONE move job happening).

When the job is finally finished it will simulate a few more movements (it is not that simple 😉 ) and ensure the shard is properly placed. Then you can make your guess.

Please note that it is not advisable to play this game on a production cluster. The game was made up very quickly just for illustration purposes. There are many edge cases that are not handled in the game (like servers going down while moving). Also moving shards around will put a lot of pressure on your cluster. Especially if there is a lot of data in the shard.

Code

The code is available on github (please note that the code quality also resembles the early days). Also thanks to Michael – our graph expert – that I could use his photo for this game.

Your move

The game is installable via our foxx store. So go ahead, install an ArangoDB cluster via DC/OS, via our starter or manually and install pronto-move-shard and make your guess. Be sure to reach out in case of questions.

If you like foxx and did something with it feel free to create a pull request to our registry, so that others can install your experiments too.

For more information about the cluster architecture and how you can achieve high availability be sure to check out the chapter about the cluster architecture in our documentation.

Viewing all 391 articles
Browse latest View live


Latest Images