What is Graph Query Language (GQL)?
Lesson
As the name suggests, Graph Query Language (GQL) is a query language for graphs, more specifically property graphs. GQL has its origins a language called Cypher but is now standardized as Graph Query Language by the International Organization for Standardization (ISO) who describe the standard as:
"This document defines data structures and basic operations on property graphs. It provides capabilities for creating, accessing, querying, maintaining, and controlling property graphs and the data they comprise.
This document specifies the syntax and semantics of a data management language for specifying and modifying the structure of property graphs and collections thereof."
These types of graphs aren't bar charts, or pie charts! They are rooted in a branch of mathematics called graph theory and model data as relationships (edges) between objects (nodes or vertices). A property graph has explicit directions on the relationships between nodes and stores key-value pairs of data on each relationship and node.
Graph Query Language is used by graph databases and graph database management systems (GDBMS) which store data as a property graph. Although the original implementation of Cypher was in the Neo4j database (more on this later), there are many modern graph databases which support GQL, or it's pre-curser openCypher, directly or through an extension:
- Neo4j
- Memgraph
- KuzuDB
- FalkorDB
- ArcadeDB
- Apache AGE (for PostgreSQL)
- NebulaGraph
- TigerGraph
Although GQL is similar to SQL in many ways, it can look quite different as it uses ASCII art syntax to represent the patterns of data that you're searching for. The example below will look for a Person named Bob and return their followers.
MATCH (b:Person)<-[rel:FOLLOWS]-(f:Person)
WHERE b.name = 'Bob'
RETURN b, rel, f
As you can see from the example, nodes a represented in parentheses ()
whilst relationships are shown with hyphens -
, square brackets []
and, optionally, an arrow to indicate the direction of the relationship >
.
Other key elements from the example:
- Variables: we assign the components being matched to variables (
b
,rel
,f
) and return them at the end. - Labels: We indicate the labels to match on with colons (
:Person
) - here we're only matching nodes with aPerson
label. - Relationship Types: similar to labels, we indicate the relationship type to match on after a colon (
:FOLLOWS
) - Properties: in the WHERE clause, we use dot notation to access the
name
property. We could also have achieved this in the match clause like this:(b:Person {name: 'Bob'})
.
Most graph databases offer specific drivers for different languages (such as Java, Python, Go, Rust), or you can use a more general GQL library such Neontology (Python) which can talk to different GQL implementations.
Comparison to other query languages and concepts
GraphQL
GraphQL is a query language for APIs, not for databases (usually) and serves as an alternative to REST APIs rather than being a direct comparison/competitor to GQL.
GraphQL provides an intuitive, graph-like way of interacting with web APIs but is not a full graph query language in the same way as GQL.
The similarity in name can cause some confusion, and some GraphQL tools use GQL
in their names or documentation. There are also libraries and extensions integrate GraphQL and GQL databases (so that you can use GraphQL to access data in a graph database). However GraphQL and Graph Query Language/GQL are completely different languages and concepts.
SQL
Structured Query Language (SQL) has been around since the 1970s and was standardised by ISO in 1987! SQL is widely used for managing data in relational database management systems (RDBMS) for table-oriented data.
Cypher originally inspired by SQL, and therefore provides a SQL like experience but which is intuitive and efficient for interacting with property graphs rather than tables.
Although aspects of GQL are similar to SQL and GQL should be friendly for those coming to graphs from SQL, the two languages are not compatible and have some significant differences.
NoSQL
NoSQL is a broad, catch-all term for databases which do not use a tabular relational model in the same way as traditional SQL databases.
Therefore, graph databases can be considered as one type of NoSQL databases. Other popular types of NoSQL database include key-value stores (such as Redis) and document stores (such as MongoDB).
RDF
The Resource Description Framework (RDF) is a standardized framework for modelling data which uses URIs to name relationships and the two ends between them as a triple. A collection of triples (nodes and relationships) becomes a graph. RDF is developed and standardized by the World Wide Web Consortium (W3C).
RDF is a different way of representing graph data compared to the property graphs supported by GQL. RDF has a strict approach which may suit some applications but lack the flexibility needed for others.
Some graph implementations support both RDF data and property graphs directly or through integrations.
SPARQL
SPARKQL, pronounced 'sparkle', is a query language for RDF databases. There are some similarities between SPARQL and GQL - they are both query languages for graph data, and both have similarities with SQL - but they are not compatible.
Some graph databases may be able to support SPARQL and GQL.
Gremlin (Tinkerpop)
Gremlin is another Graph query language which is developed as part of Apache TinkerPop - a graph computing framework. Gremlin supports property graphs but adopts a different syntax approach to GQL, using a functional, data-flow language. For example (from the Gremlin documentation):
// What are the names of Gremlin's friends' friends?
g.V().has("name","gremlin").
out("knows").out("knows").values("name")
Some databases, such as Amazon Neptune, may support Gremlin and Cypher/GQL (and SPARQL) for querying data.
GQL History
GQL largely developed out of the Cypher language. Cypher emerged at Neo4j in 2010 as a property graph query language which would fill the space for graphs which SQL fills for relational databases. The 1999 film The Matrix is a popular theme at Neo4j and can be seen in some of their naming - Neo, played by Keanu Reeves, is the hero of the film, whilst Cypher (aka Mr Reagan) is the on the side of the bad guys (Agent Smith).
In 2015 Neo4j launched the openCypher group which aimed to take the query language from something specific to Neo4j databases to be more standardized and widely used as 'the' property graph query language. Alongside Neo4j, openCypher's initial supporters included Oracle, Databricks and Tableau.
Out of openCypher and wider work by Neo4j, academia and others, support grew for an ISO standard property graph query language. In 2019 ISO announced a new project for this purpose - GQL, Graph Query Language.
Questions
Test your knowledge with these questions.
References
Learn more about this topic by checking out these references.
Courses
This lesson is part of the following courses.