Technical Documentation

Crystal is an outcome of a European research project (“IOStack: Software-Defined Storage for Big Data”). Due to its research-based nature, for having a deep knowledge of Crystal we suggest to check both the research papers and the available technical documentation provided here and in the GitHub repositories.

Crystal control plane

Repository: https://github.com/Crystal-SDS/controller

The control plane offers dynamic meta-programming facilities over the data plane. In particular, there are three components that enable management policies in the control plane: (i) meta-language (DSL), (ii) metadata store and APIs, and (ii) distributed controllers.

Crystal DSL

Crystal’s DSL hides the complexity of low-level policy enforcement, thus achieving simplified storage administration. The structure of our DSL is as follows:

Target: The target of a policy represents the recipient of a policy’s action (e.g., filter enforcement) and it is mandatory to specify it on every policy definition. To meet the specific needs of object storage, targets can be tenants, containers or even individual data objects. This enables high management and administration flexibility.
Trigger clause (optional): Dynamic storage automation policies are characterized by the trigger clause. A policy may have one or more trigger clauses —separated by AND/OR operands— that specify the workload-based situation that will trigger the enforcement of a filter on the target. Trigger clauses consist of inspection triggers, operands (e.g, >, <, =) and values. The DSL exposes both types of inspection triggers: workload metrics (e.g., GETS_SEC) and request metadata (e.g., OBJECT_SIZE<512).
Action clause: The action clause of a policy defines how a filter should be executed on an object request once the policy takes place. The action clause may accept parameters after the WITH keyword in form of key/value pairs that will be passed as input to customize the filter execution. Retaking the example of a compression filter, we may decide to enforce compression using a gzip or an lz4 engine, and even their compression level.

To cope with object stores formed by proxies/storage nodes (e.g., Swift), our DSL enables to explicitly control the execution stage of a filter with the ON keyword. Also, dynamic storage automation policies can be persistent or transient; a persistent action means that once the policy is triggered the filter enforcement remains indefinitely (by default), whereas actions to be executed only during the period where the condition is satisfied are transient (keyword TRANSIENT).

The vocabulary of our DSL can be extended on-the-fly to accommodate new filters and inspection triggers. That is, in Fig. 1 we can use keywords COMPRESSION and DOCS in P1 once we associate “COMPRESSION” with a given filter implementation and “DOCS” with some file extensions, respectively.

Specialization, Pipelining and Grouping

There are three advanced DSL features to ease policy management: Policy specialization, action pipelining and target grouping.

Our DSL enables to specialize policies based on the target scope. This is very useful to administrate tenants with disparate behavior on different containers, and even objects. To draw an example of how it works, let us imagine a tenant T1 with containers C1 and C2. Thus, defining a SDS policy P1 on T1 yields that the policy will be also applicable to the containers C1 and C2, as well as to the objects that they contain. Now, let us supose that we define a new policy P2 on container C2. As C2 has now a more specialized policy definition, the system will only enforce P2 on object requests related to C2. Conversely, P1 is still applicable to C1, as the only policy defined for this container is at the tenant scope.

Our DSL offers to pipeline several actions on a single request (e.g., compression+encryption), similar to streamprocessing frameworks. Pipelining filters can be done i) by explicitly defining multiple action clauses separated by commas in a single policy (see P1 in Fig. 2), or ii) by defining multiple individual automation policies. In both cases, filters are pipelined and executed in the order they are defined on PUTs, and in reverse order on GETs.

To ease policy management at scale, our DSL supports policy recipes and target groups; for instance, we can create a GOLD policy recipe that defines caching and high bandwidth shares for premium tenants, as well as BRONZE policy recipe that defines data compression and reduced IO bandwidth under multi-tenant workloads for freemiumtenants. By doing this, administrators can rapidly enforce a set of policies to similar targets. Analogously, we can group targets to apply SDS policies to all of them. That is, we can create groups like WEB_CONTAINERS or LOGS_CONTAINERS to represent all the containers that are being used to serve Web pages or to store log data, respectively. Then, we can define suitable policies for containers exhibiting similar usage patterns.

Crystal Metadata Store & APIs

The Crystal metadata store contains the metadata of policies at the control plane that allow the execution of filters at the data plane. Specifically, it contains metadata of policies, metadata of filters and a registry. First, metadata of policies refers to the relationship of targets and filters, as well as the conditions under which a certain filter should be enforced. Second, metadata of filters is the necessary information for the deployment and correct operation of a filter within the storage system. Besides, the Crystal registry is a space to dynamically extend the capabilities of our DSL. In Fig. 2, we can use keywords COMPRESSION and DOCS in P1 once we register that “COMPRESSION” is associated with a given filter implementation and “DOCS” refers to some file extensions, respectively (see registry calls at Table 1). This makes Crystal DSL extensible, in contrast with existing systems.

The management of the metadata store is done via APIs. That is, to plug-in a new SDS service in Crystal it is required to provide a REST API that manages its own metadata. As shown in Table 1, Crystal has its own management API; it exposes the DSL compiler, manages the registry, etc. Similarly, Fig.1 shows that other Crystal services like the filter framework (see Section 5) or the bandwidth differentiation service (see Section 6.2) integrate their own API. This design favors the development of new SDS services in Crystal, as their APIs are isolated from already existing ones.

As visible in Table 1, Crystal a DSL compilation service via an API call. For simple static policies, the compilation process translates the policy into target!filter relationship in the metadata store. Next, we show how dynamic policies are materialized in form of distributed controllers that extend the control plane.

Distributed Controllers

Crystal resorts to distributed controllers, in form of supervised micro-services, which can be deployed in the system at runtime to extend the control plane. We offer two types of controllers: automation and global controllers. On the one hand, the Crystal DSL compiles dynamic storage automation policies into automation controllers (e.g., P2 in Fig. 1). Their life-cycle consists of consuming the appropriate monitoring metrics and interact with the filter framework API to enforce a filter when the trigger clause is satisfied.

On the other hand, global controllers are not generated by the DSL; instead, by simply extending a base class and overriding its computeAssignments method, developers can deploy controllers that contain complex algorithms with global visibility and continuous control of a filter at the data plane (e.g., P3 in Fig. 1). To this end, the base global controller class encapsulates the logic i) to ingest monitoring events, ii) to disseminate the computed assignments across nodes2, and iii) to get Service-Level Objectives (SLO) to be enforced from the metadata layer. This allowed us to deploy distributed IO bandwidth control algorithms, for example.

Extensible control loop: To close the control loop, workload metric processes are micro-services that provide controllers with monitoring information from the data plane. While running, a workload metric process consumes and aggregates events from one workload metric at the data plane. For the sake of simplicity, we advocate to separate workload metrics not only per metric type, but also by target granularity.

Controllers and workload metrics processes interact in a publish/subscribe fashion. For instance, Fig. 3 shows that, once initialized, an automation controller subscribes to the appropriate workload metric process, taking into account the target granularity. The subscription request of a controller specifies the target to which it is interested in, such as tenant T1 or container C1; this ensures that controllers do not receive unnecessary monitoring information from other targets. Once the workload metric process receives the subscription request, it adds the controller to its observer list. Periodically, it notifies the activity of the different targets to the interested controllers that may trigger the execution of filters.

Crystal data plane

At the data plane, we offer two main extension hooks: Inspection triggers and a filter framework.

Crystal inspection triggers (monitoring & request metadata)

Repository: https://github.com/Crystal-SDS/metrics-middleware

Inspection triggers enable controllers to dynamically respond to workload changes in real time. Specifically, we consider two types of introspective information sources: object metadata and monitoring metrics.

First, some object requests embed semantic information related to the object at hand in form of metadata. Crystal enables administrators to enforce storage filters based on such metadata. Concretely, our filter framework middleware is capable of analyzing at runtime HTTP metadata of object requests to execute filters based on the object size or file type, among others.

Second, Crystal builds a metrics middleware to add new workload metrics on the fly. At the data plane, a workload metric is a piece of code that accounts for a particular aspect of the system operation and publishes that information. In our design, a new workload metric can inject events to the monitoring service without interfering with existing ones (Table 1). Our metrics framework allows developers to plug-in metrics that inspect both the type of requests and their contents (e.g., compressibility). We provide the logic (i.e., AbstractMetric class) to abstract developers from the complexity of request interception and event publishing.

Crystal interception (filter framework)

Repository: https://github.com/Crystal-SDS/filter-middleware

At the data plane, storage filters are components that perform management or transformations on data objects. In this sense, a central feature of Crystal is the filter framework. The Crystal filter framework enables developers to run general-purpose code on object requests. Crystal borrows ideas from active storage literature as a mean of building filters to enforce policies. To orchestrate filters, the Crystal filter framework consists of:

Management API: The management API of the filter framework resides at the control plane of Crystal and allows managing i) filter binaries and dependencies, and ii) filter/target relationships (see Table 1). First, the filter framework management API enables an administrator to upload the executable filter binaries to be deployed in the system (e.g., .jar, .egg), which are stored at the Crystal metadata store. We also enable filters to use third-party libraries (i.e., dependencies) that can be managed via our API. Uploading a new filter via the API requires to input some metadata items for: Type of operations that the filter is designed for (e.g., PUT/GET or both), a flag that indicates whether the filter has reverse transformation on GET requests (e.g., compression/decompression), the default execution stage (proxy/storage node) or the execution environment (e.g., sandboxed, native).

Besides, once a filter has been uploaded along with its metadata and potential dependencies, we can then enforce it on a certain target’s requests (e.g., container, tenant). The enforcement is performed via the deploy_filter call at Table 1, that internally deploys the filter executable on the selected target, and persists the filter/target association at the metadata store.

Request classification: The filter/target associations at the metadata store enables our framework to discriminate the filters to be executed on a particular object request. Technically, a Crystal module built as a Swift middleware and placed at the proxy nodes performs the actual request classification. When a new object request gets into the Swift pipeline it reaches the Crystal middleware, which contacts the metadata store to infer the filters to be executed on that request depending on the target. In the case that the target has associated filters to be enforced, the Crystal middleware sets the appropriate HTTP headers in the request (e.g., GET, PUT) for triggering the filter execution. The interactions between the Crystal middleware and the metadata store, as well as the resulting filter executions are shown in Fig. 4.

Moreover, filters that change the content of data objects may receive a special treatment, such as in case of compression or encryption filters. To wit, if we create a Storlet filter with the reverse flag enabled, it means that the execution of the filter when the object was stored should be always undone upon a GET request. For instance, this yields that we may activate data compression on certain periods, but tenants will always download decompressed data objects. To this end, we store data objects with an extended metadata to keep track of the enforced reverse filters. Upon a GET request, such metadata is fetched by the Crystal middleware (object server) to build the HTTP headers that trigger reverse transformations on the data object prior to the execution of regular filters.

Filter execution environments: Upon the arrival of a tenant’s request with the appropriate HTTP headers, a filter can then be executed either at proxy or storage node stages; a decision that depends on the policy definition or on the filter’s metadata (default). As our middleware represents a hook to intercept data streams, it can support multiple execution platforms. Currently, Crystal features:

Isolated filter execution: We want to provide an isolated filter execution environment to manipulate object requests with higher security guarantees. To this end, we extended IBM Storlets framework. Storlets provide Swift with the capability to run computations near the data in a secure and isolated manner making use of Docker as application container. With Storlets a developer can write code, package and deploy it as a Swift object, and then explicitly invoke it on data objects as if the code was part of the Swift pipeline. Invoking a Storlet on a data object is done in an isolated manner so that the data accessible by the computation is only the object’s data and its user metadata. Moreover, a Docker container only executes filters of a single tenant. The Storlet engine executes a particular binary when the HTTP request for a data object contains the correct metadata headers specifying to do so. In this work, we extended the Storlets framework by adding two new features: i) Executing several Storlet filters in a single request (i.e., pipelining), and ii) full control of filter execution stage (i.e., execute a filter at proxy and/or storage node either for uploads and/or downloads).

Native filter execution: The isolated filter execution environment trades-off higher security for lower communication capabilities and interception flexibility. For this reason, we also contribute an alternative way to intercept and execute code natively. Similarly than with Storlets, a developer can install on runtime in Crystal (Python) code modules as filters following simple design guidelines. There are two main differences than with Storlets: i) Native filters can execute code at all the possible request life-cycle stages offered by Swift, and ii) native filters can communicate directly with external components (e.g, MOM, metadata store), as well as to access to storage devices (e.g., SSD). As Crystal is focused to execute non-adversarial/trusted code from administrators, we believe that this environment represents more flexible alternative.

The ambition of Crystal is to ease the development by the community to become a rich open-source SDS system. As we show next, our filter framework represents a solid ground to develop a variety of storage management filters (e.g., compression, caching).

Crystal dashboard

Repository: https://github.com/Crystal-SDS/dashboard

Crystal provides a user-friendly dashboard to manage policies, filters and workload metrics. The dashboard is completely integrated in the OpenStack Horizon project. Moreover, Crystal integrates advanced monitoring analysis tools for administrators in order to explore the behavior tenants, containers and even objects in the system and devise appropriate policies.

Research papers

Raúl Gracia-Tinedo, Pedro García-López, Marc Sánchez-Artigas, Josep Sampé, Yosef Moatti, Eran Rom, Dalit Naor, Ramon Nou, Toni Cortés, William Oppermann, Pietro Michiardi. “IOStack: Software-Defined Object Storage“. IEEE Internet Computing, 20(3), 10-18, May 2016. [pdf]
Raúl Gracia-Tinedo, Josep Sampé, Edgar Zamora-Gómez, Pedro García-López, Marc Sánchez-Artigas, Yosef Moatti, Eran Rom. “Crystal: Software-Defined Storage for Multi-tenant Object Stores“. USENIX FAST’17, 2017. [pdf]
Yosef Moatti, Eran Rom, Raúl Gracia-Tinedo, Dalit Naor, Doron Chen, Josep Sampé, Marc Sánchez-Artigas, Pedro García-López, Filip Gluszak, Eric Deschodt, Francesco Pace, Daniele Venzano, and Pietro Michiardi. “Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop“. IEEE ICDE’17, 2017.