Context Navigation

Personal Cloud Measurement

We have created a script named measurement.py. It takes two arguments:

Provider: -p or --provider and [dropbox|box|sugarsync]
Test: -t or --type and [load_and_transfer|service_variability]

Depending on the test type, the script executes one of the following files:

load_and_transfer: File load_and_transfer_test.py
service_variability: File service_variability.py

Load and Transfer Workload

The objective of this workload was twofold: Measuring the maximum up/down transfer speed of operations and detecting correlations between the transfer speed and the load of an account. Intuitively, the first objective was achieved by alternating upload and download operations, since the provider only needed to handle one operation per account at a time. We achieved the second point by acquiring information about the load of an account in each API call. The execution of this workload was continuously performed at each node as follows: First, a node created synthetic files of a size chosen at random from the aforementioned set of sizes. That node uploaded files until the capacity of the account was full. At this point, that node downloaded all the files also in random order. After each download, the file was deleted.

Implementation

First of all, the script creates 4 different files of different sizes (25, 50, 100 and 150 MB). Once done, it starts to upload random files until the account is full and an error is returned from the provider. When this appends the script starts to download a random file from the account and removes it when the download has finished.

This test will be running for 5 days approximately.

Service Variability Workload

This workload maintained in every node a nearly continuous upload and download transfer flow to analyze the performance variability of the service over time. This workload provides an appropriate substrate to elaborate a time-series analysis of these services. The procedure was as follows: The upload process first created files corresponding to each defined file size which were labeled as “reserved”, since they were not deleted from the account. By doing this we assured that the download process was never interrupted, since at least the reserved files were always ready for being downloaded. Then, the upload process started uploading synthetic random files until the account was full. When the account was full, this process deleted all files with the exception of the reserved ones to continue uploading files. In parallel, the download process was continuously downloading random files stored in the account.

Implementation

The script creates 4 files as the previous test. When the files are ready it uploads a file named reserved.dat (which will remain in the accound until the test ends) with a size of 50MB. As soon as the file is completely uploaded, the script creates two threads, one for download and one for upload.

The upload thread constantly uploads files sized [25, 50, 100, 150] MB until the account is full. Once it is full, the thread removes all files except "reserved.dat". Then, it starts its cycle again.

The download thread continuously lists all files in the account and downloads one randomly chosen. There will be always at least one file ("reserved.dat").

Deployment

Finally, we executed the experiments in different ways depending on the chosen platform. In the case of PlanetLab?, we employed the same machines in each test, and therefore, we needed to sequentially execute all the combinations of workloads and providers. This minimized the impact of hardware and network heterogeneity, since all the experiments were executed in the same conditions. On the contrary, in our labs we executed in parallel a certain workload for all providers (i.e. assigning 10 machines per provider). This provided two main advantages: The measurement process was substantially faster, and fair comparison of the three services was possible for the same period of time.

Traces

(NEW!) Here we provide some measurement traces collected during our measurement.

Trace Format. Files are in .csv format. The column fields are:

row_id: database row identifier.
account_id: Personal Cloud account used to perform this API call.
file_size: size of the uploaded/downloaded file in bytes.
operation_time_start: starting time of the API call.
operation_time_end: Finishing time of the API call.
time_zone (not used): Time zone of a node for PlanetLab? tests (http://www.planet-lab.org/).
operation_id: Hash to identify this API call.
operation_type: PUT/GET API call.
bandwidth_trace: time-series trace of a file transfer (Kbytes/sec) obtained with vnstat (http://humdi.net/vnstat/).
node_ip: Network address of the node executing this operation.
node_name: Host name of the node executing this operation.
quota_start: Amount of data in the Personal Cloud account at the moment of starting the API call.
quota_end: Amount of data in the Personal Cloud account at the moment of finishing the API call.
quota_total: Storage capacity of this Personal Cloud account.
capped (not used): Indicates if the current node is being capped (for PlanetLab? tests).
failed: Indicates if the API call has failed (1) or not (0).
Failure info: Includes the available failure information in this API call (if any).

Files and Experiment Description.

Load and Transfer Test (University Labs, from 2012-06-28 18:09:36 to 2012-07-03 18:36:07) - Box (http://ast-deim.urv.cat/pc_measurement/measurement_box_load_test.csv)

Load and Transfer Test (University Labs, from 2012-06-28 17:23:05 to 2012-07-03 15:52:37) - DropBox? (http://ast-deim.urv.cat/pc_measurement/measurement_dropbox_load_test.csv)

Load and Transfer Test (University Labs, from 2012-06-29 14:12:37 to 2012-07-04 14:00:31) - SugarSync? (http://ast-deim.urv.cat/pc_measurement/measurement_sugarsync_load_test.csv)

Load and Transfer Test (PlanetLab?, from 2012-07-11 16:02:53 to 2012-07-16 06:05:29) - Box (http://ast-deim.urv.cat/pc_measurement/measurement_box_load_transfer_pl.csv)

Load and Transfer Test (PlanetLab?, from 2012-06-22 17:05:05 to 2012-06-28 08:52:38) - DropBox? (http://ast-deim.urv.cat/pc_measurement/measurement_dropbox_load_transfer_pl.csv)

Load and Transfer Test (PlanetLab?, from 2012-07-11 16:03:53 to 2012-07-17 09:37:24) - SugarSync? (http://ast-deim.urv.cat/pc_measurement/measurement_sugarsync_load_transfer_pl.csv)

Service Variability Test (University Labs, from 2012-07-04 16:11:51 to 2012-07-09 10:19:24) - Box (http://ast-deim.urv.cat/pc_measurement/measurement_box_service_variability.csv)

Service Variability Test (University Labs, from 2012-07-03 18:30:47 to 2012-07-09 10:02:50) - DropBox? (http://ast-deim.urv.cat/pc_measurement/measurement_dropbox_service_variability.csv)

Service Variability Test (University Labs, from 2012-07-04 16:17:13 to 2012-07-09 14:34:07) - SugarSync? (http://ast-deim.urv.cat/pc_measurement/measurement_sugarsync_service_variability.csv)

Citation Policy.

To benefit from this dataset in your research you should cite the original measurement paper: "Actively Measuring Personal Cloud Storage". Raúl Gracia-Tinedo, Marc Sánchez-Artigas, Adrián Moreno-Martínez, Cristian Cotes and Pedro García-López. 6th IEEE International Conference on Cloud Computing (Cloud'13), pages 301-308. June 27-July 2, 2013, Santa Clara Marriot, CA, USA. (http://www.thecloudcomputing.org/2013/)

Bibtex format: @conference{gracia_actively_cloud_13, author = "Gracia-Tinedo, Ra{\'u}l and S{\'a}nchez-Artigas, Marc and Moreno-Mart{\'i}nez, Adri{\'a}n and Cotes-Gonz{\'a}lez, Cristian and Garc{\'i}a-L{\'o}pez, Pedro", booktitle = "IEEE CLOUD'13", pages = "301-308", title = "{A}ctively {M}easuring {P}ersonal {C}loud {S}torage", year = "2013", }

Paper in PDF format: http://ants.etse.urv.es/web/administrator/components/com_jresearch/files/publications/Actively%20Measuring%20Personal%20Cloud%20Storage.pdf

For any doubt email any of the article authors.

Enjoy :)

Last modified 11 years ago Last modified on 09/06/13 12:41:47

Download in other formats:

Plain Text