datpark

datapark.io

Brief Overview and Introduction (needs Python 2.7 on the platform)

General Aspects and Trials

datapark.io is developed and maintained by The Python Quants GmbH. It offers Web-/browser-based data science for individuals, teams and organizations. Free registrations are possible under http://cloud.datapark.io.

You can freely choose your user_name and have to provide a valid email address to which your confirmation email will be sent. You can then login under http://cloud.datapark.io, using your user_name.

Please note that trial/test accounts are only for illustration purposes and they can be closed at any time (with all data, code, etc. be permanently deleted). Please read also the Terms & Conditions as well as our Privacy Policy. If you have questions about the platform or any troubles, you can reach us under datapark@tpq.io.

Components & Features

At the moment, datapark comprises the following components and features:

  • IPython Notebook: interactive data and financial analytics in the browser with full Python integration and much more (cf. IPython home page).
  • Anaconda Python Distribution: complete Python stack for financial, scientific and data analytics workflows/applications (cf. Anaconda page); you can easily switch between Python 2.7 and 3.4.
  • R & Julia: use the leading statistics language R and the performance oriented language Julia for numerical computing.
  • File Manager: a GUI-based File Manager to upload, download, copy, remove, rename files.
  • User Forum: there is a simple forum application available via which you can share thoughts, documents and more.
  • Collaboration: the platform features user/group administration as well as file sharing via public folders and individual Web pages/folders.
  • Linux Server: datapark is powered by Linux servers to which you have full shell access.
  • Deployment: datapark is easily scalable since it is cloud-based and can also be easily deployed on your own servers (via Docker containers).

dp Overview

IPython Notebook

In the left panel of the platform, you find the current working path indicated (in black) as well as the current folder and file structure (as links in purple). Note that in this panel only IPython Notebook files are displayed. Here you can navigate the current folder structure by clicking on a link. Clicking on the double points ".." brings you one level up in the structure. Clicking the refresh button right next to the double points updates the folder/file structure. Clicking on a file link opens the IPython Notebook file.

Basic Approach

You find a link to open a new notebook on top of the left panel. With IPython notebooks, like with this one, you can interactively code Python and do data/financial analytics.

In [1]:
print ("Hello Data Science World.")
Hello Data Science World.

In [2]:
# simple calculations
3 + 4 * 2
Out[2]:
11
In [3]:
# working with NumPy arrays
import numpy as np
rn = np.random.standard_normal(100)
rn[:10]
Out[3]:
array([-2.03644984, -1.22943092,  1.89180996, -0.38089672, -1.56548998,
       -0.12738937,  1.91656909,  1.71945367,  0.44964487,  0.43329359])
In [4]:
# plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(rn.cumsum())
plt.grid(True)

If you are new to IPython Notebook, you could start on the IPython home page and might want to check out the videos that are linked there (cf. video page).

Efficient Financial Analytics

Combining the pandas library with IPython Notebook makes for a powerful financial analytics environment.

In [5]:
import pandas as pd
import pandas.io.data as web
In [6]:
AAPL = web.DataReader('AAPL', data_source='google')
  # reads data from Google Finance
AAPL['42d'] = pd.rolling_mean(AAPL['Close'], 42)
AAPL['252d'] = pd.rolling_mean(AAPL['Close'], 252)
  # 42d and 252d trends
In [7]:
AAPL[['Close', '42d', '252d']].plot(figsize=(10, 5))
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd240570410>

Using R from within IPython

Loading the R extension for IPython.

In [8]:
# only Python 2.7
%load_ext rpy2.ipython

Pushing data to R.

In [9]:
AAPL_close = AAPL['Close'].values
In [10]:
%Rpush AAPL_close

Plotting data with R.

In [11]:
%R plot(AAPL_close, pch=20, col='red'); grid(); title("AAPL closing values")

Using Julia from within Python

Julia is, for example, often faster for iterative function formulations. As an example, consider the Fibonacci sequence.

In [12]:
%%julia
# recursive formulation in Julia
fib_rec(n) = n < 2 ? n : fib_rec(n - 1) + fib_rec(n - 2)
@elapsed f = fib_rec(40)
f
fib_rec (generic function with 1 method)
1.55161821
102334155

In [13]:
# same in Python
def fib_rec(n):
    if n < 2:
        return n
    else:
        return fib_rec(n - 1) + fib_rec(n - 2)
%time fib_rec(40)
CPU times: user 50.8 s, sys: 61 ms, total: 50.9 s
Wall time: 50.9 s

Out[13]:
102334155

File Manager

The File Manager allows the easy, GUI-based file management on the platform.

Left Column

In the left column you can navigate the file system. For instance, you find a folder called public which you can use to share files with others.

dp File Manager

Right Column

In the right column, you find the contents of the folder currently active in the left column. The content is updated by clicking on the refresh butotn. You can, for example, drag and drop files and folders as well as upload files from you local disk. For uploading, you have to do the following:

  1. press the add files button
  2. select (multiple) file(s) from your local disk
  3. press the upload button

Via a double click on a file, you can open it in the editor or with IPython Notebook it. In addition, via a right click on a file, you can:

  • delete a file/folder
  • rename a file/folder
  • zip a file/folder
  • generate a new folder
  • generate a new text file
  • download a file

All file operations are only implementable based on the respective user's rights on the operating system level. For example, everybody can copy a file to the public folder. This file can then be read and executed by everybody else, but only the "owner" of the file can overwrite or delete it.

System & IPython Shell

This component of the platform allows the shell-based access to the Linux server. This part of the platform requires a separate login for security reasons. For example, you can also interactively code on the shell via IPython Shell. The IPython Shell version is started by simply typing ipython in the system shell.

dp IPython

Of course, you can do anything else via the system shell given your personal rights on the operating system level. Among others, you can:

  • do file operations (copying, renaming, moving, etc.)
  • use Git repositories (to clone/pull projects, commit and push them)

Text Editing and Coding

Via the JavaScript-based code editor, you can edit almost any technical file type with syntax highlighting and lots of useful text editing functions (shortcuts).

dp Editor

Account Management

On datapark you can manage your personal information at any time under My Account.

dp Account

Consulting and Development Services

The Python Quants group – i.e. The Python Quants GmbH, Germany, and The Python Quants LLC., New York City – provide consulting, training and development services with a focus on data and financial anlytics.