This section describes how to install and run the Open edX Analytics Developer Stack.
The Open edX Analytics Developer stack, known as the Analytics Devstack, is a modified version of the Open edX Developer Stack.
It provides all of the services and tools needed to modify the Open edX Analytics Pipeline, Data API, and Insights projects.
The Analytics Devstack includes the following additional components.
The Analytics Devstack also includes all of the components needed to run the Open edX Analytics Pipeline, which is the primary ETL (extract, transform, and load) tool that extracts and analyzes data from the other Open edX services.
To install and run the Analytics Devstack, you must first install the following required software.
Additionally, the Open edX Analytics Pipeline includes a tool that is used to deploy itself. To make use of these tools, follow these steps.
Clone the repository on your host, not on the Virtual Machine.
$ git clone https://github.com/edx/edx-analytics-pipeline
Install the project dependencies into a virtualenv on your host.
$ cd edx-analytics-pipeline
$ virtualenv venv
$ source venv/bin/activate
$ make bootstrap
The system is now ready to start running tasks on the Analytics Devstack
using the remote-task
tool.
To install the Analytics Devstack extensions directly from the command line, follow these steps.
Create the analyticstack
directory and navigate to it in the command
prompt.
$ mkdir analyticstack
$ cd analyticstack
Download the Analytics Devstack Vagrant file.
$ curl -L https://raw.github.com/edx/configuration/master/vagrant/release/analyticstack/Vagrantfile > Vagrantfile
Create the Analytics Devstack virtual machine.
$ vagrant up
Log in to the Analytics Devstack.
$ vagrant ssh
Switch to the edxapp
user.
$ sudo su edxapp
Start the LMS.
$ paver devstack lms
Log in to the Analytics Devstack.
$ vagrant ssh
Switch to the analytics_api
user.
$ sudo su analytics_api
Start the Data API.
$ ~/venvs/analytics_api/bin/python ~/analytics_api/manage.py runserver 0.0.0.0:8100 --insecure
Log in to the Analytics Devstack.
$ vagrant ssh
Switch to the insights
user.
$ sudo su insights
Enable features that are disabled by default.
$ ~/venvs/insights/bin/python ~/edx_analytics_dashboard/manage.py switch display_verified_enrollment on --create
$ ~/venvs/insights/bin/python ~/edx_analytics_dashboard/manage.py switch enable_course_api on --create
Start Insights.
$ ~/venvs/insights/bin/python ~/edx_analytics_dashboard/manage.py runserver 0.0.0.0:8110 --insecure
Open the URL http://127.0.0.1:8110
in a browser on the host.
Important
Be sure to use the IP address 127.0.0.1
instead of
localhost
. Using localhost
will prevent you from logging in.
In the Devstack LMS, register a new user and enroll in the demo course.
Navigate to the courseware and submit answers to a few problems.
Navigate to the location where edx-analytics-pipeline project was cloned on the host.
$ cd edx-analytics-pipeline
Run the enrollment task.
$ export WHEEL_URL=http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise
# On Mac OS X replace the date command below with $(date -v+1d +%Y-%m-%d)
$ remote-task --vagrant-path <path to `analyticstack`> --remote-name devstack --override-config ${PWD}/config/devstack.cfg --wheel-url $WHEEL_URL --wait \
ImportEnrollmentsIntoMysql --local-scheduler --interval-end $(date +%Y-%m-%d -d "tomorrow") --n-reduce-tasks 1
Run the answer distribution task.
$ export WHEEL_URL=http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/Ubuntu/precise
$ export UNIQUE_NAME=$(date +%Y-%m-%dT%H_%M_%SZ)
$ remote-task --vagrant-path <path to `analyticstack`> --remote-name devstack --override-config ${PWD}/config/devstack.cfg --wheel-url $WHEEL_URL --wait \
AnswerDistributionWorkflow --local-scheduler \
--src hdfs://localhost:9000/data/ \
--include '*tracking.log*' \
--dest hdfs://localhost:9000/edx-analytics-pipeline/output/answer_distribution_raw/$UNIQUE_NAME/data \
--name $UNIQUE_NAME \
--output-root hdfs://localhost:9000/edx-analytics-pipeline/output/answer_distribution/ \
--marker hdfs://localhost:9000/edx-analytics-pipeline/output/answer_distribution_raw/$UNIQUE_NAME/marker \
--n-reduce-tasks 1