Yottamine Analytics: Getting Started

Getting Started with the Yottamine Predictive Platform

What is it? The Yottamine Predictive Platform is a tool that provides high precision and large-scale predictive analytics usiang the latest machine learning techniques.

When to use it? The Yottamine Predictive Platform can be used on any sets of data for purposes of forecasting and analyzing.

Why use it? The Yottamined Predictive Platform combines the power and flexibility of cloud computing with advanced predictive analytics in a user-friendly manner to help corporations solve real-world problems.1

Who should use it? Businesses that require data analysis to perform forecasting will benefit from the Yottamine Predictive Platform.How to use it? Using the Yottamine Predictive Platform is simple, there are 8 different modules to be selected from depending on the tasks you wish to perform.

useit01This is where you are most likely to start as we require all data to be uploaded onto the cloud before any modeling/training can be carried out. See “Upload Data onto Cloud” below for more details on how to upload your data onto the cloud in this module.

useit01This module allows you to build and train a single model.

useit01If you have built a model previously and want to test it, this is the right module for you.

useit01This module allows you to build and test multiples models all at once.

useit01

In here you will find helpful tips and answers to FAQs that will get you started on the Yottamine Predictive Platform.

useit01Watch video tutorials instructing on some of the most commonly used features of the Yottamine Predictive Platform.

useit01

You can view and edit your account details as well as purchase more credits. This is also where you would go to view your credits history as well as sign up for additional services.

useit01Download a copy of the Yottamine Predictor so you can load your model into the predictor to make predictions.

 

Upload Data onto Cloud

All data must be uploaded onto the cloud before any model building/testing can begin.

First, select the “Upload Files” module from the main menu and the Upload Files pop-up window will appear. Under the “Cloud File System” panel, click on “Cloud” and you will see the “New Folder” icon appears at the bottom of the panel. You will need to specify the location of where you want to upload your files to before you can start uploading. You can either choose to upload onto the main cloud directory, or you can click on the “New Folder” icon if you need to create a new folder to store your file.

After selecting where you want your files to upload to, click on the “Add” icon to use the file browser to select your file, or you can simply drag and drop the files you wish to upload into the pop-up window. You can also copy and use the “Paste” button to select the files as well as choose multiple files by using the Ctrl+C combination. Once you have made your file selection, click on the “Upload” button and the upload process will start.

You may pause an upload at any time by clicking on the “Stop” button. You can also resume uploading a file by selecting the file and click on “Retry failed”.

After a file has been successfully uploaded, click on “Refresh” and find the folder you saved the file to. You will see that the file is now under the previously selected folder.

File Formats

There are currently 3 types of file formats that the Yottamine Predictive Platform accepts:

Default (SVM format) This can be of 2 formats, sparse or dense.

  • Sparse- Where zeros (0) are not given as an input, making this type of files smaller if many input components are zero.
  • Dense- Zeros (0) are given as input.

Example:
Assume you have 7 data originating from a 3-class problem and the input is 4 dimensional as given below by their numerical values:

1 0 1.1 0.3 -1.1
2 -2 0 1.1 0.7
2 1.1 -3 0 1.1
2 0 0 0 2
3 5 -0.5 1 2.3
3 2 0 -4.1 0
2 0 1.1 0 3.7

Dense Format

1 1:0 2:1.1 3:0.3 4:-1.1
2 1:-2 2:0 3:1.1 4:0.7
2 1.1 2:-3 3:0 4:1.1
2 1:0 2:0 3:0 4:2
3 1:5 2:-0.5 3:1 4:2.3
3 1:2 2:0 3:-4.1 4:0
2 1:0 2:1.1 3:0 4:3.7

Sparse Format (Supported in Linear, Gaussian and Polynomial Classification and Regression)

1 1:0 2:1.1 3:0.3 4:-1.1
2 1:-2 2:0 3:1.1 4:-1.1
2 1.1 2:-3 4:1.1 4:0.7
2 4:2 N/A N/A N/A
3 1:5 2:-0.5 3:1 4:2.3
3 1:2 3:-4.1 N/A N/A
2 2:1.1 4:3.7 N/A N/A

CSV(Comma Separate Values)

  • Data structured in a table of lists form, where each associated item in a group is in association with others also separated by the commas of its set.
  • Each line in the CSV file corresponds to a row in the table.
  • Within a line, fields are separated by commas, each field belonging to one table column.

Example:

The above table of data may be represented in CSV format as follows:

1997 Ford E350 ac, abs, moon $3,000.00
1999 Chevy Venture “Extended Edition” $4900.00
1999 Chevy Venture “Extended Edition,Very Large” $5000.00
1996 Jeep Grand Cherokee MUST SELL! air, moon rood, loaded $4,799.00

The above example demonstrates that:

  • fields that contain commas, double-quotes, or line-breaks must be quoted
  • a quote within a field must be escaped with an additional quote immediately preceding the literal quote
  • a quote and a comma within a field must be escaped with an extra additional quote preceding the literal quote
  • space before and after delimiter commas may not be trimmed
  • a line break within an element must be preserved

Source: http://en.wikipedia.org/wiki/Comma-separated_values

  • ARFF(Attribute Relation File Format)
    • ASCII text file that describes a list of instances sharing a set of attributes
    • Made up of 2 distinct sections, Header Information, which is followed by the Data Information
    • Header Information:
      • Contains the name of the relation, a list of the attributes (the columns in the data), and their types. Example:
        	% 1. Title: Iris Plants Database
        	%
          	% 2. Sources:
          	%      (a) Creator: R.A. Fisher
        	%      (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
          	%      (c) Date: July, 1988
          	%
          	@RELATION iris
        
          	@ATTRIBUTE sepallength  NUMERIC
          	@ATTRIBUTE sepalwidth   NUMERIC
         	@ATTRIBUTE petallength  NUMERIC
          	@ATTRIBUTE petalwidth   NUMERIC
          	@ATTRIBUTE class
          	{Iris-setosa,Iris-versicolor,Iris-virginica}
        • The relation name is defined as the first line in the ARFF file: @relation <relation-name>
        • <relation_name> is a string, and must be quoted if the name includes spaces
        • Attribute declarations take the form of an ordered sequence of @attribute statements. Each attribute in the data set has its own @attribute statement which uniquely defines the name of that attribute and its data type. The order the attribute are declared indicates the column position in the data section of the file. @attribute <attribute-name> <datatype>
        • <attribute-name> must start with an alphabetic character, if spaces are in the name then the entire name must be quoted
        • <datatype>can be of any 4 types:
          • Numeric: real or integer numbers
          • <nomnial-specification>: listing possible values: {<nominal-name1>, <nominal-name2>,”} e.g {Iris-setosa, Iris-versicolor} (values containing spaces must be quoted)
          • String: arbitrary textual values @ATTRIBUTE LLC string (string attribute declaration)
          • Date [<date-format>]: @attribute <name> date [<date-format>] where <date-format> is an optional string specifying how date values should be parsed and printed. Default: yyyy-MM-dd’T’HH:mm:ss
    • The @data declaration is a single line denoting the start of the data segment in the file
    • Data would look like the following:
      	5.1,3.5,1.4,0.2,Iris-setosa
      	4.9,3.0,1.4,0.2,Iris-setosa
      	4.7,3.2,1.3,0.2,Iris-setosa
      	4.6,3.1,1.5,0.2,Iris-setosa
      	5.0,3.6,1.4,0.2,Iris-setosa
      	5.4,3.9,1.7,0.4,Iris-setosa
      	4.6,3.4,1.4,0.3,Iris-setosa
      	5.0,3.4,1.5,0.2,Iris-setosa
      	4.4,2.9,1.4,0.2,Iris-setosa
      	4.9,3.1,1.5,0.1,Iris-setosa
    • Each instance is represented on a single line, with carriage returns denoting the end of the instance
    • Attribute values for each instance are delimited by commas, and must appear in the order that they were declared in the header section (ie the data corresponding to the nth @attribute declaration is always the nth field of the attribute)
    • Missing values are represented by a single question mark:
      	@ data
      	4.4, ?, 1.5
    • Values of string and nominal attributes are case sensitive, any that contain space must be quoted
    • Dates must be specified in the data sections using the string representation specified in the attribute declaration, example:
      	@RELATION Timestamps
      	@ATTRIBUTE timestamp DATE "yyyy-MM-dd HH:mm:ss"
      	@DATA
      	"2001-04-03 12:12:12"
      	"2001-05-03 12:59:55"
    • Sparse ARFF: data with value 0 are not explicitly represented, it has the same header (@relation and @attribute tags) but data section is different. The non-zero attributes are explicitly identified by attribute number and their value stated:Each instance is surrounded by curly braces, and the format for each entry is: <index><space><value> where index is the attribute index (starting from 0)
      	@data
      	{1 X, 3 Y, 4 "class A"}
      	{2 W, 4 "class B"}
    • Omitted values in a sparse are 0, they are not “missing” values. If value is unknown, represent it with a question mark (?)
    • Lines that begin with a % are comments
    • The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive

    Source: http://www.cs.waikato.ac.nz/~ml/weka/arff.html

Interpreting Results

Result Files

With the Build & Train New Model module, there will be 2 files under the “Project Name→JobName→results” folder if the model was built and tested. The “.result” file contains detailed results and is in the following format:

[Header]

Modeling Approach=Modeling Approach=Nonlinear SVM Classification with Gaussian

T_column=3

P_Parameter=3

Num_of_run=9

T_column

P_Parameter

Num_of_run

[Result Table]

Error Rate in Percentage=0.038,C=1000,sigma=0.5

Error Rate in Percentage=0.036,C=1000,sigma=0.55

Error Rate in Percentage=0.03,C=1000,sigma=0.6,optimal model

Error Rate in Percentage=0.038,C=900,sigma=0.5

Error Rate in Percentage=0.036,C=900,sigma=0.55

Error Rate in Percentage=0.03,C=900,sigma=0.6

Error Rate in Percentage=0.038,C=950,sigma=0.5

Error Rate in Percentage=0.036,C=950,sigma=0.55

Error Rate in Percentage=0.03,C=950,sigma=0.6

[Plotting]

Error Rate in Percentage=0.038,C=1000,sigma=0.5

Error Rate in Percentage=0.036,C=1000,sigma=0.55

Error Rate in Percentage=0.03,C=1000,sigma=0.6,optimal model

Error Rate in Percentage=0.038,C=900,sigma=0.5

Error Rate in Percentage=0.036,C=900,sigma=0.55

Error Rate in Percentage=0.03,C=900,sigma=0.6

Error Rate in Percentage=0.038,C=950,sigma=0.5

Error Rate in Percentage=0.036,C=950,sigma=0.55

Error Rate in Percentage=0.03,C=950,sigma=0.6

 

Batch Results
Batch Result files contain the same information as the individual result files and can be found under “ProjectName→TrainingFileName→results” folder with the “.result” file extension. The best model file can be found under the same folder with the file extension “.model” file.

Batch Processing

The batch processing module is very similar to the Build & Train Model module and is designed to allow for building and training of multiple models all at once.There are 3 different ways you can use this module.

Scenario 1- Load existing batch file
If you already have a batch file saved and would like to use the same parameters and datasets with minimal tweaking, then load up your existing batch file will be the most efficient way to build batch models. After loading your batch file, go through steps 1 to 4 and modify any setting you like. The attributes in all of the steps will be auto-populated using the batch file, however, you are still able to change the attributes as you go through each step.

Scenario 2- Load workflow file
If you would like to use the same parameters on different sets of training files then simply load up the workflow file and select the training files in step 1, then go through step 2 to 4 and change any of the attributes as you wish before building the batch files.

Scenario 3- Start new workflow
If you would like to build multiple models without using a batch or workflow file, then the “Start new workflow” option is the most appropriate.

 

Step 1: Training File Selection
You will need to specify the problem type as well select the training data in this step.

Step 2: Attributes
Specify the attributes selection. The attributes selection will apply to all of the models built in this batch.

Step 3: Test Method
Select the test method for each model, the test methods are model specific.

Step 4: Parameters
Specify the parameters. You can either use the same parameters for all models or specify different parameters for each model. Once you are ready, click on “Build Models”. A popup window asking you to save batch files will appear. Click on “Yes” and input a Project Name and the location of where the batch file will be saved is shown. Then click on “Save” and the confirmation of where the batch file is saved is shown. Click on “Ok” and where the result files will be saved is shown. Click on “Save” again and this starts the batch models building process.