RapidMiner

From Sinfronteras
Revision as of 13:05, 1 November 2018 by Adelo Vieira (talk | contribs) (Adding extensions - The RapidMiner Marketplace)
Jump to: navigation, search

https://rapidminer.com/

RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the machine learning process including data preparation, results visualization, model validation and optimization. RapidMiner is developed on an open core model. The RapidMiner Studio Free Edition, which is limited to 1 logical processor and 10,000 data rows is available under the AGPL license. Commercial pricing starts at $2,500 and is available from the developer.

Installing RapidMiner

Descargamos el paquete y seguimos las instruciones en el sitio oficial: https://docs.rapidminer.com/latest/studio/installation/

./RapidMiner-Studio.sh

We will need to create a RapidMiner account.

Training Videos

https://rapidminer.com/training/videos/

Introductions

https://rapidminer.com/training/videos/#introductions

GUI Intro

The views
Design view

Work areas for specific taks...

Process panel: It is to dissing any process, like:

  • Data loading
  • Forecasting
  • ...


  • To get started with a very simple process we can place an operator into the process panel:
    • For example, we can go to Data Access Operators and place (Drag and Drop) a «Retrieve» operator into the process panel.
    • Luego de hacer esto y seleccionar (click) el operator en el Process panel, el Parameters Panel cambia y permitirá, a traveés del folder icon, seleccionar the file we want to load. Podemos, por ejemplo seleccionar la «Titanic data set» que se encuenra pre-loaded in RapidMiner-Studio.
    • Then, in order to run the process, we need to connect the port of the «Retrieve» operator with the Result port.
    • Then, to run the process we have to click the Run button (>) (or F11) y así RapidMiner ejecutará el proceso y automáticamente desplegará the Result View, where the data set is display by default as a table.


Ports:

Results view

Work areas for specific taks...

Auto Model view
Operators
Repository
  • Through the repository panel you can access data and your process.
  • Al iniciar un proyecto se recomienda crear un nuevo repositorio con dos sub-folders: data and processes
Parameters
Global Search

Adding extensions - The RapidMiner Marketplace

  • To add extensions go to: Extensions > Marketplace:
    • Top Downloads: some of the most popular extensions.
    • Se recomienda instalar las siguientes:
      • Text Processing
      • Web Mining
      • Python/R integration
      • Anomaly Detection
      • Series extension
      • RapidMiner Radoop


  • Luego de instalar la extension se the «Extension» folder in the «Operators» panel mostrará una nueva carpeta por cada extension instalada.
  • También hay extension que adicionan una nueva «View». Por ejemplo, the «Radoop» extension adds the «Hadoop Data» view.


  • To manage and uninstall extensions go to: Extensions > Manage Extensions


  • Otra forma de instalar extensions es ir directamente al Marketplace website at: https://marketplace.rapidminer.com
    • Descargamos el .jar file y lo colocamos en la «extension folder»: /home/adelo/.RapidMiner/extensions/
    • Reiniciamos RapidMiner and the extension will become available.

Visualizing data

https://rapidminer.wistia.com/medias/w623uxkoga

  • Attribute = Column
    • Regular attributes
    • Special attributes:
      • Label: Cuando una attributo es marcado como «Label» quiere decir que tal atributo es el que queremos que el modelo aprenda a predecir (It's the attribute that we want our model to learn to predict). So we are going to use the regular attributes to do so.
  • Example = Row
  • Example set = The entire data


  • Data tab:
    • When displaying data (in the Results View) we can sort the order of the attributes by clicking on the attribute (one click for ascending, a second click to descending and a third time to remove the sorting. By pressing the Ctrl key we can sort by multiple attributes.


  • Statistics tab: RapidMiner do some automatic data discovery.
    • We can display the data in different chart styles (Histogram, Scatter, Pie, etc).
      • We can display multiple attributes in the same chart by clicking in the attributes shown in the «Plots box» using the Ctrl key.


  • To change the standard colors of the chart you can go to:
    • Settings > Preferences:
      • Color for minimum value in chart keys
      • Color for maximum value in chart keys


  • Advanced charts tap:
    • Example (using the «customer-churn-data»):
      • Drag and Drop the «Age» attribute to the «Domain dimension» (that represent the x axis) and the «Last transaction» attribute to the «Empty axis» («Numerical axis) (that represent the y axis)
  • Having «Series: «LastTransaction» selected (clicking) from chart configuration box:
  • Title: Number of RapidMiner users in million
  • Visualization: Lines and shapes
  • Some format configurations:
  • Item shape: Diamond
  • Color: Yellow
  • Line stile: solid
  • Aggregation: Average
  • Indicators:
  • Indicator type: Band
  • Indicator 1: Drag and Drop age to this field.
  • Indicator 2: Drag and Drop age to this field.


  • Selecting Domain dimension from chart configuration box:
  • Title: Weeks from today


  • Selecting global configuration from chart configuration box:
  • Chart title: Prediction of RapidMiner studio users
  • Plot background: Change color to black


  • Then you can export the plot:
  • File > Print/Export Image


Data Preparation AND ETL

https://rapidminer.com/training/videos/#data-preparation

Model and Validate

https://rapidminer.com/training/videos/#model-validate

Operationalize

https://rapidminer.com/training/videos/#operationalize