RapidMiner

From Sinfronteras
Revision as of 14:49, 1 November 2018 by Adelo Vieira (talk | contribs) (Introductions)
Jump to: navigation, search

https://rapidminer.com/

RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the machine learning process including data preparation, results visualization, model validation and optimization. RapidMiner is developed on an open core model. The RapidMiner Studio Free Edition, which is limited to 1 logical processor and 10,000 data rows is available under the AGPL license. Commercial pricing starts at $2,500 and is available from the developer.

Installing RapidMiner

Descargamos el paquete y seguimos las instruciones en el sitio oficial: https://docs.rapidminer.com/latest/studio/installation/

./RapidMiner-Studio.sh

We will need to create a RapidMiner account.

Training Videos

https://rapidminer.com/training/videos/

Introductions

https://rapidminer.com/training/videos/#introductions

GUI Intro

https://rapidminer.wistia.com/medias/dxnsrftr9i

The views
Design view

Work areas for specific taks...

Process panel: It is to dissing any process, like:

  • Data loading
  • Forecasting
  • ...


  • To get started with a very simple process we can place an operator into the process panel:
    • For example, we can go to Data Access Operators and place (Drag and Drop) a «Retrieve» operator into the process panel.
    • Luego de hacer esto y seleccionar (click) el operator en el Process panel, el Parameters Panel cambia y permitirá, a traveés del folder icon, seleccionar the file we want to load. Podemos, por ejemplo seleccionar la «Titanic data set» que se encuenra pre-loaded in RapidMiner-Studio.
    • Then, in order to run the process, we need to connect the port of the «Retrieve» operator with the Result port.
    • Then, to run the process we have to click the Run button (>) (or F11) y así RapidMiner ejecutará el proceso y automáticamente desplegará the Result View, where the data set is display by default as a table.


Ports:

Results view

Work areas for specific taks...

Auto Model view
Operators
Repository
  • Through the repository panel you can access data and your process.
  • Al iniciar un proyecto se recomienda crear un nuevo repositorio con dos sub-folders: data and processes
Parameters
Global Search

Adding extensions - The RapidMiner Marketplace

https://rapidminer.wistia.com/medias/9nu4i7b5ea

  • To add extensions go to: Extensions > Marketplace:
    • Top Downloads: some of the most popular extensions.
    • Se recomienda instalar las siguientes:
      • Text Processing
      • Web Mining
      • Python/R integration
      • Anomaly Detection
      • Series extension
      • RapidMiner Radoop


  • Luego de instalar la extension se the «Extension» folder in the «Operators» panel mostrará una nueva carpeta por cada extension instalada.
  • También hay extension que adicionan una nueva «View». Por ejemplo, the «Radoop» extension adds the «Hadoop Data» view.


  • To manage and uninstall extensions go to: Extensions > Manage Extensions


  • Otra forma de instalar extensions es ir directamente al Marketplace website at: https://marketplace.rapidminer.com
    • Descargamos el .jar file y lo colocamos en la «extension folder»: /home/adelo/.RapidMiner/extensions/
    • Reiniciamos RapidMiner and the extension will become available.

Data Preparation AND ETL

https://rapidminer.com/training/videos/#data-preparation

Data preparation

Prepare and clean up the data.

In reality data is never complete and without issues.

Here we'll show you some of the operators that help to prepare and clean up the data.

Model and Validate

https://rapidminer.com/training/videos/#model-validate

Operationalize

https://rapidminer.com/training/videos/#operationalize