RapidMiner
RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the machine learning process including data preparation, results visualization, model validation and optimization. RapidMiner is developed on an open core model. The RapidMiner Studio Free Edition, which is limited to 1 logical processor and 10,000 data rows is available under the AGPL license. Commercial pricing starts at $2,500 and is available from the developer.
Contents
Installing RapidMiner
Descargamos el paquete y seguimos las instruciones en el sitio oficial: https://docs.rapidminer.com/latest/studio/installation/
./RapidMiner-Studio.sh
We will need to create a RapidMiner account.
Training Videos
https://rapidminer.com/training/videos/
Introductions
https://rapidminer.com/training/videos/#introductions
GUI Intro
https://rapidminer.wistia.com/medias/dxnsrftr9i
The views
Design view
Work areas for specific taks...
Process panel: It is to dissing any process, like:
- Data loading
- Forecasting
- ...
- To get started with a very simple process we can place an operator into the process panel:
- For example, we can go to Data Access Operators and place (Drag and Drop) a «Retrieve» operator into the process panel.
- Luego de hacer esto y seleccionar (click) el operator en el Process panel, el Parameters Panel cambia y permitirá, a traveés del folder icon, seleccionar the file we want to load. Podemos, por ejemplo seleccionar la «Titanic data set» que se encuenra pre-loaded in RapidMiner-Studio.
- Then, in order to run the process, we need to connect the port of the «Retrieve» operator with the Result port.
- Then, to run the process we have to click the Run button (>) (or F11) y así RapidMiner ejecutará el proceso y automáticamente desplegará the Result View, where the data set is display by default as a table.
Ports:
Results view
Work areas for specific taks...
Auto Model view
Operators
Repository
- Through the repository panel you can access data and your process.
- Al iniciar un proyecto se recomienda crear un nuevo repositorio con dos sub-folders: data and processes
Parameters
Global Search
Adding extensions - The RapidMiner Marketplace
https://rapidminer.wistia.com/medias/9nu4i7b5ea
- To add extensions go to: Extensions > Marketplace:
- Top Downloads: some of the most popular extensions.
- Se recomienda instalar las siguientes:
- Text Processing
- Web Mining
- Python/R integration
- Anomaly Detection
- Series extension
- RapidMiner Radoop
- Luego de instalar la extension se the «Extension» folder in the «Operators» panel mostrará una nueva carpeta por cada extension instalada.
- También hay extension que adicionan una nueva «View». Por ejemplo, the «Radoop» extension adds the «Hadoop Data» view.
- To manage and uninstall extensions go to: Extensions > Manage Extensions
- Otra forma de instalar extensions es ir directamente al Marketplace website at: https://marketplace.rapidminer.com
- Descargamos el .jar file y lo colocamos en la «extension folder»: /home/adelo/.RapidMiner/extensions/
- Reiniciamos RapidMiner and the extension will become available.
Data Preparation AND ETL
https://rapidminer.com/training/videos/#data-preparation
Data preparation
Prepare and clean up the data.
In reality data is never complete and without issues.
Here we'll show you some of the operators that help to prepare and clean up the data.
Model and Validate
https://rapidminer.com/training/videos/#model-validate