RapidMiner
RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the machine learning process including data preparation, results visualization, model validation and optimization. RapidMiner is developed on an open core model. The RapidMiner Studio Free Edition, which is limited to 1 logical processor and 10,000 data rows is available under the AGPL license. Commercial pricing starts at $2,500 and is available from the developer.
Contents
Installing RapidMiner
Descargamos el paquete y seguimos las instruciones en el sitio oficial: https://docs.rapidminer.com/latest/studio/installation/
./RapidMiner-Studio.sh
We will need to create a RapidMiner account.
Description of the User Interface
The views
Design view
Work areas for specific taks...
Process panel
It is to dissing any process, like:
- Data loading
- Forecasting
- ...
- To get started with a very simple process we can place an operator into the process panel:
- For example, we can go to Data Access Operators and place (Drag and Drop) a «Retrieve» operator into the process panel.
- Luego de hacer esto y seleccionar (click) el operator en el Process panel, el Parameters Panel cambia y permitirá, a traveés del folder icon, seleccionar the file we want to load. Podemos, por ejemplo seleccionar la «Titanic data set» que se encuenra pre-loaded in RapidMiner-Studio.
- Then, in order to run the process, we need to connect the port of the «Retrieve» operator with the Result port.
- Then, to run the process we have to click the Run button (>) (or F11) y así RapidMiner ejecutará el proceso y automáticamente desplegará the Result View, where the data set is display by default as a table.
Ports:
Results view
Work areas for specific taks...
Auto Model view
Operators
Repository
- Through the repository panel you can access data and your process.
- Al iniciar un proyecto se recomienda crear un nuevo repositorio con dos sub-folders: data and processes
Parameters
Global Search
Adding extensions
- To add extensions go to: Extensions > Marketplace:
- Top Downloads: some of the most popular extensions.
- Se recomienda instalar las siguientes:
- Text Processing
- Web Mining
- Python/R integration
- Anomaly Detection
- Series extension
- RapidMiner Radoop
- Luego de instalar la extension se the «Extension» folder in the «Operators» panel mostrará una nueva carpeta por cada extension instalada.
- También hay extension que adicionan una nueva «View». Por ejemplo, the «Radoop» extension adds the «Hadoop Data» view.
- To manage and uninstall extensions go to: Extensions > Manage Extensions
- Otra forma de instalar extensions es ir directamente al Marketplace website at: https://marketplace.rapidminer.com
- Descargamos el .jar file y lo colocamos en la «extension folder»: /home/adelo/.RapidMiner/extensions/
- Reiniciamos RapidMiner and the extension will become available.
Importing data
- Iniciamos nuestro nuevo proyecto by creating a new repository with 2 sub-folders, let's call it:
- MyFirstPrediction
- data
- processes
- MyFirstPrediction
- Then, to import the data we can click the button «Import Data» and look for the file or we can just Drag and Drop the file into RapidMiner.
- In the second step (format your columns) we can change some of the properties of the attributes (columns):
- Change type: Real, Integer, etc
- Rename column
- Change Role: The default for each column is «General Attribute». We can change the role to: id, label, wight...
- In the second step (format your columns) we can change some of the properties of the attributes (columns):
Data preparation
Prepare and clean up the data.
In reality data is never complete and without issues.
Here we'll show you some of the operators that help to prepare and clean up the data.