- I started programming around 15 years ago, when I was studying geophysics, so in this part of my career, as a geophysics, I started coding mathematical, engineering problems and Data analysis and Data processing topics (Signal analysis in particular: A signal is a function that conveys information about a phenomenon. For example, Sound, images and videos are considered to be signals) . One of my main projects in this area was developing programs to perform Seismic Wave Propagation Simulations (Seismic Modelling). During this experience, I got skills in Matlab (which is a data analysis environement/or a numerical computing environement), Scilab and Shell scripting.
- Research geophysicist at GRyDs
- As a Research Geophysicist, I was responsible for performing a set of Signal analysis/Time-series analysis/Data processing tasks and ensuring the correct integration and implementation of geophysical applications into a computer cluster platform. This platform was being designed in order to facilitate task scheduling and run Computationally intensive tasks/highly compute-intensive tasks on clusters. One of my main activities was shell script programming for Seismic Modeling and Processing.
Task automation using Shell scripting: Here I could mention the generation of images to create seismic waves propagation videos or the automatic generation of pdf reports using latex that contained details about the executed process: time vs. the features of the data generated (the amount of data generated).
I have skills in Matlab, Scilab and Shell scripting that I got during my participation in an R&D Unit at Simón Bolívar University (The Parallel and Distributed Systems Group - GryDs).
MATLAB (matrix laboratory) is a language and numerical computing environment. MATLAB allows data analysis and data visualization, matrix manipulations, and performing numerical computations. Matlab contains a huge library of functions that facilitate the resolution of many mathematical and engineering problems. For example, I used it for Signal Analysis, specifically for Seismic data analysis. it for Ex. 1 and Ex. 2:
- Signal Processing in Geophysics
- Ex.1: That allows defining the coordinates of the layers of a geological model by opening an image file of the geological model and selecting, by clicking with the mouse, a set of points (or coordinates) that define each of the layers of the geological model. These coordinates will be saved in a very particular format that will be used as input of another program that is in charge of building the Geological model entity used by another program to perform a Seismic Wave Propagation Modelling.
- In the latest years I decided to reorient my career toward IT, specifically toward Data Sciences and Software Development.
- During my Bsc. in Information Technology, I have developed an excellent academic level and a clear understanding of the most important Object-Oriented Principles and Concepts. I have developed several object-oriented Java applications.
- I have a special interest for Web Development. I have also developed several Web Applications using different technologies:
- HTML, CSS
- React for the Frontend
- Express.js for the backend. This is a Node.js framework: HTTP REST APIs
- Dash: Python web application framework for building data analytic applications
So, I'm a programmer. Even if I haven't worked in a programming position for a long time, during my academic and professional experience I have worked in programming in several ocations. As I said I've been programming for 15 years. And during this time I have used many programming languages. I like programming so much that even when I'm writing a report I use a programming-based tool (Latex), I don't use a word processor like Microsoft Word. So, the programming logic, principles, and concepts of object-oriented programming, etc, is something that I'm really proficient in. Of course, I don't have 10 years experience working in a Software Developer role, so of course, you can ask me something about programming that I don't know, but you can be sure that I know how to program and that I'm able to learn any new programming language or concept in a very short time. So that is somethign that I really wanted to make clear, that I'm really proficient in programming... that I'm very confident about my programming skills...
In this project, we have created a GUI Java (Swing) Application for a Zoo Management System.
In this project, we have created a GUI Java (Swing) Application that simulates a trading day of a simplified model of a stock market.
This Application was developed using:
Back-end: Node.js (Express) (TypeScript)
Front-end: React (TypeScript)
This Application was developed using Python-Django Web framework
- Some of the languages I've worked with:
Well, I've been working with Data Analytic, I mean topics like Machine Learning, Natural Language Processing (Text classification, Sentiment Analysis) for the 2 years. So, I can say that I've been really diving Into Data Analytics for the last 2 years... but, Data Analysis in a wider sense: performing analysis based on data (data analysis), Data Interpretation, Data modelling, it is NOT something new for me at all, it's something that I have been working on for several years as a Geophysicist.
Well, I'm going to refer to my Data Analysis experience as a geophysucist later because I wanted to start by talking about my experience in Machine Learning and NLP, that are the topics that I've been learning for the last 3 years. So, my experience in these topics is based first on a Diploma in Predictive Data Analytics that I recently got at CCT College, where I got a distinction. Secontly, on a couple of online courses that I have completed in Data Analysis (mostly related to Python for data analysis); And finally, what I actually consider my most relevant experience on these topics, that I have worked on these topics in my last 2 final degree projects, which are long projects in these topics; and in my opinion, there is not a better way of learning something than working on a long academic project.
So this is about topics specifically related to Data Mining and Machine Learning, but, in a wider sense, as I said, Data Analysis is not something new for me at all. During my career as a Geophysicist, I had already worked on topics related to Data Analysis. I worked, for example, in Signal Analysis, which is a way of Time Series Analysis (and Time Series is an important topic in Data Analysis). So, there are many mathematical concepts related to signal analysis and thus to time series analysis that I've been using for a long time as a geophysicist, such as:
- Time series and Discrete signals
- Correlation, Auto-correlation, Cross-correlation
- Regression methods (Linear regression)
- Convolution and Deconvolution
- and, of course, concepts related to signal analysis like, Fourier series, Fourier transform etc.
- and many other concepts related to data analysis...
I can really say that I have a very good theoretical and practical base in topics related to Data Sciences.
- Supervised Machine Learning for Fake News Detection:
In my final Bachelor in IT project, I worked in Text classification, specifically in Supervised Machine Learning for Fake News Detection using R. In this project, we have created a Supervised Machine Learning Model for Fake News Detection based on three different algorithms: Naive Bayes, Support Vector Machine, and Gradient Boosting (XGBoost). Basically, this ML model is able to determine with an accuracy of 79% if a News Article is Fake or Reliable. Fake in the sense of News Articles that were deliberately created in order to deceive and manipulate.
- Developing a Web Dashboard for analyzing Amazon's Laptop sales data:
In my final Bachelor (Honours) in IT I worked in Sentiment Analysis using Python. I specifically developed a Web Dashboard for analyzing Amazon's Laptop sales data, mainly to perform a Sentiment Analysis on Amazon customer reviews.
- I have performed a Sentiment Analysis of Amazon customer reviews by using both, Lexicon-based and Machine Learning methods.
- Lexicon-based Sentiment Analysis: One of the purposes of this study is to evaluate different Sentiment Analysis approaches. That is why I performed a Lexicon-based Sentiment Analysis using two popular Python libraries: Textblob and Vader Sentiment.
- Machine Learning Sentiment Analysis: I have built a ML classifier for Sentiment Analysis using the Naive Bayes algorithm and an Amazon review dataset from Wang et al. (2010).It is important to notice that this is an extra result with respect to the initial objectives. I haven’t planned to carry out this studio. However, I realized that it was very beneficial to include another Sentiment Analysis approach. This has allowed me to evaluate and compare both approaches in terms of their performance.
- In addition, a Word Emotion Association Analysis has been also performed. This analysis complements the polarity analysis by adding more details about the kind of emotions or sentiments (joy, anger, disgust, etc.) in customer reviews. This analysis was performed by using the NRC Word-Emotion Association Lexicon.
So, I've been working with Data Analytic, I mean topics like Machine Learning, Natural Language Processing, Sentiment Analysis, for 2 years... but, working with data, performing analysis based on data (data analysis), data interpretation, it is NOT something new for me at all, it's something that I have been working on for several years as a Geophysicist.
This is the link to a Web Application that has been created to easily interact with the Machine Learning Models created. It allows us to determine if a News Article is Fake or Reliable by entering the text into an input field. The input text will be processed by the Machine Learning Models at the back-end and the result will be sent back to the client. This Web App was created using Shiny, an R package that can be used to build interactive web apps straight from R.
This is the link to a Github repository that contains a R Library we have created to package the Machine Learning Models built. This package contains essentially three functions: modelNB(), modelSVM() and modelXGBoost(). These functions take a news article as argument and, using the Models created, return the authenticity tag («fake (1)» or «reliable (0)»)
- I've been using Linux for about 15 years as my main OS. I consider myself a Linux power user, capable to program Shell Scripts and perform administrative tasks. I'm mostly a Debian-based systems user, but I have experience with the most popular flavors of Linux: Ubuntu, Red Hat, CentOS, Mint, SuSE.
- Throughout my career, I have worked on several occasions in activities related to Linux administration:
- Research geophysicist at GRyDs:
- I was, for example, responsible for developing shell scripts for task automation and signal analysis.
- I had the opportunity to work in the installation and administration of a LAMP stack (LAMP is stand for Linux - Apache - MySQL - PHP). So all the softwares needed to host a web application on a Linux Server.
- I have also developed a personal project, in which I perform an automatic backup of my personal data that is stored in my computer (and my Wiki) into a hard drive and into the cloud (Linux server). To do so, I have developed a shell script using technologies such as: rsync, ssh, sshpass, tar, zip, MySQL database backup, sed, gpg.
Wiki - Organize information into a cohesive, searchable and maintainable system.
- One of the most important skills I have, which I usually find complicated to make understand its importance, is my Wiki management skills.
- A Wiki is a website on which users can collaborate by creating and modifying content from the web browser. So, the best example is Wikipedia. In Wikipedia someone can create a article and then it can be modify online for other users. A Wiki is an outstanding tool to organize information into a cohesive, searchable and maintainable system that can be accessed and modified online. The benefits of a wiki to organize information are remarkable.
- I have a personal Wiki (based on the MediaWiki engine) where I document everything I'm learning and working on. So, I use a Wiki as a Personal knowledge management that allows me to organize information into a cohesive, searchable and maintainable system. The benefits that I've had using a Wiki are amazing. It has allowed me to learn in a more effective way; and most importantly, to constantly review and improve in important topics by providing a very convenient online access (so from anywhere) to an organized and structured information.
- Take a look at some of my Wiki pages: http://perso.sinfronteras.ws/index.php/Computer_Science_and_IT
- Academic assistant at USB: Communication, Presentation and Leadership Skills
- As an Academic Assistant, I was in charge of collaborating with the lecture by teaching some modules of the Geophysical Engineering program at Simón Bolívar University. I was usually in charge of a group between 20 and 30 students during theoretical and practical activities.
- This experience has contributed to my professional development in two major areas:
- By teaching modules, I have enhanced my technical geophysical knowledge.
- I have also developed communication and presentation skills, as well as the leadership strategies needed to manage a group of students and to transfer knowledge effectively.
Communication and Sale Skills
- My current job at IDG is about communication. First, because I'm working in a team, and we always have to reach targets as a team, and communication within the team is always the key to reach the targets. Secondly, because one of my main responsibilities is to call contacts (to call IT Managers) on behalf of our clients, and of course this is about effective and clear communication. I have to explain to the contact the reason for the call, the topic of the campaign, and most importantly, I have to communicate in a way that... well I have to create an atmosphere in the call where the contact is going to feel comfortable and is going to accept answering my questions.
- In this position, I have improved my communication skills in French and English. I have learned how to build and maintain a professional relationship and improved my Active Listening Skills.
- I also think that I have developed communication skills not only at work but also in other aspects of my life; you know I have always done team sports in a high competitive-level: Volleyball when I was a chield; I was member of the Volleyball team of my state and attended 1 national games; and Waterpolo at university, where I attended 5 National University Games; and those are activities where you develop, sometimes without being aware, you develop many communication skills.
- I have to call IT Managers and establish and maintain a professional conversation with them in order to identify their next investments. So from this conversation we gather information about their next investment and this information is required from our clients (IT Companies: IBM, DELL, NetApp, etc) and they use this information next step of the sales process.
- Let's say that IBM is looking to sell a particular product (A Cloud backup solution, for example). So, IBM requires IDG's services, asking for a number of contacts (IT Managers) that are planning to invest in backup solutions. Then, we establish a professional conversation with IT Managers from our database and identify those that are looking to invest in the product required for the client.
- During the phone conversations, I have to explain the topic of the product that our clients are looking to sell and be able to handle objections. That is why this experience has made me aware of the latest solutions and technologies in which the most important IT companies are working on.
- At IDG, I have also completed a Certified Sales training. During this course, I have learned and put into practice, the most important concepts of the sales process.
- Prospecting, Preparation, Approach, Presentation, Handling objections, Closing, Follow-up
Target and KPI
- I'm used to work in a Target Working Environment because I'm currently working in a TWE at IDG.
- At IDG we have to reach a daily target of about €650 per day.
- To reach this target performance we need to generate what we call a «lead». A lead is a conversation that matches the criteria asked for the client. For example, if the client (Let's see IBM) is asking for contacts that are looking to invest in Backup solutions, then every time that we have a conversation in which the contact confirms to be looking for backup solutions; this contact represents a «lead».
- So each lead that we generated has a price, and we need to generate as many leads as needed to reach the target of €650. So normally an easy lead worth about €65 and a complicated one about €180.
- So, every day we need to fight to reach the target performance. We usually have many challenges to reach the target performance:
- Data challenges: We make calls using particular data that has been prepared for a particular campaign. Many times you can make many calls but you don't reach the contacts that you are looking for. So you can spend your day making calls but not having conversations with the IT Manager. So if you are not reaching the contact, you can not make leads.
- Hard campaign challenges: That means that we have a campaign in which the client is asking for a difficult criterion. Let's say, for example, that the client is asking for contacts that are looking to invest in a particular solution (SAP applications for example). That represents a campaign challenge because we have to reach a contact that is looking to invest, specifically, in this solution.
- Solutions: There are a few techniques that we use to apply when we face the challenges. Change the data or the campaign you're working on is the first action we can take. But sometimes you can not change the campaign because we really need to deliver lead for those campaigns because we need to reach a certain number of leads the client is asking for. We usually make calls using a platform that makes the calls automatically taking the contact from the database related to the campaign you're working on. So usually we don't need to worry about the criteria (company size, job title, industry) of the contacts we are calling because the platform makes the calls. But when you have data problems, the solution is to research for contacts manually. So, that is a little tricky because you can try to call the best contact by doing manual research in the database, but you can spend a long time doing this research and that doesn't assure that you are going to reach the contact and get leads. So when you have good data you have to use the platform, otherwise, you should search for contacts manually. So in this manual research is where you have to propose ideas and develop a good methodology to be able to find good contacts and get leads. One of the techniques we apply when we have a hard campaign is, for example, if we get a lead from a particular company; we try to call other contacts from the same company because we know that this particular company is going to review in the product that the client is looking for.
- The other approach is to try to search new contacts on the internet (usually on Linkedin), but that is even more tricky because it is complicated to get reach a new contact and to get the lead. Here is where I wanted to say that I had an important contribution. So the problem with this external research is that most of the contact that you are going to find on Linkedin is already in our database. So it doesn't make sense. But I realized that when we are looking for business job titles (because sometimes we have campaigns in which the client is asking for business titles) it makes sense to do external research (on Linkedin) because our database is composed mostly for IT Professionals (we have some business contacts in our database, but not a lot) so the chance of finding a contact on Linkedin that is not in our database increase a lot. Therefore, it makes sense to do external research when looking for business contacts. By doing that, I was able to get a good number of leads for hard campaigns; and that is a concrete contribution that I made to my team.
Simón Bolívar University and background in Mathematics/Physics
I'm an engineer from the most important scientific Venezuelan university, which is Simón Bolívar University; and really, I need to highlight the academic level and the quality of Simón Bolivar University. If you check now, Simón Bolívar University is still in a good place in the LatAm University Rankings; but the university has been widely affected by the difficult political situation in the country. I don't know if you have heard about the critical political and economical situation in Venezuela. But the fact is that in my time when I started my career, Simón Bolívar university was always in the top 10 of the best LatAm Universities with scientific and technological orientation.
I have a very good background in formal and pure sciences, like mathematic and physic. I followed 7 pure maths and 5 pure physics courses; without counting all the applied geophysical courses that I followed with a high content of mathematics, physics, or chemistry.
If you review the course content of an IT program you will find at most 2 mathematic courses. I really think that for an IT professional it is very important to have a good background in mathematic. For example, to be able to understand some computational concepts (functional programming for example) you need to have a good mathematical background.
- Geophysics is an applied science, we said that is a multidisciplinary field, that uses physic, mathematic, and geology to study the internal constitution of the earth.
- One of the main applications of Geophysics is in oil exploration, that is the area where I have experience. I specifically worked
- During my acadimic and professional experience as a Geophysicist, I was involved in several data analysis topics:
- Seismic exploration - Seismic processing
- I specialized in Seismic exploration for oil and gas, specifically in Seismic analysis and Seismic data processing, which theory or mathematical foundation is related to Data Science. You actually can say that Seismic data processing is a way of Data Science.
- Seismic analysis is a kind of Signal analysis; and Signal analysis is closely related to Time series analysis. Statistical signal processing uses the language and techniques of mathematical time-series analysis, but also use other concepts and techniques like signal to noise, time/frequency domain transforms and other concepts specifically related to the physical problem under study. Of course, there are also many other concepts use in time series analysis applied to business and economics, such as time-series forecasting, trend analysis, etc. that are not present in the material on statistical signal processing; but in general Signal Analysis (which is the area where I have experience) is closely related to Data Analytics and Time-Series analysis in particular https://stats.stackexchange.com/questions/52270/relations-and-differences-between-time-series-analysis-and-statistical-signal-pr#:~:text=2%20Answers&text=As%20a%20signal%20is%20by,significant%20overlap%20between%20the%20two
- The signal that is analysed in Seismic analysis (the seismic signal) is a Seismic wave. A Seicmic waves is an acoustic wave that propagates through the earth. So, this wave can be recorded to obtain a mathematical (or functional) representation of the seismic wave. This function (or signal), which is called a Seismogram, represents ground motion measurements as a function of time; and of course, these ground motions are related to the wave propagating through the earth.
- The data tha we analyse in Seicmic Analysis (Seismic Data) consists on a large set of time series. These time series are called Seismograms or Seismic traces; but mathematically are just time series.
- In physical terms, we can say that a seismogram is basically a representation of a seismic wave propagating into the subsurface. Now, in mathematical terms, a seismogram (seismic trace) is a time series of ground motion values (the ground motions are related to the wave propagating in the subsurface). In other words, a seismogram describes ground motions as a function of time.
- In short, the purpose of seismic exploration is to create an image of the subsurface and to estimate the distribution of a range of properties - in particular, the fluid or gas content. This way the geophysicist is able to have a better idea of where oil or gas deposits can be located in the subsurface.
- So, after the Seismic acquisition phase (that is something that I'm not going to explain now because I want to focus on the seismic data processing, that was my sector, and I wanted to explain the relationship with Data Sciences) the Seismic Data consists on a large set of time series. These time series are called Seismograms or Seismic trace; but mathematically are just time series.
- I have worked in this area in my two thesis projects (bachelor and master's degrees). I have experience as an academic assistant of the course of Seismic Data processing at Simón Bolívar University; I have worked at the CGGVeritas processing center in Caracas and in an R&D Unit at PDVSA and Simón Bolívar University. So I have considerable experience in Seismic data processing, but I'm sure that the most important of all it's that I have the motivation to further developed my skills in Seismic Data processing, I am now incredibly motivated to pursue my career in Seismic data processing.
- So, there are many mathematical concepts related to signal analysis and thus to time series analysis that I've been using for a long time as a geophysicist, such as:
- Time series and Discrete signals
- Correlation, Auto-correlation, Cross-correlation
- Regression methods (Linear regression)
- Convolution and Deconvolution
- and, of course, concepts related to signal analysis like, Fourier series, Fourier transform etc.
- In this paper is explained how Autocorrelation, Cross-correlation, and other time series analysis method are applied to seismic data https://www.sciencedirect.com/topics/earth-and-planetary-sciences/autocorrelation
- Here it is also explained the concepts of Crosscorrelation and autocorrelation
- Well-Log (borehole log) Analysis
- I also worked in Geophysical Oil Well-Log (borehole log) Analysis (An oil well is a (drilling | a hole drilled) boring in the Earth that is designed to bring petroleum to the surface) (Oil well ~ borehole). Well-Log analysis is also a kind of Data Analysis; where we analyse physical properties of the geologic formation (of the rocks) under the subsurfce.
- An well-log is a record of measurements of physical properties of the geologic formations (the rocks in the subsurface) penetrated by a borehole. In other words, a well-log is a record of measurements of physical properties of the rocks as a function of depth. Some of the physical properties that are measured are: Resistivity, Natural radioactivity of the rocks-formations (Gamma Ray Log). Because radioactive elements tend to be concentrated in shales, the Gamma-ray log normally reflects the shale content of the formation. Sound wave velocity: measurement of the time required for a sound wave to travel a constant distance. The principle is that velocity of the rock decrease when the porosity increase.
- So, in the same way that we use a supervised algorithm (for example a linear regression method) for predicting the price of a house based on housing datae (like number of rooom, age of the house, lot size, etc.). In geophysics (or in petrophysics), we can use physical properties of the rocks to estimate some property of interest, such as permeability and porosity.
- Learning algorithms (Linear regression, Naive Bayes, etc.) are used in Well-log analysis, for example:
- To classify rock foramtions in the subsurface using measurements of physical properties of the rocks.
- To predict some physical properties of the rocks (Porosity or Permiability) by using measurements of other properties. See this paper: Comparison of machine learning methods for estimating permeability and porosity of oil reservoirs via petro-physical logs - https://www.sciencedirect.com/science/article/pii/S2405656117301633
- Advanced experience with the most popular flavors of Linux: Debian, Ubuntu, Red Hat, CentOS
- LAMP Administration: Apache, MySQL, PHP
- Installation and Post-installation configurations
- Users and Groups Administration
- Modify File Permissions
- Managing Processes
- Network File System (NFS)
- Remote Management with SSH