CV - Skills and Qualifications 1

From Sinfronteras
Jump to: navigation, search

Programming and Software Development

Data Science

Other qualifications

  • I started programming around 15 years ago, when I was studying geophysics, so in this part of my career, as a geophysics, I started coding mathematical, engineering problems and Data analysis and Data processing topics (Signal analysis in particular: A signal is a function that conveys information about a phenomenon. For example, Sound, images and videos are considered to be signals) . One of my main projects in this area was developing programs to perform Seismic Wave Propagation Simulations (Seismic Modelling). During this experience, I got skills in Matlab (which is a data analysis environement/or a numerical computing environement), Scilab and Shell scripting.

  • Research geophysicist at GRyDs
  • As a Research Geophysicist, I was responsible for performing a set of signal analysis (seismic processing) tasks and ensuring the correct integration and implementation of geophysical applications into a computer cluster platform. This platform was being designed in order to facilitate task scheduling and run Computationally intensive task/highly compute-intensive tasks on clusters. One of my main activities was shell script programming for Seismic Modeling and Processing.
  • Task automation using Shell scripting: Here I could mention the generation of images to create seismic waves propagation videos or the automatic generation of pdf reports using latex that contained details about the executed process: time vs. the features of the data generated (the amount of data generated).
  • I have skills in Matlab, Scilab and Shell scripting that I got during my participation in an R&D Unit at Simón Bolívar University (The Parallel and Distributed Systems Group - GryDs). MATLAB (matrix laboratory) is a language and numerical computing environment. MATLAB allows data analysis and data visualization, matrix manipulations, and performing numerical computations. Matlab contains a huge library of functions that facilitate the resolution of many mathematical and engineering problems. For example, I used it for Signal Analysis, specifically for Seismic data analysis. it for Ex. 1 and Ex. 2:
    • Signal Processing in Geophysics
    • Ex.1: That allows defining the coordinates of the layers of a geological model by opening an image file of the geological model and selecting, by clicking with the mouse, a set of points (or coordinates) that define each of the layers of the geological model. These coordinates will be saved in a very particular format that will be used as input of another program that is in charge of building the Geological model entity used by another program to perform a Seismic Wave Propagation Modelling.

  • In the latest years I decided to reorient my career toward IT, specifically toward Data Sciences and Software Development.

  • During my Bsc. in Information Technology, I have developed an excellent academic level and a clear understanding of the most important Object-Oriented Principles and Concepts. I have developed several object-oriented Java applications.

  • I have a special interest for Web Development. I have also developed several Web Applications using different technologies:
  • PHP
  • JavaScript
  • But my main experience is using JavaScript frameworks:
  • React for the Frontend
  • Express.js for the backend. This is a Node.js framework: HTTP REST APIs
  • Dash: Python web application framework for building data analytic applications

So, I'm a programmer. Even if I haven't worked in a programming position for a long time, during my academic and professional experience I have worked in programming in several ocations. As I said I've been programming for 15 years. And during this time I have used many programming languages. I like programming so much that even when I'm writing a report I use a programming-based tool (Latex), I don't use a word processor like Microsoft Word. So, the programming logic, principles, and concepts of object-oriented programming, etc, is something that I'm really proficient in. Of course, I don't have 10 years experience working in a Software Developer role, so of course, you can ask me something about programming that I don't know, but you can be sure that I know how to program and that I'm able to learn any new programming language or concept in a very short time. So that is somethign that I really wanted to make clear, that I'm proficient in programming.

Well, I've been working with Data Analytic, I mean topics like Machine Learning, Natural Language Processing (Text classification, Sentiment Analysis) for the 2 years. So, I can say that I've been really diving Into Data Analytics for the last 2 years... but, working with data, performing analysis based on data (data analysis), data interpretation, it is NOT something new for me at all, it's something that I have been working on for several years as a Geophysicist.

I just completed a Diplome in Predictive Data Analytics at CCT College, where I got a distinction, I have previously completed a couple of online courses in Data Analysis. And also, what I consider my most relevant experience, I have worked on these topics in my last 2 final degree projects, which are long projects in these topics; and in my opinion, there is not a better way of learning something than to work on a long academic project.

So this is about topics specifically related to Data Mining and Machine Learning, but, in a wider sense, as I said Data Analysis is not something new for me at all. During my career as a Geophysicist, I had already worked on topics related to Data Analysis. I worked, for example, in Signal Analysis, which is a way of Time Series Analysis (and Time Series is an important topic in Data Analysis). So, there are many mathematical concepts related to signal analysis and thus to time series analysis that I've been using for a long time as a geophysicist, such as Fourier series, Fourier transform, Convolution and Correlation, Deconvolution, Discrete signals, etc.

I can really say that I have a very good theoretical and practical base in topics related to Data Sciences.

  • Supervised Machine Learning for Fake News Detection:
  • In my final Bachelor in IT project, I worked in Text classification, specifically in Supervised Machine Learning for Fake News Detection using R. In this project, we have created a Supervised Machine Learning Model for Fake News Detection based on three different algorithms: Naive Bayes, Support Vector Machine, and Gradient Boosting (XGBoost). Basically, this ML model is able to determine with an accuracy of 79% if a News Article is Fake or Reliable. Fake in the sense of News Articles that were deliberately created in order to deceive and manipulate.
  • Developing a Web Dashboard for analyzing Amazon's Laptop sales data:
  • In my final Bachelor (Honours) in IT I worked in Sentiment Analysis using Python. I specifically developed a Web Dashboard for analyzing Amazon's Laptop sales data, mainly to perform a Sentiment Analysis on Amazon customer reviews.
    • I have performed a Sentiment Analysis of Amazon customer reviews by using both, Lexicon-based and Machine Learning methods.
    • Lexicon-based Sentiment Analysis: One of the purposes of this study is to evaluate different Sentiment Analysis approaches. That is why I performed a Lexicon-based Sentiment Analysis using two popular Python libraries: Textblob and Vader Sentiment.
    • Machine Learning Sentiment Analysis: I have built a ML classifier for Sentiment Analysis using the Naive Bayes algorithm and an Amazon review dataset from Wang et al. (2010).It is important to notice that this is an extra result with respect to the initial objectives. I haven’t planned to carry out this studio. However, I realized that it was very beneficial to include another Sentiment Analysis approach. This has allowed me to evaluate and compare both approaches in terms of their performance.
    • In addition, a Word Emotion Association Analysis has been also performed. This analysis complements the polarity analysis by adding more details about the kind of emotions or sentiments (joy, anger, disgust, etc.) in customer reviews. This analysis was performed by using the NRC Word-Emotion Association Lexicon.

So, I've been working with Data Analytic, I mean topics like Machine Learning, Natural Language Processing, Sentiment Analysis, for 2 years... but, working with data, performing analysis based on data (data analysis), data interpretation, it is NOT something new for me at all, it's something that I have been working on for several years as a Geophysicist.

  • Linux
  • I've been using Linux for about 15 years as my main OS. I consider myself a Linux power user, capable to program Shell Scripts and perform administrative tasks. I'm mostly a Debian-based systems user, but I have experience with the most popular flavors of Linux: Ubuntu, Red Hat, CentOS, Mint, SuSE.

  • Throughout my career, I have worked on several occasions in activities related to Linux administration:
  • Research geophysicist at GRyDs:
I was, for example, responsible for developing automation scripts in shell.
  • WikiVox:
I had the opportunity to work in the installation and administration of a LAMP stack (Apache, MySQL, PHP) on a Linux Server.
  • I have also developed a personal project, in which I perform an automatic backup of my personal data (and my Wiki) into a hard drive and into the cloud (Linux VM). To do so, I have developed a shell script using technologies such as: rsync, ssh, sshpass, tar, zip, MySQL database backup, sed, gpg.

  • Wiki - Organize information into a cohesive, searchable and maintainable system.
    • One of the most important skills I have, which I usually find complicated to make understand its importance, is my Wiki management skills.
    • A Wiki is a website on which users can collaborate by creating and modifying content from the web browser. So, the best example is Wikipedia. In Wikipedia someone can create a article and then it can be modify online for other users. A Wiki is an outstanding tool to organize information into a cohesive, searchable and maintainable system that can be accessed and modified online. The benefits of a wiki to organize information are remarkable.
    I have a personal Wiki (based on the MediaWiki engine) where I document everything I'm learning and working on. So, I use a Wiki as a Personal knowledge management that allows me to organize information into a cohesive, searchable and maintainable system. The benefits that I've had using a Wiki are amazing. It has allowed me to learn in a more effective way; and most importantly, to constantly review and improve in important topics by providing a very convenient online access (so from anywhere) to an organized and structured information.
    Take a look at some of my Wiki pages:

  • Academic assistant at USB: Communication, Presentation and Leadership Skills
  • As a Academic Assistant, I was in charge of collaborating with the lecture by teaching some modules of the Geophysical Engineering program at Simón Bolívar University. I was usually in charge of a group between 20 and 30 students during theoretical and practical activities.

  • This experience has contributed to my professional development in two major areas:
  • By teaching modules, I have solidified many technical geophysical knowledge.
  • I have also developed communication and presentation skills, as well as the leadership strategies needed to manage a group of students and to transfer knowledge effectively.

  • IDG: Communication and Sale Skills
    • I have to call IT Manager to gather information about their investments. To do so, I have to establish and maintain a professional conversation with IT Managers in order to identify their needs and the next investments. The gathered information is required from our clients (IT Companies: IBM, DELL, Net App, etc) and used in the next step of the sales process.
    • Let's say that IBM is looking to sell a particular product (A Cloud backup solution, for example). So, IBM requires IDG's services, asking for a number of contacts (IT Managers) that are planning to invest in backup solutions. Then, we establish a professional conversation with IT Managers from our database and identify those that are looking to invest in the product required for the client.
    • In this position, I have improved my communication skills in French and English. I have learned how to build and maintain a professional relationship and improved my Active Listening Skills.
    • During the phone conversations, I have to explain the topic of the product that our clients are looking to sell and be able to handle objections. That is why this experience has allowed me to be aware of the latest solutions and technologies in which the most important IT companies are working on.
    • At IDG, I have also completed a Certified Sales training. During this course, I have learned and put into practice, the most important concepts of the sales process.
    • Prospecting, Preparation, Approach, Presentation, Handling objections, Closing, Follow-up

  • Target and KPI
    • At IDG we need to generate what we call a «lead». A lead is a conversation that matches the criteria asked for the client. For example, if the client (Let's see IBM) is asking for contacts that are looking to invest in Backup solutions, then every time that we have a conversation in which the contact confirms to be looking for backup solutions; this contact represents a «lead».
    • At IDG we have to reach a daily target of about €650 per day. So each lead that we generated has a price, and we need to generate as many leads as needed to reach the target of €650. So normally an easy lead worth about €65 and a complicated one about €180.
    • So, every day we need to fight to reach the target performance. We usually have many challenges to reach the target performance:
    • Data challenges: We make calls using particular data that has been prepared for a particular campaign. Many times you can make many calls but you don't reach the contacts that you are looking for. So you can spend your day making calls but not having conversations with the IT Manager. So if you are not reaching the contact, you can not make leads.
    • Hard campaign challenges: That means that we have a campaign in which the client is asking for a difficult criterion. Let's say, for example, that the client is asking for contacts that are looking to invest in a particular solution (SAP applications for example). That represents a campaign challenge because we have to reach a contact that is looking to invest, specifically, in this solution.
    • Solutions: There are a few techniques that we use to apply when we face the challenges. Change the data or the campaign you're working on is the first action we can take. But sometimes you can not change the campaign because we really need to deliver lead for those campaigns because we need to reach a certain number of leads the client is asking for. We usually make calls using a platform that makes the calls automatically taking the contact from the database related to the campaign you're working on. So usually we don't need to worry about the criteria (company size, job title, industry) of the contacts we are calling because the platform makes the calls. But when you have data problems, the solution is to research for contacts manually. So, that is a little tricky because you can try to call the best contact by doing manual research in the database, but you can spend a long time doing this research and that doesn't assure that you are going to reach the contact and get leads. So when you have good data you have to use the platform, otherwise, you should search for contacts manually. So in this manual research is where you have to propose ideas and develop a good methodology to be able to find good contacts and get leads. One of the techniques we apply when we have a hard campaign is, for example, if we get a lead from a particular company; we try to call other contacts from the same company because we know that this particular company is going to review in the product that the client is looking for.
    The other approach is to try to search new contacts on the internet (usually on Linkedin), but that is even more tricky because it is complicated to get reach a new contact and to get the lead. Here is where I wanted to say that I had an important contribution. So the problem with this external research is that most of the contact that you are going to find on Linkedin is already in our database. So it doesn't make sense. But I realized that when we are looking for business job titles (because sometimes we have campaigns in which the client is asking for business titles) it makes sense to do external research (on Linkedin) because our database is composed mostly for IT Professionals (we have some business contacts in our database, but not a lot) so the chance of finding a contact on Linkedin that is not in our database increase a lot. Therefore, it makes sense to do external research when looking for business contacts. By doing that, I was able to get a good number of leads for hard campaigns; and that is a concrete contribution that I made to my team.

  • Simón Bolívar University and background in Mathematics/Physics

    I'm an engineer from the most important scientific Venezuelan university, which is Simón Bolívar University; and really, I need to highlight the academic level and the quality of Simón Bolivar University. If you check now, Simón Bolívar University is still in a good place in the LatAm University Rankings; but the university has been widely affected by the difficult political situation in the country. I don't know if you have heard about the critical political and economical situation in Venezuela. But the fact is that in my time when I started my career, Simón Bolívar university was always in the top 10 of the best LatAm Universities with scientific and technological orientation.

    I have a very good background in formal and pure sciences, like mathematic and physic. I followed 7 pure maths and 5 pure physics courses; without counting all the applied geophysical courses that I followed with a high content of mathematics, physics, or chemistry.

    If you review the course content of an IT program you will find at most 2 mathematic courses. I really think that for an IT professional it is very important to have a good background in mathematic. For example, to be able to understand some computational concepts (functional programming for example) you need to have a good mathematical background.

  • Geophysisc:
Geophysics is an applied science, we said that is a multidisciplinary field, that uses physic, mathematic, and geology to study the internal constitution of the earth.
One of the main applications of Geophysics is in oil exploration, that is the area where I have experience.
During my acadimic and professional experience as a Geophysicist, I was involved in several data analysis topics:
  • Seismic exploration - Seismic processing
I specialized in Seismic exploration for oil and gas, specifically in Seismic data processing, which theory or mathematical foundation is related to Data Science. You actually can say that Seismic data processing is a way of Data Science.
Seismic analysis is a kind of Signal analysis; and Signal analysis is closely related to Time series analysis. Statistical signal processing uses the language and techniques of mathematical time-series analysis, but also use other concepts and techniques like signal to noise, time/frequency domain transforms and other concepts specifically related to the physical problem under study. Of course, there are also many other concepts use in time series analysis applied to business and economics, such as time-series forecasting, trend analysis, etc. that are not present in the material on statistical signal processing.,significant%20overlap%20between%20the%20two
The signal that is analysed in Seismic analysis (the seismic signal) is a Seismic wave. A Seicmic waves is an acoustic wave that propagates through the earth. So, this wave can be recorded to obtain a mathematical (or functional) representation of the seismic wave. This function (or signal), which is called a Seismogram, represents ground motion measurements as a function of time; and of course, these ground motions are related to the wave propagating through the earth.
The data tha we analyse in Seicmic Analysis (Seismic Data) consists on a large set of time series. These time series are called Seismograms or Seismic traces; but mathematically are just time series.
In physical terms, we can say that a seismogram is basically a representation of a seismic wave propagating into the subsurface. Now, in mathematical terms, a seismogram (seismic trace) is a time series of ground motion values (the ground motions are related to the wave propagating in the subsurface). In other words, a seismogram describes ground motions as a function of time.
In short, the purpose of seismic exploration is to create an image of the subsurface and to estimate the distribution of a range of properties - in particular, the fluid or gas content. This way the geophysicist is able to have a better idea of where oil or gas deposits can be located in the subsurface.
So, after the Seismic acquisition phase (that is something that I'm not going to explain now because I want to focus on the seismic data processing, that was my sector, and I wanted to explain the relationship with Data Sciences) the Seismic Data consists on a large set of time series. These time series are called Seismograms or Seismic trace; but mathematically are just time series.
I have worked in this area in my two thesis projects (bachelor and master's degrees). I have experience as an academic assistant of the course of Seismic Data processing at Simón Bolívar University; I have worked at the CGGVeritas processing center in Caracas and in an R&D Unit at PDVSA and Simón Bolívar University. So I have considerable experience in Seismic data processing, but I'm sure that the most important of all it's that I have the motivation to further developed my skills in Seismic Data processing, I am now incredibly motivated to pursue my career in Seismic data processing.
So, there are many mathematical concepts related to signal analysis and thus to time series analysis that I've been using for a long time as a geophysicist, such as:
  • Time series and Discrete signals
  • Correlation, Auto-correlation, Cross-correlation
  • Regression methods (Linear regression)
  • Convolution and Deconvolution
  • and, of course, concepts related to signal analysis like, Fourier series, Fourier transform etc.
In this paper is explained how Autocorrelation, Cross-correlation, and other time series analysis method are applied to seismic data
Here it is also explained the concepts of Crosscorrelation and autocorrelation

  • Well-Log (borehole log) Analysis
I also worked in Geophysical Oil Well-Log (borehole log) Analysis (An oil well is a (drilling | a hole drilled) boring in the Earth that is designed to bring petroleum to the surface) (Oil well ~ borehole). Well-Log analysis is also a kind of Data Analysis; where we analyse physical properties of the geologic formation (of the rocks) under the subsurfce.
An well-log is a record of measurements of physical properties of the geologic formations (the rocks in the subsurface) penetrated by a borehole. In other words, a well-log is a record of measurements of physical properties of the rocks as a function of depth. Some of the physical properties that are measured are: Resistivity, Natural radioactivity of the rocks-formations (Gamma Ray Log). Because radioactive elements tend to be concentrated in shales, the Gamma-ray log normally reflects the shale content of the formation. Sound wave velocity: measurement of the time required for a sound wave to travel a constant distance. The principle is that velocity of the rock decrease when the porosity increase.
So, in the same way that we use a supervised algorithm (for example a linear regression method) for predicting the price of a house based on housing datae (like number of rooom, age of the house, lot size, etc.). In geophysics (or in petrophysics), we can use physical properties of the rocks to estimate some property of interest, such as permeability and porosity.
Learning algorithms (Linear regression, Naive Bayes, etc.) are used in Well-log analysis, for example:
  • To classify rock foramtions in the subsurface using measurements of physical properties of the rocks.
  • To predict some physical properties of the rocks (Porosity or Permiability) by using measurements of other properties. See this paper: Comparison of machine learning methods for estimating permeability and porosity of oil reservoirs via petro-physical logs -