^{1}

^{*}

^{1}

^{1}

Artificial lift plays an important role in petroleum industry to sustain production flowrate and to extend the lifespan of oil wells. One of the most popular artificial lift methods is Electric Submersible Pumps (ESP) because it can produce high flowrate even for wells with great depth. Although ESPs are designed to work under extreme conditions such as corrosion, high temperatures and high pressure, their lifespan is much shorter than expected. ESP failures lead to production loss and increase the cost of replacement, because the cost of intervention work for ESP is much higher than for other artificial lift methods, especially for offshore wells. Therefore, the prediction of ESP failures is highly valuable in oil production and contribute s a lot to the design, construction and operation of oil wells. The contribution of this study is to use 3 machine learning algorithms, which are Decision Tree, Random Forest and Gradient Boosting Machine, to build predictive models for ESP lifespan while using both dynamic and static ESP parameters. The results of these models were compared to find out the most suitable model for the prediction of ESP life cycle. In addition, this study also evaluated the influence factor of various operating param e ters to forecast the most impact parameters on the duration of ESP. The results of this study can provide a better understanding of ESP behavior so that early actions can be realized to prevent potential ESP failures .

Artificial lifts are widely used in production wells to optimize production flowrate [

In recent years, oil and gas experts have tried to identify the main causes of ESP failures and to predict the life cycle of ESP by different methods such as using the harmonic patterns in the electric supply [

The use of PCA was also mentioned in the work of Abdelaziz, et al. (2017) to predict failure of ESP [

This study presents a different way to approach this problem by building a predictive model using different machine learnings algorithms, which are Decision Tree, Random Forest and Gradient Boosting Machine, to predict ESP lifespan with both dynamic and static parameters. A total of 13 operating parameters were collected from 97 ESP. Furthermore, the model also classifies the impact of these parameters on the ESP lifespan. The results can be used to improve the ESP performance by appropriately adjusting the most influential parameters on ESP lifespan.

Machine Learning (ML) is a subset of Artificial Intelligence (AI). The principle of machine learning is data acquisition and self-learning machines. ML is a data analysis method that automates the building of an analytical model. Using iterative algorithms to learn from data, ML allows computers to find deeply hidden values that cannot be obtained by explicitly programming. The iterative aspect of ML is important because when these models are exposed to new data they can adapt independently. The model can learn from previous calculations to make repeatable and reliable decisions and results.

According to the learning method, Machine Learning algorithms are usually divided into 4 groups: Supervised learning, Unsupervised learning, Semi-supervised lerning and Reinforcement learning. Supervised learning is an algorithm that predicts the outcome of a new data (new input) based on known data (input, outcome).

All three algorithms used in this study, including Decision Tree, Random Forest and Gradient Boosting Machine, are Supervised Learning algorithms.

Decision Tree is a structured hierarchy that can be used to classify objects based on a series of rules. When giving data about objects with attributes along with its classes, the decision tree will generate rules to predict the class of unknown objects (unseen data). Decision Trees consists of three main parts: a root node, leaf nodes and its branches. The root node is starting point of a decision tree and both the root node and the node contain questions or criteria to be answered. The branch represents the results of the test on the node. For example, the question on the first node asking the answer is “yes” or “no”, then there will be one sub-node responsible for the response is “yes”, the other node is “no”. An example of a decision tree is illustrated in

Step 1: Select the best attribute a of the data set S using Information Gain (IG) and Entropy with InformationGain = entropy ( parent ) − [ weightedaverage ] ∗ entropy ( children ) and Entropy = ∑ p ( x ) log p ( x ) where p(x) is the proportion of the number of elements in class x to the number of elements in set S.

Step 2: Partition the set S into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum.

Step 3: Make a decision tree node containing that attribute.

Step 4: Recurse on subsets using the remaining attributes.

Random Forest can build a collection of decision trees and then use voting method to make decisions about the target variable. An example of Random Forest is as follows: suppose you want to go on a British tour and have intention to visit a city like Manchester, Liverpool or Birmingham. To make decision you will need to consult a lot of opinions from friends, travel blogs, tours … Each one corresponds to a decision tree that provides questions like: is the city beautiful, is it possible to visit the stadiums, how much does the visit cost, how long is the duration of the visit … Then you will have a forest of answers to decide which city to visit. The Random Forest evaluates and classifies decision trees using voting to deliver the final results.

Mathematically, the algorithm can be explained as follows: Random Forest is a collection of hundreds of decision trees, where each decision tree is randomly generated from sample re-selection (random selection) part of the data and randomize variables from all variables in the data (

The Gradient Boosting Machine is a synchronous technique that tries to create a strong model from a number of weak models. Instead of building a prediction model (such as a decision tree) with medium accuracy, we build various predictive models with a weaker accuracy (weak learner) if they work individually but with higher accuracy if they work together. This can be done by building a model from the training data, then creating a second model that tries to correct the error from the first model (

We can imagine that each weak learner consists of weak, medium and excellent students, and a teacher. The weight of knowledge of the teacher will be the highest and the one of weak students will be the lowest. When you ask certain questions and need these people to draw conclusions, if many people have the same conclusions or if the weight of knowledge of those who make conclusions is higher than the total group, then this conclusion may be right.

The goal of building a predictive model is to clarify the relationship between a group of input variables, the parameters that affect ESP lifespan, and a target variable, the ESP lifespan itself. The models were built using supervised learning methods: Decision Tree, Random Forest and Gradient Boosting Machine. The accuracy of each model will be evaluated based on the root-Mean Squared Error (RMSE). RMSE measures the difference between forecast value and actual value. In theory, a perfect model would have a RMSE value of 0 meaning absolute prediction, but in practice, the above assumption is not possible with the variable nature of the data. The best model would be the model with the lowest RMSE value.

The dataset used to build the forecast models was collected from 97 ESP. Dataset consists of input variables including static parameters (parameters of wells, fluid properties) and dynamic parameters (operating parameters) of ESP during operation, while the output is number of operating days (lifespan) of ESP (

Number | Parameter | Interval of Values | Unit |
---|---|---|---|

1 | Pump discharge pressure | 1500 - 3000 | psi |

2 | Pump intake pressure | 500 - 2500 | psi |

3 | Pump intake temperature | 45 - 200 | ˚C |

4 | Motor temperature | 70 - 200 | ˚C |

5 | Vibration x | 0.1 - 0.3 | inch |

6 | Vibration y | 0.05 - 0.2 | inch |

7 | Motor current | 30 - 120 | A |

8 | Motor speed | 45 - 70 | Hz |

9 | Wellhead temperature | 40 - 80 | ˚C |

10 | Wellhead pressure | 400 - 500 | psi |

11 | Production flow rate | 1000 - 5000 | bpd |

12 | Water cut | 20 - 45 | % |

13 | Gas oil raito | 100 - 1500 | scf/bbl |

14 | Fluid viscosity | 0.2 - 0.5 | cP |

15 | ESP lifespan | 30 - 700 | day |

The results obtained from the models are presented in

The graphs comparing the actual values and forecast values of all three models Decision Tree, Random Forest and Gradient Boosting Machine respectively presented in Figures 5-7 confirmed again the above observation, which means the models can be classed from lower to higher accuracy: Decision Tree, Random Forest and Gradient Boosting Machine. Obviously,

The explanation of this observation can be rooted to the fact that Decision Tree is a single learner, so it might be not suitable for data sets with large numbers of variables that can lead to bigger errors than the other two models. Meanwhile, the Random Forest and Gradient Boosting Machine are both ensemble learning methods, with the accuracy of Random Forest can be improved by performing voting results from hundreds of decision trees, and Gradient Boosting Machine can fix the errors of previous decision trees by the following decision trees.

The ranking of influence factors were extracted from Random Forest and Gradient Boosting Machine and then presented in

This study showed that Gradient Boosting Machine can be chosen because it gave the smallest Mean Square Error (MSE), and it can also provide an accurate ranking of the influence parameters on the lifespan of the Electrical Submersible Pumps. The GBM model has AR and RMSE values of 11.7 days and 21.2 days, respectively.

Model | Average Residual (days) | Root Mean Squared Error (RMSE) | Regression Coefficient (R^{2}) |
---|---|---|---|

Decision Tree | 28.25 | 51.26 | 0.3697 |

Random Forest | 18.8 | 34.12 | 0.7186 |

Gradient Boosting Machine | 11.7 | 21.2 | 0.8908 |

The paper proposed a proactive approach by building predictive models for Electrical Submersible Pump lifespan based on machine learning algorithms. Unlike previous studies, this study used different methods with the same data set to find out the best method to use in real life. It is concluded that the Gradient Boosting Machine is the best suitable method for predicting ESP life cycle, not only because it has the most accurate predictive results but also it can give a ranking of influence factors. Although the temperature is known for a long time to have a damaged effect on ESP duration, it has not been demonstrated based on big data mining. Hence, this study is the first to show that the temperature is the most influential factor on ESP run life. This knowledge will help for further improvement of ESP operation worldwide.

This research is funded by Hochiminh City University of Technology-VNU-HCM under grant number T-DCDK-2018-93.

The authors declare no conflicts of interest regarding the publication of this paper.

Pham, S.T., Vo, P.S. and Nguyen, D.N. (2021) Effective Electrical Submersible Pump Management Using Machine Learning. Open Journal of Civil Engineering, 11, 70-80. https://doi.org/10.4236/ojce.2021.111005