1.web pages other than core constituents, comprises of other aspects, just as banners, navigational elements, sxternal links, copyright information, etc. This rowdy content envelops more area of web pages. Generally, the data available on web pages accessible on web pages is either interpreted in XML, or HTML, or XHTML format that usually involve semi-structured text documents, which deficit formatted document structure. This document does not segregate between the text and the schema, and the bulk of structure used to symbolize the text relies upon the purpose. No syntactic is applied to semi-structured documents. This requires mining core contents of text documents to interpret words or sentences for recapturing relevant information. Even though there are many functioning methods that formulate the genuine content recognition problems as DOM tree node selection issue, each one has some kind of intervals. Here an approach based on pattern matching technique is proposed. This technique utilizes simple probing for mining of core contents from web pages which are usually semi-structured in essence. It craves visiting the relevant news web site using their URL, picking up the links connected to each news page of particularized category, mining the data including metadata from each of these news web pages. The accession utilizes devised algorithm that applies regular verbalization to recognize the accurate pattern for mining the real text contents from these new documents. Recommended approach deals with news web pages of any size and mines core contents with adeptness and high veracity.2.The demand for a fitter online water quality supervising system to control and supply drinking water treatment processes is proven. Water utility company in Surabaya known as PDAM Surabaya has few repositories in their water supply system that are being assesses by WTW IQ SensorNet 2020 XT. Nevertheless, on that sensor devices although it can provide some of the water quality parameters value facts but the sensor is apathetic and internal data is yet stocked in the sensor itself. To fix the problem, we projected an application of data logger to supervise data collections online water quality monitoring system by using web extraction. This application is manufactured by using Python language, the application is capable of gathering data fron sensor by using web extraction. We have employed Beautiful soap library and store the data in SQL. In these research, our examination shows the  accomplishment of the application by weighing the accuracy of data that has been extracted from sensors are about 99% , the flaw rate is less than 1%, and MSE ( Mean Square Error) around 0.35%. We also estimated the growth of the data size that is evenly increasing and also measured the correlation among the data size with the delay which shows the worth of out data is around 0.0000002739. It shows that our application is capable to work in real-time and the delay is not affected by the size of data.3.The desire to construct the application is to suggest a quick fix to its consumers in gathering computers befitting their needs. Price comparison  based on data sources recovered from five computer shops is one of the trait of this application to help users in saving the cost of acquiring PC elements and assembling the computer easily. This contrasting trait relies on a fundamental consumer’s paramount  i.e, they want to purchase items not only at cheapest price but also acquiring the best quality. The research begin with the arrangement of questionaires to some respondants who had purchased computer elements or gathered a computer online. This questionaire is assesed to confirm that all traits which were previously stated by the author are applicable to user needs. Then, in order to extract required data from five computer shops, the author utilizes Pentaho software as a tool to do web harvesting and web grabbing method. These methodologies grant the application to extract data from those five computer shops. The result of this research is a web-based application  constructed in PHP and Javascript with MySQL as its database.

