Image of Data Extraction dengan Teknik Scraping

Text

Data Extraction dengan Teknik Scraping



ABSTRACT
The data contained on the internet is usually in the form of heterogeneous (different) formats,
unstructured and generally in HTML (Hypertext Markup Language) format, so the analysis
process is difficult to do directly. Web scraping is a technique that can be used to separate
main content from other parts such as headers and footers on a website. Web scraping can
also be used to extract data or information from HTML elements of the website so that it can
group data formats that were not previously structured into structured. Web scraping can be
combined with various methods, one of which is the HTML DOM (Document Object Model).
This system is built for scraping and extracting data on web pages, where the data taken is
only the text format contained in the paragraph (p) elements of the website. Tests carried out
are using the blackbox method, and testing manually to see the success rate of the system
built in doing text format scraping. The results of the implementation and final testing of the
system in scraping with the HTML DOM method to several websites at once is that the data
extraction process is successful and runs well, but there are some failures in the scraping
process to some websites that use more security such as SSL (Secure Socket Layer) / TLS
(Transport Layer Security), anti scraping and absence of p elements on targeted website
pages.

Keywords: Web Scraping, PHP, HTML DOM.


Ketersediaan

196101911961-019Tersedia

Informasi Detil

Judul Seri
-
No. Panggil
1961-019
Penerbit FTI ITP : Padang.,
Deskripsi Fisik
94 Halaman
Bahasa
Indonesia
ISBN/ISSN
2015610020
Klasifikasi
NONE
Tipe Isi
text
Tipe Media
computer
Tipe Pembawa
other (computer)
Edisi
20192
Subyek
-
Info Detil Spesifik
-
Pernyataan Tanggungjawab

Versi lain/terkait

Tidak tersedia versi lain


Lampiran Berkas



Informasi


DETAIL CANTUMAN


Kembali ke sebelumnyaXML DetailCite this