Detail Cantuman

Text

Data Extraction dengan Teknik Scraping

Agha Arfa Wahidin - Personal Name

ABSTRACT
The data contained on the internet is usually in the form of heterogeneous (different) formats,
unstructured and generally in HTML (Hypertext Markup Language) format, so the analysis
process is difficult to do directly. Web scraping is a technique that can be used to separate
main content from other parts such as headers and footers on a website. Web scraping can
also be used to extract data or information from HTML elements of the website so that it can
group data formats that were not previously structured into structured. Web scraping can be
combined with various methods, one of which is the HTML DOM (Document Object Model).
This system is built for scraping and extracting data on web pages, where the data taken is
only the text format contained in the paragraph (p) elements of the website. Tests carried out
are using the blackbox method, and testing manually to see the success rate of the system
built in doing text format scraping. The results of the implementation and final testing of the
system in scraping with the HTML DOM method to several websites at once is that the data
extraction process is successful and runs well, but there are some failures in the scraping
process to some websites that use more security such as SSL (Secure Socket Layer) / TLS
(Transport Layer Security), anti scraping and absence of p elements on targeted website
pages.

Keywords: Web Scraping, PHP, HTML DOM.

Ketersediaan

19610191 1961-019 Tersedia

Informasi Detil

Judul Seri	-
No. Panggil	1961-019
Penerbit	FTI ITP : Padang., 2019
Deskripsi Fisik	94 Halaman
Bahasa	Indonesia
ISBN/ISSN	2015610020
Klasifikasi	NONE
Tipe Isi	text

Tipe Media	computer
Tipe Pembawa	other (computer)
Edisi	20192
Subyek	-
Info Detil Spesifik	-
Pernyataan Tanggungjawab	-

Versi lain/terkait

Tidak tersedia versi lain

Lampiran Berkas

Informasi

DETAIL CANTUMAN

Kembali ke sebelumnya XML Detail Cite this