Page Scraping of HTML Tables for WML Presentation

Since the popularity of the Internet has grown, tremendous amounts of information has been uploaded for access via a common web browser. When this information involves tabular data, it is usually presented in the form of an HTML table.

Since HTML tables use code that is reasonably structured and repetitive in nature, it can lend itself well to what is often called "page scraping". Page scraping is the process of using a computer program to access the HTML web page and process the page information in a way that gathers particular information from that page. This information is often then presented in a different fashion, often in a more simple manner.

In the example shown here, we use a Java Input Stream to access the WAGS Tournament Schedule page of Washington Area Girls Soccer website. The HTML code is processed using a series of Regular Expressions to collect just the information that includes the schedule and scores of each of the tournament age and bracket divisions. Further processing occurs using custom code in a Java Servlet that selects just the game specific information for presentation as a WML page which can be viewed in a Mobile Phone WML browser.

The result of this page scrapping technique can be viewed via a WML Mobile web browser at http: //www.fieldmaster.org/wags/wags.wml. The information on the WAGS tournament games is courtesy of the WAGS Tournament website, and is available at www.wagstournament.com/schedules/schedules.html. The Socker Fan is using this information only as an example of how to perform page scraping techniques. More information on the use of Regular Expressions can be found at Regular-Expression.info. Sun Microsystems Java programming language began the support of Regular Expression with version 1.4 which is available at Sun's Java website, The Source For Java Technology.

Click here to return to The Fieldmaster home page...
www.fieldmaster.org