Question: I want to extract data from a website. What is the best way to create a Wildcard Match to extract the text I want?

Answer: In our experience, the easiest way to construct the Wildcard Matches is to build the match starting with the page’s HTML code. Follow these steps:

  1. Load the page you want to extract data from in your web browser.
  2. If you’re using FireFox, select “View | Page Source” from the main menu bar. If you’re using Internet Explorer, select “View | Source”.
  3. Locate the text you wish to extract in the HTML source.
  4. Select the text you wish to extract and the surrounding HTML tags. For example, if we wanted to capture all the title text information from this page, we’d select:
    <TITLE>Inspyder - I want to extract data from a website, what is the best way to create a Wildcard Match?</TITLE>
  5. Now replace the text with the field name you wish to capture the data to. If we wanted to call the page title text “TitleText”, our query would look like:
    <TITLE>#TitleText#</TITLE>
  6. Finally, make sure “Include HTML” is checked in the Query options (otherwise the HTML will be ignored and the query will never match anything).