/ Python

Building a Search Engine for our Scraped Data on Elasticsearch Part 2

In Part 1, we Scraped this website for data and ingested it into elasticsearch.

In this post, we will build the search engine frontend, that we will use to search for blog posts.

We will code our application that the results that is returned is hyperlinked so when you click on the page title, that it takes you directly to the post.

Requirements:

We nneed Python Flask and Elasticsearch Python Modules to be installed, in order to proceed:

$ pip install flask
$ pip install elasticsearch

Our Python Flask Application:

Our Python Flask Appliation will be our Web Framework and we will render our html files using jinja templates.

In my case my application is named app.py:

from flask import Flask, render_template, request
from elasticsearch import Elasticsearch

app = Flask(__name__)
es = Elasticsearch('10.0.1.10', port=9200)

@app.route('/')
def home():
    return render_template('search.html')

@app.route('/search/results', methods=['GET', 'POST'])
def search_request():
    search_term = request.form["input"]
    res = es.search(
        index="scrape-sysadmins", 
        size=20, 
        body={
            "query": {
                "multi_match" : {
                    "query": search_term, 
                    "fields": [
                        "url", 
                        "title", 
                        "tags"
                    ] 
                }
            }
        }
    )
    return render_template('results.html', res=res )

if __name__ == '__main__':
    app.secret_key = 'mysecret'
    app.run(host='0.0.0.0', port=5000)

Our Index Page will be named templates/search.html:

<!DOCTYPE html>
<html lang="en">
<head>

  <meta charset="utf-8">
  <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Pacifico">
  <link rel="icon" href="http://obj-cache.cloud.ruanbekker.com/favicon.ico">
  <link href="//netdna.bootstrapcdn.com/bootstrap/3.0.0/css/bootstrap.min.css" rel="stylesheet">

  <title>Bookmarks Search</title>

</head>
<body>

  <div class="container">
    <div style="background:transparent !important" class="jumbotron">
      <div style="font-family: 'Pacifico', cursive;">
        <p>
          <center>
          <font size="8">Search for Blog Posts</font>
          </center>
        </p>
      </div>
    </div>

    <form action="/search/results" method="post">
      <div class="input-group">
        <input type="text" class="form-control input-lg" name="input" placeholder="Search" autofocus>
        <div class="input-group-btn">
          <button class="btn btn-primary btn-lg" type="submit">
            <i class="glyphicon glyphicon-search"></i>
          </button>
        </div>
      </div>
    </form>

    <br><br>
      <footer class="footer">
        <p>&copy; 2017 Ruan Bekker </p>
      </footer>

      </div>
    </div>
  </body>
</html>

After we have entered our search query, we will do a POST request to our templates/results.html page, this page will have some logic in, as it will use the data that gets passed to it,, and run through a for loop:

<!DOCTYPE html>
<html lang="en">
  <head>

    <meta charset="utf-8">
    <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Pacifico">
    <link rel="icon" href="http://obj-cache.cloud.ruanbekker.com/favicon.ico">
    <link href="//netdna.bootstrapcdn.com/bootstrap/3.0.0/css/bootstrap.min.css" rel="stylesheet">

    <title>Bookmarks Search</title>

  </head>
  <body>

  <div class="container">
    <div style="background:transparent !important" class="jumbotron">
      <div style="font-family: 'Pacifico', cursive;">
        <p>
          <center>
          <font size="8">Search for Blog Posts</font>
          </center>
        </p>
      </div>
    </div>

  <form action="/search/results" method="post">
    <div class="input-group">
      <input type="text" class="form-control input-lg" name="input" placeholder="Search"  autofocus>
      <div class="input-group-btn">
        <button class="btn btn-primary btn-lg" type="submit">
          <i class="glyphicon glyphicon-search"></i>
        </button>
      </div>
    </div>
  </form>

  <center>
      <h1>Results: ({{ res['hits']['total'] }}) </h1>
  </center>

  <table class="table">
    <thead>
      <tr>
        <th>Date Stamp</th>
        <th>Source</th>
        <th>Title</th>
        <th>Tags</th>
      </tr>
    </thead>

{% for hit in res['hits']['hits'] %}
    <tbody>
      <tr>
        <th scope="row">{{ hit['_source']['date'] }}</th>
          <td>{{ hit['_index'] }}</td>
          <td><a href="{{ hit['_source']['url'] }}">{{ hit['_source']['title'] }}</a></td>
          <td>{{ hit['_source']['tags'] }}</td>
      </tr>
    </tbody>
{% endfor %}
  </table>

        <footer class="footer">
          <p>&copy; 2017 Ruan Bekker</p>
        </footer>

      </div>
    </div>
  </body>
</html>

Running your Python Flask Web Application:

If everything was running according to plan, you should be able to run your application and it will listen on port 5000:

$ python app.py
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

When you access your Endpoint on Port 5000, you should be able to see the main screen, which should look like this:

For this example I took a screenshot after searching for Linux, and found this output:

You can also update the Flask code to search over multiple indexes, so you could have different indexes for different blogs, and the HTML will render where the posts came from, where you should see in source column.

I hope that everyone found this useful.