Learn Python Series (#20) - PyMongo Part 3

in #utopian-io7 years ago (edited)

Learn Python Series (#20) - PyMongo Part 3

python_logo.png

What Will I Learn?

  • You will learn about how to use the Logical Operators $and, $or, $nor and $not;
  • about some other operators including $text and $search;
  • how to address nested fields;
  • how to create and use an index for textual search;
  • how to use the $size operator;
  • and how to use the flexible $where operator, which will be of special interest to JavaScript-loving aspiring Pythonistas!

Requirements

  • A working modern computer running macOS, Windows or Ubuntu;
  • An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution;
  • The ambition to learn Python programming;
  • A running MongoDB installation on your computer system, as explained in the previous two episodes .

Difficulty

Intermediate

Curriculum (of the Learn Python Series):

Learn Python Series (#20) - PyMongo Part 3

In the previous Learn Python Series episodes about PyMongo Part 1 and PyMongo Part 2, we learned what MongoDB & PyMongo are about, how to do "CRUD" (Create, Read, Update, Delete) operations - and how to use Query Operations for more advanced querying.

In this episode, we'll expand our knowledge regarding PyMongo with a number of techniques: we are going to discuss so-called Logical Operators. Let's get started!

Adding some (dummy) data to the test_mongo dataset

Up until now, we've kept the dataset of the test_mongo test database pretty simple. We didn't need more or more complex data to explain what was discussed in the PyMongo Parts 1 & 2 tutorial episodes anyway. But since we're going to discuss creating more complex data queries via using so-called Logical Operators regarding PyMongo, let's first add some more data to the test_mongo dataset.

import pymongo
from bson.objectid import ObjectId
from pprint import pprint

client = pymongo.MongoClient('mongodb://localhost:27017')
db = client.test_mongo
coll = db.accounts

scipio_posts = [
        {
            "title": "Learn Python Series (#19) - PyMongo Part 2",
             "url": "https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2"
        },
        {
            "title": "Learn Python Series (#18) - PyMongo Part 1",
             "url": "https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1"
        },
        {
            "title": "Learn Python Series (#17) - Roundup #2 - Combining and analyzing any-to-any multi-currency historical data",
             "url": "https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data"
        }
]

result = coll.update_one({"account": "scipio"}, {"$set": {"posts": scipio_posts}})
jedigeiss_posts = [
        {
            "title": "Das Tagebuch der (Steem)Leiden",
             "url": "https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden"
        },        
        {
            "title": "D-A-CH Support -- Discord / Steem Bridge -- added Autovoter",
             "url": "https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter"
        }
]

result = coll.update_one({"account": "jedigeiss"}, {"$set": {"posts": jedigeiss_posts}})

Okay, now as a result, 2 of the 4 documents in the accounts collection have a "posts" field, of which one holds 3 and the other 2 post items, each containing 2 fields ("title" and "url"); that will do for now.

1.png

Using Logical Operators for more complex data queries

PyMongo gives us support for logical query operators $and, $or, $nor and $not, with which we're able to construct more complex data queries. Let's go over them one by one to see how they work and what kind of queries we can come up with.

Using the $and operator

The $and operator can be used to perform logical AND operations (where all expressions involved need to be True) on a list (array) of 2 or more expressions. As a result, only the documents that satisfy all expressions will be selected.

result = list(coll.find(
    {"$and": [
        {"posts": {"$exists": True}},
        {"account_id": {"$exists": True}}
    ]}
))
pprint(result)
[{'_id': ObjectId('5ae46285dd58330cd666056f'),
  'account': 'scipio',
  'account_id': 422033,
  'active': True,
  'posts': [{'title': 'Learn Python Series (#19) - PyMongo Part 2',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2'},
            {'title': 'Learn Python Series (#18) - PyMongo Part 1',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1'},
            {'title': 'Learn Python Series (#17) - Roundup #2 - Combining and '
                      'analyzing any-to-any multi-currency historical data',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data'}],
  'slogan': "Does it matter who's right, or who's left?"}]

Nota bene: As we can see, only 1 account (scipio) is returned, being the only account that has both a "posts" field as well as an "account_id". The account jedigeiss has a "posts" field as well but not an "account_id", where the account fabiyamada has an "account_id" field but not a "posts" field.

Using the $or operator

The $or operator can be used to perform logical OR operations (where only one of all expressions involved needsto be True) on a list (array) of 2 or more expressions. As a result, the documents that satisfy one or more of all expressions involved will be selected.

result = list(coll.find(
    {"$or": [
        {"posts": {"$exists": True}},
        {"account_id": {"$exists": True}}
    ]}
))
pprint(result)
[{'_id': ObjectId('5ae46285dd58330cd666056f'),
  'account': 'scipio',
  'account_id': 422033,
  'active': True,
  'posts': [{'title': 'Learn Python Series (#19) - PyMongo Part 2',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2'},
            {'title': 'Learn Python Series (#18) - PyMongo Part 1',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1'},
            {'title': 'Learn Python Series (#17) - Roundup #2 - Combining and '
                      'analyzing any-to-any multi-currency historical data',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data'}],
  'slogan': "Does it matter who's right, or who's left?"},
 {'_id': ObjectId('5ae46693dd58330cd6660571'),
  'account': 'fabiyamada',
  'account_id': 261379,
  'active': True},
 {'_id': ObjectId('5ae4bc73dd58332709589486'),
  'account': 'jedigeiss',
  'active': True,
  'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
             'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
            {'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
                      'Autovoter',
             'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
  'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
            'Banker, Gamer'}]

Nota bene: The query we now ran using $or deliberately is the same, to show you the difference, as the example $and query we began with, because right now in total 3 documents were selected:

  • the account (scipio) has both a "posts" field as well as an "account_id" field, which satisfies the $or clause (at least 1 expression evaluates to True, and it doesn't matter both expressions do);
  • the account jedigeiss has a "posts" field;
  • the account fabiyamada has an "account_id" field.

Using the $nor operator

The $nor operator performs logical NOR operations on a list (array) of 2 or more expressions. As a result, the documents that satisfy NONE of all expressions, ergo when ALL expressions FAIL, will be selected.

result = list(coll.find(
    {"$nor": [
        {"posts": {"$exists": True}},
        {"account_id": {"$exists": True}}
    ]}
))
pprint(result)
[{'_id': ObjectId('5ae46693dd58330cd6660570'),
  'account': 'stoodkev',
  'active': True}]

Using the $not operator

The $not operator performs logical NOT operations on 1 expression and selects the documents that do not match that expression. $not affects other operators.

result = list(coll.find(
    {"account_id":{
        "$not": {"$exists": True}
    }}
))
pprint(result)
[{'_id': ObjectId('5ae46693dd58330cd6660570'),
  'account': 'stoodkev',
  'active': True},
 {'_id': ObjectId('5ae4bc73dd58332709589486'),
  'account': 'jedigeiss',
  'active': True,
  'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
             'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
            {'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
                      'Autovoter',
             'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
  'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
            'Banker, Gamer'}]

Other operators

The $size operator

If you use the $size operator on a list (array) field, it will match documents having that exact number of elements:

result = list(coll.find(
    {"posts": {"$size": 2}}
))
pprint(result)
[{'_id': ObjectId('5ae4bc73dd58332709589486'),
  'account': 'jedigeiss',
  'active': True,
  'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
             'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
            {'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
                      'Autovoter',
             'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
  'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
            'Banker, Gamer'}]

Nota bene: the $size operator expects an integer number as its argument, you cannot use an expression such as:

result = list(coll.find(
    {"posts": {"$size": {"$gte": 2}}}
))

The $where operator

For those of you that have experience in JavaScript programming, or for those aspiring Pythonistas feeling adventurous: special attention for the $where operator, since it allows for passing an argument string containing - you probably guessed it by now - a JavaScript expression (or even a JavaScript function).

However, please note that although the $where operator does provide a lot of flexibility, it is quite slow because it requires the MongoDB database to evaluate the JavaScript expression separately for every document in the selected collection.

A nice example, in which the $where expression shines, is by performing a "greater-than-or-equal-to"-type query for the number of elements in the "posts" list (array) field, which doesn't work with the same flexibility as we've just seen with the $size example:

result = list(coll.find(
    {"posts": {"$exists": True}, "$where":"this.posts.length >= 2"} ))
pprint(result)
[{'_id': ObjectId('5ae46285dd58330cd666056f'),
  'account': 'scipio',
  'account_id': 422033,
  'active': True,
  'posts': [{'title': 'Learn Python Series (#19) - PyMongo Part 2',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2'},
            {'title': 'Learn Python Series (#18) - PyMongo Part 1',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1'},
            {'title': 'Learn Python Series (#17) - Roundup #2 - Combining and '
                      'analyzing any-to-any multi-currency historical data',
             'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data'}],
  'slogan': "Does it matter who's right, or who's left?"},
 {'_id': ObjectId('5ae4bc73dd58332709589486'),
  'account': 'jedigeiss',
  'active': True,
  'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
             'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
            {'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
                      'Autovoter',
             'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
  'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
            'Banker, Gamer'}]

and as you can see, in this case both the accounts of scipio as well as jedigeiss were selected correctly.

Perform a textual search using the $text & $search operators

The $text operator is pretty cool, in that it allows you to search for words contained within the textual content of specific fields, provided those field are indexed with a text index.

To do so, first place an index (using create_index()) where you pass in the field key(s) as a list (array) and TEXT as the index direction. Then combine the $text & $search operators and look for a specific word.

Like so:

# Since the "title" field is nested within the elements contained,
# inside the "posts" field, use the dot-notation `.` to index
# the field "posts.title"
coll.create_index([("posts.title", pymongo.TEXT)])

result = list(coll.find(
    {"$text": {"$search": "Discord"}}
))
pprint(result)
[{'_id': ObjectId('5ae4bc73dd58332709589486'),
  'account': 'jedigeiss',
  'active': True,
  'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
             'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
            {'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
                      'Autovoter',
             'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
  'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
            'Banker, Gamer'}]

What did we learn, hopefully?

In this episode, we again gained more knowledge on how to query MongoDB via the PyMongo package. We discussed the Logical Query Operators $and, $or, $nor and $not, plus we went over the operators $size, $where, $text and $search.

And because I deliberately kept the queries and underlying dataset simple enough to comprehend (hopefully?) yet rich enough to demonstrate the various querying techniques, I hope it became clear that MongoDB / PyMongo provides us with some pretty powerful mechanisms to "precisely pinpoint" the data we're looking for based on a number of criteria!

Thank you for your time!



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Hey @scipio

We're already looking forward to your next contribution!

Decentralised Rewards

Share your expertise and knowledge by rating contributions made by others on Utopian.io to help us reward the best contributions together.

Utopian Witness!

Vote for Utopian Witness! We are made of developers, system administrators, entrepreneurs, artists, content creators, thinkers. We embrace every nationality, mindset and belief.

Want to chat? Join us on Discord https://discord.me/utopian-io

Thank you for another well presented tutorial !


Need help? Write a ticket on https://support.utopian.io.
Chat with us on Discord.

[utopian-moderator]

Thank you too! :-)

python..the name of starange..feel like awesome

its really great information about Python. i think every python learner student is very benefit from this article.

Thank you from python student side.

dear @scipio your daily python learning series is really helpful for all.

Oh man I have waited for this tutorial my total life 😍 Thank you @scipio