Learn Python Series (#20) - PyMongo Part 3
Learn Python Series (#20) - PyMongo Part 3
What Will I Learn?
- You will learn about how to use the Logical Operators
$and
,$or
,$nor
and$not
; - about some other operators including
$text
and$search
; - how to address nested fields;
- how to create and use an index for textual search;
- how to use the
$size
operator; - and how to use the flexible
$where
operator, which will be of special interest to JavaScript-loving aspiring Pythonistas!
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution;
- The ambition to learn Python programming;
- A running MongoDB installation on your computer system, as explained in the previous two episodes .
Difficulty
Intermediate
Curriculum (of the Learn Python Series
):
- Learn Python Series - Intro
- Learn Python Series (#2) - Handling Strings Part 1
- Learn Python Series (#3) - Handling Strings Part 2
- Learn Python Series (#4) - Round-Up #1
- Learn Python Series (#5) - Handling Lists Part 1
- Learn Python Series (#6) - Handling Lists Part 2
- Learn Python Series (#7) - Handling Dictionaries
- Learn Python Series (#8) - Handling Tuples
- Learn Python Series (#9) - Using Import
- Learn Python Series (#10) - Matplotlib Part 1
- Learn Python Series (#11) - NumPy Part 1
- Learn Python Series (#12) - Handling Files
- Learn Python Series (#13) - Mini Project - Developing a Web Crawler Part 1
- Learn Python Series (#14) - Mini Project - Developing a Web Crawler Part 2
- Learn Python Series (#15) - Handling JSON
- Learn Python Series (#16) - Mini Project - Developing a Web Crawler Part 3
- Learn Python Series (#17) - Roundup #2 - Combining and analyzing any-to-any multi-currency historical data
- Learn Python Series (#18) - PyMongo Part 1
- Learn Python Series (#19) - PyMongo Part 2
Learn Python Series (#20) - PyMongo Part 3
In the previous Learn Python Series
episodes about PyMongo Part 1
and PyMongo Part 2
, we learned what MongoDB & PyMongo are about, how to do "CRUD" (Create, Read, Update, Delete) operations - and how to use Query Operations for more advanced querying.
In this episode, we'll expand our knowledge regarding PyMongo with a number of techniques: we are going to discuss so-called Logical Operators. Let's get started!
Adding some (dummy) data to the test_mongo
dataset
Up until now, we've kept the dataset of the test_mongo
test database pretty simple. We didn't need more or more complex data to explain what was discussed in the PyMongo Parts 1 & 2 tutorial episodes anyway. But since we're going to discuss creating more complex data queries via using so-called Logical Operators regarding PyMongo, let's first add some more data to the test_mongo
dataset.
import pymongo
from bson.objectid import ObjectId
from pprint import pprint
client = pymongo.MongoClient('mongodb://localhost:27017')
db = client.test_mongo
coll = db.accounts
scipio_posts = [
{
"title": "Learn Python Series (#19) - PyMongo Part 2",
"url": "https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2"
},
{
"title": "Learn Python Series (#18) - PyMongo Part 1",
"url": "https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1"
},
{
"title": "Learn Python Series (#17) - Roundup #2 - Combining and analyzing any-to-any multi-currency historical data",
"url": "https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data"
}
]
result = coll.update_one({"account": "scipio"}, {"$set": {"posts": scipio_posts}})
jedigeiss_posts = [
{
"title": "Das Tagebuch der (Steem)Leiden",
"url": "https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden"
},
{
"title": "D-A-CH Support -- Discord / Steem Bridge -- added Autovoter",
"url": "https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter"
}
]
result = coll.update_one({"account": "jedigeiss"}, {"$set": {"posts": jedigeiss_posts}})
Okay, now as a result, 2 of the 4 documents in the accounts
collection have a "posts"
field, of which one holds 3 and the other 2 post items, each containing 2 fields ("title"
and "url"
); that will do for now.
Using Logical Operators for more complex data queries
PyMongo gives us support for logical query operators $and
, $or
, $nor
and $not
, with which we're able to construct more complex data queries. Let's go over them one by one to see how they work and what kind of queries we can come up with.
Using the $and
operator
The $and
operator can be used to perform logical AND
operations (where all expressions involved need to be True
) on a list (array) of 2 or more expressions. As a result, only the documents that satisfy all expressions will be selected.
result = list(coll.find(
{"$and": [
{"posts": {"$exists": True}},
{"account_id": {"$exists": True}}
]}
))
pprint(result)
[{'_id': ObjectId('5ae46285dd58330cd666056f'),
'account': 'scipio',
'account_id': 422033,
'active': True,
'posts': [{'title': 'Learn Python Series (#19) - PyMongo Part 2',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2'},
{'title': 'Learn Python Series (#18) - PyMongo Part 1',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1'},
{'title': 'Learn Python Series (#17) - Roundup #2 - Combining and '
'analyzing any-to-any multi-currency historical data',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data'}],
'slogan': "Does it matter who's right, or who's left?"}]
Nota bene: As we can see, only 1 account (scipio
) is returned, being the only account that has both a "posts"
field as well as an "account_id"
. The account jedigeiss
has a "posts"
field as well but not an "account_id"
, where the account fabiyamada
has an "account_id"
field but not a "posts"
field.
Using the $or
operator
The $or
operator can be used to perform logical OR
operations (where only one of all expressions involved needsto be True
) on a list (array) of 2 or more expressions. As a result, the documents that satisfy one or more of all expressions involved will be selected.
result = list(coll.find(
{"$or": [
{"posts": {"$exists": True}},
{"account_id": {"$exists": True}}
]}
))
pprint(result)
[{'_id': ObjectId('5ae46285dd58330cd666056f'),
'account': 'scipio',
'account_id': 422033,
'active': True,
'posts': [{'title': 'Learn Python Series (#19) - PyMongo Part 2',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2'},
{'title': 'Learn Python Series (#18) - PyMongo Part 1',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1'},
{'title': 'Learn Python Series (#17) - Roundup #2 - Combining and '
'analyzing any-to-any multi-currency historical data',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data'}],
'slogan': "Does it matter who's right, or who's left?"},
{'_id': ObjectId('5ae46693dd58330cd6660571'),
'account': 'fabiyamada',
'account_id': 261379,
'active': True},
{'_id': ObjectId('5ae4bc73dd58332709589486'),
'account': 'jedigeiss',
'active': True,
'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
{'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
'Autovoter',
'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
'Banker, Gamer'}]
Nota bene: The query we now ran using $or
deliberately is the same, to show you the difference, as the example $and
query we began with, because right now in total 3 documents were selected:
- the account (
scipio
) has both a"posts"
field as well as an"account_id"
field, which satisfies the$or
clause (at least 1 expression evaluates to True, and it doesn't matter both expressions do); - the account
jedigeiss
has a"posts"
field; - the account
fabiyamada
has an"account_id"
field.
Using the $nor
operator
The $nor
operator performs logical NOR
operations on a list (array) of 2 or more expressions. As a result, the documents that satisfy NONE of all expressions, ergo when ALL expressions FAIL, will be selected.
result = list(coll.find(
{"$nor": [
{"posts": {"$exists": True}},
{"account_id": {"$exists": True}}
]}
))
pprint(result)
[{'_id': ObjectId('5ae46693dd58330cd6660570'),
'account': 'stoodkev',
'active': True}]
Using the $not
operator
The $not
operator performs logical NOT
operations on 1 expression and selects the documents that do not match that expression. $not
affects other operators.
result = list(coll.find(
{"account_id":{
"$not": {"$exists": True}
}}
))
pprint(result)
[{'_id': ObjectId('5ae46693dd58330cd6660570'),
'account': 'stoodkev',
'active': True},
{'_id': ObjectId('5ae4bc73dd58332709589486'),
'account': 'jedigeiss',
'active': True,
'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
{'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
'Autovoter',
'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
'Banker, Gamer'}]
Other operators
The $size
operator
If you use the $size
operator on a list (array) field, it will match documents having that exact number of elements:
result = list(coll.find(
{"posts": {"$size": 2}}
))
pprint(result)
[{'_id': ObjectId('5ae4bc73dd58332709589486'),
'account': 'jedigeiss',
'active': True,
'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
{'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
'Autovoter',
'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
'Banker, Gamer'}]
Nota bene: the $size
operator expects an integer number as its argument, you cannot use an expression such as:
result = list(coll.find(
{"posts": {"$size": {"$gte": 2}}}
))
The $where
operator
For those of you that have experience in JavaScript programming, or for those aspiring Pythonistas feeling adventurous: special attention for the $where
operator, since it allows for passing an argument string containing - you probably guessed it by now - a JavaScript expression (or even a JavaScript function).
However, please note that although the $where
operator does provide a lot of flexibility, it is quite slow because it requires the MongoDB database to evaluate the JavaScript expression separately for every document in the selected collection.
A nice example, in which the $where
expression shines, is by performing a "greater-than-or-equal-to"-type query for the number of elements in the "posts"
list (array) field, which doesn't work with the same flexibility as we've just seen with the $size
example:
result = list(coll.find(
{"posts": {"$exists": True}, "$where":"this.posts.length >= 2"} ))
pprint(result)
[{'_id': ObjectId('5ae46285dd58330cd666056f'),
'account': 'scipio',
'account_id': 422033,
'active': True,
'posts': [{'title': 'Learn Python Series (#19) - PyMongo Part 2',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-19-pymongo-part-2'},
{'title': 'Learn Python Series (#18) - PyMongo Part 1',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-18-pymongo-part-1'},
{'title': 'Learn Python Series (#17) - Roundup #2 - Combining and '
'analyzing any-to-any multi-currency historical data',
'url': 'https://steemit.com/utopian-io/@scipio/learn-python-series-17-roundup-2-combining-and-analyzing-any-to-any-multi-currency-historical-data'}],
'slogan': "Does it matter who's right, or who's left?"},
{'_id': ObjectId('5ae4bc73dd58332709589486'),
'account': 'jedigeiss',
'active': True,
'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
{'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
'Autovoter',
'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
'Banker, Gamer'}]
and as you can see, in this case both the accounts of scipio
as well as jedigeiss
were selected correctly.
Perform a textual search using the $text
& $search
operators
The $text
operator is pretty cool, in that it allows you to search for words contained within the textual content of specific fields, provided those field are indexed with a text index.
To do so, first place an index (using create_index()
) where you pass in the field key(s) as a list (array) and TEXT as the index direction. Then combine the $text
& $search
operators and look for a specific word.
Like so:
# Since the "title" field is nested within the elements contained,
# inside the "posts" field, use the dot-notation `.` to index
# the field "posts.title"
coll.create_index([("posts.title", pymongo.TEXT)])
result = list(coll.find(
{"$text": {"$search": "Discord"}}
))
pprint(result)
[{'_id': ObjectId('5ae4bc73dd58332709589486'),
'account': 'jedigeiss',
'active': True,
'posts': [{'title': 'Das Tagebuch der (Steem)Leiden',
'url': 'https://steemit.com/deutsch/@jedigeiss/das-tagebuch-der-steem-leiden'},
{'title': 'D-A-CH Support -- Discord / Steem Bridge -- added '
'Autovoter',
'url': 'https://steemit.com/utopian-io/@jedigeiss/d-a-ch-support-discord-steem-bridge-added-autovoter'}],
'slogan': 'IT Nerd, Risk Specialist, Musician, Cryptocoin Enthusiast, '
'Banker, Gamer'}]
What did we learn, hopefully?
In this episode, we again gained more knowledge on how to query MongoDB via the PyMongo package. We discussed the Logical Query Operators $and
, $or
, $nor
and $not
, plus we went over the operators $size
, $where
, $text
and $search
.
And because I deliberately kept the queries and underlying dataset simple enough to comprehend (hopefully?) yet rich enough to demonstrate the various querying techniques, I hope it became clear that MongoDB / PyMongo provides us with some pretty powerful mechanisms to "precisely pinpoint" the data we're looking for based on a number of criteria!
Thank you for your time!
Posted on Utopian.io - Rewarding Open Source Contributors
Hey @scipio
We're already looking forward to your next contribution!
Decentralised Rewards
Share your expertise and knowledge by rating contributions made by others on Utopian.io to help us reward the best contributions together.
Utopian Witness!
Vote for Utopian Witness! We are made of developers, system administrators, entrepreneurs, artists, content creators, thinkers. We embrace every nationality, mindset and belief.
Want to chat? Join us on Discord https://discord.me/utopian-io
Thank you for another well presented tutorial !
Need help? Write a ticket on https://support.utopian.io.
Chat with us on Discord.
[utopian-moderator]
Thank you too! :-)
python..the name of starange..feel like awesome
Good Post. 👌
its really great information about Python. i think every python learner student is very benefit from this article.
Thank you from python student side.
dear @scipio your daily python learning series is really helpful for all.
Oh man I have waited for this tutorial my total life 😍 Thank you @scipio