Search the Schlumberger Oilfield Glossary programmatically using Selenium.
Project description
Schlumberger Petroleum Glossary
Browse the Schlumberger Petroleum Glossary using Python (in English and Spanish).
For optimum performance, Use the Chrome browser and a fast and stable internet connection.
This package is intended for research or instructional use only.
Installation
- Install using pip:
pip install slb-glossary
Dependencies
Quick Start
import slb_glossary as slb
# Create a glossary object
with slb.Glossary(slb.Browser.CHROME, open_browser=True) as glossary:
# Search for a term
results = glossary.search("porosity")
# Print the results
for result in results:
print(result.asdict())
Usage
Please note that this is just a brief overview of the module. The module is properly documented and you are encouraged to read the docstrings for more information on the various methods and classes.
"topics" used in the context of this documentation refers to the subjects or topics in the glossary.
Instantiate a glossary object
Import the module:
import slb_glossary as slb
To use the glossary, you need to create a Glossary
object. The Glossary
class takes a few arguments:
browser
: The browser to use. It can be any of the values in theBrowser
enum. Ensure you have the browser selected installed on your machine.open_browser
: A boolean indicating whether to open the browser when searching the glossary or not. If this is True, a browser window is open when you search for a term. This can be useful for monitoring and debugging the search process. If you don't need to see the browser window, set this to False. This is analogous to running the browser in headless mode. The default value is False.page_load_timeout
: The maximum time to wait for a page to load before raising an exception.implicit_wait_time
: The maximum time to wait for an element to be found before raising an exception.language
: The language to use when searching the glossary. This ca be any of the values in theLanguage
enum. Presently, only English and Spanish are supported. The default value isLanguage.ENGLISH
.
glossary = slb.Glossary(slb.Browser.CHROME, open_browser=True)
Get all topics/subjects available in the glossary
When you initialize a glossary, the available topics are automatically fetched and stored in the topics
attribute.
topics = glossary.topics
print(topics)
This returns a mapping of the topic to the number of terms under the topic in the glossary
{
"Drilling": 452,
"Geology": 518,
...
}
Use glossary.topics_list
if you only need a list of the topics in the glossary. glossary.size
returns the total number of terms in the glossary.
If you need to refetch all topics call glossary.get_topics()
. Read the method's docstring for more info on its use.
Get a topic match
Do you have a topic in mind and are not sure if it is in the glossary? Use the get_topic_match
method to get a topic match. It returns a single topic that best matches the input topic.
topic = glossary.get_topic_match("drill")
print(topic)
# Output: Drilling
Search for a term
Use the search
method to search for a term in the glossary
results = glossary.search("porosity")
This returns a list of SearchResult
s for "porosity". You can also pass some optional arguments to the search
method:
under_topic
: Streamline search to a specific topicstart_letter
: Limit the search to terms starting with the given letter(s)max_results
: Limit the number of results returned.
Search for terms under a specific topic/subject
results = glossary.get_terms_on(topic="Well workover")
The get_terms_on
method returns a list of SearchResult
s for all terms under the specified topic.
The difference between search
and get_terms_on
is that search
searches the entire glossary while get_terms_on
searches only under the specified topic. Hence, search can contain terms from different topics.
The topic passed need not be an exact match to what is in the glossary. The glossary will choose the closest match to the provided topic that is available in the glossary.
Interesting fact: If you want to base your search on multiple topics, just pass a string with the topics separated by a comma. For example,
"Drilling, Well workover, Shale gas"
.
Search results
Search results are returned as SearchResult
objects. Each SearchResult
object has the following attributes:
term
: The term being searched fordefinition
: The definition of the termgrammatical_label
: The grammatical label of the term. Basically the part of speech of the termtopic
: The topic under which the term is foundurl
: The URL to the term in the glossary
To get the search results as a dictionary, use the asdict
method.
results = glossary.search("oblique fault")
for result in results:
print(result.asdict())
You could also convert search results to tuples using the astuple
method.
results = glossary.search("oblique fault")
for result in results:
print(result.astuple())
Other methods
Some other methods available in the Glossary
class are:
get_search_url
: Returns the correct glossary url for the given parameters.get_terms_urls
: Returns the URLs of all terms gotten using the given parameters.get_results_from_url
: Extracts search results from a given URL. Returns a list ofSearchResult
s.
Closing the glossary
When you are done using the glossary, it is important that you close it to free up resources. This is done by calling the close
method.
glossary.close()
If you used the Glossary
object as a context manager, you don't need to call the close
method. The Glossary
object will automatically close itself when the context manager exits. Also, on normal termination of the program, the Glossary
object will close itself (If it is not already closed).
Save/export search results to a file
A convenient way to save search results to a file is to use the saver
attribute of the glossary object.
results = glossary.search("gas lift")
glossary.saver.save(results, "./gas_lift.txt")
The save
method takes a list of SearchResult
s and the filename or file path to save the results to. The file save format is determined by the file extension. The supported file formats by default are 'xlsx', 'txt', 'csv' and 'json'.
Or check glossary.saver.supported_file_types
.
Customizing how results are saved
By default, the Glossary
class uses a Saver
class to save search results. This base Saver
class only supports a few file formats, which should be sufficient. However, if you need to save in an unsupported format. You can subclass the Saver
class thus;
from typing import List
import slb_glossary as slb
class FooSaver(slb.Saver):
@staticmethod
def save_as_xyz(results: List[SearchResult], filename: str):
# Validate filename or path
# Your implementation goes here
...
Read the docstrings of the Saver
class to get a good grasp of how to do this. Also, you may read the slb_glossary.saver
module to get an idea of how you would implement your custom save method.
There are two ways you can use your custom saver class.
- Create a
Glossary
subclass:
import slb_glossary as slb
class FooGlossary(slb.Glossary):
saver_class = FooSaver
...
glossary = FooGlossary(...)
glossary.saver.save(...)
- Instantiate a saver directly
saver = FooSaver()
saver.save(...)
Contributing
Contributions are welcome. Please fork the repository and submit a pull request.
Credits
This project was inspired by the 2023/24/25 Petrobowl Team of the Federal University of Petroleum Resources, Effurun, Delta state, Nigeria. It aided the team's preparation for the PetroQuiz and PetroBowl competitions organized by the Society of Petroleum Engineers(SPE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file slb_glossary-0.0.1.tar.gz
.
File metadata
- Download URL: slb_glossary-0.0.1.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 614032c3a800ec458bb46d1c5644072f6e743e78bfd00564d55dbcbed188c2f7 |
|
MD5 | 2ec408983b4fb7b72140766be236b412 |
|
BLAKE2b-256 | bf60e8f5be0cbcc7c85d105e1a70bed7ccaa6b18cffdf727d3a82eb717055b73 |
File details
Details for the file slb_glossary-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: slb_glossary-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38af0c7df78850680d920001c647fce3b6634897095129b58d6eaa553f533e83 |
|
MD5 | ecb9b84ff666111a7ced50403e1cb07c |
|
BLAKE2b-256 | 99bbfe034013a786f64df7ab291e16287e20857d26395600895d24998baabcb7 |