Skip to main content

Mycroft's multilingual text parsing and formatting library

Project description

License CLA Team Status

Build Status Coverage Status PRs Welcome Join chat

Lingua Franca

Mycroft's multilingual text parsing and formatting library

Lingua Franca (noun)

a framework that is adopted as the common language between speakers with different native tongues</dr>

Formatting

Convert data into spoken equivalents

Pronounce numbers

spoken versions of numbers

from lingua_franca.format import nice_number, pronounce_number

assert nice_number(25/6) == "4 and a sixth"
assert nice_number(201) == "201"
assert nice_number(3.14159269) == "3 and a seventh"

assert pronounce_number(3.14159269) == "three point one four"
assert pronounce_number(0) == "zero"
assert pronounce_number(10) == "ten"
assert pronounce_number(201) == "two hundred and one"
assert pronounce_number(102.3) == "one hundred and two point three"
assert pronounce_number(
    4092949192) == "four billion, ninety two million, nine hundred and forty nine thousand, one hundred and ninety two"

assert pronounce_number(100034000000299792458, short_scale=True) == \
       "one hundred quintillion, thirty four quadrillion, " \
       "two hundred and ninety nine million, seven hundred and ninety " \
       "two thousand, four hundred and fifty eight"

assert pronounce_number(100034000000299792458, short_scale=False) == \
       "one hundred trillion, thirty four thousand billion, " \
       "two hundred and ninety nine million, seven hundred and ninety " \
       "two thousand, four hundred and fifty eight"

Pronounce datetime objects

spoken date for datetime.datetime objects

from lingua_franca.format import nice_date, nice_date_time, nice_time
import datetime

dt = datetime.datetime(2017, 1, 31,  13, 22, 3)

assert nice_date(dt) == "tuesday, january thirty-first, twenty seventeen"

assert nice_time(dt) == "one twenty two"
assert nice_time(dt, use_ampm=True) ==  "one twenty two p.m."
assert nice_time(dt, speech=False) == "1:22"
assert nice_time(dt, speech=False, use_ampm=True) == "1:22 PM"
assert nice_time(dt, speech=False, use_24hour=True) == "13:22"
assert nice_time(dt, speech=False, use_24hour=True, use_ampm=True) == "13:22"
assert nice_time(dt, use_24hour=True, use_ampm=True) == "thirteen twenty two"
assert nice_time(dt, use_24hour=True, use_ampm=False) == "thirteen twenty two"

assert nice_date_time(dt) == "tuesday, january thirty-first, twenty seventeen at one twenty two"

Pronounce durations

spoken number of seconds or datetime.timedelta objects

from lingua_franca.format import nice_duration


assert nice_duration(1) ==   "one second"
assert nice_duration(3) ==   "three seconds"
assert nice_duration(1, speech=False) ==   "0:01"
assert nice_duration(61), "one minute one second"
assert nice_duration(61, speech=False) ==   "1:01"
assert nice_duration(5000) ==  "one hour twenty three minutes twenty seconds"
assert nice_duration(5000, speech=False), "1:23:20"
assert nice_duration(50000) ==   "thirteen hours fifty three minutes twenty seconds"
assert nice_duration(50000, speech=False) ==   "13:53:20"
assert nice_duration(500000) ==   "five days  eighteen hours fifty three minutes twenty seconds"
assert nice_duration(500000, speech=False), "5d 18:53:20"

from datetime import timedelta

assert nice_duration(timedelta(seconds=500000), speech=False) ==  "5d 18:53:20"

Parsing

Extract data from natural language text

Extract numbers

from lingua_franca.parse import extract_number, extract_numbers

# extract a number
assert extract_number("nothing") is False
assert extract_number("two million five hundred thousand tons of spinning "
                      "metal") == 2500000
assert extract_number("six trillion") == 6000000000000.0
assert extract_number("six trillion", short_scale=False) == 6e+18

assert extract_number("1 and 3/4 cups") == 1.75
assert extract_number("1 cup and a half") == 1.5

## extracts all numbers
assert extract_numbers("nothing") == []
assert extract_numbers("this is a one twenty one  test") == [1.0, 21.0]
assert extract_numbers("1 dog, seven pigs, macdonald had a farm, "
                       "3 times 5 macarena") == [1, 7, 3, 5]

Extract durations

extract datetime.timedelta objects

## extract durations
from lingua_franca.parse import extract_duration
from datetime import timedelta

assert extract_duration("nothing") == (None, 'nothing')

assert extract_duration("Nineteen minutes past the hour") == (
    timedelta(minutes=19),
    "past the hour")
assert extract_duration("wake me up in three weeks, four hundred ninety seven"
                        " days, and three hundred 91.6 seconds") == (
           timedelta(weeks=3, days=497, seconds=391.6),
           "wake me up in , , and")
assert extract_duration(
    "The movie is one hour, fifty seven and a half minutes long") == (
           timedelta(hours=1, minutes=57.5),
           "the movie is ,  long")

Extract dates

extract datetime.datetime objects

## extract date times
from datetime import datetime
from lingua_franca.parse import extract_datetime, normalize

def extractWithFormat(text):
    date = datetime(2017, 6, 27, 13, 4)  # Tue June 27, 2017 @ 1:04pm
    [extractedDate, leftover] = extract_datetime(text, date)
    extractedDate = extractedDate.strftime("%Y-%m-%d %H:%M:%S")
    return [extractedDate, leftover]


def testExtract(text, expected_date, expected_leftover):
    res = extractWithFormat(normalize(text))
    assert res[0] == expected_date
    assert res[1] == expected_leftover


testExtract("now is the time",
            "2017-06-27 13:04:00", "is time")
testExtract("in a couple minutes",
            "2017-06-27 13:06:00", "")
testExtract("What is the day after tomorrow's weather?",
            "2017-06-29 00:00:00", "what is weather")
testExtract("Remind me at 10:45 pm",
            "2017-06-27 22:45:00", "remind me")
testExtract("what is the weather on friday morning",
            "2017-06-30 08:00:00", "what is weather")
testExtract("what is tomorrow's weather",
            "2017-06-28 00:00:00", "what is weather")
testExtract("remind me to call mom next tuesday",
            "2017-07-04 00:00:00", "remind me to call mom")
testExtract("remind me to call mom in 3 weeks",
            "2017-07-18 00:00:00", "remind me to call mom")
testExtract("set an alarm for tonight 9:30",
            "2017-06-27 21:30:00", "set alarm")
testExtract("on the evening of june 5th 2017 remind me to call my mother",
            "2017-06-05 19:00:00", "remind me to call my mother")

Contributing to this project

We welcome all contributions to Lingua Franca. To get started:

0. Sign a Contributor Licensing Agreement

To protect yourself, the project, and users of Mycroft technologies, we require a Contributor Licensing Agreement (CLA) before accepting any code contribution. This agreement makes it crystal clear that, along with your code, you are offering a license to use it within the confines of this project. You retain ownership of the code, this is just a license.

You will also be added to our list of excellent human beings!

Please visit https://mycroft.ai/cla to initiate this one-time signing.

1. Setup a local copy of the project

  1. Fork the project to create your own copy.

  2. Clone the repository and change into that directory

git clone https://github.com/your-username/lingua-franca/
cd lingua-franca
  1. Setup a lightweight virtual environment (venv) for the project. This creates an isolated environment that can have it's own independent set of installed Python packages.
python3 -m venv .venv
source .venv/bin/activate

To exit the venv you can run deactivate or close the terminal window.

  1. Install the package and it's dependencies
pip install wheel
python -m pip install .
pip install pytest
python setup.py install
  1. To check that everything is installed correctly, let's run the existing test-suite.
pytest

2. Writing tests

We utilize a Test Driven Development (TDD) methodology so the first step is always to add tests for whatever you want to add or fix. If it's a bug, we must not have a test that covers that specific case, so we want to add another test. If you are starting on a new language then you can take a look at the tests for other languages to get started.

Tests are all located in lingua_franca/test. Each language should have two test files:

  • test_format_lang.py
  • test_parse_lang.py

3. Run tests to confirm they fail

Generally, using TDD, all tests should fail when they are first added. If the test is passing when you haven't yet fixed the bug or added the functionality, something must be wrong with the test or the test runner.

pytest

4. Write code

Now we can add our new code. There are three main files for each language:

  • common_data_lang.py
    Common data that can be used across formatting and parsing such as dictionaries of number names.
  • format_lang.py
    All formatting functions for this language.
  • parse_lang.py
    All parsing functions for this language.

Since we have already written our unit tests, we can run these regularly to see our progress.

5. Document your code

Document code using Google-style docstrings. Our automated documentation tools expect that format. All functions and class methods that are expected to be called externally should include a docstring. (And those that aren't should be prefixed with a single underscore.

6. Try it in Mycroft

Lingua Franca is installed by default when you install Mycroft-core, but for development you generally have this repo cloned elsewhere on your computer. You can use your changes in Mycroft by installing it in the Mycroft virtual environment.

If you added the Mycroft helper commands during setup you can just use:

mycroft-pip install /path/to/your/lingua-franca

Otherwise you need to activate that venv manually:

cd ~/mycroft-core
source venv-activate.sh
pip install /path/to/your/lingua-franca

Now, when talking with Mycroft, it will be using your development version of Lingua Franca.

7. Commit changes

Make commits in logical units, and describe them thoroughly. If addressing documented issue, use the issue identifier at the very beginning of each commit. For instance:

git commit -m "Issues-123 - Fix 'demain' date extraction in French"

8. Submit a PR

Once your changes are ready for review, create a pull request.

Like commit messages, the PR title and description should properly describe the changes you have made, along with any additional information that reviewers who do not speak your language might need to understand.

9. Waiting for a review

While you wait for a review of your contribution, why not take a moment to review some other pull requests? This is a great way to learn and help progress the queue of pull requests, which means your contribution will be seen more quickly!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for lingua-franca, version 0.2.3
Filename, size File type Python version Upload date Hashes
Filename, size lingua_franca-0.2.3-py3-none-any.whl (892.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size lingua_franca-0.2.3.tar.gz (191.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page