Formasaurus tells you the types of HTML forms and their fields using machine learning
Formasaurus is a Python package that tells you the type of an HTML form and its fields using machine learning.
It can detect if a form is a login, search, registration, password recovery, “join mailing list”, contact, order form or something else, which field is a password field and which is a search query, etc.
License is MIT.
Check docs for more.
- Support for scikit-learn < 0.18 is dropped;
- Formasaurus is no longer tested with Python 3.3;
- tests are fixed to account for upstream changes; Python 3.6 build is enabled.
- more annotated data for captchas;
- formasaurus init command which trains & caches the model.
- pip bug with pip install formasaurus[with-deps] is worked around; it should work now as pip install formasaurus[with_deps].
- fixed API documentation at readthedocs.org
- more annotated data;
- new form_classes and field_classes attributes of FormFieldClassifer;
- more robust web page encoding detection in formasaurus.utils.download;
- bug fixes in annotation widgets;
- fields=False argument is supported in formasaurus.extract_forms, formasaurus.classify, formasaurus.classify_proba functions and in related FormFieldClassifier methods. It allows to avoid predicting form field types if they are not needed.
- formasaurus.classifiers.instance() is renamed to formasaurus.classifiers.get_instance().
- Bias is no longer regularized for form type classifier.
This is a major backwards-incompatible release.
- Formasaurus now can detect field types, not only form types;
- API is changed - check the updated documentation;
- there are more form types detected;
- evaluation setup is improved;
- annotation UI is rewritten using IPython widgets;
- more training data is added.
- Python 3 support;
- fixed model auto-creation.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for formasaurus-0.8.1-py2.py3-none-any.whl