Formasaurus is a Python package that tells you the type of an HTML form
and its fields using machine learning.
It can detect if a form is a login, search, registration, password recovery,
“join mailing list”, contact, order form or something else, which field
is a password field and which is a search query, etc.
License is MIT.
Check docs for more.
- more annotated data for captchas;
- formasaurus init command which trains & caches the model.
- pip bug with pip install formasaurus[with-deps] is worked around;
it should work now as pip install formasaurus[with_deps].
- fixed API documentation at readthedocs.org
- more annotated data;
- new form_classes and field_classes attributes of FormFieldClassifer;
- more robust web page encoding detection in formasaurus.utils.download;
- bug fixes in annotation widgets;
- fields=False argument is supported in formasaurus.extract_forms,
formasaurus.classify, formasaurus.classify_proba functions and
in related FormFieldClassifier methods. It allows to avoid predicting
form field types if they are not needed.
- formasaurus.classifiers.instance() is renamed to
- Bias is no longer regularized for form type classifier.
This is a major backwards-incompatible release.
- Formasaurus now can detect field types, not only form types;
- API is changed - check the updated documentation;
- there are more form types detected;
- evaluation setup is improved;
- annotation UI is rewritten using IPython widgets;
- more training data is added.
- Python 3 support;
- fixed model auto-creation.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.