Skip to main content

Large LAnguage MOdels for Reference Extraction: extract and evaluate references from free-form text using LLM/VLMs.

Project description

Llamore

Large LAnguage MOdels for Reference Extraction

A framework to extract and evaluate scientific references and citations from free-form text and PDFs using LLM/VLMs.

Setup

pip install llamore

Quick start

A few things you can do with Llamore.

Extract references

from llamore import GeminiExtractor

extractor = GeminiExtractor(api_key="MY_GEMINI_API_KEY")
references = extractor(pdf="path/to/my.pdf")

or

text = """4 I have explored the gendered nature of citizenship at greater length in two complementary
papers: ‘Embodying the Citizen’ in Public and Private: Feminist Legal Debates, ed. M.
Thornton (1995) and ‘Historicising Citizenship: Remembering Broken Promises’ (1996) 20
Melbourne University Law Rev. 1072."""
references = extractor(text=text)

Export as TEI biblStructs

references.to_xml("./my_references.xml")

Evaluate with gold references

from llamore import F1

f1 = F1()
f1.compute_macro_average(references, gold_references)

You can also have a look into the jupyter notebook at notebooks/quick_start.ipynb.

Reference JSON schema

Llamore internally defines a reference via a pydantic BaseModel in llamore.reference.Reference. It is based on the TEI biblStruct model and its JSON schema is the following:

{
  "$defs": {
    "Organization": {
      "description": "Contains information about an identifiable organization such as a business, a tribe, or any other grouping of people.",
      "properties": {
        "name": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Contains an organizational name.",
          "title": "Name"
        }
      },
      "title": "Organization",
      "type": "object"
    },
    "Person": {
      "description": "Contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc.",
      "properties": {
        "forename": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Contains a forename, given or baptismal name.",
          "title": "Forename"
        },
        "surname": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Contains a family (inherited) name of a person, as opposed to a given, baptismal, or nick name.",
          "title": "Surname"
        }
      },
      "title": "Person",
      "type": "object"
    }
  },
  "description": "A reference based on the TEI biblstruct format.",
  "properties": {
    "analytic_title": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "This title applies to an analytic item, such as an article, poem, or other work published as part of a larger item.",
      "title": "Analytic Title"
    },
    "monographic_title": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "This title applies to a monograph such as a book or other item considered to be a distinct publication, including single volumes of multi-volume works.",
      "title": "Monographic Title"
    },
    "journal_title": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "This title applies to any serial or periodical publication such as a journal, magazine, or newspaper.",
      "title": "Journal Title"
    },
    "authors": {
      "anyOf": [
        {
          "items": {
            "anyOf": [
              {
                "$ref": "#/$defs/Person"
              },
              {
                "$ref": "#/$defs/Organization"
              }
            ]
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the name or names of the authors, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority.",
      "title": "Authors"
    },
    "editors": {
      "anyOf": [
        {
          "items": {
            "anyOf": [
              {
                "$ref": "#/$defs/Person"
              },
              {
                "$ref": "#/$defs/Organization"
              }
            ]
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains a secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.",
      "title": "Editors"
    },
    "publisher": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the name of the organization responsible for the publication or distribution of a bibliographic item.",
      "title": "Publisher"
    },
    "publication_date": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the date of publication in any format.",
      "title": "Publication Date"
    },
    "publication_place": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the name of the place where a bibliographic item was published.",
      "title": "Publication Place"
    },
    "volume": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines the scope of a bibliographic reference in terms of the volume of a larger work.",
      "title": "Volume"
    },
    "issue": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains an issue number, or issue numbers.",
      "title": "Issue"
    },
    "pages": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines the scope of a bibliographic reference in terms of page numbers.",
      "title": "Pages"
    },
    "cited_range": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines the range of cited content, often represented by pages or other units.",
      "title": "Cited Range"
    },
    "refs": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines references to another location, possibly modified by additional text or comment. ",
      "title": "Refs"
    }
  },
  "title": "Reference",
  "type": "object"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llamore-0.1.0rc0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llamore-0.1.0rc0-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file llamore-0.1.0rc0.tar.gz.

File metadata

  • Download URL: llamore-0.1.0rc0.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.10

File hashes

Hashes for llamore-0.1.0rc0.tar.gz
Algorithm Hash digest
SHA256 7e0de933077da0f0a28af004dd4a503215f5d612b6486808210905cf706e3034
MD5 fc27fdaf63588de023b91db729e0fa83
BLAKE2b-256 ec3aa985cd69b3cfe5c65f0103eb943a485013905bdb0427fe1f8c1f8c154332

See more details on using hashes here.

File details

Details for the file llamore-0.1.0rc0-py3-none-any.whl.

File metadata

  • Download URL: llamore-0.1.0rc0-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.10

File hashes

Hashes for llamore-0.1.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 3bb454651cbfd1e1b368cd6fe9061915296c9cccbdff697495b24687c31c4edb
MD5 e28290556a487223fc602e021ff2ac6e
BLAKE2b-256 1a8ab9b049d9df67c4c9e4c8fe10387bc934baf26e58a55468bcbf567ce8c2c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page