Skip to main content

Large LAnguage MOdels for Reference Extraction: extract and evaluate references from free-form text using LLM/VLMs.

Project description

Llamore logo
Llamore

Large LAnguage MOdels for Reference Extraction

A framework to extract and evaluate scientific references and citations from free-form text and PDFs using LLM/VLMs.

Setup

pip install llamore

Quick start

A few things you can do with Llamore.

Extract references

Define your extractor. You can use the OpenaiExtractor for most of the open model serving frameworks like Ollama, vLLM, etc.

from llamore import GeminiExtractor, OpenaiExtractor

extractor = GeminiExtractor(api_key="MY_GEMINI_API_KEY")

Extract references from a PDF or a raw input string.

references = extractor(pdf="path/to/my.pdf")

or

text = """4 I have explored the gendered nature of citizenship at greater length in two complementary
papers: ‘Embodying the Citizen’ in Public and Private: Feminist Legal Debates, ed. M.
Thornton (1995) and ‘Historicising Citizenship: Remembering Broken Promises’ (1996) 20
Melbourne University Law Rev. 1072."""
references = extractor(text=text)

Export as TEI biblStructs

references.to_xml("./my_references.xml")

Evaluate with gold references

from llamore import F1

f1 = F1()
f1.compute_macro_average(references, gold_references)

You can also have a look at the quick start notebook.

Reference JSON schema

Llamore internally defines a reference via a pydantic BaseModel in llamore.reference.Reference. It is based on the TEI biblStruct model and its JSON schema is the following:

{
  "$defs": {
    "Organization": {
      "description": "Contains information about an identifiable organization such as a business, a tribe, or any other grouping of people.",
      "properties": {
        "name": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Contains an organizational name.",
          "title": "Name"
        }
      },
      "title": "Organization",
      "type": "object"
    },
    "Person": {
      "description": "Contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc.",
      "properties": {
        "forename": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Contains a forename, given or baptismal name.",
          "title": "Forename"
        },
        "surname": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Contains a family (inherited) name of a person, as opposed to a given, baptismal, or nick name.",
          "title": "Surname"
        }
      },
      "title": "Person",
      "type": "object"
    }
  },
  "description": "A reference based on the TEI biblstruct format.",
  "properties": {
    "analytic_title": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "This title applies to an analytic item, such as an article, poem, or other work published as part of a larger item.",
      "title": "Analytic Title"
    },
    "monographic_title": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "This title applies to a monograph such as a book or other item considered to be a distinct publication, including single volumes of multi-volume works.",
      "title": "Monographic Title"
    },
    "journal_title": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "This title applies to any serial or periodical publication such as a journal, magazine, or newspaper.",
      "title": "Journal Title"
    },
    "authors": {
      "anyOf": [
        {
          "items": {
            "anyOf": [
              {
                "$ref": "#/$defs/Person"
              },
              {
                "$ref": "#/$defs/Organization"
              }
            ]
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the name or names of the authors, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority.",
      "title": "Authors"
    },
    "editors": {
      "anyOf": [
        {
          "items": {
            "anyOf": [
              {
                "$ref": "#/$defs/Person"
              },
              {
                "$ref": "#/$defs/Organization"
              }
            ]
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains a secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.",
      "title": "Editors"
    },
    "publisher": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the name of the organization responsible for the publication or distribution of a bibliographic item.",
      "title": "Publisher"
    },
    "publication_date": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the date of publication in any format.",
      "title": "Publication Date"
    },
    "publication_place": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains the name of the place where a bibliographic item was published.",
      "title": "Publication Place"
    },
    "volume": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines the scope of a bibliographic reference in terms of the volume of a larger work.",
      "title": "Volume"
    },
    "issue": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Contains an issue number, or issue numbers.",
      "title": "Issue"
    },
    "pages": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines the scope of a bibliographic reference in terms of page numbers.",
      "title": "Pages"
    },
    "cited_range": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines the range of cited content, often represented by pages or other units.",
      "title": "Cited Range"
    },
    "refs": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Defines references to another location, possibly modified by additional text or comment. ",
      "title": "Refs"
    }
  },
  "title": "Reference",
  "type": "object"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llamore-0.1.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llamore-0.1.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file llamore-0.1.0.tar.gz.

File metadata

  • Download URL: llamore-0.1.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.10

File hashes

Hashes for llamore-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5b95294256632bd4a610695f8909eb5b1d8a42a55e48d4b3149b54173159266e
MD5 3574111564558a5c9bd8fbb4219b3c82
BLAKE2b-256 72fbdfa69dc45467e8d5e778d46e221decddbba1b0f0b1b47d0caded864ef465

See more details on using hashes here.

File details

Details for the file llamore-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llamore-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.10

File hashes

Hashes for llamore-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62c2a3252e6445027898e2082720348f2b7ab2877b8e8d7f5c02f37d8a34b1d4
MD5 f8700a100fc0a58765c20586b58262b9
BLAKE2b-256 1787200b4989634f2bd710cdccd4443b8acfe3fc1d762a7eaa9b43f010d7438b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page