This module is an extension of the module account_invoice_import: it adds support for regular PDF invoices i.e. PDF invoice that don’t have an embedded XML file. It uses the invoice2data library which takes care of extracting the text of the PDF invoice, find an existing invoice template and execute the invoice template to extract the useful information from the invoice.
To know the full story behind the development of this module, read this blog post.
More information for creating the templates can be found in tutorial of the invoice2data library. The templates have to be created manually. An graphical template creator for odoo is a work in progress.
WARNING: an alternative module account_invoice_import_simple_pdf developped in July 2021 provides similar features but has one big advantage: the accountant can add support for a new vendor by himself, no more invoice templates which require technical skill. The module account_invoice_import_simple_pdf provides basic functionality, but does not support line level accounting.
Table of contents
Installation
This module requires the Python library invoice2data available on Github with a version >= 0.2.74 (February 2018).
To install the latest version of this library, run:
sudo pip3 install --upgrade invoice2data
If you use Ubuntu 16.04 LTS or higher, you can use the pdftotext version 0.41.0 that is packaged in the distribution:
sudo apt install poppler-utils
If you want the invoice2data library to support mixed-type pdf’s or fallback on OCR if the PDF doesn’t contain text (only a very small minority of PDF invoices are image-based and require OCR) like scanned receipts, you should also install Ocrmypdf
pip install -U ocrmypdf
If you want the invoice2data library to fallback on OCR if the PDF doesn’t contain text (only a very small minority of PDF invoices are image-based and require OCR) like scanned receipts, you should also install Imagemagick (to get the convert utility to convert PDF to TIFF) and Tesseract OCR :
sudo apt install imagemagick tesseract-ocr
If you want to use custom invoice templates for the invoice2data lib (in addition to the templates provided by the invoice2data lib), you should add a line in your Odoo server configuration file such as:
invoice2data_templates_dir = /opt/invoice2data_local_templates
and store your invoice templates in YAML format (.yml extension) or json format in the directory that you have configured above. If you add invoice tempates in this directory, you don’t have to restart Odoo, they will be used automatically on the next invoice import.
If you want to use only your custom invoice templates and ignore the templates provided by the invoice2data lib, you should have in your Odoo server configuration file:
invoice2data_templates_dir = /opt/invoice2data_local_templates
invoice2data_exclude_built_in_templates = True
The yaml templates are loaded with [pyyaml](https://github.com/yaml/pyyaml) which is a pure python implementation. (thus rather slow) As an alternative json templates can be used. Which are natively better supported by python. The performance with yaml templates can be greatly increased 10x by using [libyaml](https://github.com/yaml/libyaml) It can be installed on most distributions by:
sudo apt-get libyaml-dev
French users should also install the module l10n_fr_business_document_import available in the French localization. Dutch users should also install the module l10n_nl_business_document_import available in the Netherlands localization.
Dependencies
pip install invoice2data
| The main dependency of this invoice import module |apt install poppler-utils
| The default Input-reader for the invoice2data library |pip install dateparser
| Requirement for parsing the invoice dates, this requirement is likely already satisfied by odoo itself |apt install libyaml-dev
| Template loader, recomended to greatly speedup the loading of yaml templates |apt install imagemagick
| inputreader: Pre-processes the pdf before feeding it into tesseract-ocr |apt install tesseract-ocr
| inputreader: for ocr of image only pdf files |apt install tesseract-ocr-
see documentation | inputreader: Language pack for tesseract ocr, greatly improves character detection |apt install ocrmypdf
| inputreader: For image only or mixed type pdf’s. It uses tesseract-ocr under the hood, but provides optimalisations which greatly improves results |Configuration
Go to the form view of the supplier and configure it with the following parameters:
- the VAT is set (the VAT number is used by default when searching the supplier in the Odoo partner database)
- in the Invoicing tab, create an Invoice Import Configuration.
For the PDF invoice of your supplier that don’t have an embedded XML file, you will have to create a template file in YAML format in the invoice2data Python library. It is quite easy to do ; if you are familiar with regexp, it should not take more than 10 minutes for each supplier.
Here are some hints to help you add a template for your supplier:
- There is a tutorial in the repo of the invoice2data library
- Take Free SAS template file as an example. You will find a sample PDF invoice for this supplier under invoice2data/test/pdf/invoice_free_fiber_201507.pdf
- Try to run the invoice2data library manually on the sample invoice of Free:
% python -m invoice2data.main --debug invoice2data/test/pdf/invoice_free_fiber_201507.pdf
On the output, you will get first the text of the PDF, then some debug info on the parsing of the invoice and the regexps, and, on the last line, you will have the dict that contain the result of the parsing.
- if the VAT number of the supplier is present in the text of the PDF invoice, I think it’s a good idea to use it as the keyword. It is good practice to add 2 other keywords: one for the language (for example, match on the word Invoice in the language of the invoice) and one for the currency, to match only the invoices of that supplier in this particular language and currency.
the list of fields should contain the following entries:
- “vat” with the VAT number of the supplier (if the VAT number of the supplier is not in the text of PDF file, add a “partner_name” key)
- “amount” (“amount” is the total amount with taxes)
- “amount_untaxed” or “amount_tax” (one or the other, no need for both)
- “date”: the date of the invoice
- “invoice_number”
- “date_due”, if this information is available in the text of the PDF file.
The invoice2data library is quite powerfull. It supports multiple input methods (pdftotext, ocrmypdf, tesseract ocr, google cloud vision). Even invoicelines can be imported and mapped to products in the database. The invoice2data library does not have a strict standard on field names. This makes the module very flexible, but also hard to create re-usable templates.
If you want to make use of the advanced features, support for the following fields is implemented.
## Supported fields
(note: the fieldname column contains the name to be used in the template file.)
Partner fields | fieldname | type | Description | | ————– | :———: | :————————————– | | vat | char | The vat code is unique for each partner, it has the highest priority for matching the partner | | partner_name | char | self explaining | | partner_street | char | self explaining | | partner_street2 | char | self explaining | | partner_street3 | char | self explaining | | partner_city | char | self explaining | | partner_zip | char | self explaining | | country_code | char | use iso format fr or nl | | state_code | char | use iso format NY (for New York) | | partner_email | char | self explaining | | partner_website | char | self explaining | | telephone | char | can be used for matching the partner with the help of support modules | | mobile | char | can be used for matching the partner contact with the help of support modules | | partner_ref | char | reference name or number can be used for partner matching | | siren | char | French business code, can be used for matching the partner | | partner_coc | char | General business identiefier number, can be used for matching the partner |
Invoice Fields (on document level) | fieldname | type | Description | | ————– | :———: | :————————————– | | currency | char | The currency of the invoice in iso format (EUR, USD) | | currency_symbol | char | The currency symbol of the invoice (€, $) | | bic | char | Bank Identifier Code | | iban | char | International Bank Account Number | | amount | float | The total amount of the invoice (including taxes) | | amount_untaxed | float | The total amount of the invoice (excluding taxes) | | amount_tax | float | The sum of the tax amount of the invoice | | date | date | The date of the invoice | | invoice_number | char | self explaining | | date_due | date | The duedate of the invoice | | date_start | date | The start date of the period for the invoice when the services are delivered. | | date_end | date | The start date of the period for the invoice when the services are delivered. | | note | char | The contents of this field will be imported in the chatter. | | narration | char | The contents of this field will be imported in the narration field. (on the bottom of the invoice.) | | payment_reference | char | If the invoice is pre-paid an reference can be used for payment reconciliation | | payment_unece_code | char | The unece code of the payment means according to 4461 code list | | incoterm | char | The Incoterm 2000 abbrevation | | company_vat | char | The vat number of the company to which the invoice is addressed to. Used to check if the invoice is actually is adressed to the company which wants to process it. (Very useful in multi-company setup) | | mandate_id | char | A banking mandate is attached to a bank account and represents an authorization that the bank account owner gives to a company for a specific operation (such as direct debit). |
Invoice line Fields | fieldname | type | Description | | ————– | :———: | :————————————– | | name | char | The name of the product, can be used for product matching | | barcode | char | The the barcode of the product or product package, used for product matching | | code | char | The (internal) product code, used for product matching | | qty | float | The amount of items/units | | unece_code | char | The unece code of the products units of measure can be passed | | uom | char | The name of the unit of measure, internally if will be mapped to the unece code. Example L will be mapped to unece_code LTR | | price_unit | float | The unit price of the item. (excluding taxes) | | discount | float | The amount of discount for this line. Eg 20 for 20% discount or 0.0 for no discount | | price_total | float | The total amount of the invoice line including taxes. It can be used to select the correct tax tag. | | price_subtotal | float | The total amount of the invoice line excluding taxes. It can be used to create adjustment lines when the decimal precision is insufficient. | | line_tax_percent | float | The percentage of tax | | line_tax_amount | float | The fixed amount of tax applied to the line | | line_note | char | Notes on the invoice can be imported, There is a special view available. | | sectionheader | char | There is a special view available for section headers. | | date_start | date | The start date of the period for the invoice when the services are delivered. | | date_end | date | The start date of the period for the invoice when the services are delivered. |
Known issues / Roadmap
- Implement support for lines with all tax included, used for some localizations like Switzerland or scanned receipts.
- An graphical template builder.
Known Issues * The input module is hard coded to use pdftotext parser and as a fallback to tesseract. * Creation of the templates is still quite hard. * The addres and company specific fields are parsed. Meaning it is possible to import an invoice which is issued to another company than yours!
Changelog
14.0.2.2.0 (2023-03-03)
- [ADD] Support for invoicelines. (#74)
Bug Tracker
Bugs are tracked on GitHub Issues. In case of trouble, please check there if your issue has already been reported. If you spotted it first, help us to smash it by providing a detailed and welcomed feedback.
Do not contact contributors directly about support or help with technical issues.
Credits
Contributors
- Alexis de Lattre <alexis.delattre@akretion.com>
Maintainers
This module is maintained by the OCA.
OCA, or the Odoo Community Association, is a nonprofit organization whose mission is to support the collaborative development of Odoo features and promote its widespread use.
Current maintainers:
This module is part of the OCA/edi project on GitHub.
You are welcome to contribute. To learn how please visit https://odoo-community.org/page/Contribute.