Skip to content

Reads a `.docx` file and stores its components in vfile format to be processed by other tools, like `reoff-parse`.

Notifications You must be signed in to change notification settings

TrialAndErrorOrg/docx-to-vfile

Repository files navigation

Note This repository is automatically generated from the main parser monorepo. Please submit any issues or pull requests there.

docx-to-vfile

npm version npm downloads

Reads a .docx file and stores its components in vfile format to be processed by other tools, like reoff-parse.

Currently extremely dumb and just stores it all in memory, no streams for you. File reading does happen in streams.

Based on docxtract

Contents

What is this?

This package reads a .docx file and stores its components in vfile format to be processed by other tools, like reoff-parse. This is the first step in a pipeline to convert a .docx file to many other formats using the unified ecosystem.

A .docx document is just a zip file with a bunch of XML and other files (such as images) in it. This package unzips the .docx file, reads the XML files and images and stores them in a VFile object, which is a virtual file format that can be used by other tools in the unified ecosystem.

When should I use this?

Probably only exclusively to read a docx file to feed into reoff-parse or something similar, or if you want to access the raw data of a docx file for some reason.

Install

This package is ESM only. In Node.js (version 12.20+, 14.14+, 16.0+, 18.0+), install as

pnpm add docx-to-vfile
# or with yarn
# yarn add docx-to-vfile
# or with npm
# npm install docx-to-vfile

Use

In Node

import { docxToVFile } from 'docx-to-vfile'

Pass a path to a .docx file

const file = await docxToVFile('path/to/file.docx')

Pass a Blob

const blob = await fetch('https://path/to/file.docx').then((res) => res.blob())
const file = await docxToVFile(blob)

Pass a Buffer

import { readFile } from 'fs/promises'
const buffer = await readFile('path/to/file.docx')
const file = await docxToVFile(buffer)

Pass a ReadStream

import { createReadStream } from 'fs'

const file = await docxToVFile(createReadStream('path/to/file.docx'))

In the browser

import { docxToVFile } from 'docx-to-vfile/browser'

Pass a File

<input type="file" />
document.querySelector('input[type="file"]')?.addEventListener('change', async (e) => {
  const file = await docxToVFile(e.target.files[0])
})

Output

Using the default settings, the main value of the VFile will be the content of the main document, and the data will contain the content of the other files in the .docx archive. Media files will be stored in the media property.

const output = {
  data: {
    'word/footnotes.xml': '<?xml version ...',
    '_rels/rels': '<?xml version ...',
    // ...
    relations: {
      rId9: 'footnotes.xml',
      rId8: 'endnotes.xml',
      // ...
    },
    media: {
      media/image1.png: //<Blob>,
    },
  },
  value: //'[the content of word/document.xml, the main document]',
  // other vfile stuff
  messages: [],
  history: [],
  cwd: './',
}

String(output) === output.value // true

API


docxToVFile()

Takes a docx file as an ArrayBuffer and returns a VFile with the contents of the document.xml file as the root, and the contents of the other xml files as data.

Signature

docxToVFile(file: ArrayBuffer, userOptions: Options = {}): Promise<VFile>;

Parameters

Name Type Description
file ArrayBuffer The docx file as an ArrayBuffer
userOptions Options -

Returns

Promise<VFile>

A VFile with the contents of the document.xml file as the root, and the contents of the other xml files as data.

Defined in: src/lib/docx-to-vfile-unzipit.ts:90


DocxData

The data attribute of a VFile Is set to the DataMap interface in the vfile module

Hierarchy

  • Data.DocxData

Indexable

[key: XMLOrRelsString]: string | undefined

Properties

media

object

The media files in the .docx file

Index signature
Type declaration

Overrides: Data.media

Defined in: src/lib/docx-to-vfile-unzipit.ts:45

relations

object

The relations between the .xml files in the .docx file

Index signature
Type declaration

Overrides: Data.relations

Defined in: src/lib/docx-to-vfile-unzipit.ts:49


DocxVFile

Extends VFile with a custom data attribute

This information should be on the VFile interface, this is just used in contexts where you only want to know the type of the data attribute, e.g. when writing a library that does something with the output of docxToVFile.

Hierarchy

  • VFile.DocxVFile

Properties

cwd

string

Base of path (default: process.cwd() or '/' in browsers).

Inherited from: VFile.cwd

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:53

data

DocxData

Overrides: VFile.data

Defined in: src/lib/docx-to-vfile-unzipit.ts:80

history

string[]

List of filepaths the file moved between.

The first is the original path and the last is the current path.

Inherited from: VFile.history

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:47

map

undefined | null | Map

Source map.

This type is equivalent to the RawSourceMap type from the source-map module.

Inherited from: VFile.map

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:85

messages

VFileMessage[]

List of messages associated with the file.

Inherited from: VFile.messages

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:39

result

unknown

Custom, non-string, compiled, representation.

This is used by unified to store non-string results. One example is when turning markdown into React nodes.

Inherited from: VFile.result

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:76

stored

boolean

Whether a file was saved to disk.

This is used by vfile reporters.

Inherited from: VFile.stored

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:67

value

Value

Raw value.

Inherited from: VFile.value

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:59

Accessors

basename

Get the basename (including extname) (example: 'index.min.js').

Signature
basename(): undefined | string;
Returns

undefined | string

Inherited from: VFile.basename

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:123

Set basename (including extname) ('index.min.js').

Cannot contain path separators ('/' on unix, macOS, and browsers, '\' on windows). Cannot be nullified (use file.path = file.dirname instead).

Signature
basename(arg: undefined | string): void;
Parameters

| Name | Type | | :---- | :---------- | -------- | | arg | undefined | string |

Returns

void

Inherited from: VFile.basename

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:119

Inherited from: VFile.basename

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:119 node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:123

dirname

Get the parent path (example: '~').

Signature
dirname(): undefined | string;
Returns

undefined | string

Inherited from: VFile.dirname

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:111

Set the parent path (example: '~').

Cannot be set if there’s no path yet.

Signature
dirname(arg: undefined | string): void;
Parameters

| Name | Type | | :---- | :---------- | -------- | | arg | undefined | string |

Returns

void

Inherited from: VFile.dirname

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:107

Inherited from: VFile.dirname

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:107 node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:111

extname

Get the extname (including dot) (example: '.js').

Signature
extname(): undefined | string;
Returns

undefined | string

Inherited from: VFile.extname

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:135

Set the extname (including dot) (example: '.js').

Cannot contain path separators ('/' on unix, macOS, and browsers, '\' on windows). Cannot be set if there’s no path yet.

Signature
extname(arg: undefined | string): void;
Parameters

| Name | Type | | :---- | :---------- | -------- | | arg | undefined | string |

Returns

void

Inherited from: VFile.extname

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:131

Inherited from: VFile.extname

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:131 node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:135

path

Get the full path (example: '~/index.min.js').

Signature
path(): string;
Returns

string

Inherited from: VFile.path

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:101

Set the full path (example: '~/index.min.js').

Cannot be nullified. You can set a file URL (a URL object with a file: protocol) which will be turned into a path with url.fileURLToPath.

Signature
path(arg: string): void;
Parameters
Name Type
arg string
Returns

void

Inherited from: VFile.path

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:95

Inherited from: VFile.path

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:95 node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:101

stem

Get the stem (basename w/o extname) (example: 'index.min').

Signature
stem(): undefined | string;
Returns

undefined | string

Inherited from: VFile.stem

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:147

Set the stem (basename w/o extname) (example: 'index.min').

Cannot contain path separators ('/' on unix, macOS, and browsers, '\' on windows). Cannot be nullified (use file.path = file.dirname instead).

Signature
stem(arg: undefined | string): void;
Parameters

| Name | Type | | :---- | :---------- | -------- | | arg | undefined | string |

Returns

void

Inherited from: VFile.stem

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:143

Inherited from: VFile.stem

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:143 node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:147

Methods

fail()

Create a fatal error associated with the file.

Its fatal is set to true and file is set to the current file path. Its added to file.messages.

👉 Note: a fatal error means that a file is no longer processable.

Throws

Message.

Signature
fail(reason: string | VFileMessage | Error, place?: null | Node<Data> | NodeLike | Position | Point, origin?: null | string): never;
Parameters

| Name | Type | Description | | :-------- | :------- | :------------- | -------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | ------- | ----------------------------------------- | | reason | string | VFileMessage | Error | Reason for message, uses the stack and message of the error if given. | | place? | null | Node<Data> | NodeLike | Position | Point | Place in file where the message occurred. | | origin? | null | string | Place in code where the message originates (example: 'my-package:my-rule' or 'my-rule'). |

Returns

never

Message.

Inherited from: VFile.fail

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:220

info()

Create an info message associated with the file.

Its fatal is set to null and file is set to the current file path. Its added to file.messages.

Signature
info(reason: string | VFileMessage | Error, place?: null | Node<Data> | NodeLike | Position | Point, origin?: null | string): VFileMessage;
Parameters

| Name | Type | Description | | :-------- | :------- | :------------- | -------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | ------- | ----------------------------------------- | | reason | string | VFileMessage | Error | Reason for message, uses the stack and message of the error if given. | | place? | null | Node<Data> | NodeLike | Position | Point | Place in file where the message occurred. | | origin? | null | string | Place in code where the message originates (example: 'my-package:my-rule' or 'my-rule'). |

Returns

VFileMessage

Message.

Inherited from: VFile.info

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:195

message()

Create a warning message associated with the file.

Its fatal is set to false and file is set to the current file path. Its added to file.messages.

Signature
message(reason: string | VFileMessage | Error, place?: null | Node<Data> | NodeLike | Position | Point, origin?: null | string): VFileMessage;
Parameters

| Name | Type | Description | | :-------- | :------- | :------------- | -------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | ------- | ----------------------------------------- | | reason | string | VFileMessage | Error | Reason for message, uses the stack and message of the error if given. | | place? | null | Node<Data> | NodeLike | Position | Point | Place in file where the message occurred. | | origin? | null | string | Place in code where the message originates (example: 'my-package:my-rule' or 'my-rule'). |

Returns

VFileMessage

Message.

Inherited from: VFile.message

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:174

toString()

Serialize the file.

Signature
toString(encoding?: null | BufferEncoding): string;
Parameters

| Name | Type | Description | | :---------- | :----- | :--------------- | ------------------------------------------------------------------------------------- | | encoding? | null | BufferEncoding | Character encoding to understand value as when it’s a Buffer (default: 'utf8'). |

Returns

string

Serialized file.

Inherited from: VFile.toString

Defined in: node_modules/.pnpm/vfile@5.3.7/node_modules/vfile/lib/index.d.ts:157


Options

Properties

include?

string[] | RegExp[] | (key: string) => boolean | "all" | "allWithDocumentXML"

Include only the specified files on the data attribute of the VFile. This may be useful if you want to only do something with a subset of the files in the docx file, and don't intend to use 'reoff-stringify' to turn the VFile back into a docx file.

  • If an array of strings or regexps is passed, only files that match one of the values will be included.
  • If a function is passed, it will be called for each file and should return true to include the file.
  • If the value is 'all', almost all files will be included, except for 'word/document.xml', as that already is the root of the VFile.
  • If the value is 'allWithDocumentXML', all files will be included, including word/document.xml, even though that is already the root of the VFile. Useful if you really want to mimic the original docx file.

You should keep it at the default value if you intend to use 'reoff-stringify' to turn the VFile back into a docx file.

Default

'all'

Defined in: src/lib/docx-to-vfile-unzipit.ts:30

withoutMedia?

boolean

Whether or not to include media in the VFile.

By default, images are included on the data.media attribute of the VFile as an object of ArrayBuffers, which are accessible both client and serverside.

Default

false

Defined in: src/lib/docx-to-vfile-unzipit.ts:16


XMLOrRelsString

${string}.xml | ${string}.rels

Defined in: src/lib/docx-to-vfile-unzipit.ts:71

Compatibility

Security

docx-to-vfile currently does not read macros, so it is not vulnerable to potential security issues with macros.

It does not however do any other security checks, so it is possible that maliciously crafted docx files could cause problems when e.g. parsed with rehype.

Related

  • reoff-parse — Parse the output of docx-to-vfile into a VFile with an ooxast tree.

Contribute

License

GPL-3.0-or-later © Thomas F. K. Jorna

About

Reads a `.docx` file and stores its components in vfile format to be processed by other tools, like `reoff-parse`.

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published