Skip to content

Converting docx (downloaded from s3) to pdf (uploaded to s3) #255

Description

@Ayh4m

Hello guys,

I'm trying to convert a MS Word .docx file (that is being downloaded from S3 bucket) to a pdf one (and upload it to the same location in that S3 bucket).

NOTE: I've tried the layer solution and it did work for me.

I've followed all the mentioned steps:

1- Created my own docker image:

Where my Dockerfile looks like this:

FROM public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64
COPY app.js package.json ${LAMBDA_TASK_ROOT}/
RUN npm install
CMD [ "app.handler" ]

and apps.js looks like this:

const {convertTo, canBeConvertedToPDF} = require('@shelf/aws-lambda-libreoffice');
const fs = require("fs");
const AWS = require("aws-sdk");

const S3_BUCKET = process.env.S3_BUCKET
const WORD_FILE_KEY = process.env.WORD_FILE_KEY
const TMP_DIR = "/tmp"

const s3 = new AWS.S3();


module.exports.handler = async () => {

    console.log(`# Bucket Name: ${S3_BUCKET}`)
    console.log(`# File Name: ${WORD_FILE_KEY}`)
    console.log(`[Initial]: ${getTmpDirContent()}`)

    // Download .docx file from S3 to /tmp
    console.log("# Downloading the .docx file from S3 ...")
    await downloadFileFromS3(WORD_FILE_KEY)
    console.log(`[After Download]: ${getTmpDirContent()}`)

    if (canBeConvertedToPDF(WORD_FILE_KEY)) {
        // Convert .docx to .pdf
        console.log("# Converting .docx file to .pdf file ...")
        const convertedFilePath = convertTo(WORD_FILE_KEY, "pdf");
        console.log(convertedFilePath)
        console.log(`[After Convert]: ${getTmpDirContent()}`)

        // Upload .pdf file to S3
        console.log("# Uploading the .pdf file to S3 ...")
        await uploadFileToS3(convertedFilePath)

        console.log("# Done")
    } else {
        console.log("# Can't be converted tp pdf!")
    }
}

const downloadFileFromS3 = async (fileKey) => {
    try {
        const filePath = `${TMP_DIR}/${fileKey}`
        const params = {
            Bucket: S3_BUCKET,
            Key: fileKey
        }
        const objData = await s3.getObject(params).promise();
        fs.writeFileSync(filePath, objData.Body.toString());
        console.log(`- File ${fileKey} has been downloaded to ${filePath} successfully`);
    } catch (e) {
        throw new Error(`- Download Error: ${e.message}`)
    }
}

const uploadFileToS3 = async (filePath) => {
    try {
        const fileKey = filePath.split(`${TMP_DIR}/`)[1]
        const fileData = fs.readFileSync(filePath)
        const params = {
            Bucket: S3_BUCKET,
            Key: fileKey,
            Body: fileData
        }
        const data = await s3.upload(params).promise();
        console.log(`- Upload successfully at ${data.Location}`);
    } catch (e) {
        throw new Error(`- Upload Error: ${e.message}`)
    }
};

const getTmpDirContent = () => {
    return fs.readdirSync(TMP_DIR)
}

& package.json is the following:

{
  "name": "libreoffice-lambda-container-image",
  "version": "1.0.0",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "@shelf/aws-lambda-libreoffice": "^5.0.1",
    "aws-sdk": "^2.1293.0"
  }
}

2- Pushed it to ECR.

3- Created a Lambda Function:

With the following configurations:
Snag_e5f871b
Snag_e603b86
Snag_e61d495

AND I'm getting the following error:
Snag_e650a4c

Any suggestions?!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions