A couple days ago I decided to get my act together regarding incoming mail (physical ones) and other documents so I scanned everything before throwing to big box with “Archive” label and… started to think about enabling search of the scanned documents.
Searching through documents is a topic for another post, here we’ll focus on getting text from images by means of Optical Character Recognition service from AWS – AWS Textract.
Preparation
AWS Textract processes documents from S3 bucket, so we need one before we can start our text extraction process, so let’s create one:
aws s3api create-bucket --bucket YOUR_BUCKET_NAME --create-bucket-configuration LocationConstraint=eu-west-1
aws s3api put-bucket-lifecycle-configuration --bucket YOUR_BUCKET_NAME --lifecycle-configuration '{
"Rules": [
{
"ID": "ExpireAfter1Day",
"Status": "Enabled",
"Prefix": "",
"Expiration": {
"Days": 1
}
}
]
}'
Here we create a bucket and then set lifecycle policy to automatically delete all objects after 1 day – this way we get rid of processed document in an automatic way.
Processing documents
Now it’s a time to process the document and get text. First we upload the file from local disk to S3, then we call AWS Textract detect-document-text action to get the results. As the result is quite long JSON, but we need just the text, we post-process the results using jq:
file=test.png
bucket=YOUR_BUCKET_NAME
aws s3 cp $file s3://$bucket/
aws textract detect-document-text --document "S3Object={Bucket=$bucket,Name=$file}" | jq -r '.Blocks[] | select(.BlockType == "LINE") | .Text'
The result will be the text extracted from the image file. You can save it to the file, DB or anywhere you find it useful for further processing 🙂
The nice thing about AWS Textract is that you don’t have to care about the format of the file, as AWS Textract can recognize text from both images as well as PDF files 🙂
That’s a solid point about game fairness – transparency is key! Seeing platforms like ph889 prioritize RNG & RTP is encouraging. Curious to try their quick registration – check out ph889 download for a streamlined experience! Seems geared towards Filipino players.
That’s a solid point about game fairness – transparency is key! Seeing platforms like ph889 prioritize RNG & RTP is encouraging. Curious to try their quick registration – check out ph889 download for a streamlined experience! Seems geared towards Filipino players.
Yo! Had a blast on yono77719. Some killer games and the whole experience felt pretty slick. If you’re looking for a new place to play, check it out yono77719.
Yo! Had a blast on yono77719. Some killer games and the whole experience felt pretty slick. If you’re looking for a new place to play, check it out yono77719.
Yo! Had a blast on yono77719. Some killer games and the whole experience felt pretty slick. If you’re looking for a new place to play, check it out yono77719.
Thinking about checking out 8kbetbb, heard a few whispers. Anyone else had a go? Maybe worth a look, maybe not! Here’s the link 8kbetbb
Thinking about checking out 8kbetbb, heard a few whispers. Anyone else had a go? Maybe worth a look, maybe not! Here’s the link 8kbetbb
Thinking about checking out 8kbetbb, heard a few whispers. Anyone else had a go? Maybe worth a look, maybe not! Here’s the link 8kbetbb
Yo, Jatt777game has my attention now. Pretty straight forward gaming site. Throw a few bucks at jatt777game and see if you win!
Yo, Jatt777game has my attention now. Pretty straight forward gaming site. Throw a few bucks at jatt777game and see if you win!
Yo, Jatt777game has my attention now. Pretty straight forward gaming site. Throw a few bucks at jatt777game and see if you win!