Jump to content
Science Forums

Automating Book Preview Using Software


Recommended Posts

Can we write a program to automate book previews ?

 

Here is how I would go about it -

 

1. Scan the book and store in pdf format.

2. Convert the pdf into doc format.

3. Create hashtables of verbs, adverbs, pronouns, interjections, conjunctions.

4. Identify the nouns and rank person names according to frequency. (by exclusion of 1-3). Ranking will help to identify major and minor characters.

5. Create a hashtable of country names to identify the setting.

6. Scan index to search for keywords that help identify the genre.

7. Check the author to see if he / she is a known celebrity. (This data could be collated in a hashtable).

8. Search for foul language to see if the book is ok for kids.

9. Check whether translations are available.

10. Check the timeline - minimum year to maximum year.

11. Check for existing reviews.

12. Collate the data into sentences that are readable. For instance -

 

* The genre of the book is spy-thriller.

* The main characters are Jack, Jill and Bob.

* The story is set in Afghanistan. etc.

 

This is done by using a template and substituting the variables.

 

This will at least reflect key data about the book.

 

Do you think this is feasible ?

Link to comment
Share on other sites

 

Can we write a program to automate book previews ?
 
Here is how I would go about it -
 
1. Scan the book and store in pdf format.
2. Convert the pdf into doc format.
3. Create hashtables of verbs, adverbs, pronouns, interjections, conjunctions.
4. Identify the nouns and rank person names according to frequency. (by exclusion of 1-3). Ranking will help to identify major and minor characters.
5. Create a hashtable of country names to identify the setting.
6. Scan index to search for keywords that help identify the genre.
7. Check the author to see if he / she is a known celebrity. (This data could be collated in a hashtable).
8. Search for foul language to see if the book is ok for kids.
9. Check whether translations are available.
10. Check the timeline - minimum year to maximum year.
11. Check for existing reviews.
12. Collate the data into sentences that are readable. For instance -
 
* The genre of the book is spy-thriller.
* The main characters are Jack, Jill and Bob.
* The story is set in Afghanistan. etc.
 
This is done by using a template and substituting the variables.
 
This will at least reflect key data about the book.
 
Do you think this is feasible ?

 

No.

 

Next silly question..........

Link to comment
Share on other sites

Exchemist?
This is an application of  NLP (=Natural Language Processing). I mean I was writing recomendation engines some based on how similar articles are, so there you fish out the defining words (eg. frequent in current article but not frequent in all the articles) and ifthere are enough matches you say the articles are similar.

Petrush's idea is similar but easier for 4 you'd use some TFIDF (term frequency times inverse document frequency) variation  since you only have a corpus of 1 document.
For 6 some sentiment-analysis equivalent neural network to define the genre

etc.

Link to comment
Share on other sites

Exchemist?

This is an application of  NLP (=Natural Language Processing). I mean I was writing recomendation engines some based on how similar articles are, so there you fish out the defining words (eg. frequent in current article but not frequent in all the articles) and ifthere are enough matches you say the articles are similar.

 

Petrush's idea is similar but easier for 4 you'd use some TFIDF (term frequency times inverse document frequency) variation  since you only have a corpus of 1 document.

For 6 some sentiment-analysis equivalent neural network to define the genre

 

etc.

Classifying a book by genre, or identifying similarities, is not writing a review, which I assume is what was meant by "preview". 

 

A review involves someone whose opinion might command a reader's respect giving an opinion on the qualities and demerits of the book, with reference to subject, plot, characters, style etc.  

Link to comment
Share on other sites

Ok, I concentrated on the points in the list not the title. And the points given are feasible

 

I could improve the design by -

 

  • Moving data from a database where a preview has been generated into an archive
  • Using a webservice to get the latest reviews
  • Storing generated previews as XML to enable efficient storage of data and transfer across firewalls    :vava:
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...