Automatic Indexing of PDF Real Books
Posted by . on Saturday, February 15, 2014 with 18 comments
Working with music in PDF books can be really troublesome. Indexing and naming the individual songs in a large collection is something few applications handle well. To be honest, the Fakebook app was no exception, and importing e.g. a complete Real Book would be a timeconsuming chore.
Until now!
The new version of Fakebook (1.3.0, available on Google Play and Amazon) changes this. Thanks to PDFB script files downloading, indexing, filtering, sorting and naming is just a single click! The scripts are simple semicolon separated text files with all data necessary for the Fakebook parser to automate the boring work. Use them to extract a single page from a huge document or to import and index a complete song book.
The syntax looks like this:
PDFB1;playlisttitle;playlistcomment;url;
page;title;last;first;aka;style;tempo;signature;comment;
page;title;last;first;aka;style;tempo;signature;comment;
An example, extracting some Sonny Rollins songs from a scanned Real Book that is stored in the Download folder on the phone:
PDFB1;Sonny Rollins Book;;file:///sdcard/Download/therealbookvolume1.pdf;
24;Airegin;Rollins;Sonny;;Swing;;4/4;;
340;Oleo;Rollins;Sonny;;Swing;;4/4;;
359;Pent Up House;Rollins;Sonny;;Swing;;4/4;;
506;Plain Jane;Rollins;Sonny;;;;;;
510;Valse Hot;Rollins;Sonny;;Waltz;;3/4;;
OK, but now you all say "writing these scripts still seems quite timeconsuming". Yeah, right.
That's why we provide more examples and ready made PDFB files for importing The Real Book (5th ed) and The Vocal Book on http://www.skrivarna.com/p/fakebook.html. Just select the Menu -> Import new songs -> Internet in the Fakebook app to get there.
More examples and book links will be added on request. And if you write a useful script, feel free to share it by mailing it to us or in the comment field below!
Until now!
The new version of Fakebook (1.3.0, available on Google Play and Amazon) changes this. Thanks to PDFB script files downloading, indexing, filtering, sorting and naming is just a single click! The scripts are simple semicolon separated text files with all data necessary for the Fakebook parser to automate the boring work. Use them to extract a single page from a huge document or to import and index a complete song book.
The syntax looks like this:
PDFB1;playlisttitle;playlistcomment;url;
page;title;last;first;aka;style;tempo;signature;comment;
page;title;last;first;aka;style;tempo;signature;comment;
PDFB1;Sonny Rollins Book;;file:///sdcard/Download/therealbookvolume1.pdf;
24;Airegin;Rollins;Sonny;;Swing;;4/4;;
340;Oleo;Rollins;Sonny;;Swing;;4/4;;
359;Pent Up House;Rollins;Sonny;;Swing;;4/4;;
506;Plain Jane;Rollins;Sonny;;;;;;
510;Valse Hot;Rollins;Sonny;;Waltz;;3/4;;
OK, but now you all say "writing these scripts still seems quite timeconsuming". Yeah, right.
That's why we provide more examples and ready made PDFB files for importing The Real Book (5th ed) and The Vocal Book on http://www.skrivarna.com/p/fakebook.html. Just select the Menu -> Import new songs -> Internet in the Fakebook app to get there.
More examples and book links will be added on request. And if you write a useful script, feel free to share it by mailing it to us or in the comment field below!
Categories: Fakebook
Oh wow, looks like we have been working on more or less exactly the same thing. Perhaps we can collaborate to find a single solution which works for everyone? My idea was to a) crowd-source compilation of the real book indices via https://github.com/aspiers/book-indices and b) provide tools to explode large PDF books into fragments (one PDF per song) plus an index PDF with hyperlinks: https://github.com/aspiers/PDFexploder
ReplyDeleteHi,
ReplyDeletefor Work purposes I need an indexed version of the Real Book 1 SIXTH EDITION. I'm happy to create the PDFB file, but what do I do with it once created? I have all the charts as one long pdf, but it's easy enough for me to split them into separate charts. How do I go about using the PDFB file, and what format do I save it in?
Importing fakebooks via .pdfb scripts is a useful feature. But I would prefer using it with local files: download both the somefakebook.pdfb and the somefakebook.pdf, save both in the same folder and start the import. How can I do that?
ReplyDeleteWith Import new songs -> File I can navigate to the .pdfb and run the import. But the import path in the .pdfb has to be a fully specified path like file:///sdcard/_SyncT/fakebook_django_2008.pdf. This is pretty inconvenient as every person to whom I pass my files has to edit the file and adapt the included path to his or her local situation. I tried file:///fakebook_django_2008.pdf, file:///./fakebook_django_2008.pdf, fakebook_django_2008.pdf (without file:///), none of them worked.
To my surprise performance is not an issue: Importing the 242 Songs in fakebook_django_2008.pdf took less than 5 minutes when I copied it to SD0 before and about 20% longer when I copied from a URL.
Providing 'Composer first name' and 'Composer last name' doesn't make sense. How should that match with something like "V. Guerino & Jean Peyronnin 1928". So I put the complete 'Author' information into 'Composer last name' and misuse 'composer first name' for the key (which for sure would be worth an own dedicated database field). I tried 'Comment' first, but it didn't show up nowhere. The same happens with 'aka', 'tempo' and 'signature', they cannot be edited within Fakebook app and are displayed nowhere.
ReplyDeleteOne more question: which character coding should I use? Most of my special characters like German ä, ö, ü, French é, è, ' and many more look fine in my text editor, but are displayed in a strange way after importing into Fakebook
ReplyDeleteFound out via try & error: converting with Notepad++ to 'UTF8 without BOM' fixed that issue
DeleteiGigBook is now available for Android, you have a lot of catching up to do!
ReplyDeleteHi Philip,
DeleteYes I noticed that the app was published the other day. Great! I think we both benefit from the friendly competition.
Welcome to Android!
Best regards
/Bernard
I need help please. I have been trying over and over again to import Real book 5, but it just won't work?? I tried Real Vocal book, and that worked fine, as did Slickbook. But Real book just will not download. Im using an Asus. Thank you! Jenny.
ReplyDeleteHi Jenny,
DeleteSorry to hear about the problems you see, but I'm sure we will be able to work it out. Could you send us a mail with the exact device model (which Asus) and Android version you use? Also describe what happens (e.g. is there a crash or an error message etc). Send it to fakebook@skrivrna
Hi again Jenny,
DeleteNo need to get back with more info, the problem shall be fixed (one of the download links were broken). Th scripts should all work now.
Thanks for the patience.
Hi, is it possible to visualize 2 pages simultaneously?
ReplyDeletethanks
Couple questions:
ReplyDelete1. How is page in PDF file specified - eg. if "Valse Hot" in Realbk1 is on page A13 - should I input A13 as page ?
2. Can I have one PDFB file with same playlist name but extracting from different several PDF files (which would mean multiple PDFB lines in the data file?
3. Where is the folder for Fakebook app and how can I clean up unwanted imports?
OK so I did couple experiments. I looks like - regarding #1 - you need to input absolute page in the PDF file (counting cover as page 1). Regarding #2 - it might be a moot point since the app does "master index" on it's own so all you need is import each fake book as a separate playlist - using separate PDFB file. With master index PDF file for all fakebooks and with few lines of Perl I can not index all my PDF files - nice!
DeleteMy suggestion for immediate improvement would be to allow editing properties of the tune to include local MP3 file.
Somehow I'm stuck in what feels like an endless loop importing the django book. The import would finish and i could see the music fine but switching applications and returning would cause the import to start sharing from the beginning. Running Android 4.4.4 on all hp tablet.
ReplyDeleteI've hacked together a powershell script to do a little bit of the heavy lifting. it leverages the realindex file which I pulled the text out of. Maybe there's a better index online, but this gets the basics done.
ReplyDelete(script follows)
# realindex.txt format:
# Basic Rhythm for Tango................................... TheBook 279
# Basie Eyes ............................................... Library 29
# Basin St. Blues ........................................... TheBook 398
# Usage e.g.
# .\parse.ps1 -bookname NewReal1 -offset 16
# .\parse.ps1 -bookname NewReal2 -offset 12
# .\parse.ps1 -bookname NewReal3 -offset 10
[CmdletBinding()]
Param(
[Parameter(Mandatory=$True)] [string]$bookname,
[Parameter(Mandatory=$True)] [int]$offset
)
$phoneLocation = 'file:///sdcard/Download'
$pdfbFile = "$bookname.pdfb"
Remove-Item $pdfbFile -ErrorAction SilentlyContinue
Add-Content $pdfbFile "PDFB1;$bookname;;$phoneLocation/$bookname.pdf;"
Get-Content realindex.txt | `
Where-Object { $_ -Like "* $($bookname) *"} `
| Where-Object { $_ -match '(.*?)(?:\.{3,100})[ ](?:.*)[ ](\d+)'} `
| ForEach-Object { `
$page = [int]$matches[2] + $offset
$title = $matches[1]
$last = ''
$first = ''
$aka = ''
$style = ''
$tempo = ''
$signature = ''
$comment = ''
Add-Content $pdfbFile "$page;$title;$last;$first;$aka;$style;$tempo;$signature;$comment;"
}
Write-Output "Done. Copy $pdfbFile and $bookname.pdf to $phoneLocation"
and here's the realindex.txt file I used.
ReplyDeletepart 1: http://pastebin.com/byTn2tvN
part 2: http://pastebin.com/rDME89SD
My script found a bug. An import of a pdfb with
ReplyDelete319;Reach Out; I’ll Be There;;;;;;;;
will stop further imports because of the extra semicolon
fix in my script is to use:
$title = $matches[1] -replace ';',','